Extracting 22-character strings from text using sed/awk?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting 22-character strings from text using sed/awk?
# 8  
Old 09-15-2013
Try:
Code:
perl -nle 'print $& while /.{20}GG/g' file

# 9  
Old 09-17-2013
@wisecracker - thanks. The code works great, but doesn't get desiredthe results because there are line breaks.
# 10  
Old 09-17-2013
Hi Twinklefingers...

Can you copy and paste what you are getting...

Thanks...
# 11  
Old 09-17-2013
@wisecracker

Input:

Code:
TCAAATTTTATGGCATAGTAAAGATAACAAAAGAAGTGATCGAATTTATTCTTTAGATAT
TCAACCATATCCAAATTATTATAGTGTTAAAAAATTAAATAGGAAATATGAATTGTTTAT
CAAATATTTAAAAGAAAAAGGAAAAATAGAATGTAATAAATTCGACACTTTAGAAGAAAA
ACAAAAAATTATAAATGATTCAACATTGCCCTCAAATAATAATTATAATAATAGTAGTAA
TAATAATAATAATAATAATAAATATGAAACATTAAAATTAGATAAAGAACTGAATCAAGG
AAATGAGGATAACATTATAAAGACGGTAATTGGGAATAACACGGAAGTTAGTAATAATAA
CTTATTAGATGATACAAACAATAAATTAAACGAAATAAAAACAAATACATCTACGGAAGA
TCACCAAGAACACAATTTAGTAAATAAAAAAAACGAAACGAATAGTTCATCTAATGACAA
TATTTCACATAATAAAACACCAATGCAATCAAATAAATTACTTACATCATTACAAGATGA
TAAAACAAAAAAAAAACCTATCAAATTTAATATAGCTACATGTGGTGCTGACGAATTCGT
ACATTTGTGGAGAATATTTATAAAAGATGATATTTCAATTAAATGCTTAGGTAGATTTAT
AGGTCATAGTGGTGAAATAAATTGTGTTCGTTTTAACAAAAATGGTAGATATATAGCAAG
TGGGGGAGAAGATAAATTTTTATATATATGGGAAAAAAGTAAAAAACCAAAAAATATACC
ATTAGGTTATGATATAAGCTTTTTAGATTATAAAGAATGGTGGAATGTCGTAGGGTCTTT
CAGATGTAGTGGTGTAATAAATAGTATTATTTGGTCAAACAATGATACCTTATATGTAGC

Output:

Code:
L01883730:dir$ sh 22mers_17Sept13.sh 

TCAAATTTTATGGCATAGTAAAGATAACAAAAGAAGTGATCGAATTTATTCTTTAGATAT

TTTAGATAT

Thank you!
# 12  
Old 09-17-2013
Try GNU awk:
Code:
awk 'NR>1{print p} {gsub(/\n/,x); p=substr($0,length-19,20) RS}' RS=GG file

Or can there also be strings that also have GG among the 20 characters?
# 13  
Old 09-17-2013
@Scrutinizer

Thank you very much. I think this is getting into the right ballpark. However,

for the input:
Code:
TCAAATTTTATGGCATAGTAAAGATAACAAAAGAAGTGATCGAATTTATTCTTTAGATAT
TCAACCATATCCAAATTATTATAGTGTTAAAAAATTAAATAGGAAATATGAATTGTTTAT
CAAATATTTAAAAGAAAAAGGAAAAATAGAATGTAATAAATTCGACACTTTAGAAGAAAA
ACAAAAAATTATAAATGATTCAACATTGCCCTCAAATAATAATTATAATAATAGTAGTAA

the desired output is:
Code:
TAGTGTTAAAAAATTAAATAGG
TCAAATATTTAAAAGAAAAAGG

the script returns:
Code:
TCAAATTTTATGG
GG
CATAGG
TAAAGG
ATAACAAAAGG
AAGG
TGG
ATCGG
AATTTATTCTTTAGG
ACCATATCCAAATTATTATAGG
TGG
TTAAAAAATTAAATAGG
GG
AAATATGG
AATTGG
TTTATCAAATATTTAAAAGG
AAAAAGG
GG
AAAAATAGG
AATGG
TAATAAATTCGG
ACACTTTAGG
AAGG
AAAAACAAAAAATTATAAATGG
ATTCAACATTGG
AAATAATAATTATAATAATAGG
TAGG

# 14  
Old 09-17-2013
Would something like this work then (GNU awk or mawk):
Code:
awk 'length(p)==22{print p} {$0=p $0; gsub(/\n/,x); p=substr($0,length-19,20) RS}' RS=GG file

Is should also find strings that have been part of a previous match, except the first GG in the previous example on line 1 because it does not have 20 chars before.

--
edit, this should also find the first short one:
Code:
awk 'NR>1{print p; $0=p $0} {gsub(/\n/,x); p=substr($0,length-19,20) RS}' RS=GG file

Code:
GGTCAAATTTTATGG
TAGTGTTAAAAAATTAAATAGG
TCAAATATTTAAAAGAAAAAGG
TAGATAAAGAACTGAATCAAGG
ACTGAATCAAGGGGAAATGAGG
GGGGATAACATTATAAAGACGG
ATTATAAAGACGGGGTAATTGG
GGTAATTGGGGGAATAACACGG
ATAAAAACAAATACATCTACGG
AATTTAATATAGCTACATGTGG
TGACGAATTCGTACATTTGTGG
ATATTTCAATTAAATGCTTAGG
TGCTTAGGGGTAGATTTATAGG
AGATTTATAGGGGTCATAGTGG
GTGTTCGTTTTAACAAAAATGG
GGGGTAGATATATAGCAAGTGG
TAGATATATAGCAAGTGGGGGG
AGATAAATTTTTATATATATGG
AACCAAAAAATATACCATTAGG
CTTTTTAGATTATAAAGAATGG
TAGATTATAAAGAATGGGGTGG
ATGGGGTGGGGAATGTCGTAGG
GGGGGTCTTTCAGATGTAGTGG
TGTAATAAATAGTATTATTTGG


Last edited by Scrutinizer; 09-17-2013 at 05:57 PM.. Reason: change > 19 to ==22
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extracting strings at various positions of text file

Hi Team - I hope everyone has been well! I export a file from one of our source systems that gives me more information than I need. The way the file outputs, I need to extract certain strings at different positions on the file and echo them to another file. I can do this in batch easily,... (2 Replies)
Discussion started by: SIMMS7400
2 Replies

2. Shell Programming and Scripting

Extracting text between two strings, multiple instances

Hi experts, Ive got a text file which has the following text which will occur in this format at least one time: +=========================>> Some stuff that evreryone should knnow other stufsjdokajkajokajda aijhjajcdjajcisajcqsqdqwdqad <<=========================+ It is likely that... (8 Replies)
Discussion started by: martin0852
8 Replies

3. Shell Programming and Scripting

Extracting text between two constant strings

Hi All, I have a file whose common patter is like this: .I 1 .U 87049087 .S Some text here too .M This is a text .T Some another text here .P Name of the book .W Some lines of more text. This text needs to be extracted. .A more text goes here too .I 2 (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

4. Shell Programming and Scripting

any savant ? using AWK/SED to remove newline character between two strings : conditional removal

I'd like to remove (do a pattern or precise replacement - this I can handle in SED using Regex ) ---AFTER THE 1ST Occurrence ( i.e. on the 2nd occurrence - from the 2nd to fourth occurance ) of a specific string : type 1 -- After the 1st occurrence of 1 string1 till the 1st occurrence of... (4 Replies)
Discussion started by: sieger007
4 Replies

5. Shell Programming and Scripting

Extracting text between two strings

Hi, I've looked at a few existing posts on this, but they don't seem to work for my inputs. I have a text file where I want to extract all the text between two strings, every time that occurs. Eg my input file is Anna said that she would fetch the bucket. Anna and Ben moved the bucket.... (9 Replies)
Discussion started by: JamesForeman
9 Replies

6. Shell Programming and Scripting

replace two character strings by two variables with sed command

Hello, I want to writte a script that replace two character strings by two variables with the command sed butmy solution doesn't work. I'm written this: sed "s/TTFactivevent/$TTFav/g && s/switchSLL/$SLL/g" templatefile. I want to replace TTFactivevent by the variable $TTFav, that is a... (4 Replies)
Discussion started by: POPO10
4 Replies

7. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

8. UNIX for Dummies Questions & Answers

Using awk/sed to extract text between Strings

Dear Unix Gurus, I've got a data file with a few hundred lines (see truncated sample)... BEGIN_SCAN1 TASK_NAME=LA48 PDD Profiles PROGRAM=ArrayScan 1.00 21.220E+00 2.00 21.280E+00 END_DATA END_SCAN1 BEGIN_SCAN2 TASK_NAME=LA48 PDD Profiles 194.00 2.1870E+00 ... (5 Replies)
Discussion started by: tintin72
5 Replies

9. Shell Programming and Scripting

Extracting text between two strings, first instance only

There are a lot of ways to extract text from between two strings, but what if those strings occur multiple times and you only want the text from the first two strings? I can't seem to find anything to work here. I'm using sed to process the text after it's extracted, so I prefer a sed answer, but... (4 Replies)
Discussion started by: fubaya
4 Replies

10. Shell Programming and Scripting

extracting a set of strings from a text file

i have textfiles that contain a series of lines that look like this: string0 .................................................... column3a column4a string1**384y0439 ..................................... column3b column4b... (2 Replies)
Discussion started by: Deanne
2 Replies
Login or Register to Ask a Question