awk split


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers awk split
# 1  
Old 08-12-2011
awk split

Hi Folks,

I have lines that look like this:

Code:
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/469_894ACGTGCTATGCGG

I want to split all lines into:

Code:
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426     ACGTGCTATGCGG
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/469_894 ACGTGCTATGCGG

Then I want to print:

Code:
ACGTGCTATGCGG
ACGTGCTATGCGG

Code:
awk '{split($0,a,"[A;C;G;T]");print a[1]}'

gives:

Code:
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/469_894

but

Code:
awk '{split($0,a,"[A;C;G;T]");print a[2]}'

gives nothing.

I can split in other ways, e.g.,

Code:
awk '{split($0,a,"/[0-9]_[0-9]");print a[2]}'

or

Code:
awk '{split($0,a,"/[0-9]_[0-9][A;C;G;T]");print a[2]}'

a[1] always prints correctly, but a[2] is always "empty".

What am I doing wrong?

Thanks for your help.
Robert

Last edited by zxmaus; 08-12-2011 at 11:44 PM..
# 2  
Old 08-12-2011
You can't expect characters that are used to split a string to be part of the result. If you split "1,2,3,4" on the comma, by definition the comma is not an allowed member of a field. Same goes with a bracket expression such as "[ACGT]"; splitting on such an expression forbids A, C, G, and T from occurring in a field.

Assuming I understood what were trying to do, the semicolons in your bracket expressions are incorrect. Characters in a bracket expression should not be delimited. To split on the four letters "A", "C", "G", and "T", "[ACGT]" is all that's needed. Adding those semicolons will cause splitting on semicolons as well.

Looking at your data:
Code:
>m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG

If you just want to print the highlighted base sequence, and if its always preceded by the final number in the line, the following will do:
Code:
sed 's/.*[[:digit:]]//'

Or if the base sequence always begins at the 4th character past the final underscore:
Code:
sed 's/.*_...//'

Regards,
Alister

Last edited by alister; 08-12-2011 at 03:33 PM..
This User Gave Thanks to alister For This Post:
# 3  
Old 08-12-2011
Try this one liner awk, see if it helps:

Code:
echo "m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG"|awk -F'A' '{print $1 " " "A" $2 $3}'

---------- Post updated at 02:36 PM ---------- Previous update was at 02:34 PM ----------

off course if you want to print just ACGT...

Code:
echo "m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG"|awk -F'A' '{print  "A" $2 $3}'


Last edited by zxmaus; 08-12-2011 at 11:45 PM..
This User Gave Thanks to dude2cool For This Post:
# 4  
Old 08-12-2011
Alister,
Thanks for taking a look, and for your comments, quite helpful. The solution you offered did the trick. I sincerely appreciate your time.

Best,
Robert
# 5  
Old 08-12-2011
Quote:
Originally Posted by dude2cool
Try this one liner awk, see if it helps:

Code:
echo "m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG"|awk -F'A' '{print $1 " " "A" $2 $3}'

---------- Post updated at 02:36 PM ---------- Previous update was at 02:34 PM ----------

off course if you want to print just ACGT...

echo "m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG"|awk -F'A' '{print "A" $2 $3}'
Alister's solution sure simpliest and better but if you want to do it with awk then using different seperator rather than "A" cold be better i guess:
Code:
echo ">m110730_101608_00120_c100168052554400000315046108261127_s1_p0/7/29_426ACGTGCTATGCGG" | awk -F_ 'c=substr($NF,4,13){print c}'

This User Gave Thanks to EAGL€ For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk split and awk calculation in the same command

I am trying to run the awk below. My question is when I split the input, then run anotherawk to perform a calculation using that splitas the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you :). input ... (8 Replies)
Discussion started by: cmccabe
8 Replies

2. Shell Programming and Scripting

awk split help

Hello, I have the following input file: A=1;B=2;C=3;D=4 A=4;B=6;C=7;D=9 I wish to have the following output 1 2 3 4 4 6 7 9 Can awk split be used to do this? I have done this without using split, but the process is quite tedious. Any help is appreciated! (4 Replies)
Discussion started by: Rabu
4 Replies

3. Shell Programming and Scripting

awk split numbers

I would like to split a string of numbers "1-2,4-13,16,19-20,21-25,31-32" and output these with awk into -dFirstPage=1 -dLastPage=2 file.pdf -dFirstPage=4 -dLastPage=13 file.pdf -dFirstPage=16 -dLastPage=16 file.pdf file.pdf -dFirstPage=19 -dLastPage=20 file.pdf -dFirstPage=21 -dLastPage=25... (3 Replies)
Discussion started by: sdf
3 Replies

4. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

5. UNIX for Dummies Questions & Answers

awk split

Can anybody tell me what is wrong with this ? It does not produce anyoutput. awk 'split( "this:that", arr,":")' (2 Replies)
Discussion started by: jville
2 Replies

6. Shell Programming and Scripting

AWK split

Dear colleagues! I want to create a script which will take each file from the list and then parse it filename with awk/split. I do it this way: for file in `cat /$FileListFN`; do echo `awk ' {N=split(FILENAME,FNParts,"_")} {for (i=1; i<=N; i++) ... (10 Replies)
Discussion started by: slarionoff
10 Replies

7. Shell Programming and Scripting

awk to split string

Hello Friends, Im trying to split a string. When i use first method of awk like below i have an error: method1 (I specified the FS as ":" so is this wrong?) servert1{root}>awk -f split.txt awk: syntax error near line 2 awk: bailing out near line 2 split.txt:... (5 Replies)
Discussion started by: EAGL€
5 Replies

8. Shell Programming and Scripting

split file with awk

I did a lot of search on this forum on spiting file; found a lot, but my requirement is a bit different, please guide. Master file: x:start:5 line1:23 line2:12 2:90 x:end:5 x:start:2 45:56 22:90 x:end:2 x:start:3 line1:23 line2:12 x:end:3 x:start:2 line5:23 (1 Reply)
Discussion started by: uwork72
1 Replies

9. Shell Programming and Scripting

awk - split function

Hi, I have some output in the form of: #output: abc123 def567 hij890 ghi324 the above is in one column, stored in the variable x ( and if you wana know about x... x=sprintf(tolower(substr(someArray,1,1)substr(userArray,3,1)substr(userArray,2,1))) when i simply print x (print x) I get... (7 Replies)
Discussion started by: fusionX
7 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question