Substr


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Substr
# 1  
Old 07-24-2015
Substr

Code:
awk '/^>/{id=$0;next}length>=7 { print id, "\n"$0}' Test.txt

Can I use substr to achieve the same task?
Thanks!

Last edited by Xterra; 07-24-2015 at 07:44 PM.. Reason: updated code
# 2  
Old 07-24-2015
This is equivalent to this:
Code:
awk 'NR%2 || length()>7' file

And it prints every odd line and every even line if the line length is more than 7 bytes..

I do not see how substr0 could be used for this..
# 3  
Old 07-24-2015
Sorry, I have just updated the code
Code:
 awk '/^>/{id=$0;next}length>=7 { print id, "\n"$0}' Test.txt

I should add that if the sequence in the even line is shorter than 7, both, the odd line and "id" should be removed
infile:
Code:
>GHL8OVD01BNNCA Freq 10
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTNGCAGCAYaaaMz12
>GHL8OVD01CMQVT SHORT
TTGATGT
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCNTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAG-AC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCYTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA 
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCTTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAGTAC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCTTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA

outfile:
Code:
>GHL8OVD01BNNCA Freq 10
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTNGCAGCAYaaaMz12
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCNTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAG-AC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCYTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCTTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAGTAC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCTTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA


Last edited by Xterra; 07-24-2015 at 09:45 PM..
# 4  
Old 07-24-2015
Xterr,
Couple of things:

1. Not sure what your original code was before you edited it as your updated code is identical.
2. There is a space in the first position of each line on your input file so the condition ^> will not find any lines.
3. What is the requirement as to why you need to use substr as your solution appears to work with a slight modification?
4. Based on your input and output files, length>=7 should be length>7 in your solution after removing the spaces in first position of each line.

Code:
awk '/^>/{id=$0;next}length>7 { print id, "\n"$0}' Test.txt
>GHL8OVD01BNNCA Freq 10
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTNGCAGCAYaaaMz12
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCNTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAG-AC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCYTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA
>GHL8OVD01CMQVT Freq 1
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGCTTA
>GHL8OVD01CMQVW Freq 1
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAGTAC
>GHL8OVD01A45V3 Freq 1
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCTTC
>GHL8OVD01AV2U9 Freq 1
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTCGCAGCGTTA

# 5  
Old 07-24-2015
Quote:
There is a space in the first position of each line on your input file so the condition ^> will not find any lines.
Sorry! I did not notice (corrected)
Quote:
What is the requirement as to why you need to use substr as your solution appears to work with a slight modification?
I know my code works. I want to improve it. Is there a better way to achieve this task using awk? Any suggestions?
# 6  
Old 07-24-2015
Xterra,

Quote:
Is there a better way to achieve this task using awk?
What do you mean by better way? Most efficient? The shortest or most cryptic? Your solution appears to work based on your input/output after implementing #2 and #4 in my last post and is readable. Surely there are other alternatives on how to do it with awk or other commands/utilities but without further clarification on the requirements or what you mean by a better way, it's hard for us to make suggestions.

Your requirement
Quote:
if the sequence in the even line is shorter than 7 , both, the odd line and "id" should be removed
does not match your desired output results as the below line on your input file does not meet this condition (i.e. it's not shorter than 7) and should therefore appear in the output file (your output file does not have it).

Quote:
>GHL8OVD01CMQVT SHORT
TTGATGT
btw...Editing/correcting your prior posts based on suggestions made in subsequent posts makes it difficult to follow the history of the posts.
# 7  
Old 07-25-2015
There isn't a lot of room for improvement in:
Code:
awk '/^>/{id=$0;next}length>=7 { print id, "\n"$0}' Test.txt

But, the comma in your print statement is adding a space to the end of the odd numbered output lines. And, as mjf said, to match the output you said you wanted; you need length>7 instead of length>=7. A slightly different way of doing it is:
Code:
awk '{if($1~/^>/)id=$0;else if(length>7)print id"\n"$0}' Test.txt

but I don't know that it is any better (other than taking out the comma). The following is the best I can do using your naming conventions. It is a little bit shorter and requires fewer tests:
Code:
awk '{id=$0;getline;if(length>7)print id"\n"$0}' Test.txt

This User Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

HELP : awk substr

Hi, - In a file test.wmi Col1 | firstName | lastName 4003 | toto_titi_CT- | otot_itit - I want to have only ( colones $7,$13 and $15) with code 4003 and 4002. for colone $13 I want to have the whole name untill _CT- or _GC- 1- I used the command egrep with awk #egrep -i... (2 Replies)
Discussion started by: georg2014
2 Replies

2. Shell Programming and Scripting

How to use if/else if with substr?

I have a command like this: listdb ID923 -l |gawk '{if (substr($0,37,1)==1 && NR == 3)print "YES" else if (substr ($0,37,1)==0 && NR == 3) print "NO"}' This syntax doesn't work. But I was able to get this to work: listdb ID923 -l |gawk '{if (substr($0,37,1)==1 && NR == 3)print "YES"}' ... (4 Replies)
Discussion started by: newbie2010
4 Replies

3. Shell Programming and Scripting

awk substr

Hello life savers!! Is there any way to use substr in awk command for returning one part of a string from declared start and stop point? I mean I know we have this: substr(string, start, length) Do we have anything like possible to use in awk ? : substr(string, start, stop) ... (9 Replies)
Discussion started by: @man
9 Replies

4. UNIX for Dummies Questions & Answers

substr

can anybody explain this code? thanks in advance..:) (6 Replies)
Discussion started by: janani_kalyan
6 Replies

5. UNIX for Dummies Questions & Answers

substr of a file

Hi, i'm a newbie and i don't know unix... I'm a dba oracle. I need to cat the content of a file like this: > ps -eaf|grep pmon oracle 221422 1 0 Sep 17 - 7:20 ora_pmon_ORCL oracle 405626 1 0 Sep 17 - 8:39 ora_pmon_ORCL1 oracle 491534 1 0 ... (3 Replies)
Discussion started by: davyp74
3 Replies

6. UNIX for Dummies Questions & Answers

substr of a file

.wysiwyg { PADDING-RIGHT: 0px; PADDING-LEFT: 0px; BACKGROUND: #f5f5ff; PADDING-BOTTOM: 0px; MARGIN: 5px 10px 10px; FONT: 10pt verdana, geneva, lucida, 'lucida grande', arial, helvetica, sans-serif; COLOR: #000000; PADDING-TOP: 0px } .wysiwyg A:link { COLOR: #22229c } .wysiwyg_alink { COLOR:... (1 Reply)
Discussion started by: davyp74
1 Replies

7. Shell Programming and Scripting

substr not working

Hi I am trying to run this command in ksh ...its not working $line="123356572867116w1671716" actual_length = 16 cut_line=`awk 'BEGIN{print substr(ARGV,1,actual_length)}' "$line"` the substr is not giving me an output how can i make it done can anyone hwlp me on this cut_line=`awk... (2 Replies)
Discussion started by: pukars4u
2 Replies

8. Shell Programming and Scripting

get substr?

Hi, I have a long string like, aabab|bcbcbcbbc|defgh|paswd123 dedededede|efef|ghijklmn|paswd234 ghghghghgh|ijijii|klllkkk|paswd345 lmlmlmmm|nononononn|opopopopp|paswd456 This string is devided into one space between substrings. This substrings are, aabab|bcbcbcbbc|defgh|paswd123... (6 Replies)
Discussion started by: syamkp
6 Replies

9. UNIX for Dummies Questions & Answers

Substr

Hi, My input file is 41;2;xxxx;yyyyy.... 41;2;xxxx;yyyyy.... 41;2;xxxx;yyyyy.... .. .. I need to change the second field value from 2 to 1. i.e., 41;1;xxxx;yyyyy.... 41;1;xxxx;yyyyy.... 41;1;xxxx;yyyyy.... .. .. Thanks in advance. (9 Replies)
Discussion started by: deepakwins
9 Replies

10. Shell Programming and Scripting

Using substr

What is the more efficient way to do this (awk only and default FS) ? $ echo "jefe@alm"|awk '{pos = index($0, "@");printf ("USER: %s\n",substr ($0,1,pos-1))}' USER: jefe Thx in advance (2 Replies)
Discussion started by: Klashxx
2 Replies
Login or Register to Ask a Question