Correct use of substr

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Correct use of substr
# 1  
Old 04-18-2017
Correct use of substr

I have a file that looks like this:
Code:
 >ID_1
 ATGCATGC
 >ID_2
 ATGCATGC
 >ID_3
 ATGCATGC
 >ID_4
 ATGCATGC

And I am using the following script to "extract" specific positions from the sequences:
Code:
 awk '/^>/{id=$0; next}{ print id "\n" substr( $1,1,1 ) substr ($1,4,2 ) substr ($1,7,1) }' test.txt

It actually works but I suspect is the wrong way to use substr. This is the output:
Code:
 >ID_1
 ACAG
 >ID_2
 ACAG
 >ID_3
 ACAG
 >ID_4
 ACAG

Ideally, what I would like to do, is to use a file positions.txt, containing the sites I would like to extract:
Code:
 1
 4
 5
 7

I would appreciate if anyone can point me in the right direction.
Thanks in advance!
# 2  
Old 04-18-2017
Actually -- I see nothing wrong. That's how strings, substr, concatenation, and variables work in awk.

Anyway, the code you wanted:
Code:
awk '   NR==FNR { POS[++P]=$1+0 ; next } # Load into array POS while in file 1
        /^>/ { print ; next } # Print IDs immediately
        {
                S="";
                for(N=1; N in POS; N++) S=S substr($0, POS[N], 1); # Assemble substrings
                print S; # Print
        }' positions.txt inputfile

NR==FNR is an old trick. NR is the total cumulative number of lines, while FNR is the line number in the current file. The two are equal only while awk is processing its first file.
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 04-18-2017
Thanks a TON Corona!
Could you please explain me the following parts of your code:

Code:
 POS[++P]=$1+0

Once again thank you very much!
# 4  
Old 04-18-2017
++P is the pre-increment operator, which increments the variable before it's used. Which means it goes POS[1], POS[2], POS[3], ...

If I'd used the post-increment operator, P++, it would do POS[""], POS[1], POS[2], ... because unset variables are blank strings.

$1+0 is to make sure awk stores it as a number, not a string. Doing any arithmetic on a string converts it into a number. Might not be necessary here.
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 04-18-2017
Got it! Just one more quick question, how could I change the output field separator for substr from "" to " "? In other words, how can I modify your script so I can generate the following output:
Code:
 >ID_1
 A C A G
 >ID_2
 A C A G
 >ID_3
 A C A G
 >ID_4
 A C A G

# 6  
Old 04-18-2017
Interesting: shouldn't the integer operator p++ immediatly cast to an integer i.e. give 0 ??

---------- Post updated at 15:42 ---------- Previous update was at 14:56 ----------

Because the output is assembled in a variable there is no simple OFS option.
Two solutions,
1. with a separator variable
Code:
                S=sep=""
                for(N=1; N in POS; N++) { S=S sep substr($0, POS[N], 1); sep=" " }# Assemble substrings

2. with an embedded if clause
Code:
                S=""
                for(N=1; N in POS; N++) S=S (S=="" ? S : " ") substr($0, POS[N], 1) # Assemble substrings

# 7  
Old 04-18-2017
I guess I am doing something wrong because I am only printing the headers.
I can modify the file using sed but I really would like to get the feeling of how to do it with awk
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Substr

awk '/^>/{id=$0;next}length>=7 { print id, "\n"$0}' Test.txt Can I use substr to achieve the same task? Thanks! (8 Replies)
Discussion started by: Xterra
8 Replies

2. Shell Programming and Scripting

HELP : awk substr

Hi, - In a file test.wmi Col1 | firstName | lastName 4003 | toto_titi_CT- | otot_itit - I want to have only ( colones $7,$13 and $15) with code 4003 and 4002. for colone $13 I want to have the whole name untill _CT- or _GC- 1- I used the command egrep with awk #egrep -i... (2 Replies)
Discussion started by: georg2014
2 Replies

3. Shell Programming and Scripting

How to use if/else if with substr?

I have a command like this: listdb ID923 -l |gawk '{if (substr($0,37,1)==1 && NR == 3)print "YES" else if (substr ($0,37,1)==0 && NR == 3) print "NO"}' This syntax doesn't work. But I was able to get this to work: listdb ID923 -l |gawk '{if (substr($0,37,1)==1 && NR == 3)print "YES"}' ... (4 Replies)
Discussion started by: newbie2010
4 Replies

4. Shell Programming and Scripting

awk substr

HI I am using awk and substr function to list out the directory names in the present working directory . I am using below code ls -l | awk '{ if ((substr($1,1,1)) -eq d) {print $9 }}' But the problem is i am getting all the files and directories listed where as the requirement i wrote... (7 Replies)
Discussion started by: prabhu_kumar
7 Replies

5. UNIX for Dummies Questions & Answers

substr

can anybody explain this code? thanks in advance..:) (6 Replies)
Discussion started by: janani_kalyan
6 Replies

6. UNIX for Dummies Questions & Answers

substr of a file

Hi, i'm a newbie and i don't know unix... I'm a dba oracle. I need to cat the content of a file like this: > ps -eaf|grep pmon oracle 221422 1 0 Sep 17 - 7:20 ora_pmon_ORCL oracle 405626 1 0 Sep 17 - 8:39 ora_pmon_ORCL1 oracle 491534 1 0 ... (3 Replies)
Discussion started by: davyp74
3 Replies

7. Shell Programming and Scripting

get substr?

Hi, I have a long string like, aabab|bcbcbcbbc|defgh|paswd123 dedededede|efef|ghijklmn|paswd234 ghghghghgh|ijijii|klllkkk|paswd345 lmlmlmmm|nononononn|opopopopp|paswd456 This string is devided into one space between substrings. This substrings are, aabab|bcbcbcbbc|defgh|paswd123... (6 Replies)
Discussion started by: syamkp
6 Replies

8. UNIX for Dummies Questions & Answers

awk or substr

i have a variable 200612 the last two digits of this variable should be between 1 and 12, it should not be greater than 12 or less than 1 (for ex: 00 or 13,14,15 is not accepted) how do i check for this conditions in a unix shell script. thanks Ram (3 Replies)
Discussion started by: ramky79
3 Replies

9. UNIX for Dummies Questions & Answers

Substr

Hi, My input file is 41;2;xxxx;yyyyy.... 41;2;xxxx;yyyyy.... 41;2;xxxx;yyyyy.... .. .. I need to change the second field value from 2 to 1. i.e., 41;1;xxxx;yyyyy.... 41;1;xxxx;yyyyy.... 41;1;xxxx;yyyyy.... .. .. Thanks in advance. (9 Replies)
Discussion started by: deepakwins
9 Replies

10. Shell Programming and Scripting

Using substr

What is the more efficient way to do this (awk only and default FS) ? $ echo "jefe@alm"|awk '{pos = index($0, "@");printf ("USER: %s\n",substr ($0,1,pos-1))}' USER: jefe Thx in advance (2 Replies)
Discussion started by: Klashxx
2 Replies
Login or Register to Ask a Question