Help with extract info if fulfill condition required


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with extract info if fulfill condition required
# 1  
Old 05-19-2012
Help with extract info if fulfill condition required

Input file (4 DATA record shown in this case):
Code:
DATA       AA0110            
ACCESSION   AA0110
VERSION     AA0110  GI:157412239
FEATURES             Location/Qualifiers
     length            1..1170
                      1..1700
                     /length="1170"
     position            1..1170
                     /length="1170"
     band             1..948
                     /length="948"
//

DATA       BC599              
DEFINITION  USA
ACCESSION   BC599
VERSION     BC599  GI:239744030
FEATURES             Location/Qualifiers

     position          1..3159
                     /length="3159"
     length            1..40000
                     /length="40000"
//

DATA       HI101               
DEFINITION  UK
ACCESSION   HI101
VERSION     HI101  GI:239745142

FEATURES             Location/Qualifiers

     band             1..757
                     /length="757"
     length            1..747
                     /length="747"
//

DATA       AVE111
ACCESSION   AVE111
VERSION     AVE111  GI:157412223
FEATURES             Location/Qualifiers
     position            1..1170
                     /length="1170"
//

Desired output file:
Code:
157412239 1170
239744030 40000
239745142 747
157412223 -

Condition required:
1. The first column info of desired output file is extracted from the line shown "VERSION" and extract the content after GI:;
2. The second column info of desired output file is extracted from the line that shown "/length="XXX"" after "length" word;
3. If first column info of desired output file is available but lack of column 2 info. Just put a "-" and print in desired output file;

Command try:
Code:
awk 'BEGIN {RS=""; FS="//"} /VERSION/ {for (i=1;i<=NF;i++) {if ($i~/\/length=/) {print $i}}}' input_file.txt
DATA       AA0110            
ACCESSION   AA0110
VERSION     AA0110  GI:157412239
FEATURES             Location/Qualifiers
     length            1..1170
                     /length="1170"
     position            1..1170
                     /length="1170"
     band             1..948
                     /length="948"

The command I try fail to give my desired output result Smilie
I was thinking to use "//" as field separator of each record.

Thanks for any advice.

---------- Post updated at 09:01 PM ---------- Previous update was at 04:54 AM ----------

Is there any advice or hints provided to solve my doubt?
I'm still stuck at solving this problem Smilie
Thanks in advance!

Last edited by perl_beginner; 05-22-2012 at 01:39 PM..
# 2  
Old 05-20-2012
I think you want RS to be //, not FS.

Code:
#!/usr/bin/awk -f
BEGIN { RS="//"; FS="\n[[:space:]]*" }

{
        ver=len=""

        for (i=1;i<=NF;i++) {
                if (match($i,/^VERSION .* GI:/))
                        ver=substr($i,RSTART+RLENGTH)
                if ($i ~ /^length / && split($(i+1),b,/="/))
                        len=b[2]
        }
        if (ver) print ver, len ? 0+len : "-"
}

This User Gave Thanks to neutronscott For This Post:
# 3  
Old 05-20-2012
// cannot be used a record separator in standard awk, it needs to be a single character. The special caseRS= to split the records on empty lines (two consecutive newlines) can not be used here because there are empty lines in the records.

Try:
Code:
awk -F'[ \t:"=]*' '$1=="VERSION"{if(p)print p; printf "%s ",$4; p="-"} $2=="length"{ getline; if($2=="/length") p=$3 } END{print p}' infile


Last edited by Scrutinizer; 05-20-2012 at 02:31 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 05-20-2012
Thanks neutronscott.
Many thanks for your awk script.
Give me some time to digest it Smilie

---------- Post updated at 01:45 AM ---------- Previous update was at 01:43 AM ----------

Dear Scrutinizer,

Your awk script work perfectly for my case!
Really appreciate for your explanation in detail.
I will take note in future regarding "//" and FS and RS.
Currently I'm trying to understand your awk command.
Will ask you if I'm stuck on it later Smilie
Thanks a lot!
# 5  
Old 05-22-2012
Hi Scrutinizer,

Do you have any idea if one of my record is shown like this:
Code:
     length            1..1170
                      1..1700
                     /length="1170"

instead of:
Code:
      length            1..1170
                      /length="1170"

It the new case, the awk code that you written can't really given "1170" Smilie
It gives "-" instead.
Code:
157412239 - 
239744030 40000 
239745142 747 
157412223 -

I just find out that some of my "/length="XXX"" is not appear immediately the next line after "length"

Thanks for advice.

---------- Post updated at 11:45 AM ---------- Previous update was at 11:41 AM ----------

Hi neutronscott,

I just find out that some of my "/length="XXX"" is not appear immediately the next line after "length" Smilie
I try with your awk code.
It can't work fine if the "/length="XXX"" is not shown at the next line after "length".
Thanks for further advice.
# 6  
Old 05-22-2012
can /length be shown on same line as length? i have no idea what data i am looking at, but it seems to use width to separate the features categories. so i assume we're in FEATURE length, until we reach a line preceeded by 5 or less spaces.

Code:
#!/usr/bin/awk -f
$1 == "VERSION" { ver=substr($3,4); len="-" }
$1 == "length" { l=1; next }
l&&match($0,/^[[:space:]]*/)&&(l=(RLENGTH>5)) { len=0+substr($1,10) }
$1 == "//" { print ver, len }

Code:
$ ./script input
157412239 1170
239744030 40000
239745142 747
157412223 -

Code:
awk -F'[ \t:"=]*' '$1=="VERSION"{v=$4;l="-"}$1=="//"{print v,l}$2=="length"{p=1;next}p&&match($0,/^[[:space:]]*/)&&(p=(RLENGTH>5)){l=$3}' input


Last edited by neutronscott; 05-22-2012 at 03:29 PM..
This User Gave Thanks to neutronscott For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract info from pings.?

Hi guys, new to this forum. I am currently trying to extract the times from pinging a domain and list the top 3 and then also do the opposite i.e. list the bottom 3. so if I had this as a ping result (the bold part is what I want): 64 bytes from 193.120.166.90: icmp_seq=10 ttl=128 time=34.8... (5 Replies)
Discussion started by: acoding
5 Replies

2. Shell Programming and Scripting

If condition help required

I have a if condition it checks its pid exist it means it is running, otherwise not running. I am checking with ps x=`ps -fu myuserid|grep java| |grep -v grep | awk '{print $2}'` if then ............ Above code is giving integer error, because currently process of java is... (4 Replies)
Discussion started by: learnbash
4 Replies

3. Solaris

prstat info required

Hi, I issued the following command on my solaris 10 and got the following info. prstat -Z PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 1344 oracle 4872M 2392M sleep 59 0 145:49:50 0.1% oracle/1 17523 oracle 3976K 3672K cpu18 59 0 0:00:04 0.1% prstat/1 1550 oracle 3276M... (1 Reply)
Discussion started by: malikshahid85
1 Replies

4. Solaris

Info required zones in solaris

Can anyone explain me what is the use of zones in solaris. Also how to find whether it is a local or global zone. (1 Reply)
Discussion started by: rogerben
1 Replies

5. Shell Programming and Scripting

Extract info from csv

I have some input file, which contains some lines which are comma separated. Eg. a,b,id=999],d d,f,g,id=345],x x,y,x,s,id=677],y I run a loop to read the lines one by one. What i want is to extract the value on the right of id=. I cannot do it by Awk, since the column number is not fixed.... (5 Replies)
Discussion started by: indianjassi
5 Replies

6. Shell Programming and Scripting

how to retrieve required specific info from the file

Hi I have a file which consists of a number in the square brackets, followed by the blank line, then several lines which describe this number. This pattern is repeated several thousands time. The number in the brackets and the decription of it is unique. For example: ASRVSERV=1241GD;... (2 Replies)
Discussion started by: aoussenko
2 Replies

7. UNIX for Advanced & Expert Users

Required info on Pstack on solaris10

Hi All , I am wanted to know the #of thread ( Kernal & User ) created by the Process i was looking into the prstat PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 994 httpd 126M 99M sleep 58 0 1:11.25 2.4% ns-httpd/33 997 httpd 114M 86M sleep 58 0 0:40.02 2.7% ns-httpd/22 995... (0 Replies)
Discussion started by: rajendra44
0 Replies

8. AIX

Extract info

Anyone have a better idea to automate extraction of info like ... "uname" "ifconfig" "ps efl" "netstat -ao" etc. from several hundred aix, solaris, red hat boxes? without logging into each box and manually performing these tasks and dumping them to individual files? thanks for any input (1 Reply)
Discussion started by: chm0dvii
1 Replies

9. Shell Programming and Scripting

End of loop condition required???

Hi i have a variable with lots of tokens seperated with spaces. e.g VAR="ABC DEF GHSD GHQS TUTSD JHDTQ QDHQ CDQKDGQ WQUTQD DQUTQD DQJGDQ QDTQD WDQUTQDU QDUGQD QDJGQD DQUTDUQ QDUIDTQ" i want to separate all of the above tokens and call a script with each of the token e.g sh script.sh <TOKEN>... (4 Replies)
Discussion started by: skyineyes
4 Replies

10. UNIX for Dummies Questions & Answers

using cut to extract info

a simple question, how can i use cut (after using grep) to extract the last four digits on a line. so say i had a string http://blabla:9020, how would I extract the port?? -Fez (4 Replies)
Discussion started by: hafhaq
4 Replies
Login or Register to Ask a Question