Help with data re-arrangement problem facing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with data re-arrangement problem facing
# 8  
Old 12-13-2011
Try this...
Code:
awk -F"[><]" '/name/{if(x){sub("/$","",x);print x;x=""}x=$3"\t"}/symbol/{x=x $3"/"}END{sub("/$","",x);print x}'
 input_file

--ahamed

---------- Post updated at 07:34 PM ---------- Previous update was at 07:30 PM ----------

Quote:
Originally Posted by cpp_beginner
Thanks ahamed.
Do you mind to explain a little bit more about your code?
I'm quite confusing about following code:
Code:
'/symbol/{x=x?x"/"$2:$2}/name/{print x"\t"$2;x=""}'

Usage of ternary operator, to avoid adding "/" at the beginning. Add "/" only if x is not empty!

Something like this...
Code:
if(x is not null)
{
  x = x "/" value
}
else
{
  x = value
}

HTH
--ahamed

Last edited by ahamed101; 12-14-2011 at 06:17 AM..
This User Gave Thanks to ahamed101 For This Post:
# 9  
Old 12-14-2011
hi ahamed,

I just found one bug by using your awk code:
Code:
<name>A2ASPNC</name>
<symbol>Remark:_alternate_name_YHR211W</symbol>

<name>9STRA</name>
<symbol>Unnamed_product</symbol>

By using your awk code, it will give the following result:
Code:
A2ASPNC	
Remark:_alternate_name_YHR211W	Remark:_alternate_name_YHR211W
9STRA	
Unnamed_product	Unnamed_product

Ideally, it should shown the result like this:
Code:
A2ASPNC	  Remark:_alternate_name_YHR211W	
9STRA   Unnamed_product

Do you have any idea to solve this bug?
I found out that the error happen is due to got the content of "name" in between "<symbol>" and "</symbol>"
Many thanks ya.
# 10  
Old 12-14-2011
Try this...
Code:
awk -F"[><]" '/<name>/{if(x){sub("/$","",x);print x;x=""}x=$3"\t"}
/<symbol>/{x=x $3"/"}END{sub("/$","",x);print x}' input_file

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 11  
Old 12-15-2011
Thanks again ahamed, your awk code worked fine.
I'm facing another problem when my data is look like this:
Code:
__<tmp>SAST</tmp>
______<Reference_id="92320298"_key="4"_type="PAPER"/>
______<Reference_id="1621096"_key="5"_type="TEDT"/>
____</citation>
____<scope>SEQUENCE</scope>
__</reference>
__<Reference_id="Q9UWM9"_key="6"_type="ModelPortal"/>
__<Reference_id="AO:0005525"_key="7"_type="Go">
____<property_type="term"_value="F:GTP_binding"/>
____<property_type="evidence"_value="IEA:InterPro"/>
__</Reference>

__<tmp>G3FH</tmp>
____<scope>Sample</scope>
__</reference>
__<Reference_evidence="2"_id="JF460418"_key="3"_type="EMBL">
____<property_type="protein_sequence_ID"_value="AEN93129.1"/>
____<property_type="molecule_type"_value="Genomic_DNA"/>
__</Reference>

__<tmp>STAAD</tmp>
______<Reference_id="92320298"_key="4"_type="PAPER"/>
______<Reference_id="1621096"_key="5"_type="TEDT"/>
____</citation>
____<scope>SEQUENCE</scope>
__</reference>
__<Reference_id="AO:0005525"_key="7"_type="Go">
____<property_type="term"_value="F:TMP_binding"/>
__</Reference>

Desired output:
Code:
SAST AO:0005525 F:GTP_binding
G3FH - -
STAAD AO:0005525 F:TMP_binding

I just wanna extract the info in between "__<tmp>" and "</tmp>" to represent the first column data in output file;
Column 2 in output file is those content in "__<Reference_id="" when "AO:XXXXXXX" is detected. If not, just use a "-" to represent it;
Column 3 in output file is extract from those data that one line after "AO:XXXXXX" and only extract out the info in between "term"_value=" and "">"

Many thanks for advice.
# 12  
Old 12-15-2011
what about __<Reference_id="Q9UWM9"_key="6"_type="ModelPortal"/>? that also has __<Reference_id="

--ahamed
# 13  
Old 12-15-2011
Hi ahamed,

I would only like to extract those info that have "__<Reference_id=" and the content must have "AO:XXXXX" as well.
Thus "__<Reference_id="Q9UWM9"_key="6"_type="ModelPortal"/> ? that also has __<Reference_id="" is not include because it don't have "AO:XXXXXX".
Thanks ya.
# 14  
Old 12-16-2011
Try this...
Code:
awk -F'[><"]' '/<tmp>/{if(!x&&y){printf "- -"}x=0;printf"\n"}
/<tmp>/{y=9;printf $3 OFS}/AO:/{x=1;printf $3 OFS;getline;printf $5 OFS}
END{printf"\n"}' input_file

--ahamed
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk facing delimiter inside data

Inpu file is as below: CMEOPT1_dump.1:1002 ZN:VTJ3J3C131 CMEOPT1_dump.1:1002 ZN:VTM4M4P123%5 CMEOPT1_dump.1:1002 ZN:VTM3M3P132%5 CMEOPT1_dump.2:1002 OZNG4 CMEOPT2_dump.3:1002 ZB:VTH4H4C132 CMEOPT2_dump.4:1002 ZN:VTK4K4P123 CMEOPT2_dump.5:1002 ZN:BOZ2Z2Z2P131%5 CMEOPT2_dump.5:1002 OZNG4 ... (10 Replies)
Discussion started by: zaq1xsw2
10 Replies

2. Shell Programming and Scripting

Retreive data with arrangement

Hi all I have following part of a big file TTDS00002 Synonyms M1 receptor TTDS00002 Disease Alzheimer's disease TTDS00002 Disease Bronchospasm (histamine induced) TTDS00002 Disease Cognitive deficits TTDS00002 Disease Schizophrenia TTDS00002 Function The muscarinic acetylcholine... (2 Replies)
Discussion started by: kareena
2 Replies

3. UNIX for Advanced & Expert Users

Data re-arrangement

Hi I have a huge problem to solve ASAP. Can someone please help!!! My format is arranged in this format: It has three columns. LOGIN ALIAS REC_ID A BB1 0 A ... (1 Reply)
Discussion started by: Mapilo
1 Replies

4. AIX

facing problem using su

Hi, I am able to login using su - or su directly , # prompt is coming, it doesnt ask for password. any normal user on aix system is login using su - or su . Please suggest where to change the configuration direct root login is disabled in /etc/ssh/sshd_config file. (0 Replies)
Discussion started by: manoj.solaris
0 Replies

5. Shell Programming and Scripting

Help to data re-arrangement problem

Input file <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ (2 Replies)
Discussion started by: cpp_beginner
2 Replies

6. Shell Programming and Scripting

Manipulate data in detail problem facing

Input Participant number: HAC Position type Location Distance_start Distance_end Range Mark 1 1 + Front 808 1083 276 2 1 + Front 1373 1636 264 3 1 - Back 1837 2047 211 Participant number: BCD Position type... (6 Replies)
Discussion started by: patrick87
6 Replies

7. Shell Programming and Scripting

Re-arrangement of data

Dear Frineds, I have a flat file as follows ABCD ABDCWQE POIERAS ADSGASGFG GHJKFHD XBDFGHFGDH POIU IJPFG AFGJFPGOU A;DGUPGU SFSDFDSDFHDSF SDFGHSFDH I want this column to be converted into row like follows ABCD, ABDCWQE, POIERAS, ADSGASGFG, GHJKFHD, XBDFGHFGDH (6 Replies)
Discussion started by: anushree.a
6 Replies

8. Shell Programming and Scripting

sorting/arrangement problem

Hi, I have the following 'sorting' problem. Given the input file: a:b:c:12:x:k s:m:d:8:z:m a:b:c:1:x:k p:q:r:23:y:m a:b:c:3:x:k p:q:r:1:y:m the output I expect is: a:b:c:1:x:k p:q:r:1:y:m s:m:d:8:z:m What happened here is I grouped together lines having the same values for... (7 Replies)
Discussion started by: Abhishek Ghose
7 Replies

9. UNIX for Dummies Questions & Answers

Data arrangement

10 2 1 2 3 4 5 6 7 8 20 3 2 1 3 2 9 8 2 1 Need the data to be arranged: 10 2 1 5 2 6 3 7 4 8 20 3 2 1 1 2 3 8 2 9 please help! (6 Replies)
Discussion started by: bobo
6 Replies

10. UNIX for Dummies Questions & Answers

Data arrangement

I have these following data: Home Tom Member List 100 Yes 200 No Home Tom Member List 1 No 2 Yes Home Tome Member List 3 No 400 Yes I want my data to be consistants like this: (4 Replies)
Discussion started by: bobo
4 Replies
Login or Register to Ask a Question