How to arange records in a particular order?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to arange records in a particular order?
# 1  
Old 04-06-2009
Bug How to arange records in a particular order?

Hi guys,
I have a problem please help if you have any solutions.

I have two files. FILE1 having records separated by '>'

FILE1
>LOG_Ps04g30040.1|12004.m08110|test lc-like prot, test1
MGASPSREEAHSNSSFSGNGKAMAVASSASSSGSNQAQSKRAPALHMFQEIVAEKDFTAS
LPKQ*
>ab|22329085|xyz|PP_194957.2| (CALRE);on biding [liana]
MGLPQNKLSFFCFFFLVSVLTLAPLAFSEIFLEEHFEGGWKSRWVLSDWKRNEGKAGTFKHTAGKWPGD
DNKGIQTYNDAKHYAISAKIPEFSNKNRTLVVQYSVKIEQDIECGGAYIKLLSGYVNQKQFGGDTPYSL
HDEL
>pnl_A5C9U1_VITVI A7Q0M0 chr scaffold_42,shotgun
MKLLSGEVDQKKFGGDTPYSIMFGPDICGYSTKKVHAIFSYQGSNHLIKKDVPCETDQLT
HVYTFVLRPDATYSILIDNVEKQSGSLYSDWDILPPKQIKDAKAKKPEAWDDKEYIPDPE
WKAPMIDNPDFKDDPDFFIYPHLKYVGIELWQVKSG*TMFDNILVCDDPDYAKKLAEETW
KHKGAEKTAFEEQEKKREEEESKDDPDDSDVSSIKSLPGFH
>Glyta16g22680.1
MSFEDSSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRK
KVKSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRKTIK
NAKKPEDWDDREYIDDPNDVKPEGFDSIPREIPDRKAKEPEDWDEEENGLWEPPKIPNSA
>cgi|Poptr1_1|555048|eu3.00031681 on binding [test2]
MGLPQNKLSFFCFFFLVSVLTLAPLAFSEIFLEEHFEGGWKSRWVLSDWKRNEGKAGTFKHTAGKWPGD
DNKGIQTYNDAKHYAISAKIPEFSNKNRTLVVQYSVKIEQDI

I want to arrange the records in the same order as they are in FILE2. It only contains the text before the first space in the line containing '>'.

FILE2
cgi|Poptr1_1|555048|eu3.00031681
pnl_A5C9U1_VITVI
ab|22329085|xyz|PP_194957.2|
Glyta16g22680.1
LOG_Ps04g30040.1|12004.m08110|test

The OUTPUT should be like this -

>cgi|Poptr1_1|555048|eu3.00031681 on binding [test2]
MGLPQNKLSFFCFFFLVSVLTLAPLAFSEIFLEEHFEGGWKSRWVLSDWKRNEGKAGTFKHTAGKWPGD
DNKGIQTYNDAKHYAISAKIPEFSNKNRTLVVQYSVKIEQDI
>pnl_A5C9U1_VITVI A7Q0M0 chr scaffold_42,shotgun
MKLLSGEVDQKKFGGDTPYSIMFGPDICGYSTKKVHAIFSYQGSNHLIKKDVPCETDQLT
HVYTFVLRPDATYSILIDNVEKQSGSLYSDWDILPPKQIKDAKAKKPEAWDDKEYIPDPE
WKAPMIDNPDFKDDPDFFIYPHLKYVGIELWQVKSG*TMFDNILVCDDPDYAKKLAEETW
KHKGAEKTAFEEQEKKREEEESKDDPDDSDVSSIKSLPGFH
>ab|22329085|xyz|PP_194957.2|(CALRE);on biding [liana]
MGLPQNKLSFFCFFFLVSVLTLAPLAFSEIFLEEHFEGGWKSRWVLSDWKRNEGKAGTFKHTAGKWPGD
DNKGIQTYNDAKHYAISAKIPEFSNKNRTLVVQYSVKIEQDIECGGAYIKLLSGYVNQKQFGGDTPYSL
HDEL
>Glyta16g22680.1
MSFEDSSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRK
KVKSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRKTIK
NAKKPEDWDDREYIDDPNDVKPEGFDSIPREIPDRKAKEPEDWDEEENGLWEPPKIPNSA
>LOG_Ps04g30040.1|12004.m08110|test lc-like prot, test1
MGASPSREEAHSNSSFSGNGKAMAVASSASSSGSNQAQSKRAPALHMFQEIVAEKDFTAS
LPKQ*

Thanks Smilie
# 2  
Old 04-06-2009
Code:
awk -v ORS= 'NR==FNR{a[$1]=$0;next}$0=">"a[$0]' RS='>' file1 RS='\n' file2

# 3  
Old 04-06-2009
Thanks Rubin, there is a small error in the output.

The output I am getting is -
>cgi|Poptr1_1|555048|eu3.00031681 on binding [test2]
MGLPQNKLSFFCFFFLVSVLTLAPLAFSEIFLEEHFEGGWKSRWVLSDWKRNEGKAGTFKHTAGKWPGD
DNKGIQTYNDAKHYAISAKIPEFSNKNRTLVVQYSVKIEQDI


>pnl_A5C9U1_VITVI A7Q0M0 chr scaffold_42,shotgun
MKLLSGEVDQKKFGGDTPYSIMFGPDICGYSTKKVHAIFSYQGSNHLIKKDVPCETDQLT
HVYTFVLRPDATYSILIDNVEKQSGSLYSDWDILPPKQIKDAKAKKPEAWDDKEYIPDPE
WKAPMIDNPDFKDDPDFFIYPHLKYVGIELWQVKSG*TMFDNILVCDDPDYAKKLAEETW
KHKGAEKTAFEEQEKKREEEESKDDPDDSDVSSIKSLPGFH
>>Glyta16g22680.1
MSFEDSSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRK
KVKSKKHQSVGPWGGNGGSRWDDGIYSGVRQLVIVHGTGIDSIQIEYDKKGSSIWSEKHGGSGGRKTIK
NAKKPEDWDDREYIDDPNDVKPEGFDSIPREIPDRKAKEPEDWDEEENGLWEPPKIPNSA
>LOG_Ps04g30040.1|12004.m08110|test lc-like prot, test1
MGASPSREEAHSNSSFSGNGKAMAVASSASSSGSNQAQSKRAPALHMFQEIVAEKDFTAS
LPKQ*

one entry is missing and there is a new line between first and second record and also there is an extra '>' in the third entry.

Thanks again and I'll be happy if u can explain the code specially what " -v ORS= " does.
# 4  
Old 04-06-2009
The code works fine, with the same output as your first post, either with old awk or gawk - tested. If it works for some entries, it should work for all them.
I'm assuming the files' formation looks exactly as your samples, but I doubt that file2 is not an exact copy of the first field of file1, at least for some entries, maybe it has extra characters, doublecheck to make sure.
-v ORS= set output record separator to the null string ( default is a newline, ORS="\n" ) .
# 5  
Old 04-07-2009
Thanks, Ya there was some difference in characters but still an empty line is introduced after the first record in the output file.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Want to grep records in alphabetical order from a file and split into other files

Hi All, I have one file containing thousands of table names in single column. Now I want that file split into multiple files e.g one file containing table names starting from A, other containing all tables starting from B...and so on..till Z. I tried below but it did not work. for i in... (6 Replies)
Discussion started by: shekhar_4_u
6 Replies

2. Shell Programming and Scripting

Separate records of a file on 2 types of records

Hi I am new to shell programming in unix Please if I can provide help. I have a file structure of a header record and "N" detail records. The header record will be the total number of detail records I need to split the file in 2: One for the header Another for all detail records Could... (1 Reply)
Discussion started by: jamcogar
1 Replies

3. Shell Programming and Scripting

Delete records in reverse order

Hi all, i have dynamic file 'xyz.txt', records always look likes below format ... 0000021 RET 31-MAR-1984 FAP 0000021 DTA 14-JAN-2003 CNV 0000021 DTA 25-MAR-2012 DTA 0000021 DTA 26-MAR-2012 DTA ################################################# 0000021 DTA ... (4 Replies)
Discussion started by: krupasindhu18
4 Replies

4. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. Shell Programming and Scripting

Split records into multiple records

Hi All, I am trying to split a record into multiple records based on a value. Input.txt "A",1,0,10 "B",2,0,10,15,20 "C",3,11,14,16,19,21,23 "D",1,0,5 My desired output is: "A",1,0,10 "B",2,0,10 "B",2,15,20 "C",3,11,14 "C",3,16,19 "C",3,21,23 (4 Replies)
Discussion started by: kmsekhar
4 Replies

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

8. Shell Programming and Scripting

diff 2 files > file3, but records in various order

What I really need is a script that compares 2 (.csv) text files line by line with a single entries on each line and then outputs NON-duplicate lines to a third (.csv) text file, the problem is the lines may be exactly the same, but in different order in the 2 text files, so sourcefile1... (11 Replies)
Discussion started by: unclecameron
11 Replies

9. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

10. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies
Login or Register to Ask a Question