comparing to text files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting comparing to text files
# 1  
Old 07-10-2010
comparing to text files

Hi All,

I have two files of the following formats

file 1 - this is a big file

Code:
>AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase [asd ert 456]
gdfjafhlkhlnlklaklskckcfhhahgfahajfkkallalfafafa
>AB_2 gi|229194404|ref|ZP_04321209.1|
gfksjgfkjsfjslfslfslhf
>AB_3 gi|229194405|ref|ZP_04321210.1|
gjksdfhkjshfvlshlshl
>AB_4 gi|229194406|ref|ZP_04321211.1| alkylphosphonate uptake protein [Bder ce 33L]
fjhfjhfhjfHDHDjhghfdghdfgkffg
>AB_5 gi|229194407|ref|ZP_04321212.1| hypothetical protein pE33L466_0459 [Badfr cereus 33L]
hghcGhcGchGchkGcjcjxgxjgcjxgcjxx

file 2

Code:
>AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase [asd ert 456]
atggcgtgatgcgatgtgcath
>AB_2 gi|229194404|ref|ZP_04321209.1|
atgctagtcgatttgcaagttaaattt
>AB_4 gi|229194406|ref|ZP_04321211.1| alkylphosphonate uptake protein [Bder ce 33L]
atttttcccaaatgcaaagggccttggaaa

The headers (that begins with '>' are only common between file1 and file2 and also file2 will be smaller than file1

I would like to compare file2 to file1 and get an output file such a way that the headers in file2 matches with those headers in file1 and extract those characters (including the headers) into a new output file

such that the output file looks like

Code:
>AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase [asd ert 456]
gdfjafhlkhlnlklaklskckcfhhahgfahajfkkallalfafafa
>AB_2 gi|229194404|ref|ZP_04321209.1|
gfksjgfkjsfjslfslfslhf
>AB_4 gi|229194406|ref|ZP_04321211.1| alkylphosphonate uptake protein [Bder ce 33L]
fjhfjhfhjfHDHDjhghfdghdfgkffg

Please let me know the best way to do this comparison using awk.

cheers
# 2  
Old 07-10-2010
Try:
Code:
awk 'NR==FNR&&/^>/{A[$1]=$0;next}/^>/{if(A[$1]){$0=A[$1];p=1}else p=0}p' file2 file1

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing columns in 2 text files

Hi i have 2 files file1.txt XX,ZZ,XC,EE,RR,BB XC,CF,FG,RG,GH,GH File2.txt DF,GH,MH,FR,FG,GH,NOTOK XX,ZZ,XC,EE,RR,BB,OK result XX,ZZ,XC,EE,RR,BB OK look for column1 , XX and if it matches in File2.txt , retrieve the 7 th field from File2 and print in 3 rd file , ... (9 Replies)
Discussion started by: Shyam_84
9 Replies

2. Shell Programming and Scripting

Need Help in comparing 2 text files in shell script

Hi All, I have 2 files like below vi f1 frog elephant rabit zebra dog vi f2 rabit dog ============== Now i want to comapre two files and the result will be frog (8 Replies)
Discussion started by: kumar85shiv
8 Replies

3. UNIX for Dummies Questions & Answers

Comparing and merging two text files

Hey everybody, I am new here and already a question to ask, I just recently started some bioinformatic work for my PhD so I am slowly learning Anyway, here is my problem, I have two text files, one contains the complete data file with 43000 genes and their read counts for all my samples... (1 Reply)
Discussion started by: ant55
1 Replies

4. Shell Programming and Scripting

Basic question on comparing two text files

RHEL 5.4 (Korn shell) I have two files First file $ cat myfile_1.txt JOHN KEITH CHANG JUDE CHRISTINE KRISHNA AHMED ULRICH JESSICA-- Second file has 3 names missing (AHMED, ULRICH, JESSICA ) $ cat myfile_2.txt JOHN KEITH CHANG JUDE CHRISTINE (4 Replies)
Discussion started by: John K
4 Replies

5. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

6. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

7. AIX

comparing within text files

hi! some looping problem here... i have a 2-column text file 4835021 20060903FAL0132006 4835021 20060904FAL0132006 4835021 20060905FAL0132006 4835023 20060903FAL0132006 4835023 20061001HAL0132006 4835023 ... (3 Replies)
Discussion started by: d3ck_tm
3 Replies

8. UNIX for Dummies Questions & Answers

Ignoring Some Text When Comparing 2 Files

Hi Guys, How can I compare 2 text files and show the differences between the 2 files but have some of the text ignored? Example File1 File2 Thomas Thomass !1st name! David Davidd !2nd name! John John !3rd name! So,... (1 Reply)
Discussion started by: jimmyflip
1 Replies

9. UNIX for Dummies Questions & Answers

comparing text files

I am comparing text files where there are number of rows of numbers from window to unix box Is there any way of checking lets say 4 document of text file and seeing the difference only (or missing rows of numbers) with simple commands with lets say a batch file FROM ABSOULTE... (2 Replies)
Discussion started by: sjumma
2 Replies
Login or Register to Ask a Question