Matching the entries and printing data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching the entries and printing data
# 1  
Old 09-18-2012
Matching the entries and printing data

Hi all,

I have a file with Id which I want to compare it with other file to get the sequence of a particular id.

File 1 with ID
Code:
Q7L8J4
Q676U5
Q8NAA4
Q5TYW2
Q5SQ80
Q5VUR7
Q4UJ75
Q96IX9
Q7Z4T9
Q6NTF7
Q8IZP0
Q9NYB9
Q9P2A4
O14639
Q9ULW3
Q969K4
Q15057
Q5T8D3
Q8N7X0
Q9Y2D8
Q8TED9
Q8N4X5

File 2 with sequence information
Code:
>sp|Q7L8J4|3BP5L_HUMAN SH3 domain-binding protein 5-like OS=Homo sapiens GN=SH3BP5L PE=1 SV=1
MAELRQVPGGRETPQGELRPEVVEDEVPRSPVAEEPGGGGSSSSEAKLSPREEEELDPRI
QEELEHLNQASEEINQVELQLDEARTTYRRILQESARKLNTQGSHLGSCIEKARPYYEAR
RLAKEAQQETQKAALRYERAVSMHNAAREMVFVAEQGVMADKNRLDPTWQEMLNHATCKV
NEAEEERLRGEREHQRVTRLCQQAEARVQALQKTLRRAIGKSRPYFELKAQFSQILEEHK
AKVTELEQQVAQAKTRYSVALRNLEQISEQIHARRRGGLPPHPLGPRRSSPVGAEAGPED
MEDGDSGIEGAEGAGLEEGSSLGPGPAPDTDTLSLLSLRTVASDLQKCDSVEHLRGLSDH
VSLDGQELGTRSGGRRGSDGGARGGRHQRSVSL
>sp|Q676U5|A16L1_HUMAN Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2
MSSGLRAADFPRWKRHISEQLRRRDRLQRQAFEEIILQYNKLLEKSDLHSVLAQKLQAEK
HDVPNRHEISPGHDGTWNDNQLQEMAQLRIKHQEELTELHKKRGELAQLVIDLNNQMQRK
DREMQMNEAKIAECLQTISDLETECLDLRTKLCDLERANQTLKDEYDALQITFTALEGKL
RKTTEENQELVTRWMAEKAQEANRLNAENEKDSRRRQARLQKELAEAAKEPLPVEQDDDI
EVIVDETSDHTEETSPVRAISRAATKRLSQPAGGLLDSITNIFGRRSVSSFPVPQDNVDT
HPGSGKEVRVPATALCVFDAHDGEVNAVQFSPGSRLLATGGMDRRVKLWEVFGEKCEFKG
SLSGSNAGITSIEFDSAGSYLLAASNDFASRIWTVDDYRLRHTLTGHSGKVLSAKFLLDN
ARIVSGSHDRTLKLWDLRSKVCIKTVFAGSSCNDIVCTEQCVMSGHFDKKIRFWDIRSES
IVREMELLGKITALDLNPERTELLSCSRDDLLKVIDLRTNAIKQTFSAPGFKCGSDWTRV
VFSPDGSYVAAGSAEGSLYIWSVLTGKVEKVLSKQHSSSINAVAWSPSGSHVVSVDKGCK
AVLWAQY
>sp|Q8NAA4|A16L2_HUMAN Autophagy-related protein 16-2 OS=Homo sapiens GN=ATG16L2 PE=2 SV=2
MAGPGVPGAPAARWKRHIVRQLRLRDRTQKALFLELVPAYNHLLEKAELLDKFSKKLQPE
PNSVTPTTHQGPWEESELDSDQVPSLVALRVKWQEEEEGLRLVCGEMAYQVVEKGAALGT
LESELQQRQSRLAALEARVAQLREARAQQAQQVEEWRAQNAVQRAAYEALRAHVGLREAA
LRRLQEEARDLLERLVQRKARAAAERNLRNERRERAKQARVSQELKKAAKRTVSISEGPD
TLGDGMRERRETLALAPEPEPLEKEACEKWKRPFRSASATSLTLSHCVDVVKGLLDFKKR
RGHSIGGAPEQRYQIIPVCVAARLPTRAQDVLDAHLSEVNAVRFGPNSSLLATGGADRLI
HLWNVVGSRLEANQTLEGAGGSITSVDFDPSGYQVLAATYNQAAQLWKVGEAQSKETLSG
HKDKVTAAKFKLTRHQAVTGSRDRTVKEWDLGRAYCSRTINVLSYCNDVVCGDHIIISGH
NDQKIRFWDSRGPHCTQVIPVQGRVTSLSLSHDQLHLLSCSRDNTLKVIDLRVSNIRQVF
RADGFKCGSDWTKAVFSPDRSYALAGS

I want these two files to be compared by comapring ID in the first file with the ID encoded between the pipeline "||" in the second file. If it is same then print the complete sequence.

For example, Q676U5 is found in first file and in the second file. So in the output I should have something like this given below
Expected output
Code:
>sp|Q676U5|A16L1_HUMAN Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2
MSSGLRAADFPRWKRHISEQLRRRDRLQRQAFEEIILQYNKLLEKSDLHSVLAQKLQAEK
HDVPNRHEISPGHDGTWNDNQLQEMAQLRIKHQEELTELHKKRGELAQLVIDLNNQMQRK
DREMQMNEAKIAECLQTISDLETECLDLRTKLCDLERANQTLKDEYDALQITFTALEGKL
RKTTEENQELVTRWMAEKAQEANRLNAENEKDSRRRQARLQKELAEAAKEPLPVEQDDDI
EVIVDETSDHTEETSPVRAISRAATKRLSQPAGGLLDSITNIFGRRSVSSFPVPQDNVDT
HPGSGKEVRVPATALCVFDAHDGEVNAVQFSPGSRLLATGGMDRRVKLWEVFGEKCEFKG
SLSGSNAGITSIEFDSAGSYLLAASNDFASRIWTVDDYRLRHTLTGHSGKVLSAKFLLDN
ARIVSGSHDRTLKLWDLRSKVCIKTVFAGSSCNDIVCTEQCVMSGHFDKKIRFWDIRSES
IVREMELLGKITALDLNPERTELLSCSRDDLLKVIDLRTNAIKQTFSAPGFKCGSDWTRV
VFSPDGSYVAAGSAEGSLYIWSVLTGKVEKVLSKQHSSSINAVAWSPSGSHVVSVDKGCK
AVLWAQY

Thanks in advance
# 2  
Old 09-18-2012
I assume you want any entry from the second file to print if it's ID is in the first. This might do it:

Code:
awk -F "|" '
    NR == FNR { idx[$1] = 1; next; }
    /^>sp/ { snarf = $2 in idx  }
    snarf
' file1 file2 >output-file

# 3  
Old 09-19-2012
Hi,

It does work but it prints me other records which are not in the file 1 with IDs. These records which are not in file 1 are printed after the last entry in file 1. Why does this happen?
# 4  
Old 09-19-2012
try the below code put it in some script like test.sh
give it the permissions to execute and then

run it

Code:
for i in `cat pathto file1`
do
grep $i file2 >> outputfile
done


thanks
# 5  
Old 09-19-2012
Quote:
Originally Posted by manigrover
Hi,

It does work but it prints me other records which are not in the file 1 with IDs. These records which are not in file 1 are printed after the last entry in file 1. Why does this happen?
Unsure. I tested my code with a file that contained entries that weren't matched in file 1 and never got anything but what was matched. Only thing I can think of is it's an awk version issue. Do you get the same results with this:


Code:
awk -F "|" '
    NR == FNR { idx[$1] = 1; next; }
    /^>sp/ {  snarf = idx[$2]+0; }
    snarf
' file1 file2 >output-file

# 6  
Old 09-20-2012
Are the data lines split or have you wrapped the output in you demo file2?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Continued trouble matching fields in different files and selective field printing ([g]awk)

I apologize in advance, but I continue to have trouble searching for matches between two files and then printing portions of each to output in awk and would very much appreciate some help. I have data as follows: File1 PS012,002 PRQ 0 1 1 17 1 0 -1 3 2 1 2 -1 ... (7 Replies)
Discussion started by: jvoot
7 Replies

2. Shell Programming and Scripting

Need help on pattern matching and printing the same

Hi, I need to match for the pattern '.py' in my file and print the word which contains. For example: cat testfile a b 3 4.py 5 6 a b.py c.py 4 5 6 7 8 1.py 2.py 3 4 5 6 Expected output: 4.py b.py c.py 1.py 2.py (3 Replies)
Discussion started by: Sumanthsv
3 Replies

3. Shell Programming and Scripting

Comparing same column from two files, printing whole row with matching values

First I'd like to apologize if I opened a thread which is already open somewhere. I did a bit of searching but could quite find what I was looking for, so I will try to explaing what I need. I'm writing a script on our server, got to a point where I have two files with results. Example: File1... (6 Replies)
Discussion started by: mitabrev83
6 Replies

4. Shell Programming and Scripting

UNIX awk pattern matching and printing lines

I have the below plain text file where i have some result, in order to mail that result in html table format I have written the below script and its working well. cat result.txt Page 2015-01-01 2000 Colors 2015-02-01 3000 Landing 2015-03-02 4000 #!/bin/sh LOG=/tmp/maillog.txt... (1 Reply)
Discussion started by: close2jay
1 Replies

5. Shell Programming and Scripting

Matching and printing line with awk

Hi there, I'm trying to use awk to print out the entire line that contains a match to a certain regex and then append some text,plus the match to the end of the line. So far I have: awk -F: '{print "RG:Z:" $2}' file Which prints out the match I want plus the additional text, but I'm stuck... (3 Replies)
Discussion started by: jim_lad
3 Replies

6. Shell Programming and Scripting

Help With AWK Matching and Re-printing Lines

Hi All, I'm looking to use AWK to pattern match lines in XML file - Example patten for below sample would be /^<apple>/ The sample I wrote out is very basic compared to what I am actually working with but it will get me started I would like to keep the matched line(s) unchanged but have them... (4 Replies)
Discussion started by: rhoderidge
4 Replies

7. Shell Programming and Scripting

fgrep not printing non matching lines

I'm using this: fgrep -f file1.txt file2.txt To find lines in file1 that match patterns found in file2. When I add -v: egrep -v -f file1.txt file2.txt It won't return non matching lines, I just get a blank. Can anyone help? PS. file1.txt contains 3 million lines...each string... (2 Replies)
Discussion started by: Nonito84
2 Replies

8. Shell Programming and Scripting

Printing entire field, if at least one row is matching by AWK

Dear all, I have been trying to print an entire field, if the first line of the field is matching. For example, my input looks something like this. aaa ddd zzz 123 987 126 24 0.650 985 354 9864 0.32 0.333 4324 000 I am looking for a pattern,... (5 Replies)
Discussion started by: Chulamakuri
5 Replies

9. Shell Programming and Scripting

Pattern Matching and printing

Dear All, I have a log file like below 13:26:31 |152.22 13:27:31 |154.25 13:28:31 |154.78 13:29:31 |151.23 13:30:31 |145.63 13:31:31 |142.10 13:32:31 |145.45 where values will be there from 00:00 hrs to 23:59 hrs. I'm matching for last occurance of 23:59 and printing 1440 lines (grep... (4 Replies)
Discussion started by: Naga06
4 Replies

10. UNIX for Dummies Questions & Answers

Pattern matching and Printing Filename

Hi, My requirement is to search for a paritcular string from a group of .gz files and to print the lines containing that string and the name of the files in which that string is present. Daily 500 odd .gz files will be generated in a directory(directory name will be in the form of... (4 Replies)
Discussion started by: krao
4 Replies
Login or Register to Ask a Question