Matching two file contents and extracting associated information


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching two file contents and extracting associated information
# 1  
Old 07-02-2010
Matching two file contents and extracting associated information

Hi,
I am new to shell programming and need help. I have File1 with some ID numbers and File2 with ID number and some associated information.

I want to match the ID numbers from File1 to contents in File2 and output a third file which pulls out the ID numbers and the associated information with the match.


For example

cat File1
Code:
 
pc00123
pc345
pc1255

cat File2
Code:
>sequence 1a, prod, (pc00123)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFEEGTRFSSMFGFFVQAIVTGKGP
>sequence 45e, padam, (pc00123;pc345;pc3213)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFSMFGFFVQAIVTGKGPABBBGAAAFF
AKGMLMOIHRGNBGBSSSVFGHDSF
>sequence 332, paadat, (pc555;pc10623)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG

I want to match the ID numbers from File1 with File2 and output not only the lines that match but also get the associated information which ends before “>sequence”.

The needed output is :

Code:
>sequence 1a, prod, (pc00123)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFEEGTRFSSMFGFFVQAIVTGKGP
>sequence 45e, padam, (pc00123;pc345;pc3213)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFSMFGFFVQAIVTGKGPABBBGAAAFF
AKGMLMOIHRGNBGBSSSVFGHDSF

It will be very helpful if you can suggest how to do this. Thanks

Last edited by vgersh99; 07-02-2010 at 12:17 PM.. Reason: code tags, please!
# 2  
Old 07-02-2010
nawk -f new.awk file1.txt file2.txt

new.awk:
Code:
BEGIN {
  FS="[();]"
}
FNR==NR {f1[$0];next}
/^>sequence/{
  for(i=2; i<NF;i++)
    if ($i in f1) {f++;print; next}
  f=0
  next
}
f

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 07-02-2010
Thanks a lot. It works nicely..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting information using awk

I want to write a script that extracts a value from a line of text. I know it can be done using awk but I've never used awk before so I don't know how to do it. The text is: Mem: 100M Active, 2150K Cache, 500M Buf, 10G Free I want to extract the free memory value to use as a variable. In... (5 Replies)
Discussion started by: millsy5
5 Replies

2. Shell Programming and Scripting

Problems extracting some information

Hi there! Well, I'm writing a script to obtain certain information about files. Specifically, I want to get the information about those files which last access were in the last 24 hours, so I'm doing something like this: find <directory_name> -atime -1 -printf '%f %a\n' I would also... (4 Replies)
Discussion started by: Skirmish
4 Replies

3. Shell Programming and Scripting

extracting information from multiple files

Hello there, I am trying to extract (string) information ( a list words) from 4 files and then put the results into 1 file. Currently I am doing this using grep -f list.txt file1 . and repeat the process for the other 3 files. The reasons i am doing that (a) I do know how to code (b) each file... (4 Replies)
Discussion started by: houkto
4 Replies

4. Shell Programming and Scripting

reading a file extracting information writing to a file

Hi I am trying to extract information out of a file but keep getting grep cant open errors the code is below: #bash #extract orders with blank address details # # obtain the current date # set today to the current date ccyymmdd format today=`date +%c%m%d | cut -c24-31` echo... (8 Replies)
Discussion started by: Bruble
8 Replies

5. Shell Programming and Scripting

I want to delete the contents of a file which are matching with contents of other file

Hi, I want to delete the contents of a file which are matching with contents of other file in shell scripting. Ex. file1 sheel,sumit,1,2,3,4,5,6,7,8 sumit,rana,2,3,4,5,6,7,8,9 grade,pass,2,3,4,5,6,232,1,1 name,sur,33,1,4,12,3,5,6,8 sheel,pass,2,3,4,5,6,232,1,1 File2... (3 Replies)
Discussion started by: ranasheel2000
3 Replies

6. Shell Programming and Scripting

How to view the contents of .gz file without extracting the file?

Hi All, I have several .gz files and i need to see the contents of these file, without extracting these file. If i extract these file the space will be full so. I need to see the contents and parse the contents to a script to extract data from these. Please let me know if you need any more... (10 Replies)
Discussion started by: amitkhiare
10 Replies

7. Shell Programming and Scripting

Problems with extracting information

Hi all, <select name="comp" id="comp" style="width:130px;"> <?php $sqlcomp = mysql_query("SELECT * FROM comp"); while ($redcomp = mysql_fetch_array($sqlcomp)) { extract($redcomp); echo "<option value=\"$comp_id\">comp_name</option>"; } ?> ... (0 Replies)
Discussion started by: c0mrade
0 Replies

8. Shell Programming and Scripting

Extracting XML Tag Contents

Hi Jean I require your help in writing a shell script. Iam zero in Unix programming. I have a large file about 400 MB of data, which contains about 50000 XML messages seperated by a Tab, I think. I need to extract only 4 values from each XML message and write it onto a new file. Please help me... (2 Replies)
Discussion started by: pk_eee
2 Replies

9. UNIX for Dummies Questions & Answers

Extracting information from text fields.

Dear friends, I'm a novice Unix user and I'm trying to learn the ropes. I have a big task I have to accomplish and I'm convinced Unix can get the job done, I just haven't figured out how. I recently posted on the topic of cutting text between unique text patterns and somebody helped me a great... (24 Replies)
Discussion started by: spindoctor
24 Replies

10. Shell Programming and Scripting

Extracting information from a template

I have a template that I usually use to generate stats on an hourly basis for a number of cell sites altogether. I would like to be able to write a script that would go to the template and extract the information for any single site at any time during the day. For example, let's say that my... (4 Replies)
Discussion started by: Ernst
4 Replies
Login or Register to Ask a Question