Match values/IDs from column and text files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Match values/IDs from column and text files
# 1  
Old 02-23-2012
Question Match values/IDs from column and text files

Hello,

I am trying to modify 2 files, to yield results in a 3rd file.
File-1 is a 8-columned file, separted with tab.

Code:
  
1234:1 xyz1234 blah blah blah blah blah blah
1234:1 xyz1233 blah blah blah blah blah blah
1234:1 abc1234 blah blah blah blah blah blah
n/a RRR0000 blah blah blah blah blah blah
n/a RRR0000 blah blah blah blah blah blah
9876:2 htg234 blah blah blah blah blah blah
9876:2 dkj1234 blah blah blah blah blah blah
9876:2 htg234 blah blah blah blah blah blah
n/a QQQ0000 blah blah blah blah blah blah

File-2:
Code:
 
>1234:1 some_text_to_be_deleted
gtcgcatgcatcgactagcgagctacga
>9876:2 some_text_to_be_deleted
cgatcgatgctagctagctgggggccccaaaa
>RRR0000 some_text_to_be_deleted
gctagctagtcgatcgtagctacgatgctagctagtcg
>QQQ0000 some_text_to_be_deleted
cgaaaaagggaaattttaaaggggcggcgcgcg

My output should look like ("|" and ";"):

Code:
 
>1234:1 | ID1-1234:1 ID2-xyz1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-1234:1 ID2-xyz1233 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-1234:1 ID2-abc1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; 
gtcgcatgcatcgactagcgagctacga
>9876:2 | ID1-9876:2 ID2-2htg234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-9876:2 ID2-dkj1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-9876:2 Id2-htg234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah 
cgatcgatgctagctagctgggggccccaaaa
>RRR0000 | ID1-n/a ID2-RRR0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-n/a ID2-RRR0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
gctagctagtcgatcgtagctacgatgctagctagtcg
>QQQ0000 | ID1-n/a ID2-QQQ0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
cgaaaaagggaaattttaaaggggcggcgcgcg

Any help is appreciated...Thanks!

Last edited by ad23; 02-23-2012 at 06:12 PM..
# 2  
Old 02-23-2012
Code:
$ cat tabfile.awk

BEGIN {
        while(getline<TABFILE)
        {
                ID=$1
                if(ID == "n/a") ID=$2

                for(N=1; N<=2; N++)     $N="ID" N "-" $N;
                for(N=3; N<=NF; N++)    $N=sprintf("f%d-%s", N-2, $N);

                if(TAB[ID])     TAB[ID] = TAB[ID] "; " $0
                else            TAB[ID] = $0
        }
}

/>/ {
        ID=substr($1, 2);
        printf(">%s | %s\n", ID, TAB[ID]);
        next;
} 1

$  awk -v TABFILE=tabfile -f tabfile.awk data 
>1234:1 | ID1-1234:1 ID2-xyz1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-1234:1 ID2-xyz1233 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-1234:1 ID2-abc1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
gtcgcatgcatcgactagcgagctacga
>9876:2 | ID1-9876:2 ID2-htg234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-9876:2 ID2-dkj1234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-9876:2 ID2-htg234 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
cgatcgatgctagctagctgggggccccaaaa
>RRR0000 | ID1-n/a ID2-RRR0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah; ID1-n/a ID2-RRR0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
gctagctagtcgatcgtagctacgatgctagctagtcg
>QQQ0000 | ID1-n/a ID2-QQQ0000 f1-blah f2-blah f3-blah f4-blah f5-blah f6-blah
cgaaaaagggaaattttaaaggggcggcgcgcg

$

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Match duplicate ids in two files

I have two text files. File 1 has 150 ids but all the ids exists in duplicates so it has 300 ids in total. File 2 has 1500 ids but all exists in duplicates so file 2 has 300 ids in total. i want to match the first occurance of every id in file 1 with first occurance of thet id in file 2 and 2nd... (2 Replies)
Discussion started by: limd
2 Replies

2. Shell Programming and Scripting

Extracting values based on line-column numbers from multiple text files

Dear All, I have to solve the following problems with multiple tab-separated text file but I don't know how. Any help would be greatly appreciated. I have access to Linux mint (but not as a professional). I have multiple tab-delimited files with the following structure: file1: 1 44 2 ... (5 Replies)
Discussion started by: Bastami
5 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

Match value in column and append file with new values

Hi, I need help to match two files based on two columns. file_1 ID AA An Ca Ele Pro Su Ot Tra g13950 No No Yes No Yes Yes Yes Yes g05760 Yes No No No No Yes Yes Yes g12640 No No No No No No No No k17720 No Yes No No No No No Yes g05640 Yes Yes Yes No No Yes Yes Yes file_2 ... (8 Replies)
Discussion started by: redse171
8 Replies

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. Shell Programming and Scripting

Adding Column Values Using Pattern Match

Hi All, I have a file with data as below: A,FILE1_MYFILE_20130309_1038,80,25.60 B,FILE1_MYFILE_20130309_1038,24290,18543.38 C,FILE1_dsc_dlk_MYFILE_20130309_1038,3,10.10 A,FILE2_MYFILE_20130310_1039,85,110.10 B,FILE2_MYFILE_20130310_1039,10,12.10... (10 Replies)
Discussion started by: angshuman
10 Replies

7. UNIX for Dummies Questions & Answers

Merging two text files by a column and filling in the missing values

Hi, I have to text files that I want to merge by the first column. The values in the first column pretty much match for the first part. However there are some values that are present in column 1 and not present in column 2 or vice versa. For such values I would like to substitute X for the... (9 Replies)
Discussion started by: evelibertine
9 Replies

8. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

9. Shell Programming and Scripting

Cat Values from Several files if it meets criteria for column values

I have results from some statistical analyses. The format of the results are as given below: I want to select lines that have a p-value (last column) less than 0.05 from all the results files (*.results) and cat to a new results file. It would be very nice if a new column is added that tells... (2 Replies)
Discussion started by: genehunter
2 Replies

10. Shell Programming and Scripting

Finding multiple column values and match in a fixed length file

Hi, I have a fixed length file where I need to verify the values of 3 different fields, where each field will have a different value. How can I do that in a single step. (6 Replies)
Discussion started by: naveen_sangam
6 Replies
Login or Register to Ask a Question