Multiple file matching in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Multiple file matching in awk
# 1  
Old 01-31-2012
Multiple file matching in awk

head 1.txt
Code:
chr1	1   2   s1
chr1	3   4   s1
chr1	5   6   s1
chr1	20 11  s1
chr1	7   90  s1

head 2.txt
Code:
chr1	1   2   s2
chr1	3   4   s2
chr1	5   6   s2
chr1	20 11  s2
chr1	7   90  s2

Code I have used
Code:
awk ' NR==FNR{(a[$1]=$2) && (a[$2]=$3) && (a[$3]=$4);next} (a[$1]) && (a[$2]) && (a[$3]) {print $1"\t"$2"\t"$3"\t"$4"\t"a[$3]}' 2.txt 1.txt | head -5

Output
Code:
chr1	1   2   s1 s2
chr1	3   4   s1 s2
chr1	5   6   s1 s2
chr1	20 11  s1 s2
chr1	7   90  s1 s2

My question now is if I have another couple of files or more with the following data

head 3.txt
Code:
chr1	1   2   s3
chr1	3   4   s3
chr1	5   6   s3
chr1	20 11  s3
chr1	7   90  s3

head 4.txt
Code:
chr1	1   2   s4
chr1	3   4   s4
chr1	5   6   s4
chr1	20 11  s4
chr1	7   90  s4

How do I get the following output
Code:
chr1	1   2   s1 s2 s3 s4
chr1	3   4   s1 s2 s3 s4
chr1	5   6   s1 s2 s3 s4
chr1	20 11  s1 s2 s3 s4
chr1	7   90  s1 s2 s3 s4

Please keep in mind that my files are unsorted.

Thanks in advance.

Moderator's Comments:
Mod Comment Use code tags please, see PM.

Last edited by zaxxon; 01-31-2012 at 07:34 PM.. Reason: code tags
# 2  
Old 01-31-2012
How about this (output will be in same order as last file):

Code:
awk 'FNR==1{f++}
{key[FNR]=$1"\t"$2"\t"$3; val[f,key[FNR]]=$4}
END {
   for(i=1;i in key; i++) {
       printf "%s", key[i];
       for(c=1;c<=f;c++) printf "\t%s", val[c,key[i]]
       printf "\n";
}}' *.txt

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 02-01-2012
head 1.txt
Code:
chr1	1	2	s1
chr1	3	4	s1
chr1	20	11	s1
chr1	7	90	s1

head 2.txt
Code:
chr1	1	2	s2
chr1	3	4	s2
chr1	5	6	s2
chr1	20	11	s2
chr1	7	90	s2

head 3.txt
Code:
chr1	1	2	s3
chr1	5	6	s3
chr1	20	11	s3
chr1	7	90	s3

head 4.txt
Code:
chr1	1	2	s4
chr1	3	4	s4
chr1	5	6	s4
chr1	20	11	s4

Code:
awk 'FNR==1{f++} {key[FNR]=$1"\t"$2"\t"$3; val[f,key[FNR]]=$4} END { for(i=1;i in key; i++) {printf "%s", key[i]; for(c=1;c<=f;c++) printf "\t%s", val[c,key[i]]; printf "\n";}}' *.txt

chr1	1	2	s1	s2	s3	s4
chr1	3	4	s1	s2		s4
chr1	5	6		s2	s3	s4
chr1	20	11	s1	s2	s3	s4
chr1	7	90	s1	s2	s3


Thanks the code works fine.

But, if there are some records which are present only in two or three files and not in all the four or multiple files, the code generates the same output and leaves a blank space at records that doesn't contain any value. Please check my example data above.

Thanks in advance.

@Mods: Please excuse me for not including the code tags. Could someone tell me how to include code tags?


Moderator's Comments:
Mod Comment How to use code tags when posting data and code samples.

Last edited by Franklin52; 02-01-2012 at 05:23 PM.. Reason: Code tags
# 4  
Old 02-01-2012
You didn't say what you wanted when a record was missing this version puts *NONE*

Code:
awk 'FNR==1{f++}
{key[FNR]=$1"\t"$2"\t"$3; val[f,key[FNR]]=$4}
END { 
   for(i=1;i in key; i++) {
       printf "%s", key[i];
       for(c=1;c<=f;c++) printf "\t%s", val[c,key[i]]?val[c,key[i]]:"*NONE*"
       printf "\n";
   }
}' *.txt

# 5  
Old 02-01-2012
Quote:
Originally Posted by Chubler_XL
You didn't say what you wanted when a record was missing this version puts *NONE*

Code:
awk 'FNR==1{f++}
{key[FNR]=$1"\t"$2"\t"$3; val[f,key[FNR]]=$4}
END { 
   for(i=1;i in key; i++) {
       printf "%s", key[i];
       for(c=1;c<=f;c++) printf "\t%s", val[c,key[i]]?val[c,key[i]]:"*NONE*"
       printf "\n";
   }
}' *.txt

Sorry friend. When I work with things, I encounter certain things, so kind of wondering how to do it. Thanks for the update.

If I have more than 4 columns in each file and still the first three columns needs to be matched and the remaining columns needs to be printed, how do I do it?

1.txt
Quote:
chr1 1 2 3 4 5 6
chr2 a b c d e f
chr3 f g h l o p
2.txt
Quote:
chr1 1 2 3 7 8 9
chr2 a b c u j i
chr3 f g h i j k
3.txt
Quote:
chr1 1 2 3 10 11 12
chr2 a b c 3 4 5
chr3 f g h 6 7 8
4.txt
Quote:
chr1 1 2 3 9 8 7
chr2 a b c 0 2 6
chr3 f g h 3 1 2
Output.txt
Quote:
chr1 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7
chr2 a b c d e f u j i 3 4 5 0 2 6
chr3 f g h l o p i j k 6 7 8 3 1 2
Please note that my files are unsorted and not all files have common values. If file one has 100 records, the other one might have only 50 records but some recors will be common.

Thanks in advance.
# 6  
Old 02-01-2012
how about this:

Code:
awk 'FNR==1{f++}
{
    key=$1"\t"$2"\t"$3;
    C[key]++;
    for(i=4;i<=NF;i++)
        val[f,key]=val[f,key]"\t"$i
}
END { 
   for(key in C) {
       if (C[key] == f) {
           printf "%s", key;
           for(c=1;c<=f;c++) printf "%s", val[c,key]
           printf "\n";
       }
   }
}' *.txt

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to update file with partial matching line in another file and append text

In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

awk to update file with sum of matching fields in another file

In the awk below I am trying to add a penalty to a score to each matching $1 in file2 based on the sum of $3+$4 (variable TL) from file1. Then the $4 value in file1 is divided by TL and multiplied by 100 (this valvue is variable S). Finally, $2 in file2 - S gives the updated $2 result in file2.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

4. Shell Programming and Scripting

awk script issue redirecting to multiple files after matching pattern

Hi All I am having one awk and sed requirement for the below problem. I tried multiple options in my sed or awk and right output is not coming out. Problem Description ############################################################### I am having a big file say file having repeated... (4 Replies)
Discussion started by: kshitij
4 Replies

5. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

6. Shell Programming and Scripting

awk - writing matching pattern to a new file and deleting it from the current file

Hello , I have comma delimited file with over 20 fileds that i need to do some validations on. I have to check if certain fields are null and then write the line containing the null field into a new file and then delete the line from the current file. Can someone tell me how i could go... (2 Replies)
Discussion started by: goddevil
2 Replies

7. Shell Programming and Scripting

Split single file into multiple files using pattern matching

I have one single shown below and I need to break each ST|850 & SE to separate file using unix script. Below example should create 3 files. We can use ST & SE to filter as these field names will remain same. Please advice with the unix code. ST|850 BEG|PO|1234 LIN|1|23 SE|4 ST|850... (3 Replies)
Discussion started by: prasadm
3 Replies

8. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Hi, Input 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr21.fa chr3.fa 7488 7389 chr1.fa chr1.fa 3546 9887 chr9.fa chr5.fa 7898 7387 chrX.fa chr3.fa Desired Output 7488 7389 chr1.fa chr1.fa 2 3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

9. Shell Programming and Scripting

Multiple pattern matching using awk and getting count of lines

Hi , I have a file which has multiple rows of data, i want to match the pattern for two columns and if both conditions satisfied i have to add the counter by 1 and finally print the count value. How to proceed... I tried in this way... awk -F, 'BEGIN {cnt = 0} {if $6 == "VLY278" &&... (6 Replies)
Discussion started by: aemunathan
6 Replies

10. Shell Programming and Scripting

matching multiple values in awk

How will you change the 5th column in the data file with the value in the second column in the error_correction.txt file. You have to match an extra variable, column 3 of the error_correction file with column 6 of the data.txt file. data.txt: vgr,bugatti veron,,3.5,Maybe,6,.......,ax2,....... (0 Replies)
Discussion started by: VGR
0 Replies
Login or Register to Ask a Question