Match value in column and append file with new values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match value in column and append file with new values
# 1  
Old 09-10-2014
Match value in column and append file with new values

Hi,

I need help to match two files based on two columns.

file_1

Code:
ID	AA	An	Ca	Ele	Pro	Su	Ot	Tra
g13950	No	No	Yes	No	Yes	Yes	Yes	Yes
g05760	Yes	No	No	No	No	Yes	Yes	Yes
g12640	No	No	No	No	No	No	No	No
k17720	No	Yes	No	No	No	No	No	Yes
g05640	Yes	Yes	Yes	No	No	Yes	Yes	Yes

file_2
Code:
192.1	g13950	192.168.1.2.58	79%	324	BRK brag	11.1663	Dressrossa season
192.1	k17720	192.168.1.2.77	81%	777	BRK brag	12.9651	Dragon era
192.1	g12640	192.168.1.2.78	73%	569	FA match	26.4916	Dragon era
192.1	g05640	192.168.1.2.86	93%	961	Zorro Crack	20.1251	In the sky
188.1	g05760	192.168.1.2.86	91%	1225	Lucy Sanji	25.974	Island


$1 in file_1 need to match with $2 in file_2. When there is a match, that match with "Yes" should be changed according to their column header except for the last column (Tra) of file_1. If there is a match but all columns in file_1 is 'No', then the last 2 columns in output file should be set "-" and 'No' respectively. The results need to be appended at the end of the output file as follows:-

Output
Code:
192.1	g13950	192.168.1.2.58	79%	324	BRK brag	11.1663	Dressrossa season	Ca,Pro,Su,Ot	Yes
192.1	k17720	192.168.1.2.77	81%	777	BRK brag	12.9651	Dragon era	An	Yes	
192.1	g12640	192.168.1.2.78	73%	569	FA match	26.4916	Dragon era	-	No
192.1	g05640	192.168.1.2.86	93%	961	Zorro Crack	20.1251	In the sky	AA,An,Ca,Su,Ot	Yes
188.1	g05760	192.168.1.2.86	91%	1225	Lucy Sanji	25.974	Island	AA,Su,Ot	Yes

This is really complicated for me to work on and i have no idea how to do it. I googled and looked for related threads but found only the way how to convert from columns to row but i could not figure out how i can get the output as i expected above Smilie. appreciate your kind help on this. thanks.
# 2  
Old 09-10-2014
And if you have a line in file_1 like:
Code:
k12345	No	No	No	No	No	No	No	Yes

what is supposed to be added to the corresponding line in file_2?
Code:
	-	No

or:
Code:
	-	Yes

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 09-10-2014
Hello,

Following may help you. (It will give Tra word at last)

Code:
cat check_file2.ksh
awk 'NR==FNR{
{if(FNR==1)
        {for(i=1;i<=NF;i++)
                {Y[i]=$i}
        }
}
{if(NR>1)
        {for(j=1;j<=NF;j++)
                {if($j == "Yes")
                        {A=A?A","Y[j]:Y[j]}
                                {if(j==NF && A)
                                        {X[$1]=A OFS "Yes";A=""}
                                }
 
                {if($j == "No")
                        {p++}
                }
                {if(p==NF)
                        {X[$1]="-" OFS "No";}
                }
                }
 
        }
}
            }
($2 in X) {print $0 OFS X[$2]}
' OFS="\t" file_1 file_2

Output will be as follows.

Code:
192.1   g13950  192.168.1.2.58  79%     324     BRK brag        11.1663 Dressrossa season       Ca,Pro,Su,Ot,Tra        Yes
192.1   k17720  192.168.1.2.77  81%     777     BRK brag        12.9651 Dragon era      An,Tra  Yes
192.1   g12640  192.168.1.2.78  73%     569     FA match        26.4916 Dragon era      -       No
192.1   g05640  192.168.1.2.86  93%     961     Zorro Crack     20.1251 In the sky      AA,An,Ca,Su,Ot,Tra      Yes
188.1   g05760  192.168.1.2.86  91%     1225    Lucy Sanji      25.974  Island  AA,Su,Ot,Tra    Yes


EDIT: Following command will remove Tra word as per your condition.

Code:
cat check_file2.ksh
awk 'NR==FNR{
{if(FNR==1)
        {for(i=1;i<=NF;i++)
                {Y[i]=$i}
        }
}
{if(NR>1)
        {for(j=1;j<=NF;j++)
                {if($j == "Yes" && Y[j] !~ "Tra")
                        {A=A?A","Y[j]:Y[j]}
                                {if(j==NF && A)
                                        {X[$1]=A OFS "Yes";A=""}
                                }
 
                {if($j == "No")
                        {p++}
                }
                {if(p==NF)
                        {X[$1]="-" OFS "No";}
                }
                }
 
        }
}
            }
($2 in X) {print $0 OFS X[$2]}
' OFS="\t" file_1 file_2

Output will be as follows.

Code:
192.1   g13950  192.168.1.2.58  79%     324     BRK brag        11.1663 Dressrossa season       Ca,Pro,Su,Ot    Yes
192.1   k17720  192.168.1.2.77  81%     777     BRK brag        12.9651 Dragon era      An      Yes
192.1   g12640  192.168.1.2.78  73%     569     FA match        26.4916 Dragon era      -       No
192.1   g05640  192.168.1.2.86  93%     961     Zorro Crack     20.1251 In the sky      AA,An,Ca,Su,Ot  Yes
188.1   g05760  192.168.1.2.86  91%     1225    Lucy Sanji      25.974  Island  AA,Su,Ot        Yes

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-10-2014 at 02:56 AM.. Reason: changed OFS to tab + improved solution which will remove Tra in Output
This User Gave Thanks to RavinderSingh13 For This Post:
# 4  
Old 09-10-2014
Code:
awk -F '\t' 'NR == FNR{if(NR == 1) {for(i=2; i<=NF-1; i++) H[i]=$i; next};
  for(i=2; i<=NF-1; i++) if($i == "Yes")
    A[$1] = (A[$1] == "") ? H[i] : (A[$1] "," H[i]); B[$1] = $NF; next}
{print $0, (A[$2] == "") ? "-" : A[$2], B[$2]}' OFS='\t' file_1 file_2

This User Gave Thanks to SriniShoo For This Post:
# 5  
Old 09-10-2014
Quote:
Originally Posted by Don Cragun
And if you have a line in file_1 like:
Code:
k12345	No	No	No	No	No	No	No	Yes

what is supposed to be added to the corresponding line in file_2?
Code:
	-	No

or:
Code:
	-	Yes

Hi Don,

forgot that one. sorry bout that. it should be the second one:
Code:
	-	Yes

[/QUOTE]

thanks.

---------- Post updated at 09:23 AM ---------- Previous update was at 08:58 AM ----------

Hi RavinderSingh13,

Thanks so much.. It worked perfectly on my sample data. I just need to add a little bit code due to situation highlighted by Don Crugan. Will look into that. Thanks

---------- Post updated at 09:33 AM ---------- Previous update was at 09:23 AM ----------

Quote:
Originally Posted by SriniShoo
Code:
awk -F '\t' 'NR == FNR{if(NR == 1) {for(i=2; i<=NF-1; i++) H[i]=$i; next};
  for(i=2; i<=NF-1; i++) if($i == "Yes")
    A[$1] = (A[$1] == "") ? H[i] : (A[$1] "," H[i]); B[$1] = $NF; next}
{print $0, (A[$2] == "") ? "-" : A[$2], B[$2]}' OFS='\t' file_1 file_2

Hi SriniShoo,

Your code worked perfectly too on my sample data. It is simpler but a little bit difficult for me to understand. Can you pls explain your code? thanks.

Last edited by redse171; 09-10-2014 at 10:29 AM..
# 6  
Old 09-10-2014
Here is how I would do it (now that we know how you want to handle the case that wasn't covered by your description). It is similar to SriniShoo's code, but formatted in a way that I find easier to follow:
Code:
awk '
BEGIN {	FS = OFS = "\t"		# Set input and output field separators
}
NR == 1 {			# Grab field headers from 1st line in 1st file
	for(i = 2; i < NF; i++)
		h[i] = $i
	next			# Skip to next input line
}
FNR == NR {			# Gather data for field 1 in 1st file
	d[$1]			# Initialize to empty string
	for(i = 2; i < NF; i++)	# For each Yes add header to the list...
		if($i == "Yes")
			d[$1] = ((d[$1] == "") ? "" : d[$1] ",") h[i]
	if(d[$1] == "")		# Set last field based on what we have found
		d[$1] = "-" OFS $NF	# No "Yes"; use lsat field on line
	else	d[$1] = d[$1] OFS "Yes"	# 1 or more "Yes"; "Yes"j
	next			# Skip to next input llne
}
{	print $0, d[$2]		# Print lines from 2nd file with data from 1st file
}' file_[12]

This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 09-10-2014
Quote:
Originally Posted by Don Cragun
Here is how I would do it (now that we know how you want to handle the case that wasn't covered by your description). It is similar to SriniShoo's code, but formatted in a way that I find easier to follow:
Code:
awk '
BEGIN {	FS = OFS = "\t"		# Set input and output field separators
}
NR == 1 {			# Grab field headers from 1st line in 1st file
	for(i = 2; i < NF; i++)
		h[i] = $i
	next			# Skip to next input line
}
FNR == NR {			# Gather data for field 1 in 1st file
	d[$1]			# Initialize to empty string
	for(i = 2; i < NF; i++)	# For each Yes add header to the list...
		if($i == "Yes")
			d[$1] = ((d[$1] == "") ? "" : d[$1] ",") h[i]
	if(d[$1] == "")		# Set last field based on what we have found
		d[$1] = "-" OFS $NF	# No "Yes"; use lsat field on line
	else	d[$1] = d[$1] OFS "Yes"	# 1 or more "Yes"; "Yes"j
	next			# Skip to next input llne
}
{	print $0, d[$2]		# Print lines from 2nd file with data from 1st file
}' file_[12]

Hi Don Crugan,

This code worked awesome!! I tried on my real data and the results are perfect. Thanks so much for your detail explanation too. I was working on codes by RavinderSingh13. It worked great on the sample data but i have a problem with my real data, where results for Tra with 'No' are missing together with the one that u highlighted earlier. Trying to figure out the reason and solution.

Last edited by redse171; 09-10-2014 at 10:51 PM.. Reason: english
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

5. Shell Programming and Scripting

Adding Column Values Using Pattern Match

Hi All, I have a file with data as below: A,FILE1_MYFILE_20130309_1038,80,25.60 B,FILE1_MYFILE_20130309_1038,24290,18543.38 C,FILE1_dsc_dlk_MYFILE_20130309_1038,3,10.10 A,FILE2_MYFILE_20130310_1039,85,110.10 B,FILE2_MYFILE_20130310_1039,10,12.10... (10 Replies)
Discussion started by: angshuman
10 Replies

6. UNIX for Dummies Questions & Answers

Match values/IDs from column and text files

Hello, I am trying to modify 2 files, to yield results in a 3rd file. File-1 is a 8-columned file, separted with tab. 1234:1 xyz1234 blah blah blah blah blah blah 1234:1 xyz1233 blah blah blah blah blah blah 1234:1 abc1234 blah blah blah blah blah blah n/a RRR0000 blah blah blah... (1 Reply)
Discussion started by: ad23
1 Replies

7. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

8. UNIX for Dummies Questions & Answers

Match values from 2 files and append certain fields

Hi, I need help on appending certain field in my file1.txt based on matched patterns in file2.txt using awk or sed. The blue color need to match with one of the data in field $2 in file2.txt. If match, BEGIN and FINISHED value in red will have a new value from field $3 and $4 accordingly. ... (1 Reply)
Discussion started by: redse171
1 Replies

9. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

10. Shell Programming and Scripting

Finding multiple column values and match in a fixed length file

Hi, I have a fixed length file where I need to verify the values of 3 different fields, where each field will have a different value. How can I do that in a single step. (6 Replies)
Discussion started by: naveen_sangam
6 Replies
Login or Register to Ask a Question