New files based off match or no match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting New files based off match or no match
# 1  
Old 05-04-2015
New files based off match or no match

Trying to match $2 in original_targets with $2 of new_targets . If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no match file is created using the information in the original_targets $1,$2,$3,$4 Thank you Smilie.

So for example,
the first $2 in original_targets is 34529 and that value does not match exactly $2 of new_targets, so that is copied to a "No Match file" as
Code:
chr1	34529	35031     DTE3504500000001

the 150 $2 in original_targets is 1114780 and that values matches row 251201 exactly so is copied to a match.txt as
Code:
chr1	1114780	1115142	PXL-A0000150

Code:
awk 'FNR==NR { E[$2]=$2 ; next } { $2=$2 in E?E[$2]:"No Match" } 1' OFS="\t" original_targets.txt new_targets.txt > match.txt


Last edited by Don Cragun; 05-04-2015 at 07:52 PM.. Reason: Add and fix CODE tags.
# 2  
Old 05-05-2015
Quote:
Originally Posted by cmccabe
Trying to match $2 in original_targets with $2 of new_targets should be written to match.txt. If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no match file is created using the information in the original_targets $1,$2,$3,$4 Thank you Smilie.


So for example,
the first $2 in original_targets is 34529 and that value does not match exactly $2 of new_targets, so that is copied to a "No Match file" as
Code:
chr1	34529	35031     DTE3504500000001

the 150 $2 in original_targets is 1114780 and that values matches row 251201 exactly so is copied to a match.txt as
Code:
chr1	1114780	1115142	PXL-A0000150

Code:
awk 'FNR==NR { E[$2]=$2 ; next } { $2=$2 in E?E[$2]:"No Match" } 1' OFS="\t" original_targets.txt new_targets.txt > match.txt

I'm very confused.

Your description says that, for matched lines, output consisting of fields 1-4 of new_targets.txt and field 4 from the matched original_targets.txt line. Your example says the matched line is written unchanged. The sample code changes one or more spaces between fields to a single tab and keeps all input fields no matter how many fields are present in the input.

Your description says that fields 1-4 from unmatched lines are written to a different file. You sample code changes field 2 to the string "No Match" and changes every other sequence of one or more spaces in an input line to a tab (keeping all input fields no matter how many are present) and writes them to the same output file as the matched lines.

Your example says that there are 150 matches for field 2 containing 1114780 in original_targets (without .txt), so line 215201 from one of the files (which file is not specified) is to be copied to match.txt without adding field #4 from any of the 150 matching lines from original_targets.txt.

Please rewrite your requirements clearly, provide (small) samples of the two input files, and a corresponding sample output file that should be produced from those sample input files.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 05-05-2015
I apologize for the confusion and think I have found a method to do what I need. There seems to be one issue I am having, but I am going to try and figure it out then re-post if I need assistance. Thank you again and I apologize for the confusion Smilie.

I was able to figure it out using your explanation from this post:

Match or no match Smilie

Last edited by cmccabe; 05-05-2015 at 12:46 PM.. Reason: Thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a... (17 Replies)
Discussion started by: dis0wned
17 Replies

3. UNIX for Beginners Questions & Answers

Match tab-delimited files based on key

I thought I had this figured out but was wrong so am humbly asking for help. The task is to add an additional column to FILE 1 based on records in FILE 2. The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2. I want to add the third column from FILE 2 to the beginning of... (8 Replies)
Discussion started by: andmal
8 Replies

4. Shell Programming and Scripting

2 files replace multiple occurances based on a match

Hi All, I need some help trying to achieve the below but everything I've tried has failed, I have 2 files which i'm trying to carry out a match based on the first column from file 1, take that value find it in file 2 if found replace it with the second column from File 1 Lookup File: File 1... (3 Replies)
Discussion started by: mutley2202
3 Replies

5. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

7. Shell Programming and Scripting

Join lines from two files based on match

I have two files. File1 >gi|11320906|gb|AF197889.1|_Buchnera_aphidicola ATGAAATTTAAGATAAAAAATAGTATTTT >gi|11320898|gb|AF197885.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAA >gi|11320894|gb|AF197883.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAATTTTT File2 AF197885 Uroleucon aeneum... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

8. Shell Programming and Scripting

Match files based on either of the two columns awk

Dear Shell experts, I have 2 files with structure: File 1: ID and count head test_GI_count1.txt 1000094 2 10039307 1 10039641 1 10047177 11 10047359 1 1008555 2 10120302 1 10120672 13 10121776 1 10121865 32 And 2nd file: head Protein_gi_GeneID_symbol.txt protein_gi GeneID... (11 Replies)
Discussion started by: smitra
11 Replies

9. UNIX for Dummies Questions & Answers

Deleting files based on Substring match

In folder there are files (eg ABS_18APR2012_XYZ.csv DSE_17APR2012_ABE.csv) . My requirement is to delete all the files except today's timestamp I tried doing this to list all the files not having today's date timestamp #!/bin/ksh DATE=`date +"%d%h%Y"` DIR=/data/rfs/... (9 Replies)
Discussion started by: manushi88
9 Replies

10. Shell Programming and Scripting

Matching string on two files based on match rules.

Hi, How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules. 1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N' 2) on file2... (13 Replies)
Discussion started by: effay
13 Replies
Login or Register to Ask a Question