Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Data match 2 files based on first 2 columns matching only and join if match Post 303041351 by axis88 on Thursday 21st of November 2019 05:38:16 PM
Old 11-21-2019
Data match 2 files based on first 2 columns matching only and join if match

Hi,

i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file

I have tried with join and awk and i keep getting blank outputs or same file

is there an easier way than what i am trying to do, with grep or seb

here is what i have so far
Code:
awk 'FNR == NR { h[$1,$2] = 1; next } h[$1,$2]' file1 file2 > output

awk 'NR==FNR {a[substr($1,1,5)]=$2; next} 
               {print $1,a[$1],$2,$3,a[$3],$4}' file2 file1

awk 'BEGIN{while((getline<"file1")>0)l[$1","$2]=1}!l[$1","$2]' file2

any help would be appreciated, also i will contribute to this forum with anything i can help with

thanks

Last edited by RavinderSingh13; 11-22-2019 at 01:20 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Matching same columns and finding the smallest match

Hi all, I am wondering if its possible to solve my problem with a simple code. Basically I have a file that looks like this (tab delimited) bob 8 250 tina 8 225 sam 8 225 ellen 9 315 kyle 9 275 sally 9 135 So what I want to do is match columns 2 and 5. If columns 2 and 5... (2 Replies)
Discussion started by: phil_heath
2 Replies

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

3. Shell Programming and Scripting

Matching string on two files based on match rules.

Hi, How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules. 1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N' 2) on file2... (13 Replies)
Discussion started by: effay
13 Replies

4. Shell Programming and Scripting

Match files based on either of the two columns awk

Dear Shell experts, I have 2 files with structure: File 1: ID and count head test_GI_count1.txt 1000094 2 10039307 1 10039641 1 10047177 11 10047359 1 1008555 2 10120302 1 10120672 13 10121776 1 10121865 32 And 2nd file: head Protein_gi_GeneID_symbol.txt protein_gi GeneID... (11 Replies)
Discussion started by: smitra
11 Replies

5. Shell Programming and Scripting

Join lines from two files based on match

I have two files. File1 >gi|11320906|gb|AF197889.1|_Buchnera_aphidicola ATGAAATTTAAGATAAAAAATAGTATTTT >gi|11320898|gb|AF197885.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAA >gi|11320894|gb|AF197883.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAATTTTT File2 AF197885 Uroleucon aeneum... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

6. UNIX for Advanced & Expert Users

Match and print based on columns

HI, I have 2 different questions in this thread. Consider 2 files as input (input file have different line count ) File 1 1 1 625 56 1 12 657 34 1 9 25 45 1 2 20 54 67 3 25 35 27 4 45 73 36 5 125 56 45 File2 1 1 878 76 1 9 83 67 2 20 73 78 4 47 22 17 3 25 67 99 (4 Replies)
Discussion started by: rossi
4 Replies

7. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

8. Shell Programming and Scripting

New files based off match or no match

Trying to match $2 in original_targets with $2 of new_targets . If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

10. Shell Programming and Scripting

Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a... (17 Replies)
Discussion started by: dis0wned
17 Replies
join(1) 						      General Commands Manual							   join(1)

NAME
join - relational database operator SYNOPSIS
[options] file1 file2 DESCRIPTION
forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 or file2 is the standard input is used. file1 and file2 must be sorted in increasing collating sequence (see Environment Variables below) on the fields on which they are to be joined; normally the first in each line. The output contains one line for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field followed by the rest of the line from file1, then the rest of the line from file2. The default input field separators are space, tab, or new-line. In this case, multiple separators count as one field separator, and lead- ing separators are ignored. The default output field separator is a space. Some of the below options use the argument n. This argument should be a or a referring to either file1 or file2, respectively. Options In addition to the normal output, produce a line for each unpairable line in file n, where n is or Replace empty output fields by string s. Join on field m of both files. The argument m must be delimited by space characters. This option and the following two are provided for backward compatibility. Use of the and options ( see below ) is recommended for portability. Join on field m of file1. Join on field m of file2. Each output line comprises the fields specified in list, each element of which has the form where n is a file number and m is a field number. The common field is not printed unless specifically requested. Use character c as a separator (tab character). Every appearance of c in a line is significant. The character c is used as the field sepa- rator for both input and output. Instead of the default output, produce a line only for each unpairable line in file_number, where file_number is or Join on field f of file 1. Fields are numbered starting with 1. Join on field f of file 2. Fields are numbered starting with 1. EXTERNAL INFLUENCES
Environment Variables determines the collating sequence expects from input files. determines the alternative blank character as an input field separator, and the interpretation of data within files as single and/or multi- byte characters. also determines whether the separator defined through the option is a single- or multi-byte character. If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty variable. If is not specified or is set to the empty string, a default of ``C'' (see lang(5)) is used instead of If any internationaliza- tion variable contains an invalid setting, behaves as if all internationalization variables are set to ``C'' (see environ(5)). International Code Set Support Single- and multi-byte character code sets are supported with the exception that multi-byte-character file names are not supported. EXAMPLES
The following command line joins the password file and the group file, matching on the numeric group ID, and outputting the login name, the group name, and the login directory. It is assumed that the files have been sorted in the collating sequence defined by the or environment variable on the group ID fields. The following command produces an output consisting all possible combinations of lines that have identical first fields in the two sorted files sf1 and sf2, with each line consisting of the first and third fields from and the second and fourth fields from WARNINGS
With default field separation, the collating sequence is that of with the sequence is that of a plain sort. The conventions of and are incongruous. Numeric filenames may cause conflict when the option is used immediately before listing filenames. AUTHOR
was developed by OSF and HP. SEE ALSO
awk(1), comm(1), sort(1), uniq(1). STANDARDS CONFORMANCE
join(1)
All times are GMT -4. The time now is 03:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy