Sponsored Content
Top Forums Shell Programming and Scripting Merging columns based on one or more column in two files Post 302694501 by agama on Thursday 30th of August 2012 10:31:51 PM
Old 08-30-2012
If file one isn't too large, then this should work

Code:
# single pass across each file, but requires the entire first file
# to be held in memory which might not be realistic.
# order is preserved based on file2
awk '
    NR == FNR { cache[$1] = $0; next; }
    $1 in cache {
        printf( "%s", cache[$1] );
        $1 = "";
        print;
    }
' file1 file2 >output

If file1 is large (i.e. it's not practical to cache it in memory), then this is one way. May not be the most efficent, but it should work. The order of the output is sorted by field1.

Code:
# multiple passes across the data, but memory requirement is eliminated
# order of file2 is not preserved.
(
    sed 's/^/a /' file1
    sed 's/^/b /' file2
) | sort -k 2n,2 -k 1,1  awk '
    $1 == "a" {
        x = $2;
        $1 = "";
        cache = $0;
        next;
    }
    $2 == x {
        $1 = $2 ="";
        printf( "%s%s\n", substr( cache, 2 ), $0 );
    }
'

You could do this without the seds, and depend on the number of columns to determine if an unmatched pair exists, but this works without having to know the exact layout of either file, other than the desired column to compare.

Yes, multiple columns can be used to match.

Last edited by agama; 08-30-2012 at 11:38 PM.. Reason: small efficiency change.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging column from two files based on identifier

Hi, I have two files consisting of two columns. So I want to merge column 2 if column 1 is the same. So heres an example of what I mean. FILE1 driver 444 car 333 hat 222 FILE2 driver 333 car 666 hat 999 So I want to merge the column 2's together so... (4 Replies)
Discussion started by: phil_heath
4 Replies

2. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

3. UNIX for Dummies Questions & Answers

Merging two files based on two columns to make a third file

Hi there, I'm trying to merge two files and make a third file. However, two of the columns need to match exactly in both files AND I want everything from both files in the output if the two columns match in that row. First file looks like this: chr1 10001980 T A Second... (12 Replies)
Discussion started by: infiniteabyss
12 Replies

4. Shell Programming and Scripting

merging two files based on first column

I had two files file1 and file2. I want a o/p file(file3) like below using first column as ref. Pls give suggestion ass join is not working as the number of lines in each file is nealry 5 C? file1 --------------------- 404000324810001 Y 404000324810004 N 404000324810008 Y 404000324810009 N... (1 Reply)
Discussion started by: p_sai_ias
1 Replies

5. UNIX for Dummies Questions & Answers

Merging lines based on one column

Hi, I have a file which I'd like to merge lines based on duplicates in one column while keeping the info for other columns. Let me simplify it by an example: File ESR1 ANASTROZOLE NA FDA_approved ESR1 CISPLATIN NA FDA_approved ESR1 DANAZOL agonist NA ESR1 EXEMESTANE NA FDA_approved... (3 Replies)
Discussion started by: JJ001
3 Replies

6. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (1 Reply)
Discussion started by: manojmalhotra
1 Replies

7. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (6 Replies)
Discussion started by: manojmalhotra
6 Replies

8. UNIX for Dummies Questions & Answers

Merging two files based on matching columns

Hi, I am facing issues while accomplishing below task. We have two files Test1.txt and Test2.txt. We have to match 1st column of Test1.txt file with 2nd column of Test2.txt and then merge 2nd file with the 1st file. In the output we should select column 1 and 2 from the 1st file and column 1... (5 Replies)
Discussion started by: Prathmesh
5 Replies

9. Shell Programming and Scripting

Paste columns based on common column: multiple files

Hi all, I've multiple files. In this case 5. Space separated columns. Each file has 12 columns. Each file has 300-400K lines. I want to get the output such that if a value in column 2 is present in all the files then get all the columns of that value and print it side by side. Desired output... (15 Replies)
Discussion started by: genome
15 Replies

10. UNIX for Beginners Questions & Answers

Merging rows based on same ID in First column.

Hellow, I have a tab-delimited file with 3 columns : BINPACKER.13259.1.p2 SSF48239 BINPACKER.13259.1.p2 PF13243 BINPACKER.13259.1.p2 G3DSA:1.50.10.20 BINPACKER.13259.2.p2 SSF48239 BINPACKER.13259.2.p2 PF13243 BINPACKER.13259.2.p2 G3DSA:1.50.10.20... (7 Replies)
Discussion started by: anjaliANJALI
7 Replies
XZDIFF(1)							     XZ Utils								 XZDIFF(1)

NAME
xzcmp, xzdiff, lzcmp, lzdiff - compare compressed files SYNOPSIS
xzcmp [cmp_options] file1 [file2] xzdiff [diff_options] file1 [file2] lzcmp [cmp_options] file1 [file2] lzdiff [diff_options] file1 [file2] DESCRIPTION
xzcmp and xdiff invoke cmp(1) or diff(1) on files compressed with xz(1), lzma(1), gzip(1), or bzip2(1). All options specified are passed directly to cmp or diff. If only one file is specified, then the files compared are file1 (which must have a suffix of a supported com- pression format) and file1 from which the compression format suffix has been stripped. If two files are specified, then they are uncom- pressed if necessary and fed to cmp(1) or diff(1). The exit status from cmp or diff is preserved. The names lzcmp and lzdiff are provided for backward compatibility with LZMA Utils. SEE ALSO
cmp(1), diff(1), xz(1), gzip(1), bzip2(1), zdiff(1) BUGS
Messages from the cmp(1) or diff(1) programs refer to temporary filenames instead of those specified. Tukaani 2009-07-05 XZDIFF(1)
All times are GMT -4. The time now is 02:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy