Sponsored Content
Top Forums Shell Programming and Scripting Compare files column to column based on keys Post 302408874 by blackjack101 on Tuesday 30th of March 2010 05:50:30 PM
Old 03-30-2010
Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files).

I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it appears awk or perl would be more appropriate. Thanks

req 1- I need to extract keys from file1 (two columns) and match with file2 based on keys. This is done by the following which I found in other posts.

Code:
 
awk -F"\t" '
    FILENAME=="f1.txt" {
        Keys[$1 $2]++
    }
    FILENAME=="f2.txt" {
        if (Keys[$1 $2] == 0) {
            print $0
        }
    }
' f1.txt f2.txt > rf.txt

req 2- match rows based on keys for all columns except keys one at a time and produce a report if column value mismatch between files. Report will be examined to ignore some known differences.

File samples -
Code:
f1.txt

210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11
410	998877	web	1003		

f2.txt

210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11.00
410	998877	web	1203		

report -

f2	310	998877	column6 11	11.00
f2	410	998877	column4	1003	1203

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies

2. Shell Programming and Scripting

Compare Two Files(Column By Column) In Perl or shell

Hi, I am writing a comparator script, which comapre two txt files(column by column) below are the precondition of this comparator 1)columns of file are not seperated Ex. file1.txt 8888812341181892 1243548895685687 8945896789897789 1111111111111111 file2.txt 9578956789567897... (2 Replies)
Discussion started by: kumar96877
2 Replies

3. Shell Programming and Scripting

Nawk script to compare records of a file based on a particular column.

Hi Gurus, I am struggling with nawk command where i am processing a file based on columns. Here is the sample data file. UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11 UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11... (7 Replies)
Discussion started by: usha rao
7 Replies

4. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Hello experts, Please help me in achieving this in an easier way possible. I have 2 csv files with following data: File1 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:36:09,JOB_5340 08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350 08/23/2012... (5 Replies)
Discussion started by: asnandhakumar
5 Replies

5. Shell Programming and Scripting

Compare based on column value

Hi Experts, I want to compare 2 text files based on their column values text1 is like prd-1234 yes no yes yes prd-2345 no no no yes prd-6475 yes yes yes no and test 2 is prd-1234 no no no yes prd-2345 yes no no no desired out put as follows prd-1234 1 3 prd-235 1 4 basically it shows... (5 Replies)
Discussion started by: tijomonmathew
5 Replies

6. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

7. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are... (6 Replies)
Discussion started by: linuxlearner123
6 Replies

8. Shell Programming and Scripting

Compare two csv's with column based

Hi, I am having below two CSV's col_1,col_2,col_3 1,2,4 1,3,6 col_1,col_3,col2,col_5,col_6 1,2,3,4,5 1,6,3,,, I need to compare based on the columns where the mismatch is expected output col_1,col_2,col_3 1,2,4 (3 Replies)
Discussion started by: rohit_shinez
3 Replies

9. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

10. UNIX for Beginners Questions & Answers

UNIX script to compare 3rd column value with first column and display

Hello Team, My source data (INput) is like below EPIC1 router EPIC2 Targetdefinition Exp1 Expres rtr1 Router SQL SrcQual Exp1 Expres rtr1 Router EPIC1 Targetdefinition My output like SQL SrcQual Exp1 Expres Exp1 Expres rtr1 Router rtr1 Router EPIC1 Targetdefinition... (5 Replies)
Discussion started by: sekhar.lsb
5 Replies
DIFF(1) 							   User Commands							   DIFF(1)

NAME
diff - compare files line by line SYNOPSIS
diff [OPTION]... FILES DESCRIPTION
Compare files line by line. -i --ignore-case Ignore case differences in file contents. --ignore-file-name-case Ignore case when comparing file names. --no-ignore-file-name-case Consider case when comparing file names. -E --ignore-tab-expansion Ignore changes due to tab expansion. -b --ignore-space-change Ignore changes in the amount of white space. -w --ignore-all-space Ignore all white space. -B --ignore-blank-lines Ignore changes whose lines are all blank. -I RE --ignore-matching-lines=RE Ignore changes whose lines all match RE. --strip-trailing-cr Strip trailing carriage return on input. -a --text Treat all files as text. -c -C NUM --context[=NUM] Output NUM (default 3) lines of copied context. -u -U NUM --unified[=NUM] Output NUM (default 3) lines of unified context. --label LABEL Use LABEL instead of file name. -p --show-c-function Show which C function each change is in. -F RE --show-function-line=RE Show the most recent line matching RE. -q --brief Output only whether files differ. -e --ed Output an ed script. --normal Output a normal diff. -n --rcs Output an RCS format diff. -y --side-by-side Output in two columns. -W NUM --width=NUM Output at most NUM (default 130) print columns. --left-column Output only the left column of common lines. --suppress-common-lines Do not output common lines. -D NAME --ifdef=NAME Output merged file to show `#ifdef NAME' diffs. --GTYPE-group-format=GFMT Similar, but format GTYPE input groups with GFMT. --line-format=LFMT Similar, but format all input lines with LFMT. --LTYPE-line-format=LFMT Similar, but format LTYPE input lines with LFMT. LTYPE is `old', `new', or `unchanged'. GTYPE is LTYPE or `changed'. GFMT may contain: %< lines from FILE1 %> lines from FILE2 %= lines common to FILE1 and FILE2 %[-][WIDTH][.[PREC]]{doxX}LETTER printf-style spec for LETTER LETTERs are as follows for new group, lower case for old group: F first line number L last line number N number of lines = L-F+1 E F-1 M L+1 LFMT may contain: %L contents of line %l contents of line, excluding any trailing newline %[-][WIDTH][.[PREC]]{doxX}n printf-style spec for input line number Either GFMT or LFMT may contain: %% % %c'C' the single character C %c'OOO' the character with octal code OOO -l --paginate Pass the output through `pr' to paginate it. -t --expand-tabs Expand tabs to spaces in output. -T --initial-tab Make tabs line up by prepending a tab. -r --recursive Recursively compare any subdirectories found. -N --new-file Treat absent files as empty. --unidirectional-new-file Treat absent first files as empty. -s --report-identical-files Report when two files are the same. -x PAT --exclude=PAT Exclude files that match PAT. -X FILE --exclude-from=FILE Exclude files that match any pattern in FILE. -S FILE --starting-file=FILE Start with FILE when comparing directories. --from-file=FILE1 Compare FILE1 to all operands. FILE1 can be a directory. --to-file=FILE2 Compare all operands to FILE2. FILE2 can be a directory. --horizon-lines=NUM Keep NUM lines of the common prefix and suffix. -d --minimal Try hard to find a smaller set of changes. --speed-large-files Assume large files and many scattered small changes. -v --version Output version info. --help Output this help. FILES are `FILE1 FILE2' or `DIR1 DIR2' or `DIR FILE...' or `FILE... DIR'. If --from-file or --to-file is given, there are no restrictions on FILES. If a FILE is `-', read standard input. AUTHOR
Written by Paul Eggert, Mike Haertel, David Hayes, Richard Stallman, and Len Tower. REPORTING BUGS
Report bugs to <bug-gnu-utils@gnu.org>. COPYRIGHT
Copyright (C) 2002 Free Software Foundation, Inc. This program comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of this program under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. SEE ALSO
The full documentation for diff is maintained as a Texinfo manual. If the info and diff programs are properly installed at your site, the command info diff should give you access to the complete manual. diffutils 2.8.1 April 2002 DIFF(1)
All times are GMT -4. The time now is 08:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy