Sponsored Content
Top Forums Shell Programming and Scripting Compare two text file and output the same to third file Post 302601779 by drl on Friday 24th of February 2012 10:52:57 AM
Old 02-24-2012
Hi.

Using your sample data files, here is an example of the use of join, which is unlikely to run into memory limits. There is a lot of set-up and comparison code, so just look at the heart of the computation, the join:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join.

# Section 1, setup, pre-solution.
# Infrastructure details, environment, debug commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
C=$HOME/bin/context && [ -f $C ] && $C join
set -o nounset

FILE1=${1-data1}
shift
FILE2=${1-data2}

# Sample data files.
pe
specimen $FILE1 $FILE2 expected-output.txt
# edges $FILE1 3
# pe
# edges $FILE2 3
# pe
# edges expected-output.txt 3

# Section 2, solution.
pl " Results:"
db " Section 2: solution."
join -t: -1 1 -2 2 -o 2.2,2.1,2.2,2.3 <( sort -t: $FILE1 ) <( sort -t: -k2,2 $FILE2 ) |
sed -e 's/:/ /' |	# first separator only
tee f1


# Section 3, post-solution, check results, clean-up, etc.
v1=$(wc -l <expected-output.txt)
v2=$(wc -l < f1)
pl " Comparison of $v2 created lines with $v1 lines of desired results:"
db " Section 3: validate generated calculations with desired results."

pl " Comparison with desired results:"
if [ ! -f expected-output.txt -o ! -s expected-output.txt ]
then
  pe " Comparison file \"expected-output.txt\" zero-length or missing."
  exit
fi
if cmp expected-output.txt f1
then
  pe " Succeeded -- files have same content."
else
  pe " Failed -- files not identical -- detailed comparison follows."
  if diff -b expected-output.txt f1
  then
    pe " Succeeded by ignoring whitespace differences."
  fi
fi

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
join (GNU coreutils) 6.10

Whole: 5:0:5 of 2 lines in file "data1"
123-23456-2341
783-ueu72-k812

Whole: 5:0:5 of 3 lines in file "data2"
linenumber1-sometext:123-23456-2341:another text
linenumber2--sometext:637-7288-ju18:some other text
linenumber3--sometext:783-ueu72-k812:some more text

Whole: 5:0:5 of 2 lines in file "expected-output.txt"
123-23456-2341 linenumber1-sometext:123-23456-2341:another text
783-ueu72-k812 linenumber3--sometext:783-ueu72-k812:some more text

-----
 Results:
123-23456-2341 linenumber1-sometext:123-23456-2341:another text
783-ueu72-k812 linenumber3--sometext:783-ueu72-k812:some more text

-----
 Comparison of 2 created lines with 2 lines of desired results:

-----
 Comparison with desired results:
 Succeeded -- files have same content.

The join utility requires files ordered on the join field. This means that the result might be out of order with respect to the original positions of lines in the file. If so, you will need to re-order, say on the line number field.

See man join and info coreutils join for details.

Best wishes ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare file size from a output file from a script

Hi guys, firstly I'm working on SunOS 5.10 Generic_125100-10 sun4u sparc SUNW,Sun-Fire-V240 I've made a script to compress two directory and then send them to an other server via ftp. This is working very well. Inside theis script I decide to log usefull data for troubleshooting in case of... (7 Replies)
Discussion started by: moustik
7 Replies

2. Shell Programming and Scripting

Ping text file of ip addressese and output to text file

I am basically a scripting noob, I have some programming logic, and I wouldn't post here if my 3 hours of searching actually found something. So far this is what I have: " #! /bin/ksh List=./pinglist1.txt cat $List | while read ip do Pingable="" ping $ip -n 2 | awk '/100%/ {print... (11 Replies)
Discussion started by: Lasthitlarry
11 Replies

3. Shell Programming and Scripting

Compare two file & output

Hi every body i have a problem need help urgently file 1 (approx 200K entries) aaaaa bbbb cccccc dddd ffff file 2 (approx 2 million entries) aaaaa,1,ee,44,5t,6y, bbbb,3,ff,66,5u,8r, cccccc, ..... dddd, ..... eeeeee, ..... ffff, ...... (5 Replies)
Discussion started by: The_Archer
5 Replies

4. Shell Programming and Scripting

Dynamic output file generation using a input text file with predefined output format

Hi, I have two files , one file with data file with attributes that need to be sent to another file to generate a predefined format. Example: File.txt AP|{SSHA}VEEg42CNCghUnGhCVg== APVG3|{SSHA}XK|"password" AP3|{SSHA}XK|"This is test" .... etc --------- test.sh has... (1 Reply)
Discussion started by: hudson03051nh
1 Replies

5. Shell Programming and Scripting

Match list of strings in File A and compare with File B, C and write to a output file in CSV format

Hi Friends, I'm a great fan of this forum... it has helped me tone my skills in shell scripting. I have a challenge here, which I'm sure you guys would help me in achieving... File A has a list of job ids and I need to compare this with the File B (*.log) and File C (extend *.log) and copy... (6 Replies)
Discussion started by: asnandhakumar
6 Replies

6. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi, I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file. File1 - file2 = file3 wc -l file1.txt 58112 wc -l file2.txt 55260 head -5 file1.txt 101214200123 101214700300 101250030067 101214100500... (10 Replies)
Discussion started by: Divya Nochiyil
10 Replies

7. UNIX for Dummies Questions & Answers

To compare two files,Output into a new file

Hi Please help me to compare two files and output into a new file file1.txt 15114933 |4001 15291649 |933502 15764675 |4316 15764678 |4316 15761974 |282501 15673104 |933505 15673577 |933505 15673098 |933505 15673096 |933505 15673092 |933505 15760705 ... (13 Replies)
Discussion started by: Ankita Talukdar
13 Replies

8. Shell Programming and Scripting

Compare output of UNIX command and match data to text file

I am working on an outage script and I run a command from the command line which tells me the amount of generator failures in my market. The output of this command only gives me three digits to identify the site by. I have a master list of all sites in a separate file, call it list.txt. If my... (7 Replies)
Discussion started by: jbrass
7 Replies

9. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

10. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies
COMM(1) 							   User Commands							   COMM(1)

NAME
comm - compare two sorted files line by line SYNOPSIS
comm [OPTION]... FILE1 FILE2 DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line. When FILE1 or FILE2 (not both) is -, read standard input. With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files) --check-order check that the input is correctly sorted, even if all input lines are pairable --nocheck-order do not check that the input is correctly sorted --output-delimiter=STR separate columns with STR --total output a summary -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit Note, comparisons honor the rules specified by 'LC_COLLATE'. EXAMPLES
comm -12 file1 file2 Print only lines present in both file1 and file2. comm -3 file1 file2 Print lines in file1 not in file2, and vice versa. AUTHOR
Written by Richard M. Stallman and David MacKenzie. REPORTING BUGS
GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report comm translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
join(1), uniq(1) Full documentation at: <http://www.gnu.org/software/coreutils/comm> or available locally via: info '(coreutils) comm invocation' GNU coreutils 8.28 January 2018 COMM(1)
All times are GMT -4. The time now is 02:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy