Several file comparison not uniq or comm command


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Several file comparison not uniq or comm command
# 1  
Old 02-08-2010
Several file comparison not uniq or comm command

When comparing several files is there a way to find values unique to each file?

File1
Code:
a
b
c
d

File2
Code:
a
b
t

File 3
Code:
a
c
h

I want to print d for File1 because it is not in the other two
t for File2
h for file3

Many thanks

L

Last edited by zaxxon; 02-09-2010 at 03:52 AM.. Reason: use code tags please, ty
# 2  
Old 02-09-2010
Why not uniq or comm? They should be available on all Unixes and Linuxes.
# 3  
Old 02-09-2010
Assuming that this is small scale with small text files. We can find the unique values in a list of files and then reverse lookup each unique value in the original files.
For example:

Code:
cat filename*|sort|uniq -u | while read line
do
       echo "Unique value found : ${line} : in file"
       echo "`grep -lx "${line}" filename*`"
done


Last edited by methyl; 02-09-2010 at 03:27 PM.. Reason: typos
# 4  
Old 02-19-2010
Hi.

If I needed to get this done quickly, I would make use of the usual *nix commands. I would add the file name to each line, then manipulate the results so that I had a single file, sort it, collect the lines on which the data items were the same, and then filter for lines which had exactly 2 fields. For example:
Code:
#!/usr/bin/env bash

# @(#) s2	Demonstrate solve problem of unique values with collection.

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p awk
# set -o nounset

rm -f t1 t2
for file in data*
do
  sed "s/$/\t$file/" $file >> t1
done

echo
echo " Sample at beginning & end of $( wc -l < t1) lines in combined data file:"
head -3 t1
echo ...
tail -3 t1

echo
echo " Collector script:"
cat collect

echo
echo " Results for lines with 2 fields:"
sort t1 |
./collect |
tee t2 |
awk ' NF == 2 '

echo
echo " Intermediate file from awk collector script:"
cat t2

exit 0

producing for your data:
Code:
% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
GNU Awk 3.1.5

 Sample at beginning & end of 10 lines in combined data file:
a	data1
b	data1
c	data1
...
a	data3
c	data3
h	data3

 Collector script:
#!/usr/bin/env sh

# @(#) collect	Demonstrate collection script, awk.

FILE="$1"

# Use nawk or /usr/xpg4/bin/awk on Solaris.

awk '
BEGIN	{ FS = OFS = "\t" ; previous = "" ; line = "" ; first = "true"}
first == "true" { first = "false" ; previous = $1 ; line = $0 ; next }
$1 == previous	{ line = line "\t" $2 ; next }
		{ print line ; previous = $1 ; line = $0 }
END	{ print line }
' $FILE

exit 0

 Results for lines with 2 fields:
d	data1
h	data3
t	data2

 Intermediate file from awk collector script:
a	data1	data2	data3
b	data1	data2
c	data1	data3
d	data1
h	data3
t	data2

The awk script is for this specific instance. If this was going to be a on-going task, I would write a more general multi-file join, and have a self-join mode when only one file was specified. In fact, all the operations could probably be placed into the perl code, so that the data need be touched a minimum of times.

Best wishes ... cheers, drl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help regarding formatting(comm -23 command)

Hello all , I have two files a.txt and b.txt which have same content . They contain data that is fetched from database through a java program. When I delete a line in a.txt and run the below command comm -13 a.txt b.txt I am not getting the expected result i.e. the line i deleted from... (5 Replies)
Discussion started by: RaviTej
5 Replies

2. Linux

comm command help

The manual does not cover this very well. What do the following compares will do ? 1) comm -13 file1 file2: will it display what is in file2 not in file1? 2) comm -23 file1 file2: will it display what in 1 but not in 2 ? Thanks (5 Replies)
Discussion started by: mrn6430
5 Replies

3. UNIX for Dummies Questions & Answers

Need help with comm command

Hello , I am trying to get contents which are only present in a.csv ,so using comm -23 cat a.csv | sort > a.csv cat b.csv | sort > b.csv comm -23 a.csv b.csv > c.csv. a.csv SKU COUNTRY CURRENCY PRICE_LIST_TYPE LIST_PRICE_EFFECTIVE_DATE TG430ZA ZA USD DF ... (4 Replies)
Discussion started by: RaviTej
4 Replies

4. Shell Programming and Scripting

HPUX and comm command

I need to compare 2 files. I need to see if 1 file has records that are not in a second file. I did some searching and found the 'comm' command. According to the man pages comm -23 test1.txt test2.txt Will tell me what is in file 1 and not in file 2. So I did a simple test test1.txt has the... (3 Replies)
Discussion started by: guessingo
3 Replies

5. UNIX for Dummies Questions & Answers

help on COMM command please

could some one please explain with examples how comm -12 & comm -3 works. I am confused with manual page, Thankyou. (2 Replies)
Discussion started by: Ariean
2 Replies

6. Shell Programming and Scripting

comm command help with unicode chars in file

Hi, I have a Master file (file.txt) with good and bad records( records with unicode characters). I ahve a file with only bad records (bad.txt) I want the records in file.txt which are not present in bad.txt ie only the good records. I tried comm -23 file.txt bad.txt It is giving... (14 Replies)
Discussion started by: ashwin3086
14 Replies

7. Shell Programming and Scripting

File Comparison command but ignoring while spaces

Hello All, I am writing a file comparison utility and using the cmp command to compare 2file. But I need command that will compare 2 files and if the files are identical expect for differences in white spaces, then it should ignore those spaces and consider the two files equal. Is there a way to... (7 Replies)
Discussion started by: Veenak15
7 Replies

8. UNIX for Dummies Questions & Answers

help in comm command

Hi all, I need help in comm command , I am having 2 files . I have to display the common line in the two file only onnce and i have to also display the non common line as well. tmpcut1 -- First file cat tmpcut1 smstr_303000_O_432830_... f_c2_queue_sys30.sys30 RUNNING 10 1000... (1 Reply)
Discussion started by: arunkumar_mca
1 Replies

9. Shell Programming and Scripting

comm command

Hi I have issue with "comm " command file-1 ---- l65059 l65407 l68607 l68810 l69143 l71310 l72918 l73146 l73273 l76411 file-2 ----- (8 Replies)
Discussion started by: amitrajvarma
8 Replies

10. UNIX for Dummies Questions & Answers

Comm, command help

See my other post on sdiff .... I don't think sdiff is able to do what I want. The 'comm' command does what I need and works fine as far as the logic and results. The problem I'm having is with the output format, it outputs 3 columns of data, but because of the way it starts each line... (2 Replies)
Discussion started by: cowpoke
2 Replies
Login or Register to Ask a Question