Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
google site



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Reply
English Japanese Spanish French German Portuguese Italian Powered by Powered by Google
 
Thread Tools Search this Thread Display Modes
  #1  
Old 02-08-2010
Registered User
 

Join Date: Aug 2008
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Several file comparison not uniq or comm command

When comparing several files is there a way to find values unique to each file?

File1

Code:
a
b
c
d

File2

Code:
a
b
t

File 3

Code:
a
c
h

I want to print d for File1 because it is not in the other two
t for File2
h for file3

Many thanks

L

Last edited by zaxxon; 02-09-2010 at 02:52 AM.. Reason: use code tags please, ty
Sponsored Links
  #2  
Old 02-09-2010
zaxxon's Avatar
zaxxon zaxxon is offline Forum Staff  
code tag tagger
 

Join Date: Sep 2007
Location: Germany
Posts: 3,238
Thanks: 7
Thanked 47 Times in 45 Posts
Why not uniq or comm? They should be available on all Unixes and Linuxes.
  #3  
Old 02-09-2010
Registered User
 

Join Date: Mar 2008
Posts: 2,204
Thanks: 5
Thanked 40 Times in 39 Posts
Assuming that this is small scale with small text files. We can find the unique values in a list of files and then reverse lookup each unique value in the original files.
For example:


Code:
cat filename*|sort|uniq -u | while read line
do
       echo "Unique value found : ${line} : in file"
       echo "`grep -lx "${line}" filename*`"
done


Last edited by methyl; 02-09-2010 at 02:27 PM.. Reason: typos
  #4  
Old 02-19-2010
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 898
Thanks: 0
Thanked 10 Times in 10 Posts
Hi.

If I needed to get this done quickly, I would make use of the usual *nix commands. I would add the file name to each line, then manipulate the results so that I had a single file, sort it, collect the lines on which the data items were the same, and then filter for lines which had exactly 2 fields. For example:

Code:
#!/usr/bin/env bash

# @(#) s2	Demonstrate solve problem of unique values with collection.

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p awk
# set -o nounset

rm -f t1 t2
for file in data*
do
  sed "s/$/\t$file/" $file >> t1
done

echo
echo " Sample at beginning & end of $( wc -l < t1) lines in combined data file:"
head -3 t1
echo ...
tail -3 t1

echo
echo " Collector script:"
cat collect

echo
echo " Results for lines with 2 fields:"
sort t1 |
./collect |
tee t2 |
awk ' NF == 2 '

echo
echo " Intermediate file from awk collector script:"
cat t2

exit 0

producing for your data:

Code:
% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
GNU Awk 3.1.5

 Sample at beginning & end of 10 lines in combined data file:
a	data1
b	data1
c	data1
...
a	data3
c	data3
h	data3

 Collector script:
#!/usr/bin/env sh

# @(#) collect	Demonstrate collection script, awk.

FILE="$1"

# Use nawk or /usr/xpg4/bin/awk on Solaris.

awk '
BEGIN	{ FS = OFS = "\t" ; previous = "" ; line = "" ; first = "true"}
first == "true" { first = "false" ; previous = $1 ; line = $0 ; next }
$1 == previous	{ line = line "\t" $2 ; next }
		{ print line ; previous = $1 ; line = $0 }
END	{ print line }
' $FILE

exit 0

 Results for lines with 2 fields:
d	data1
h	data3
t	data2

 Intermediate file from awk collector script:
a	data1	data2	data3
b	data1	data2
c	data1	data3
d	data1
h	data3
t	data2

The awk script is for this specific instance. If this was going to be a on-going task, I would write a more general multi-file join, and have a self-join mode when only one file was specified. In fact, all the operations could probably be placed into the perl code, so that the data need be touched a minimum of times.

Best wishes ... cheers, drl
Sponsored Links
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
File Comparison command but ignoring while spaces Veenak15 Shell Programming and Scripting 7 07-10-2009 10:21 AM
help in comm command arunkumar_mca UNIX for Dummies Questions & Answers 1 04-27-2009 11:04 AM
comm command amitrajvarma Shell Programming and Scripting 8 02-06-2008 04:20 AM
comm command in sorting IP tads98 UNIX for Advanced & Expert Users 1 05-16-2006 07:41 AM
Comm, command help cowpoke UNIX for Dummies Questions & Answers 2 10-25-2005 08:26 AM



All times are GMT -4. The time now is 08:00 AM.