comparing 2 text files to get unique values??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting comparing 2 text files to get unique values??
# 8  
Old 12-16-2008
Code:
nawk -v OFS='\t' '$1==1 {$1=$1;print}'

" -v OFS='\t' " - set the OutputFieldSeparator to '\t' (tab)
" $1 == 1" - if the value of field 1 ($1) is 1 (the number of occurrences in the combined file), then do {...}

" {$1=$1; print} " - force the reevaluation of the current record/line - forcing the '\t' delimited fields

" | cut -f2- " - given a default field delimiter of '\t', 'cut' everything starting at field 2.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies

2. Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

Hi, I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system. Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in... (13 Replies)
Discussion started by: manikandan23
13 Replies

3. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

4. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

5. Shell Programming and Scripting

Comparing the values of two files

Hi Am trying to compare the values of two files.. One is a big file that has many values and the other is a small file.. The big file has all values present in small file.. # cat SmallFile 4456602 22347881 7471282 15859891 8257690 21954701 7078068 18219229 2883826 6094959 100000 ... (3 Replies)
Discussion started by: Priya Amaresh
3 Replies

6. UNIX for Dummies Questions & Answers

Comparing two test files and printing out the values that do not match

Hi, I have two text files with matching first columns. Some of the values in the second column do not match. I want to write a script to print out the rows (only the first column) where the values in the second column do not match. Example: Input 1 A 1 B 2 C 3 D 4 Input 2 A 2 B 2... (6 Replies)
Discussion started by: evelibertine
6 Replies

7. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

8. UNIX for Dummies Questions & Answers

Comparing two files and dividing the values

Hi all, I am new to unix and I am trying hard to get this requirement, but no luck. I am trying to compare two cloumns in two files and if it matches, the last column in file1 must be divided by file2 and the output must be written in a new file. To elaborate the 2nd column in file1 (EUR) must be... (6 Replies)
Discussion started by: smadderla
6 Replies

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

10. UNIX for Dummies Questions & Answers

Need to find only unique values for a given tag across the files

Need to find only unique values for a given tag across the files: For eg: Test1: <Tag1>aaa</Tag1> <Tag2>bbb</Tag2> <Tag3>ccc</Tag3> Test2: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test3: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test4: (8 Replies)
Discussion started by: sudheshnaiyer
8 Replies
Login or Register to Ask a Question
diff(1) 						      General Commands Manual							   diff(1)

Name
       diff - differential file comparator

Syntax
       diff [options] dir1 dir2
       diff [options] file1 file2

Description
       The command compares the contents of files or groups of files, and lists any differences it finds. When run on regular files, and when com-
       paring text files that differ during directory comparison, tells what lines must be changed in the files  to  bring  them  into	agreement.
       Except  in rare circumstances, finds a smallest sufficient set of file differences.  If neither file1 nor file2 is a directory, then either
       can be specified as `-', in which case the standard input is used.  If file1 is a directory, then a file in that directory  whose  filename
       is the same as the filename of file2 is used and likewise if file2 is a directory.

       If  both  arguments  are directories, sorts the contents of the directories by name, and then runs the regular file algorithm on text files
       that are different.  Binary files that differ, common subdirectories, and files that appear in only one directory are listed.

Options
       The following options are used when comparing directories:

       -l	 Displays the output in long format.  Each text file is piped through to paginate it; other differences are summarized	after  all
		 text file differences are reported.

       -n	 Produces a script similar to that of -e, but in reverse order and with a count of changed lines on each insert or delete command.

       -r	 Recursively checks files in common subdirectories.

       -s	 Displays names of files that are the same.

       -Sname	 Starts a directory in the middle beginning with the specified file.

       Except for the -b, i, t, and w options, which may be given with any of the others, the following formatting options are mutually exclusive:

       -b	 Ignores trailing blanks and other strings of blanks and treats such portions as equal.

       -c	 Displays three context lines with each output line.  For backwards compatibility, -cn causes n number of context lines.

       -C n	 Displays specified number of context lines with each output line.  With -c or -C the output format is modified slightly: the out-
		 put begins with identification of the files involved and their creation dates and then each change is separated by a line with  a
		 dozen asterisks (*).  The lines removed from file1 are marked with minus sign (-); those added to file2 are marked plus sign (+).
		 Lines that are changed from one file to the other are marked in both files with an exclamation point (!).

		 Changes within n context lines of each other are grouped together in the output.  This results in output  that  is  usually  much
		 easier to interpret.

       -Dstring  Causes  to  create a merged version of file1 and file2 on the standard output.  With C preprocessor controls included, a compila-
		 tion of the result without defining string is equivalent to compiling file1, while defining string will yield file2.

       -e	 Writes output to an script.  In connection with -e, the following shell program can help maintain multiple versions  of  a  file.
		 Only  an  ancestral  file ($1) and a chain of version-to-version scripts ($2,$3,...) made by need be available.  A latest version
		 message appears on the standard output.
		  (shift; cat $*; echo '1,$p') | ed - $1
		 If you specify -e when comparing directories the result is a script for converting text files that are common to the two directo-
		 ries from their state in dir1 to their state in dir2.

       -f	 Writes the output in reverse order to a script.

       -h	 Makes	a hasty comparison.  It works only when changed portions are short and well separated, but does work on files of unlimited
		 length.

       -i	 Ignores the case of letters.  For example 'A' will compare equal to `a'.

       -t	 Expand tabs in output lines.  Normal or -c output adds character(s) to the front of each line which may affect the indentation of
		 the  original	source lines and make the output listing difficult to interpret.  This option will preserves the original indenta-
		 tion.

       -w	 Causes whitespace (blanks and tabs) to be totally ignored.  For example, `if ( a == b )' will compare equal to `if(a==b)'.

       There are several options for output format; the default output format contains lines of these forms:

	    n1 a n3,n4
	    n1,n2 d n3
	    n1,n2 c n3,n4

       These lines resemble commands to convert file1 into file2.  The numbers after the letters pertain to file2.  In fact, by exchanging `a' for
       `d' and reading backward you can tell how to convert file2 into file1.  As in identical pairs where n1 = n2 or n3 = n4 are abbreviated as a
       single number.

       Following each of these lines come all the lines that are affected in the first file flagged by a left angle bracket  (<).   Then  all  the
       lines that are affected in the second file are listed, flagged by a right angle bracket (>).

Restrictions
       Editing scripts produced under the -e or -f option have trouble creating lines consisting of a single period (.).

       When comparing directories with the -b, i, t, or w options specified, first compares the files as does, and then runs the algorithm if they
       are not equal.  If the only differences are in the blank strings, may report these as differences.

Diagnostics
       Exit status is 0 for no differences, 1 for some differences,and 2 if the specified file cannot be found.

Files
       for		   -h

       See Also
	      cc(1), cmp(1), comm(1), diff3(1), ed(1)

																	   diff(1)