comparing 2 text files to get unique values??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting comparing 2 text files to get unique values??
# 8  
Old 12-16-2008
Code:
nawk -v OFS='\t' '$1==1 {$1=$1;print}'

" -v OFS='\t' " - set the OutputFieldSeparator to '\t' (tab)
" $1 == 1" - if the value of field 1 ($1) is 1 (the number of occurrences in the combined file), then do {...}

" {$1=$1; print} " - force the reevaluation of the current record/line - forcing the '\t' delimited fields

" | cut -f2- " - given a default field delimiter of '\t', 'cut' everything starting at field 2.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies

2. Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

Hi, I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system. Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in... (13 Replies)
Discussion started by: manikandan23
13 Replies

3. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

4. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

5. Shell Programming and Scripting

Comparing the values of two files

Hi Am trying to compare the values of two files.. One is a big file that has many values and the other is a small file.. The big file has all values present in small file.. # cat SmallFile 4456602 22347881 7471282 15859891 8257690 21954701 7078068 18219229 2883826 6094959 100000 ... (3 Replies)
Discussion started by: Priya Amaresh
3 Replies

6. UNIX for Dummies Questions & Answers

Comparing two test files and printing out the values that do not match

Hi, I have two text files with matching first columns. Some of the values in the second column do not match. I want to write a script to print out the rows (only the first column) where the values in the second column do not match. Example: Input 1 A 1 B 2 C 3 D 4 Input 2 A 2 B 2... (6 Replies)
Discussion started by: evelibertine
6 Replies

7. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

8. UNIX for Dummies Questions & Answers

Comparing two files and dividing the values

Hi all, I am new to unix and I am trying hard to get this requirement, but no luck. I am trying to compare two cloumns in two files and if it matches, the last column in file1 must be divided by file2 and the output must be written in a new file. To elaborate the 2nd column in file1 (EUR) must be... (6 Replies)
Discussion started by: smadderla
6 Replies

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

10. UNIX for Dummies Questions & Answers

Need to find only unique values for a given tag across the files

Need to find only unique values for a given tag across the files: For eg: Test1: <Tag1>aaa</Tag1> <Tag2>bbb</Tag2> <Tag3>ccc</Tag3> Test2: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test3: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test4: (8 Replies)
Discussion started by: sudheshnaiyer
8 Replies
Login or Register to Ask a Question
DIFF(1) 						      General Commands Manual							   DIFF(1)

NAME
diff - differential file and directory comparator SYNOPSIS
diff [ -l ] [ -r ] [ -s ] [ -cefhn ] [ -biwt ] dir1 dir2 diff [ -cefhn ] [ -biwt ] file1 file2 diff [ -Dstring ] [ -biw ] file1 file2 DESCRIPTION
If both arguments are directories, diff sorts the contents of the directories by name, and then runs the regular file diff algorithm (described below) on text files which are different. Binary files which differ, common subdirectories, and files which appear in only one directory are listed. Options when comparing directories are: -l long output format; each text file diff is piped through pr(1) to paginate it, other differences are remembered and summarized after all text file differences are reported. -r causes application of diff recursively to common subdirectories encountered. -s causes diff to report files which are the same, which are otherwise not mentioned. -Sname starts a directory diff in the middle beginning with file name. When run on regular files, and when comparing text files which differ during directory comparison, diff tells what lines must be changed in the files to bring them into agreement. Except in rare circumstances, diff finds a smallest sufficient set of file differences. If nei- ther file1 nor file2 is a directory, then either may be given as `-', in which case the standard input is used. If file1 is a directory, then a file in that directory whose file-name is the same as the file-name of file2 is used (and vice versa). There are several options for output format; the default output format contains lines of these forms: n1 a n3,n4 n1,n2 d n3 n1,n2 c n3,n4 These lines resemble ed commands to convert file1 into file2. The numbers after the letters pertain to file2. In fact, by exchanging `a' for `d' and reading backward one may ascertain equally how to convert file2 into file1. As in ed, identical pairs where n1 = n2 or n3 = n4 are abbreviated as a single number. Following each of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected in the second file flagged by `>'. Except for -b, -w, -i or -t which may be given with any of the others, the following options are mutually exclusive: -e produces a script of a, c and d commands for the editor ed, which will recreate file2 from file1. In connection with -e, the fol- lowing shell program may help maintain multiple versions of a file. Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand. A `latest version' appears on the standard output. (shift; cat $*; echo '1,$p') | ed - $1 Extra commands are added to the output when comparing directories with -e, so that the result is a sh(1) script for converting text files which are common to the two directories from their state in dir1 to their state in dir2. -f produces a script similar to that of -e, not useful with ed, and in the opposite order. -n produces a script similar to that of -e, but in the opposite order and with a count of changed lines on each insert or delete com- mand. This is the form used by rcsdiff(1). -c produces a diff with lines of context. The default is to present 3 lines of context and may be changed, e.g to 10, by -c10. With -c the output format is modified slightly: the output beginning with identification of the files involved and their creation dates and then each change is separated by a line with a dozen *'s. The lines removed from file1 are marked with `- '; those added to file2 are marked `+ '. Lines which are changed from one file to the other are marked in both files with with `! '. Changes which lie within <context> lines of each other are grouped together on output. (This is a change from the previous ``diff -c'' but the resulting output is usually much easier to interpret.) -h does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length. -Dstring causes diff to create a merged version of file1 and file2 on the standard output, with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2. -b causes trailing blanks (spaces and tabs) to be ignored, and other strings of blanks to compare equal. -w is similar to -b but causes whitespace (blanks and tabs) to be totally ignored. E.g., ``if ( a == b )'' will compare equal to ``if(a==b)''. -i ignores the case of letters. E.g., ``A'' will compare equal to ``a''. -t will expand tabs in output lines. Normal or -c output adds character(s) to the front of each line which may screw up the indenta- tion of the original source lines and make the output listing difficult to interpret. This option will preserve the original source's indentation. FILES
/tmp/d????? /usr/libexec/diffh for -h /bin/diff for directory diffs /bin/pr SEE ALSO
cmp(1), cc(1), comm(1), ed(1), diff3(1) DIAGNOSTICS
Exit status is 0 for no differences, 1 for some, 2 for trouble. BUGS
Editing scripts produced under the -e or -f option are naive about creating lines consisting of a single `.'. When comparing directories with the -b, -w or -i options specified, diff first compares the files ala cmp, and then decides to run the diff algorithm if they are not equal. This may cause a small amount of spurious output if the files then turn out to be identical because the only differences are insignificant blank string or case differences. 4th Berkeley Distribution October 21, 1996 DIFF(1)