Compare - 1st col of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare - 1st col of file
# 15  
Old 09-15-2009
Where's your code to produce the 2 desired files?
# 16  
Old 09-15-2009
Quote:
Originally Posted by gch
Except syntax is simpler without awk. Mine was also one line of code and faster. You can test that with command "time". As you noticed I did not have to explain syntax to user.
Well, you will be disapointed. I just ran a benchmark on files with 13000 lines each and here are the results:
Code:
jeanluc@ibm:~/scripts/test$ time nawk -F'|' 'FNR==NR {f1[$1];next} !($1 in f1)' file1 file2 > /dev/null

real	0m0.261s
user	0m0.248s
sys	0m0.008s


jeanluc@ibm:~/scripts/test$ time mawk -F'|' 'FNR==NR {f1[$1];next} !($1 in f1)' file1 file2 > /dev/null

real	0m0.093s
user	0m0.080s
sys	0m0.008s


jeanluc@ibm:~/scripts/test$ time cat file1 file2 | cut -f1 -d \| | sort | uniq -u > /dev/null

real	0m0.943s
user	0m0.888s
sys	0m0.052s
jeanluc@ibm:~/scripts/test$

In your solution you are using three different external programs: cat, sort and uniq which, BTW, is useless as sort can handle that with the -u switch. The penalty for your system (memory and CPU wise) is higher than with a simple awk run.
# 17  
Old 09-15-2009
cat test2From example he shows, he needed only third file with first column that was occurring only once in both files.
If one needed full line entry from both files this will do:
Code:
for i in `cat file1 file2 | cut -f1 -d \| | sort | uniq -u`
do 
grep -h $i file1 >> fil1
grep -h $i file2 >> fil2
done

If one wants to save output, one can redirect it to some file. It still runs faster than awk and it is self-explanatory.

Last edited by gch; 09-15-2009 at 02:24 PM..
# 18  
Old 09-15-2009
Just realised that I used vgersh99's solution. Here are the updated result with Franklin52's one.

with nawk:
real 0m0.279s
user 0m0.272s
sys 0m0.008s

with mawk:
real 0m0.141s
user 0m0.084s
sys 0m0.016s

with the cat | cut | sort | uniq
real 0m0.943s
user 0m0.888s
sys 0m0.052s
# 19  
Old 09-15-2009
ripat, this is interesting. Which system are you using and which shell?
# 20  
Old 09-15-2009
Linux and ksh. But in this case I don't think that the type of shell is relevant as all solutions are using external programs. I ran that test on large files as one can assume that the OP was just giving a sample and will be working on larger files.
# 21  
Old 09-15-2009
Quote:
Originally Posted by ripat
Linux and ksh. But in this case I don't think that the type of shell is relevant as all solutions are using external programs. I ran that test on large files as one can assume that the OP was just giving a sample and will be working on larger files.
You are right. How big were your files? Only OS and system load would make difference. I want to try it on Solaris 10. In the case of large files, there would be a threshold at which awk could gain advantage. It all depends on how these utilities are written. The multiple commands have smaller memory footprint but use multiple redirections (waiting for I/O).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

3. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

6. Shell Programming and Scripting

how to add new col in a file

Hi, Experts, I have a requirement as following: my source file: a a a b b c c c c I need add one more colume as following: 1 a 2 a 3 a 1 b 2 b 1 c 2 c (4 Replies)
Discussion started by: ken002
4 Replies

7. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies

8. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

9. Shell Programming and Scripting

compare two files and make 1st file same as 2nd file

I am trying to compare two file and make changes where ever its different. for example: Contents of file1 IP=192.165.89.11 NM=255.255.0.0 GW=192.165.89.1 Contents of file2 IP=192.165.89.11 NM=255.255.255.255 GW=192.165.89.1 NOTE HERE THAT NM IS DIFFERENT So i want the changes... (6 Replies)
Discussion started by: pradeepreddy
6 Replies

10. Shell Programming and Scripting

compare two col from 2 files, and output uniq from file 1

Hi, I can't find how to achive such thing, please help. I have try with uniq and comm but those command can't compare columns just whole lines, I think awk will be the best but awk is magic for me as of now. file a a1~a2~a3~a4~a6~a7~a8 file b b1~b2~b3~b4~b6~b7~b8 output 1: compare... (2 Replies)
Discussion started by: pp56825
2 Replies
Login or Register to Ask a Question