Visit Our UNIX and Linux User Community


Compare - 1st col of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare - 1st col of file
# 22  
Old 09-15-2009
Code:
$ wc -l file1 file2
  13872 file1
 171942 file2
 185814 total

As for the threshold, I really have no idea but as the Unix saying goes: "do one thing and do it well". And, again, awk is the right tool for the job. It is unbeatable when it get to processing data files. Don't be afraid by its syntax and paradigm and have a look here:
awk.info
# 23  
Old 09-15-2009
Your script gives syntax error on Solaris 10.
I don't have time to look into it right now.
# 24  
Old 09-15-2009
Use nawk or /usr/xpg4/bin/awk on Solaris.
# 25  
Old 09-15-2009
Thanks. nawk does it.
I looked at timex on nawk and command line. It is not really conclusive. Time varies wildly:
nawk
first time

real 0.58
user 0.19
sys 0.06

second time

real 0.35
user 0.18
sys 0.05

modified script (second version, for loop and grep, writing to 2 files)
Code:
for i in `cat file11 file22 | cut -f1 -d \| | sort | uniq -u`
do
echo $i
grep -h $i file1  >>  fil1
grep -h $i file2 >> fil2
done

first time

real 0.63
user 0.01
sys 0.00

second time

real 0.57
user 0.42
sys 0.09

Two files were as follows:
Code:
# wc -c file11 file22
 1574508 file1
 1148400 file2
 2722908 total

My conclusion:
These tests are not really rigorous and depend on other activities on the system.
The gain in performance is really negligible. Both approaches are fine from performance point of view. Using unix separate commands is much easier to learn and scripts are much easier to maintain. If not fluent in awk and needing more powerful language learning perl is probably a better idea. Awk syntax is not very friendly and I think it is a legacy program.
# 26  
Old 09-16-2009
Thanks Franklin52 for taking your time to explain this newbie. I'm still learning the code you have shared. Can I have some more questions on this please.

Quote:
Code:
'NR==FNR{a[$1];next}

Define an array a with the first field as index if we read the first file.
I'm able to understand that only while reading the first file NR and FNR will be equal. Is this some kind of "if condition". I could not see any "if" there.SmilieSmilie
Kindly bear my ignorance.

About the solution by gcp, it is self explanatory and I have no questions on it.
# 27  
Old 09-16-2009
It just defines an array variable and the command:

Code:
$1 in a

checks if the variable has been defined.

Regards

Previous Thread | Next Thread
Test Your Knowledge in Computers #49
Difficulty: Easy
A USB hard drive is considered to be a primary storage device for personal computers.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

3. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

6. Shell Programming and Scripting

how to add new col in a file

Hi, Experts, I have a requirement as following: my source file: a a a b b c c c c I need add one more colume as following: 1 a 2 a 3 a 1 b 2 b 1 c 2 c (4 Replies)
Discussion started by: ken002
4 Replies

7. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies

8. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

9. Shell Programming and Scripting

compare two files and make 1st file same as 2nd file

I am trying to compare two file and make changes where ever its different. for example: Contents of file1 IP=192.165.89.11 NM=255.255.0.0 GW=192.165.89.1 Contents of file2 IP=192.165.89.11 NM=255.255.255.255 GW=192.165.89.1 NOTE HERE THAT NM IS DIFFERENT So i want the changes... (6 Replies)
Discussion started by: pradeepreddy
6 Replies

10. Shell Programming and Scripting

compare two col from 2 files, and output uniq from file 1

Hi, I can't find how to achive such thing, please help. I have try with uniq and comm but those command can't compare columns just whole lines, I think awk will be the best but awk is magic for me as of now. file a a1~a2~a3~a4~a6~a7~a8 file b b1~b2~b3~b4~b6~b7~b8 output 1: compare... (2 Replies)
Discussion started by: pp56825
2 Replies

Featured Tech Videos