Compare - 1st col of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare - 1st col of file
# 8  
Old 09-14-2009
one change to it pls....

This also works fine.
I'm sorry, Can I have one small change while writing the output alone.
For the data missing in first file, it has to go into a separate file while the data missing in second file, it should be in another file.
# 9  
Old 09-14-2009
Quote:
Originally Posted by swame_sp
This also works fine.
I'm sorry, Can I have one small change while writing the output alone.
For the data missing in first file, it has to go into a separate file while the data missing in second file, it should be in another file.
Should be something like this:

Code:
awk -F"|" 'NR==FNR{a[$1];next}
$1 in a{a[$1]++;next}
{print > "NotInFirstfile"}
END{for(i in a){if(!a[i]){print i}}}
' Firstfile Secondfile > NotInSecondfile

# 10  
Old 09-14-2009
@Franklin52,

I would really appreciate if you can explain how it works.
Trying to learn....

I have not specified any file name but still it works fine.... how is that possible?
How does it identify the source files for creating the above two new files.?

Thanks,

Last edited by swame_sp; 09-14-2009 at 05:37 PM..
# 11  
Old 09-15-2009
Code:
awk -F"|" 'NR==FNR{a[$1];next}
$1 in a{a[$1]++;next}
{print > "NotInFirstfile"}
END{for(i in a){if(!a[i]){print i}}}
' Firstfile Secondfile > NotInSecondfile

Explanation:

Code:
-F"|"

Set fieldseparator.

Code:
'NR==FNR{a[$1];next}

Define an array a with the first field as index if we read the first file.

The next lines are for processing the second file:

Code:
$1 in a{a[$1]++;next}

If the first field is defined in array a increase the value of the array with 1 (line is present in both files) and read the next line.

Code:
{print > "NotInFirstfile"}

If the first field is NOT defined in array a print the line to the file "NotInFirstfile".

Code:
END{for(i in a){if(!a[i]){print i}}}

At last we print the elements of the array a with the value 0 (not increased when we read the second file).

Code:
' Firstfile Secondfile > NotInSecondfile

Firstfile and Secondfile are the input files, the prints of the END section are redirected to the file NotInSecondfile.


Regards
# 12  
Old 09-15-2009
Quote:
Originally Posted by Franklin52
But the use of 4 external programs is not the most efficient way, try this:

Code:
awk -F"|" 'NR==FNR{a[$1];next}
$1 in a{a[$1]++;next}
{print}
END{for(i in a){if(!a[i]){print i}}}
' Firstfile Secondfile

awk is the least efficient program to use. If you look at awk binary code, it is half as big as ksh. Means you are loading this on top of shell you are using. On top of that it makes code hard to debug and read and prevents programmers gaining in-depth experience with UNIX. It was developed at time when only sh was available and it could not process and format character strings. This need vanished with advent of ksh and bash. Awk right now is a crutch for people that never really learned UNIX commands.

This is character count on binaries:

Code:
# cd /usr/bin
# wc -c ksh
  171412 ksh
# wc -c awk
   80184 awk
# wc -c sort
    5816 sort
# wc -c uniq
   10036 uniq
# wc -c cut
    9928 cut

Which of these take more computer resources?
# 13  
Old 09-15-2009
I am a awk *and* ksh user and I tend to pickup the right tool for the job. And for the question raised in the OP, awk *is* the right tool. Try to achieve the same result with ksh - or any other shell - with just one line of code. Oh, and awk will be _much_ faster also.
# 14  
Old 09-15-2009
Except syntax is simpler without awk. Mine was also one line of code and faster. You can test that with command "time". As you noticed I did not have to explain syntax to user.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

3. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

6. Shell Programming and Scripting

how to add new col in a file

Hi, Experts, I have a requirement as following: my source file: a a a b b c c c c I need add one more colume as following: 1 a 2 a 3 a 1 b 2 b 1 c 2 c (4 Replies)
Discussion started by: ken002
4 Replies

7. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies

8. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

9. Shell Programming and Scripting

compare two files and make 1st file same as 2nd file

I am trying to compare two file and make changes where ever its different. for example: Contents of file1 IP=192.165.89.11 NM=255.255.0.0 GW=192.165.89.1 Contents of file2 IP=192.165.89.11 NM=255.255.255.255 GW=192.165.89.1 NOTE HERE THAT NM IS DIFFERENT So i want the changes... (6 Replies)
Discussion started by: pradeepreddy
6 Replies

10. Shell Programming and Scripting

compare two col from 2 files, and output uniq from file 1

Hi, I can't find how to achive such thing, please help. I have try with uniq and comm but those command can't compare columns just whole lines, I think awk will be the best but awk is magic for me as of now. file a a1~a2~a3~a4~a6~a7~a8 file b b1~b2~b3~b4~b6~b7~b8 output 1: compare... (2 Replies)
Discussion started by: pp56825
2 Replies
Login or Register to Ask a Question