Awk: compare values in two columns of the same file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk: compare values in two columns of the same file
# 1  
Old 07-24-2019
Awk: compare values in two columns of the same file

I'm trying to learn awk, but I've hit a roadblock with this problem. I have a hierarchy stored in a file with 3 columns:
Code:
id	name	parentID
4	D	2
2	B	1
3	C	1
1	A	5

I need to check if there are any values in column 3 that are not represented anywhere in column 1. I've tried this:
Code:
awk '{arr[$1];} !($3 in arr) {print $0}' file.txt

The desired output would be:
Code:
1	A	5

But it prints the entire file instead. What am I doing wrong?
# 2  
Old 07-24-2019
Hi, to determine if a value is not present in a column, you have to read the entire file first. There are two choices, read the file and put all relevant information in memory and then print the results, or read the same file twice.

With the latter approach, something like this should work:

Code:
awk 'NR==FNR{A[$1]; next} !($3 in A)' file.txt file.txt

Code:
id	name	parentID
1	A	5


--
Note: NR==FNR is a condition that only applies when the file is being read for the first time. The next statement ensures the rest of the code is used when reading the file for the second time.

Last edited by Scrutinizer; 07-24-2019 at 08:35 PM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 3  
Old 07-24-2019
Thanks, that worked beautifully! I had a bit of trouble getting it to work in my real life application (a 2 GB file with dozens of columns and over 2 million lines), but I managed to get it to work by specifying the field separator:
Code:
awk -F '\t' 'NR==FNR{A[$1]; next} !($3 in A)' file.txt file.txt

There were blank spaces in some of the fields.

Thanks a lot for your help.

--- Post updated at 02:57 AM ---

Actually, one more thing. The current output includes lines if there's no value in column 3, e. g., with this file:
Code:
id	name	parentID
4	D	2
2	B	1
3	C	1
1	A	5
6	E

I get this result:
Code:
id	name	parentID
1	A	5
6	E

Since the purpose of this exercise is to find parentIDs that are missing from the id column, I am not interested in lines where $3 is empty. How can I get it to omit those?

Last edited by kaktus; 07-24-2019 at 10:37 PM..
# 4  
Old 07-25-2019
Hi, try modifying the condition like so:
Code:
$3!="" && !($3 in A)

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 07-25-2019
Works great, thanks!
# 6  
Old 07-25-2019
To avoid one logical test for every line in a file, this (based on Scrutinizer's proposal) might be interesting, too:


Code:
awk 'BEGIN {A[""]} NR==FNR{A[$1]; next} !($3 in A)' file file

This User Gave Thanks to RudiC For This Post:
# 7  
Old 07-25-2019
Thanks, for my 2GB file, this saves a little bit of processing time: 2m38.580s vs. 2m43.779s. Could you please explain what adding A[""] does?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files column values using awk

Judi # cat File1 judi /export/home 76 judi /usr 83 judi # judi # cat File2 judi /export/home 79 judi /usr 82 judi # if COLUMN3 of File2 is greater that COLUMN3 of File1, then print File2's lines juid /export/home 79 Code tags please (2 Replies)
Discussion started by: judi
2 Replies

2. Shell Programming and Scripting

Compare 2 columns of files awk

hello everybody I have 2 files the file1 has 10 columns and the form: ... 110103 0802 1.16 38 20.16 22 1.21 8.77 0.00 20 120103 0832 23.40 38 22.10 21 46.35 10.17 0.00 28 120103 1413 45.00 38 24.50 21 48.85 7.89 0.00 38 130103 1112 23.40 38 22.10 21 48.85 ... (5 Replies)
Discussion started by: phaethon
5 Replies

3. Shell Programming and Scripting

How to compare the values of a column in a same file using awk?

Dear Unix experts, I have got a file where I would like to compare the values of second column if first column is same in such a way that the difference between the values is >50. If not, I would like to discard both values. For example, my input file looks like - comp275_c0_seq2 73... (7 Replies)
Discussion started by: utritala
7 Replies

4. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

5. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

6. Shell Programming and Scripting

How to compare the values of a column in awk in a same file and consecutive lines..

I would like to compare the values of 2nd column of consecutive lines of same file in such a way so that if the difference between first value and second value is more than 100 it should print complete line else ignore line. Input File ========== PDB 2500 RTDB 123 RTDB-EAGLE 122 VSCCP 2565... (4 Replies)
Discussion started by: manuswami
4 Replies

7. Shell Programming and Scripting

Compare columns in two different files using awk

Hi, I want to compare the columns of two files excluding column 2 from both the files. I tried this awk command. awk -F":" 'NR==FNR{++a;next} !(a)' file1.txt file2.txt . Example: File1.txt 123:09-15-2011:abc:123456 123:09-15-2011:abc:234567 123:09-15-2011:abc:345678 ... (5 Replies)
Discussion started by: shell_newbie
5 Replies

8. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

9. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433... (1 Reply)
Discussion started by: jm4smtddd
1 Replies

10. Shell Programming and Scripting

compare columns for equal values and output a summary

Hi all I am trying to scan a file that has 3 columns: red blue 123351 red blue 848655 red blue 126354 red blue 023158 black white 654896 red blue 650884 I want an output that sums the rows that have matching columns 1 and 2 :wall: red blue has 5 entries black white has 1 entry ... (4 Replies)
Discussion started by: reno
4 Replies
Login or Register to Ask a Question