Awk: compare values in two columns of the same file

07-24-2019

Registered User

7, 0

Join Date: Jul 2019

Last Activity: 24 October 2019, 4:43 PM EDT

Posts: 7

Thanks Given: 7

Thanked 0 Times in 0 Posts

Awk: compare values in two columns of the same file

I'm trying to learn awk, but I've hit a roadblock with this problem. I have a hierarchy stored in a file with 3 columns:

Code:

id	name	parentID
4	D	2
2	B	1
3	C	1
1	A	5

I need to check if there are any values in column 3 that are not represented anywhere in column 1. I've tried this:

Code:

awk '{arr[$1];} !($3 in arr) {print $0}' file.txt

The desired output would be:

Code:

1	A	5

But it prints the entire file instead. What am I doing wrong?

kaktus

View Public Profile for kaktus

Find all posts by kaktus

07-24-2019

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, to determine if a value is not present in a column, you have to read the entire file first. There are two choices, read the file and put all relevant information in memory and then print the results, or read the same file twice.

With the latter approach, something like this should work:

Code:

awk 'NR==FNR{A[$1]; next} !($3 in A)' file.txt file.txt

Code:

id	name	parentID
1	A	5

--
Note: NR==FNR is a condition that only applies when the file is being read for the first time. The next statement ensures the rest of the code is used when reading the file for the second time.

Last edited by Scrutinizer; 07-24-2019 at 09:35 PM..

These 2 Users Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-25-2019

Registered User

7, 0

Join Date: Jul 2019

Last Activity: 24 October 2019, 4:43 PM EDT

Posts: 7

Thanks Given: 7

Thanked 0 Times in 0 Posts

Thanks, that worked beautifully! I had a bit of trouble getting it to work in my real life application (a 2 GB file with dozens of columns and over 2 million lines), but I managed to get it to work by specifying the field separator:

Code:

awk -F '\t' 'NR==FNR{A[$1]; next} !($3 in A)' file.txt file.txt

There were blank spaces in some of the fields.

Thanks a lot for your help.

--- Post updated at 02:57 AM ---

Actually, one more thing. The current output includes lines if there's no value in column 3, e. g., with this file:

Code:

id	name	parentID
4	D	2
2	B	1
3	C	1
1	A	5
6	E

I get this result:

Code:

id	name	parentID
1	A	5
6	E

Since the purpose of this exercise is to find parentIDs that are missing from the id column, I am not interested in lines where $3 is empty. How can I get it to omit those?

Last edited by kaktus; 07-24-2019 at 11:37 PM..

kaktus

View Public Profile for kaktus

Find all posts by kaktus

07-25-2019

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, try modifying the condition like so:

Code:

$3!="" && !($3 in A)

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-25-2019

Registered User

7, 0

Join Date: Jul 2019

Last Activity: 24 October 2019, 4:43 PM EDT

Posts: 7

Thanks Given: 7

Thanked 0 Times in 0 Posts

Works great, thanks!

kaktus

View Public Profile for kaktus

Find all posts by kaktus

07-25-2019

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

To avoid one logical test for every line in a file, this (based on Scrutinizer's proposal) might be interesting, too:

Code:

awk 'BEGIN {A[""]} NR==FNR{A[$1]; next} !($3 in A)' file file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-25-2019

Registered User

7, 0

Join Date: Jul 2019

Last Activity: 24 October 2019, 4:43 PM EDT

Posts: 7

Thanks Given: 7

Thanked 0 Times in 0 Posts

Thanks, for my 2GB file, this saves a little bit of processing time: 2m38.580s vs. 2m43.779s. Could you please explain what adding A[""] does?

kaktus

View Public Profile for kaktus

Find all posts by kaktus

UNIX for Beginners Questions & Answers

Awk: compare values in two columns of the same file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files column values using awk

Discussion started by: judi

2. Shell Programming and Scripting

Compare 2 columns of files awk

Discussion started by: phaethon

3. Shell Programming and Scripting

How to compare the values of a column in a same file using awk?

Discussion started by: utritala

4. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Discussion started by: justinjj

5. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

Discussion started by: Santoshbn

6. Shell Programming and Scripting

How to compare the values of a column in awk in a same file and consecutive lines..

Discussion started by: manuswami

7. Shell Programming and Scripting

Compare columns in two different files using awk

Discussion started by: shell_newbie

8. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

Discussion started by: evelibertine

9. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Discussion started by: jm4smtddd

10. Shell Programming and Scripting

compare columns for equal values and output a summary

Discussion started by: reno