Shell do loop to compare two columns (duplicates)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell do loop to compare two columns (duplicates)
# 8  
Old 06-05-2013
One more way:

awk '{print $1}' input | awk 'NR==FNR{a[$1]++;next}$2 in a{print $2" "$3}' - input

---------- Post updated at 09:32 PM ---------- Previous update was at 09:26 PM ----------

Quote:
Originally Posted by MadeInGermany
This can look backwards
Code:
awk '{seen[$2]=$3}
($1 in seen) {print $1,seen[$1]}' input

This looks interesting solution. Can you explain me how this works?
# 9  
Old 06-06-2013
With those changed requirements, try:
Code:
awk 'NR==FNR{A[$2]; next} $1 in A{print $1,$3}' file file

(the input file is specified twice)


---
Quote:
Originally Posted by MadeInGermany
This can look backwards
[..]
Indeed, but it cannot look forward..
# 10  
Old 06-06-2013
@Scru, that prints $1,$3 pairs not $2,$3.
So, with "look forward" feature (Required?) it needs two passes of the input, either like juzz4fun's solution
Code:
awk 'NR==FNR{a[$1]; next} ($2 in a){print $2,$3}' input input

or, with the "keep order" feature (Required?) you need to store $2,$3 pairs
Code:
awk 'NR==FNR{A[$2]=$3; next} $1 in A{print $1,A[$1]}' input input

# 11  
Old 06-06-2013
@MadeinGermany, I mean if for a value in column 1 there is a corresponding value below it in column2 then it would not get printed. For example if the input would be:
Code:
2  1  10
3  6  11
4  4  12
9  2  34
5  5  13
6  8  14

Then it seems to be the proper output should be:
Code:
2 34
4 12
5 13
6 11

--
indeed I had the order reversed, so this should give the proper result, like you suggested
Code:
awk 'NR==FNR{A[$2]=$3; next} $1 in A{print $1,A[$1]}' file file

# 12  
Old 06-09-2013
Thanks for your reply. when I add another two lines to my data:

Code:
2 1 10
3 6 11
4 4 12
5 5 13
6 8 14
0.5 8 9
5 0.5 17

I expect to get :
Code:
4 12
5 13
6 11
0.5 17
5 13

but the output looks like:
Code:
4 12
5 13
6 11
5 13

and misses the 0.5 17? Any idea why?

Thanks again.

Paul

Last edited by Scrutinizer; 06-10-2013 at 01:00 AM.. Reason: code tags
# 13  
Old 06-09-2013
Quote:
Originally Posted by Paul Moghadam
---------- Post updated at 07:34 PM ---------- Previous update was at 07:20 PM ----------
Thanks for your reply. when I add another two lines to my data:

2 1 10
3 6 11
4 4 12
5 5 13
6 8 14
0.5 8 9
5 0.5 17

I expect to get :
4 12
5 13
6 11
0.5 17
5 13

but the output looks like:
4 12
5 13
6 11
5 13

and misses the 0.5 17? Any idea why?

Thanks again.

Paul
When I use the above data with the last code Scrutinizer suggested:
Code:
awk 'NR==FNR{A[$2]=$3; next} $1 in A{print $1,A[$1]}' file file

I get the results you expected when using the awk on OS X.

What OS are you using? If you are using a Solaris system, which version of awk are you using? (I.e., if you are using bash or ksh, what is the output from type awk or which awk?)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script, For loop output to excel as columns

Hi, I have a shell script which analyses the log folder for a specific string and throws me the output. I have used for loop since it does this in multiple servers. Now I want to save the output in a excel in the below format. Can someone please help? The output which I get Server1 : count... (14 Replies)
Discussion started by: srilaxman
14 Replies

2. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies

3. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies

4. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

5. Shell Programming and Scripting

Howto compare the columns of 2 diff tables of 2 different schemas in UNIX shell script

HI All, I am new to Unix shell scripts.. Could you please post the unix shell script for for the below request., There are two different tables(sample1, sample2) in different schemas(s_schema1, s_schema2). Unix shell script to compare the columns of two different tables of two... (2 Replies)
Discussion started by: Rajkumar Gopal
2 Replies

6. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

7. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies

8. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k... (7 Replies)
Discussion started by: kmsekhar
7 Replies

9. Shell Programming and Scripting

Scanning columns for duplicates and printing in one line

Description of data: NC_002737.1 4 F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU NC_006372.1 5 F1VI4M001BH0HY FF1VI4M001BH0HY F1VI4M001C0ZC5 F1VI4M001DOF2X F1VI4M001AYNTS Every field in every record is tab separated There can be "n" columns. Problem: What I want to... (4 Replies)
Discussion started by: Deep9000
4 Replies

10. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies
Login or Register to Ask a Question