Selecting lines having same values for first two columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Selecting lines having same values for first two columns
# 1  
Old 12-27-2012
Selecting lines having same values for first two columns

Hello to all.

This is first post. Kindly excuse me if I do not adhere to any rules and regulations of this forum.

I have a file containing some rows with three columns each per row(separeted by a space). There are certain rows for which first two columns have same value but the value in third column is different. In that case I wish to retain only that row which has the maximum value for third column. Further I want to wirte the rows not required to a file for further processing.

Code:
 
1123022201 9310777627 3976
1127021871 9312262893 3600  -- not required, sent to another file.
1127021871 9312262893 4016  -- to be retained.
1122000518 9350066745 4464
1127455152 7827493958 3600  -- not required, sent to another file.
1127455152 7827493958 5138  -- to be retained.

Kindly help me in this regard.

Thanks in anticipation.

Manoj
# 2  
Old 12-27-2012
Try

Code:
awk 'NR==FNR{A[$1,$2]++; B[$1,$2]=B[$1,$2]?B[$1,$2]<$3?$3:B[$1,$2]:$3;next}{if(A[$1,$2]>1){if(B[$1,$2]==$3){print}else{print > "another_file"}}else{print }}' file file

This User Gave Thanks to pamu For This Post:
# 3  
Old 12-27-2012
PS: Look the below code as written by an awk noob Smilie

Code:
sort -k 1,2 file | awk '{print $1,$2}' | uniq -d > temp #Identifies the reduntant values and stores it in a temp file

while read i
do
row1=$(grep "$i" file | head -1) #Takes the first row from the redundant rows
row2=$(grep "$i" file | tail -1) #Takes the second row from the redundant rows

diff=$(grep "$i" file | awk '{gsub(/[a-zA-Z: ]+/," ")
m=split($3,a," ");
for (i=1;i<=m;i++)
if (NR==1) b[i]=a[i]; else print a[i] - b[i] }') #Finds the difference between the rows

if [ $diff -lt 0 ];
then
 echo "$row2 moved to new file"
 echo $row2 >> new_file #With the difference the least row is found and stored in another file
else
 echo "$row1 moved to new file"
 echo $row1 >> new_file
fi 
done < temp

---------- Post updated at 03:08 AM ---------- Previous update was at 03:06 AM ----------

By comparing the above two comments, [pamu and myself], you can see how powerful awk is...
This User Gave Thanks to sathyaonnuix For This Post:
# 4  
Old 12-27-2012
You did not specify what should happen
a) if the third column's values are identical
b) if values can go negative
c) to lines that do not have duplicates.
Thus any proposal might need further refinement. Try:
Code:
$ awk   '{x=$1" "$2}
         $3 > Ar[x] {if (Ar[x]) print x, Ar[x] > "further"; Ar[x]=$3}
         $3 < Ar[x] {print  > "further"}
         END {for (x in Ar) print x, Ar[x]}
        ' file
1127455152 7827493958 5138
1122000518 9350066745 4464
1127021871 9312262893 4016
1123022201 9310777627 3976
$ cat further
1127021871 9312262893 3600
1127455152 7827493958 3600

This User Gave Thanks to RudiC For This Post:
# 5  
Old 12-27-2012
Thanks for replying so quickly and providing me the desired solution.

I tried the solution provided by RudiC and Sathyaonunix. Both are providing me the solutions. I am looking into the script by Pamu ( I am finding it difficult to understand as my expertise in Unix is not good, so kindly give me some time.)

As regarding query raised by RudiC

(i) if the third column entries are identical, then only one row is to be retained in the original file.

(ii) values cannot be negative.

(iii) the lines without duplicate values to be retained in the orginal file as such.

Thank you once again.

Manoj
# 6  
Old 12-27-2012
Quote:
Originally Posted by manojmalhotra13
Thanks for replying so quickly and providing me the desired solution.

I tried the solution provided by RudiC and Sathyaonunix. Both are providing me the solutions. I am looking into the script by Pamu ( I am finding it difficult to understand as my expertise in Unix is not good, so kindly give me some time.)

As regarding query raised by RudiC

(i) if the third column entries are identical, then only one row is to be retained in the original file.

(ii) values cannot be negative.

(iii) the lines without duplicate values to be retained in the orginal file as such.

Thank you once again.

Manoj
Thanks for providing above details.

As per your current requirement I've modified script lit bit.

Now please check.

Code:
awk 'NR==FNR{A[$1,$2]++; B[$1,$2]=B[$1,$2]?B[$1,$2]<$3?$3:B[$1,$2]:$3;next}
{if(A[$1,$2]>1){if(B[$1,$2]==$3){B[$1,$2]=0;print }else{print > "another_file"}}else{print }}' file file

A[$1,$2]++ # Increments the array index for $1 and $2.

B[$1,$2]=B[$1,$2]?B[$1,$2]<$3?$3:B[$1,$2]:$3 # Here we compare is $3 is greater than previous $3 for $1 and $2 and retain max value and assign it to B[$1,$2]

I hope this helps Smilie

pamu
This User Gave Thanks to pamu For This Post:
# 7  
Old 12-27-2012
@Pamu

The script provided by you is working fine and correctly.

Thanks.

Manoj
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to print lines that have values in certain columns ?

Hi, everyone I have a dataset like this: A B C D A C C D E F G H F D K Y X A K K C Gsome of columns have no values in each line. I want to print all lines that have 1/2/3/4 values, export separately to four files. What I expected is like this: file1 Y file 2 A C X Afile 3... (3 Replies)
Discussion started by: nengcheng
3 Replies

2. Shell Programming and Scripting

Help with shell script: selecting rows that have the same values in two columns

Hello, everyone I am beginner for shell programming. I want to print all lines that have the same values in first two columns data: a b 1 2 a a 3 4 b b 5 6 a b 4 6 what I expected is : a a 3 4 b b 5 6 but I searched for one hour in... (2 Replies)
Discussion started by: nengcheng
2 Replies

3. Shell Programming and Scripting

Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3

Hi, I have a file with the following columns: 361459 447394 CHL1 290282 290282 CHL1 361459 447394 CHL1 361459 447394 CHL1 178352861 178363529 AGA 178352861 178363529 AGA 178363657 178363657 AGA Essentially, using CHL1 as an example. For any line that has CHL1 in... (2 Replies)
Discussion started by: hubleo
2 Replies

4. Shell Programming and Scripting

selecting record by matching in two columns values

Hi Guys ! i want to search a record in file by matching two values in a record in two different columns suppose i have 3 columns and i want to select all those values from col1 for which in col3 has a specific value e.g select all "john" from column1 where column 3 has a value of "20" ... (9 Replies)
Discussion started by: ourned
9 Replies

5. Shell Programming and Scripting

how to retrieve lines that the first 4 columns have different values

Hi, all: I am not familiar with unix,and just started awk scripts. I want to retrieve lines that have the first 4 columns with different values. For example, the input is like this (tab delimited file with one header) r1 A A A A x r2 A B B A x r3 B B B B x the output should be (header is... (15 Replies)
Discussion started by: new2awkin2011
15 Replies

6. Programming

Selecting array values

I have two arrays DIST(1:NCOF) and X(1:NX) Let NCOF = 5 and NX = 15, with DIST = and X = I want to create an array that puts a zero if DIST is outside the region in X, otherwise putting 1. In this example I should get RES = Using DIST = would give RES = The values in... (6 Replies)
Discussion started by: kristinu
6 Replies

7. Programming

selecting values of date

In a table, date is stored in a column as "2011-01-4". If I write query to get the dates > "2011-01-06" , then the date "2011-01-4" is also listed. The date stored in the column is a varchar datatype. So how can I make a query to not display the date "2011-01-4" ? Is there any solution ? Thank... (4 Replies)
Discussion started by: gameboy87
4 Replies

8. Shell Programming and Scripting

Selecting rows based on values in columns

Hi My pipe delimited .txt file contains rows with 10 columns. Can anyone advise how I output to file only those rows with the letters ‘ci' as the first 2 characters in the 3rd column ? Many thanks (4 Replies)
Discussion started by: malts18
4 Replies

9. Shell Programming and Scripting

Selecting specific 'id's from lines and columns using 'SED' or 'AWK'

Hello experts, I am new to this group and to 'SED' and 'AWK'. I have data (text file) with 5 columns (C_1-5) and 100s of lines (only 10 lines are shown below as an example). I have to find or select only the id numbers (C-1) of specific lines with '90' in the same line (of C_3) AND with '20' in... (6 Replies)
Discussion started by: kamskamu
6 Replies

10. UNIX for Dummies Questions & Answers

Selecting Unique Values from many List

I have a question I have like 19 different list which contains the name of the server but what I need is just unique ones. First thing I need to do is just make a unique list within the list itself i.e. delete anything that is repeated inside the list like for example in list1 i... (1 Reply)
Discussion started by: pareshan
1 Replies
Login or Register to Ask a Question