Remove duplicate lines (the first matching line by field criteria)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines (the first matching line by field criteria)
# 1  
Old 05-03-2010
Question Remove duplicate lines (the first matching line by field criteria)

Hello to all,

I have this file
Code:
2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2437.60         134.60  14.80   21.46    2.36      0.00         285.80  2303.00
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

and I want to remove the first line where the 4th column match, like this:
Code:
2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

I'v tried uniq command
Code:
uniq -w 15 filename

and AWK
Code:
awk '!_[$1,$2,$3,$4]++' filename

but both remove the second line of match criteria not the first.

thanks for any help.
# 2  
Old 05-03-2010
try this...
Code:
awk '{A[$4]=$0}END{for (i in A){print A[i]}}' filename

# 3  
Old 05-03-2010
thanks for the quick answer but the output is not the expected:
Code:
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

I want to remove just the lines where the 4th column have consecutive values not all the others.
# 4  
Old 05-03-2010
Just altering the awk what u tried..

Code:
awk '{A[$1,$2,$3,$4]=$0]}END{for (i in A){print A[i]}}' filename

# 5  
Old 05-03-2010
thanks, i think it worked but with unsorted output
Code:
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70

I think I can manage this to get a sorted output like described above Smilie

thanks a lot.
# 6  
Old 05-03-2010
Code:
perl  -wlane '$h{"@F[0..3]"}=$_ ; END{$,="\n" ; print sort values %h}' infile.txt


or using nawk:-


Code:
nawk '{_[$1,$2,$3,$4]=$0}END{for (i in _) print _[i]}' infile.txt | sort

# 7  
Old 05-03-2010
just pipping a sort solved the problem,

Code:
awk '{A[$1$2$3$4]=$0}END{for (i in A){print A[i]}}' filename | sort

thanks a lot, it worked like a charm Smilie

---------- Post updated at 04:18 PM ---------- Previous update was at 04:17 PM ----------

thank you both Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. UNIX for Dummies Questions & Answers

Using awk to remove duplicate line if field is empty

Hi all, I've got a file that has 12 fields. I've merged 2 files and there will be some duplicates in the following: FILE: 1. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, 100 2. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, (EMPTY) 3. CDC, 54321, TEST3,... (4 Replies)
Discussion started by: tugar
4 Replies

3. Shell Programming and Scripting

Compare file1 for matching line in file2 and print the difference in matching lines

Hello, I have two files file 1 and file 2 each having result of a query on certain database tables and need to compare for Col1 in file1 with Col3 in file2, compare Col2 with Col4 and output the value of Col1 from File1 which is a) not present in Col3 of File2 b) value of Col2 is different from... (2 Replies)
Discussion started by: RasB15
2 Replies

4. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

5. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

6. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

7. Shell Programming and Scripting

Remove lines with duplicate first field

Trying to cut down the size of some log files. Now that I write this out it looks more dificult than i thought it would be. Need a bash script or command that goes sequentially through all lines of a file, and does this: if field1 (space separated) is the number 2012 print the entire line. Do... (7 Replies)
Discussion started by: ajp7701
7 Replies

8. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

9. Shell Programming and Scripting

Filter/remove duplicate .dat file with certain criteria

I am a beginner in Unix. Though have been asked to write a script to filter(remove duplicates) data from a .dat file. File is very huge containig billions of records. contents of file looks like 30002157,40342424,OTC,mart_rec,100, ,0 30002157,40343369,OTC,mart_rec,95, ,0... (6 Replies)
Discussion started by: mukeshguliao
6 Replies

10. Shell Programming and Scripting

remove lines based on score criteria

Hi guys, Please guide for Solution. PART-I INPUT FILE (has 2 columns ID and score) TC5584_1 93.9 DV161411_2 79.5 BP132435_5 46.8 EB682112_1 34.7 BP132435_4 29.5 TC13860_2 10.1 OUTPUT FILE (It shudn't contain the line ' BP132435_4 29.5 ' as BP132435 is repeated... (2 Replies)
Discussion started by: smriti_shridhar
2 Replies
Login or Register to Ask a Question