Extract duplicate rows with conditions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract duplicate rows with conditions
# 1  
Old 06-08-2015
Extract duplicate rows with conditions

Gents

Can you help please.

Input file

Code:
5490921425          1          7    1310342 54909214251
5490921425          2          1          1 54909214252
5491120937          1          1          3 54911209371
5491120937          3          1          1 54911209373
5491320785          1          7    1305158 54913207851
5491320785          2          1          1 54913207852
5491521081          1         49    1307593 54915210811
5491521081          2         49    1307593 54915210812
5491521089          1          1          2 54915210891
5491521089          2         49    1307655 54915210892
5508520753          1          1          3 55085207531
5508520753          2          1          3 55085207532
5508521065          1          1          0 55085210651
5508521065          1          1          4 55085210651
5508521089          1          1          1 55085210891
5508521089          2          1          1 55085210892
5508720777          1          1          1 55087207771
5508720777          2          1          3 55087207772
5508721325          1          7    1311208 55087213251
5508721325          2          1          4 55087213252

Output file
Using this code
Code:
awk X[$1] {print X[$1]}{ X[$1]=$0} file

I got this output

Code:
5490921425          1          7    1310342 54909214251
5491120937          1          1          3 54911209371
5491320785          1          7    1305158 54913207851
5491521081          1         49    1307593 54915210811
5491521089          1          1          2 54915210891
5508520753          1          1          3 55085207531
5508521065          1          1          0 55085210651
5508521089          1          1          1 55085210891
5508720777          1          1          1 55087207771
5508721325          1          7    1311208 55087213251

Desired output

Conditions to get desired output file.
Get all duplicate rows with following conditions.
1.- Maximum value in column 3

Code:
5490921425          1          7    1310342 54909214251
5491120937          1          1          3 54911209371
5491320785          1          7    1305158 54913207851
5491521081          1         49    1307593 54915210811
5491521089          2         49    1307655 54915210892
5508520753          1          1          3 55085207531
5508521065          1          1          0 55085210651
5508521089          1          1          1 55085210891
5508720777          1          1          1 55087207771
5508721325          1          7    1311208 55087213251

Thanks

Last edited by jiam912; 06-08-2015 at 12:04 PM..
# 2  
Old 06-08-2015
How about
Code:
sort -rnk1,1 -k3,3 file | awk 'X[$1] {print X[$1]}   {X[$1]=$0}'

# 3  
Old 06-08-2015
Yes, it works. Smilie
Thanks a lot

---------- Post updated at 02:20 PM ---------- Previous update was at 10:22 AM ----------

Dear Rudi C,

I notice that it not works complete fine.

using the sort i got
Code:
5490921425          2          1          1 54909214252
5491120937          3          1          1 54911209373
5491320785          2          1          1 54913207852
5491521081          2         49    1307593 54915210812
5491521089          2         49    1307655 54915210892
5508520753          2          1          3 55085207532
5508521065          1          1          4 55085210651
5508521089          2          1          1 55085210892
5508720777          2          1          3 55087207772
5508721325          2          1          4 55087213252

and I should get

Code:
5490921425          1          7    1310342 54909214251
5491120937          1          1          3 54911209371
5491320785          1          7    1305158 54913207851
5491521081          1         49    1307593 54915210811
5491521089          2         49    1307655 54915210892
5508520753          1          1          3 55085207531
5508521065          1          1          0 55085210651
5508521089          1          1          1 55085210891
5508720777          1          1          1 55087207771
5508721325          1          7    1311208 55087213251

In column 2 many changes, should keep like the example desired
Please help me, thanks

Last edited by jiam912; 06-08-2015 at 04:26 PM..
# 4  
Old 06-09-2015
Not sure I understand, but this seems to get quite close:
Code:
sort -nk1,1 -k3,3r -k2,2 file3 | awk 'X[$1] {print X[$1]}   {X[$1]=$0}'
5490921425          1          7    1310342 54909214251
5491120937          1          1          3 54911209371
5491320785          1          7    1305158 54913207851
5491521081          1         49    1307593 54915210811
5491521089          2         49    1307655 54915210892
5508520753          1          1          3 55085207531
5508521065          1          1          0 55085210651
5508521089          1          1          1 55085210891
5508720777          1          1          1 55087207771
5508721325          1          7    1311208 55087213251

# 5  
Old 06-09-2015
RudiC,

Yest it works now.. thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Hello I have a file like this: > cat examplefile ghi|NN603762|eee mno|NN607265|ttt pqr|NN613879|yyy stu|NN615002|uuu jkl|NN607265|rrr vwx|NN615002|iii yzA|NN618555|ooo def|NN190486|www BCD|NN628717|ppp abc|NN190486|qqq EFG|NN628717|aaa HIJ|NN628717|sss > I can sort the file by... (5 Replies)
Discussion started by: CHoggarth
5 Replies

2. Shell Programming and Scripting

Delete rows with conditions

Hi everyone, I will appreciate a lot if anyone can help me about a simple issue. I have a data file, and I need to remove some rows with a given condition. So here is a part of my data file: 5,14,1,3,3,0,0,-0.29977188269E+01 5,16,1,4,4,0,0,0.30394279900E+02... (4 Replies)
Discussion started by: hayreter
4 Replies

3. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

4. Shell Programming and Scripting

How to copy or cut specific rows from appended file with some conditions

Hi I have one file which is containing about 5000 rows and 20 columns I will just explain about my requirement here briefly with sample file, I have attached also, please help....me.. 1 28.25 36.42 5 28.26 36.42 10 28.23 36.43 15 28.22 36.43 20 28.2 36.42 25... (6 Replies)
Discussion started by: nex_asp
6 Replies

5. Shell Programming and Scripting

Extract paragraphs under conditions

Hi all, I want to extract some paragraphs out of a file under certain conditions. - The paragraph must start with 'fmri' - The paragraph must contain the string 'restarter svc:/system/svc/restarter:default' My input is like that : fmri svc:/system/vxpbx:default state_time Wed... (4 Replies)
Discussion started by: Armoric
4 Replies

6. Shell Programming and Scripting

How to extract duplicate rows

Hi! I have a file as below: line1 line2 line2 line3 line3 line3 line4 line4 line4 line4 I would like to extract duplicate lines (not unique, triplicate or quadruplicate lines). Output will be as below: line2 line2 I would appreciate if anyone can help. Thanks. (4 Replies)
Discussion started by: chromatin
4 Replies

7. UNIX for Dummies Questions & Answers

How to get remove duplicate of a file based on many conditions

Hii Friends.. I have a huge set of data stored in a file.Which is as shown below a.dat: RAO 1869 12 19 0 0 0.00 17.9000 82.3000 10.0 0 0.00 0 3.70 0.00 0.00 0 0.00 3.70 4 NULL LEE 1870 4 11 1 0 0.00 30.0000 99.0000 0.0 0 0.00 0 0.00 0.00 0.00 0 ... (3 Replies)
Discussion started by: reva
3 Replies

8. Shell Programming and Scripting

How to extract duplicate rows

I have searched the internet for duplicate row extracting. All I have seen is extracting good rows or eliminating duplicate rows. How do I extract duplicate rows from a flat file in unix. I'm using Korn shell on HP Unix. For.eg. FlatFile.txt ======== 123:456:678 123:456:678 123:456:876... (5 Replies)
Discussion started by: bobbygsk
5 Replies

9. Shell Programming and Scripting

extract lines based on few conditions

Hi, I need to extract lines based on some conditions as explained below: File format details: notes: 1. each set starts with AAA only 2. number of columns is fixed 3. number of rows per set may vary (as one set is upto DDD - 4 rows) Now, if any BBB's 5th column is blank then then... (4 Replies)
Discussion started by: prvnrk
4 Replies

10. Shell Programming and Scripting

Extract duplicate fields in rows

I have a input file with formating: 6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; Each fields is seperated by semi-comma. Sometime, the second files is... (6 Replies)
Discussion started by: anhtt
6 Replies
Login or Register to Ask a Question