Remove the partial duplicates by checking the length of a field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove the partial duplicates by checking the length of a field
# 1  
Old 09-23-2011
Remove the partial duplicates by checking the length of a field

Hi Folks -

I'm quite new to awk and didn't come across such issues before. The problem statement is that, I've a file with duplicate records in 3rd and 4th fields. The sample is as below:

Code:
aaaaaa|a12|45|56
abbbbaaa|a12|45|56
bbaabb|b1|51|45
bbbbbabbb|b2|51|45
aaabbbaaaa|a11|45|56

Here,the combination of field3 and field is same for few records viz. 4556 for the first 2 and last rows and so on..

Now,the output file is expected to be like this:

Code:
aaabbbaaaa|a11|45|56
bbbbbabbb|b2|51|45

That is, checking the length of first field for the rows where field3&field4 match and return the row with highest length in first field among them. So, one row will be picked from each set of duplicates based on the length on first field

Could you please help with a one line awk command to achieve this?


Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.

Last edited by radoulov; 09-23-2011 at 10:21 AM..
# 2  
Old 09-23-2011
Try:
Code:
awk -F"|" 'length($1)>l[$3"|"$4]{l[$3"|"$4]=length($1);a[$3"|"$4]=$0}END{for (i in a) print a[i]}' file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 09-23-2011
Code:
awk -F\| 'END { 
 for (R in r)
   print r[R]
 }
length($1) > l[$3, $4] {  
  l[$3, $4] = length($1)
  r[$3, $4] = $0
  }' infile

# 4  
Old 09-23-2011
Hi Bartus....Thanks a lot...your solution worked out
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How can I remove partial duplicates and manipulate text?

Hello, How can I remove partial duplicates and manipulate text in bash using either awk, grep or sed? Thanks. Input: ted,"foo,bar,zoo" john-son,"foot,ben,zoo" bob,"bar,foot" Expected Output: foo,ted bar,ted zoo,ted foot,john-son ben,john-son (4 Replies)
Discussion started by: tara123
4 Replies

2. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies

3. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

4. Shell Programming and Scripting

Replace a field with a character as per the field length

Hi all, I have a requirement to replace a field with a character as per the length of the field. Suppose i have a file where second field is of 20 character length. I want to replace second field with 20 stars (*). like ******************** As the field is not a fixed one, i want to do the... (2 Replies)
Discussion started by: gani_85
2 Replies

5. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

6. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

7. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

8. Shell Programming and Scripting

CSV with commas in field values, remove duplicates, cut columns

Hi Description of input file I have: ------------------------- 1) CSV with double quotes for string fields. 2) Some string fields have Comma as part of field value. 3) Have Duplicate lines 4) Have 200 columns/fields 5) File size is more than 10GB Description of output file I need:... (4 Replies)
Discussion started by: krishnix
4 Replies

9. Shell Programming and Scripting

AWK - Print partial line/partial field

Hello, this is probably a simple request but I've been toying with it for a while. I have a large list of devices and commands that were run with a script, now I have lines such as: a-router-hostname-C#show ver I want to print everything up to (and excluding) the # and everything after it... (3 Replies)
Discussion started by: ippy98
3 Replies

10. Shell Programming and Scripting

Checking file for duplicates

Hi all, I am due to start receiving a weekly csv containing around 6 million rows. I need to do some processing on this file and then send it on elsewhere. My problem is that after week 1 the files that I will receive are likely to contain data already received in previous files and I need... (8 Replies)
Discussion started by: pxy2d1
8 Replies
Login or Register to Ask a Question