Sponsored Content
Top Forums Shell Programming and Scripting Remove the partial duplicates by checking the length of a field Post 302558306 by asyed on Friday 23rd of September 2011 09:15:13 AM
Old 09-23-2011
Remove the partial duplicates by checking the length of a field

Hi Folks -

I'm quite new to awk and didn't come across such issues before. The problem statement is that, I've a file with duplicate records in 3rd and 4th fields. The sample is as below:

Code:
aaaaaa|a12|45|56
abbbbaaa|a12|45|56
bbaabb|b1|51|45
bbbbbabbb|b2|51|45
aaabbbaaaa|a11|45|56

Here,the combination of field3 and field is same for few records viz. 4556 for the first 2 and last rows and so on..

Now,the output file is expected to be like this:

Code:
aaabbbaaaa|a11|45|56
bbbbbabbb|b2|51|45

That is, checking the length of first field for the rows where field3&field4 match and return the row with highest length in first field among them. So, one row will be picked from each set of duplicates based on the length on first field

Could you please help with a one line awk command to achieve this?


Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.

Last edited by radoulov; 09-23-2011 at 10:21 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Checking file for duplicates

Hi all, I am due to start receiving a weekly csv containing around 6 million rows. I need to do some processing on this file and then send it on elsewhere. My problem is that after week 1 the files that I will receive are likely to contain data already received in previous files and I need... (8 Replies)
Discussion started by: pxy2d1
8 Replies

2. Shell Programming and Scripting

AWK - Print partial line/partial field

Hello, this is probably a simple request but I've been toying with it for a while. I have a large list of devices and commands that were run with a script, now I have lines such as: a-router-hostname-C#show ver I want to print everything up to (and excluding) the # and everything after it... (3 Replies)
Discussion started by: ippy98
3 Replies

3. Shell Programming and Scripting

CSV with commas in field values, remove duplicates, cut columns

Hi Description of input file I have: ------------------------- 1) CSV with double quotes for string fields. 2) Some string fields have Comma as part of field value. 3) Have Duplicate lines 4) Have 200 columns/fields 5) File size is more than 10GB Description of output file I need:... (4 Replies)
Discussion started by: krishnix
4 Replies

4. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

5. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

6. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

7. Shell Programming and Scripting

Replace a field with a character as per the field length

Hi all, I have a requirement to replace a field with a character as per the length of the field. Suppose i have a file where second field is of 20 character length. I want to replace second field with 20 stars (*). like ******************** As the field is not a fixed one, i want to do the... (2 Replies)
Discussion started by: gani_85
2 Replies

8. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

9. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies

10. UNIX for Beginners Questions & Answers

How can I remove partial duplicates and manipulate text?

Hello, How can I remove partial duplicates and manipulate text in bash using either awk, grep or sed? Thanks. Input: ted,"foo,bar,zoo" john-son,"foot,ben,zoo" bob,"bar,foot" Expected Output: foo,ted bar,ted zoo,ted foot,john-son ben,john-son (4 Replies)
Discussion started by: tara123
4 Replies
CHECKBASHISMS(1)					      General Commands Manual						  CHECKBASHISMS(1)

NAME
checkbashisms - check for bashisms in /bin/sh scripts SYNOPSIS
checkbashisms script ... checkbashisms --help|--version DESCRIPTION
checkbashisms, based on one of the checks from the lintian system, performs basic checks on /bin/sh shell scripts for the possible presence of bashisms. It takes the names of the shell scripts on the command line, and outputs warnings if possible bashisms are detected. Note that the definition of a bashism in this context roughly equates to "a shell feature that is not required to be supported by POSIX"; this means that some issues flagged may be permitted under optional sections of POSIX, such as XSI or User Portability. In cases where POSIX and Debian Policy disagree, checkbashisms by default allows extensions permitted by Policy but may also provide options for stricter checking. OPTIONS
--help, -h Show a summary of options. --newline, -n Check for "echo -n" usage (non POSIX but required by Debian Policy 10.4.) --posix, -p Check for issues which are non POSIX but required to be supported by Debian Policy 10.4 (implies -n). --force, -f Force each script to be checked, even if it would normally not be (for instance, it has a bash or non POSIX shell shebang or appears to be a shell wrapper). --extra, -x Highlight lines which, whilst they do not contain bashisms, may be useful in determining whether a particular issue is a false posi- tive which may be ignored. For example, the use of "$BASH_ENV" may be preceded by checking whether "$BASH" is set. --version, -v Show version and copyright information. EXIT VALUES
The exit value will be 0 if no possible bashisms or other problems were detected. Otherwise it will be the sum of the following error val- ues: 1 A possible bashism was detected. 2 A file was skipped for some reason, for example, because it was unreadable or not found. The warning message will give details. SEE ALSO
lintian(1). AUTHOR
checkbashisms was originally written as a shell script by Yann Dirson <dirson@debian.org> and rewritten in Perl with many more features by Julian Gilbey <jdg@debian.org>. DEBIAN
Debian Utilities CHECKBASHISMS(1)
All times are GMT -4. The time now is 05:39 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy