This is similar in spirit to the solution from RudiC. However, it uses a local version of uniq that includes several features beyond a system uniq. In order to consider 2 fields, the first separator is modified. It appears that the last of the duplicate lines is desired. The local utility does not require the file to be sorted. Here is the script:
Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate elimination of duplicate lines, local uniq.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed my-uniq dixf
FILE=${1-data1}
E=expected-output.txt
pl " Input data file $FILE:"
cat $FILE
pl " Input data file transform separator:"
sed 's/:/_/' $FILE |
tee t1
pl " Expected output:"
cat $E
pl " Results, re-transform separator:"
my-uniq --separator=":" --last --field=1 t1 |
sed 's/_/:/' |
tee f1
pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2
pl " Help in my-uniq:"
my-uniq -h
dixf my-uniq
exit 0
producing:
Code:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.7 (jessie)
bash GNU bash 4.3.30
sed (GNU sed) 4.2.2
my-uniq (local) 1.11
dixf (local) 1.42
-----
Input data file data1:
FM:Chicago:Development
FM:Chicago:Development:Score
SR:Cary:Testing:Testcases
PM:Newyork:Scripting
PM:Newyork:Scripting:Audit
-----
Input data file transform separator:
FM_Chicago:Development
FM_Chicago:Development:Score
SR_Cary:Testing:Testcases
PM_Newyork:Scripting
PM_Newyork:Scripting:Audit
-----
Expected output:
FM:Chicago:Development:Score
SR:Cary:Testing:Testcases
PM:Newyork:Scripting:Audit
-----
Results, re-transform separator:
FM:Chicago:Development:Score
SR:Cary:Testing:Testcases
PM:Newyork:Scripting:Audit
-----
Verify results if possible:
-----
Comparison of 3 created lines with 3 lines of desired results:
Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.
-----
Help in my-uniq:
my-uniq - print or omit unique lines in non-sorted file
Synopsis
This code exists because one common requirement of a task is to
find (or omit) unique (or replicated) lines in a file, but also
to preserve the original order of the lines. Standard versions
of "uniq" have usually required a sorted input file.
An additional common requirement is to consider only the content
of one field in each line rather than the entire line. my-uniq
satisfies these requirements.
Usage: my-uniq options files
options:
--count
place count on each processed line, default is off.
--duplicate
print items that have more than one occurrence, default off.
--unique
print items that have only one occurrence, default is off.
--field=n
select a specific field, delimited by the separator, to be
used for the comparison, the default is the entire line.
--separator=string
choose an alternate separator, such as "|", or ",", the
default separator is "whitespace".
--last
allows over-writing, effectively keeping the most-recently
seen instance. Some versions of uniq on other *nix systems use
the most recent (Solaris), the default is compatibility with
GNU/Linux uniq, which keeps the first occurrence.
--quick
omit the operation that prints the lines in the order that
they were read. This prints according to a hash order,
therefore somewhat random -- a quick way to re-order a
file. This also requires less storage, a consideration for
large-volume files.
--help
print this and quit.
--version
print version number and quit.
my-uniq Like GNU/Linux uniq, but files need not be sorted. (what)
Path : ~/bin/my-uniq
Version : 1.11
Length : 282 lines
Type : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Help : probably available with --help
Modules : (for perl codes)
warnings 1.23
strict 1.08
Carp 1.3301
Data::Dumper 2.151_01
Getopt::Long 2.42
We often create work-alikes to system utilities that incorporate options that seem obvious (to us) that are useful. We currently don't publish the codes, but perhaps the documentation will help others develop codes for their shops.
I think that the technique of joining fields might be able to be used with the system uniq. The results would depend on which OS is being used, so that the correct duplicate is kept.
Best wishes ... cheers, drl
Last edited by drl; 05-03-2017 at 12:10 PM..
Reason: Correct minor typo (spelling).
Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'"
However, on a couple of systems, they have some mount points looking like this:
/stage
/stand
/usr
/MFPIS... (2 Replies)
OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script.
So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup
... (2 Replies)
Hi please help me how to remove duplicate lines in any file.
I have a file having huge number of lines.
i want to remove selected lines in it.
And also if there exists duplicate lines, I want to delete the rest & just keep one of them.
Please help me with any unix commands or even fortran... (7 Replies)
Hi All,
Below is my requirement. Whatever coming in between ' ', needs to delete.
Input File Contents:
==============
This is nice 'boy'
This 'is
bad
boy.' Got it
Expected Output
===========
This is nice
This
Got it (4 Replies)
Hey all, a relative bash/script newbie trying solve a problem.
I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like
2007-11-03... (3 Replies)
hi :)
I need to delete partial duplicate lines
I have this in a file
sihp8027,/opt/cf20,1980182
sihp8027,/opt/oracle/10gRelIIcd,155200016
sihp8027,/opt/oracle/10gRelIIcd,155200176
sihp8027,/var/opt/ERP,10376312
and need to leave it like this:
sihp8027,/opt/cf20,1980182... (2 Replies)
Hi All,
I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space.
I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Hello sed gurus. I am using ksh on Sun and have a file created by concatenating several other files. All files contain header rows. I just need to keep the first occurrence and remove all other header rows.
header for file
1111
2222
3333
header for file
1111
2222
3333
header for file... (8 Replies)
Hi, I'm sorry I'm no coder so I came here, counting on your free time and good will to beg for spoonfeeding some good code. I'll try to be quick and concise!
Got file with 50k lines like this:
"Heh, heh. Those darn ninjas. They're _____."*wacky
The "canebrake", "timber" & "pygmy" are types... (7 Replies)
I have a file
Line 1 a
Line 22
Line 33
Line 1 b
Line 22
Line 1 c
Line 4
Line 5
I want to delete all lines before last occurrence of a line which contains something which is defined in a variable. Say a variable var contains 'Line 1', then I need the following in the output.
... (21 Replies)