sort | uniq question


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sort | uniq question
# 1  
Old 02-02-2011
sort | uniq question

Hello,
I have a large data file:
Code:
1234 8888 bbb
2745 8888 bbb
9489 8888 bbb
1234 8888 aaa
4838 8888 aaa
3977 8888 aaa

I need to remove duplicate lines (where the first column is the duplicate). I have been using:
Code:
sort file.txt | uniq -w4 > newfile.txt

However, it seems to keep the first of the duplicate pair alphabetically. So in the example above,
Code:
1234 8888 aaa    would be kept, and
1234 8888 bbb    would be excluded

I need to modify the command so that the first of the two lines *chronologically* would be kept (In this case, 1234 8888 bbb).

Thanks so much!

Last edited by Scott; 02-02-2011 at 03:47 PM.. Reason: Please use code tags
# 2  
Old 02-02-2011
Hi.

This seems to work, assuming you have an appropriate sort command:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate sort and uniq.

# Section 1, setup, pre-solution.
# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to test script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
C=$HOME/bin/context && [ -f $C ] && . $C specimen sort uniq
set -o nounset
pe

FILE=${1-data1}

# Section 2, display input file.
# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

# Section 3, solution.
pl " Results, sort | uniq:"
sort -k1,1 $FILE | uniq -w4 

pl " Results, sort -u:"
sort -k1,1 -u $FILE

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.7 (lenny) 
GNU bash 3.2.39
specimen (local) 1.17
sort (GNU coreutils) 6.10
uniq (GNU coreutils) 6.10

 || start [ first:middle:last ]
Whole: 5:0:5 of 6 lines in file "data1"
1234 8888 bbb
2745 8888 bbb
9489 8888 bbb
1234 8888 aaa
4838 8888 aaa
3977 8888 aaa
 || end

-----
 Results, sort | uniq:
1234 8888 aaa
2745 8888 bbb
3977 8888 aaa
4838 8888 aaa
9489 8888 bbb

-----
 Results, sort -u:
1234 8888 bbb
2745 8888 bbb
3977 8888 aaa
4838 8888 aaa
9489 8888 bbb

See man sort for details.

Best wishes ... cheers, drl
# 3  
Old 02-02-2011
Hi,

Other solution using 'perl':
Code:
$ perl -lane 'print if not $no{ $F[0] }++' infile

Regards,
Birei
# 4  
Old 02-02-2011
Try the "-u" switch to uniq (only print unique lines).

Code:
sort file.txt | uniq -w4 -u > newfile.txt

These 2 Users Gave Thanks to methyl For This Post:
# 5  
Old 02-02-2011
Code:
awk '!a[$1$2]++'


Last edited by radoulov; 02-02-2011 at 07:11 PM.. Reason: Code tags!
# 6  
Old 02-02-2011
i tried this
Code:
awk '!a[$1$2]++' filename

on this
Code:
01/Feb/2011   -- User Count : 27
  31/Jan/2011   --  User Count : 21
  02/Feb/2011   -- User Count : 24
  30/Jan/2011   --  User Count : 4

and it didn't sort by mo & day. But, I assumed that is because I didn't specify the correct columns.
# 7  
Old 02-02-2011
try:

Code:
awk -v FS=OFS="/" '!a[$1$2]++'

note: make sure the first replicate is you want,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Uniq and sort -u

Hello all, Need to pick your brains, I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters. So I want to do a sort -u file Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies

2. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

3. Shell Programming and Scripting

Sort uniq or awk

Hi again, I have files with the following contents datetime,ip1,port1,ip2,port2,number How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up? Please mind the file may contain 100k lines. (8 Replies)
Discussion started by: LDHB2012
8 Replies

4. Shell Programming and Scripting

Sort field and uniq

I have a flatfile A.txt 2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11 How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies

5. Shell Programming and Scripting

Sort and uniq after comparision

Hi All, I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date. ... (1 Reply)
Discussion started by: nua7
1 Replies

6. Shell Programming and Scripting

Help with Uniq and sort

The key is first field i want only uniq record for the first field in file. I want the output as or output as Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies

7. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

8. Shell Programming and Scripting

sort and uniq in perl

Does anyone have a quick and dirty way of performing a sort and uniq in perl? How an array with data like: this is bkupArr BOLADVICE_VN this is bkupArr MLT6800PROD2A this is bkupArr MLT6800PROD2A this is bkupArr BOLADVICE_VN_7YR this is bkupArr MLT6800PROD2A I want to sort it... (4 Replies)
Discussion started by: reggiej
4 Replies

9. UNIX for Dummies Questions & Answers

Help with Last,uniq, sort and cut

Using the last, uniq, sort and cut commands, determine how many times the different users have logged in. I know how to use the last command and cut command... i came up with last | cut -f1 -d" " | uniq i dont know if this is right, can someone please help me... thanks (1 Reply)
Discussion started by: jay1228
1 Replies

10. UNIX for Dummies Questions & Answers

sort/uniq

I have a file: Fred Fred Fred Jim Fred Jim Jim If sort is executed on the listed file, shouldn't the output be?: Fred Fred Fred Fred Jim Jim Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
Login or Register to Ask a Question