| Help | unix | grep | sort | uniq - Different output from what I thought would be the same


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers | Help | unix | grep | sort | uniq - Different output from what I thought would be the same
# 1  
Old 10-12-2009
| Help | unix | grep | sort | uniq - Different output from what I thought would be the same

Hello,

I'm having an consistency issue....

Code:
 
grep 'a' /usr/share/dict/words

1) This will highlight every 'a' in each word.

Code:
 
grep 'a\{1,\}' /usr/share/dict/words

2) This will highlight 'a' if it occurs at least once in a sequence. So every 'a'.

Output of 1) I would think would be identical to 2)

I output the both results into seperate text files:
8647903 Oct 12 21:34 holding2a.txt this is where 1) went
8642625 Oct 12 21:34 holding2b.txt this is where 2) went
Why is there difference in file size?

more holding2a.txt | wc -l results 276975
more holding2a.txt | wc -l results 276975
this holds true for holding2b.txt

So I wanted to compare the text in each file.

more holding2a.txt | sort revealed this output

Code:
aaa
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals

more holding2b.txt | sort revealed this output

Code:
abaca
abacas
abacate
abacaxi
abacay
abacinate
abacination
abacterial
abactinal
abactinally

I checked the word and line count on both files again and they were the same.

I finally I merged the two files:
Code:
 
more holding2a.txt > holdingall.txt
more holding2b.txt | cat >> holdingall.txt
holdingall.txt | sort | uniq -u | wc -w -l

The result is 798 for both -w and -l.

Sample output:
Code:
 
holdingall.txt | sort | uniq -u -c

Top part.
Code:
  
      1 aaa
      1 aa
      1 aah
      1 aahed
      1 aahing
      1 aahs
      1 aal
      1 aalii
      1 aaliis
      1 aals
      1 aam
      1 aardwolf
      1 aardwolves
      1 aargh
      1 aaron
      1 aaronic
      1 aarrgh
      1 aarrghh
      1 aas
      1 aasvogel
      1 aasvogels
      1 aardvark
      1 aardvarks
      1 advocaat
      1 advocaat
      1 afrikaans
      1 afrikaans
      1 ahaaina
      1 ahaaina
      1 akaakai
      1 akaakai
      1 amaas
      1 amaas
      1 assbaa
      1 assbaa
      1 aa
      1 aah
      1 aahed
      1 aahing
      1 aahs
      1 aal
      1 aalii
      1 aaliis
      1 aals
      1 aam
      1 aardwolf
      1 aardwolves
      1 aargh
      1 aaron
      1 aaronic

Bottom few.
Code:
      
      1 Wraac
      1 Wraac
      1 Yaakov
      1 Yaakov
      1 Zaandam
      1 Zaandam
      1 Zitvaa
      1 Zitvaa

1) I don't know why uniq -u isn't removing the what appears to be duplicates.

2) I don't know why sort isn't sorting properly, I mean 'aa' is in two different places at the top of the list, I would think they would be together.

I tried sort -d and sort -s which resulted in what appeared to be the order. sort -d did take noticably longer to finish.
# 2  
Old 10-13-2009
Code:
cat holdingall.txt | sort | uniq -c > uniq_file

I am not sure if you are redirecting the sorted values to a file as implicitly it doesnot over write the file

HTH,
PL
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Uniq and sort -u

Hello all, Need to pick your brains, I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters. So I want to do a sort -u file Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies

2. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

3. Shell Programming and Scripting

Sort uniq or awk

Hi again, I have files with the following contents datetime,ip1,port1,ip2,port2,number How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up? Please mind the file may contain 100k lines. (8 Replies)
Discussion started by: LDHB2012
8 Replies

4. Shell Programming and Scripting

Sort field and uniq

I have a flatfile A.txt 2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11 How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies

5. Shell Programming and Scripting

Sort and uniq after comparision

Hi All, I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date. ... (1 Reply)
Discussion started by: nua7
1 Replies

6. Shell Programming and Scripting

Help with Uniq and sort

The key is first field i want only uniq record for the first field in file. I want the output as or output as Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies

7. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

8. Shell Programming and Scripting

sort and uniq in perl

Does anyone have a quick and dirty way of performing a sort and uniq in perl? How an array with data like: this is bkupArr BOLADVICE_VN this is bkupArr MLT6800PROD2A this is bkupArr MLT6800PROD2A this is bkupArr BOLADVICE_VN_7YR this is bkupArr MLT6800PROD2A I want to sort it... (4 Replies)
Discussion started by: reggiej
4 Replies

9. UNIX for Dummies Questions & Answers

Help with Last,uniq, sort and cut

Using the last, uniq, sort and cut commands, determine how many times the different users have logged in. I know how to use the last command and cut command... i came up with last | cut -f1 -d" " | uniq i dont know if this is right, can someone please help me... thanks (1 Reply)
Discussion started by: jay1228
1 Replies

10. UNIX for Dummies Questions & Answers

sort/uniq

I have a file: Fred Fred Fred Jim Fred Jim Jim If sort is executed on the listed file, shouldn't the output be?: Fred Fred Fred Fred Jim Jim Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
Login or Register to Ask a Question