duplicated lines not recognized by sort and uniq


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers duplicated lines not recognized by sort and uniq
# 1  
Old 10-22-2008
Question duplicated lines not recognized by sort and uniq

Hello all,
I've got a strange behaviour of sort and uniq commands: they do not recognise apparently duplicated lines in a file (already sorted). The lines are identical by eye, but they must differ in smth, because when they are put in two files, those have slightly different size.
What can make this difference?..

thanks a lot
# 2  
Old 10-22-2008
Can you post those two files? It would help.
# 3  
Old 10-22-2008
Quote:
Originally Posted by redoubtable
Can you post those two files? It would help.
yes, sure, here they are. thanks
# 4  
Old 10-22-2008
The best way to see the difference is to diff both files.

diff ID_file1.txt ID_file2.txt says the files differ.
To find out the difference, I issued an hexdump on both files and we see the difference quite easily in the end of each string:
Code:
redoubtable@Tsunami ~ $ hexdump ID_file2.txt |head -n1
0000000 6f43 746e 6769 0d31 430a 6e6f 6974 3267
redoubtable@Tsunami ~ $ hexdump ID_file1.txt |head -n1
0000000 6f43 746e 6769 0a31 6f43 746e 6769 0a32
redoubtable@Tsunami ~ $

As you can see, there is an 0xd followed by 0xa in the end of ID_file2.txt and just a 0xa in ID_file1.txt

PS: the output of hexdump should be read as follows:
1234 5678 9123 4567 -> 34, 12, 78, 56, 23, 91, 67, 45.
So, 6f43 746e 6769 0d31 430a 6e6f 6974 3267 is 0x43 0x6f 0x6e 0x74 0x69 0x67 0x31 0xd 0xa 0x43 0x6f ...
# 5  
Old 10-22-2008
Quote:
Originally Posted by redoubtable
The best way to see the difference is to diff both files.

diff ID_file1.txt ID_file2.txt says the files differ.
To find out the difference, I issued an hexdump on both files and we see the difference quite easily in the end of each string:
Code:
redoubtable@Tsunami ~ $ hexdump ID_file2.txt |head -n1
0000000 6f43 746e 6769 0d31 430a 6e6f 6974 3267
redoubtable@Tsunami ~ $ hexdump ID_file1.txt |head -n1
0000000 6f43 746e 6769 0a31 6f43 746e 6769 0a32
redoubtable@Tsunami ~ $

As you can see, there is an 0xd followed by 0xa in the end of ID_file2.txt and just a 0xa in ID_file1.txt

PS: the output of hexdump should be read as follows:
1234 5678 9123 4567 -> 34, 12, 78, 56, 23, 91, 67, 45.
So, 6f43 746e 6769 0d31 430a 6e6f 6974 3267 is 0x43 0x6f 0x6e 0x74 0x69 0x67 0x31 0xd 0xa 0x43 0x6f ...

yeah.. thanks for clarifying )) Excuse my ignorance - is this difference with defining line ends only? does that mean the (e.g. perl) scripts will treat such lines as different? If yes, how can the endings be fixed?

thanks a lot..
# 6  
Old 10-22-2008
One has newlines 0xa the other has newlines 0xa and carriage returns 0xd - it is a DOS text file.

To fix it use dos2unix (or dos2ux on some machines)
Code:
dos2unix dostextfile.txt > unixfile

A little spider told me, eh? Her name was Acanthoscurria gomesiana...
# 7  
Old 10-22-2008
Quote:
Originally Posted by jim mcnamara
One has newlines 0xa the other has newlines 0xa and carriage returns 0xd - it is a DOS text file.

To fix it use dos2unix (or dos2ux on some machines)
Code:
dos2unix dostextfile.txt > unixfile

thank you, clear. one thing remains a mistery - both files were created under unix with unix commands (though different in the two cases) - how the dos endings could have ever got there?..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort & Uniq -u

Hi All, Below the actual file which i like to sort and Uniq -u /opt/oracle/work/Antony/Shell_Script> cat emp.1st 2233|a.k. shukula |g.m. |sales |12/12/52 |6000 1006|chanchal singhvi |director |sales |03/09/38 |6700... (8 Replies)
Discussion started by: Antony Ankrose
8 Replies

2. UNIX for Dummies Questions & Answers

Uniq and sort -u

Hello all, Need to pick your brains, I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters. So I want to do a sort -u file Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies

3. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

4. UNIX for Dummies Questions & Answers

Sort csv file by duplicated column value

hello, I have a large file (about 1gb) that is in a file similar to the following: I want to make it so that I can put all the duplicates where column 3 (delimited by the commas) are shown on top. Meaning all people with the same age are listed at the top. The command I used was ... (3 Replies)
Discussion started by: jl487
3 Replies

5. Shell Programming and Scripting

Sort field and uniq

I have a flatfile A.txt 2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11 How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies

6. UNIX for Dummies Questions & Answers

Sort and uniq lines of a file while keeping a header line

So, I have a file that has some duplicate lines. The file has a header line that I would like to keep at the top. I could do this by extracting the header from the file, 'sort -u' the remaining lines, and recombine them. But they are quite big, so if there is a way to do it with a single... (1 Reply)
Discussion started by: Digby
1 Replies

7. Shell Programming and Scripting

Help with Uniq and sort

The key is first field i want only uniq record for the first field in file. I want the output as or output as Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies

8. Shell Programming and Scripting

remove duplicated lines without sort

Hi Just wondering whether or not I can remove duplicated lines without sort For example, I use the command who, which shows users who are logging on. In some cases, it shows duplicated lines of users who are logging on more than one terminal. Normally, I would do who | cut -d" " -f1 |... (6 Replies)
Discussion started by: lalelle
6 Replies

9. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

10. UNIX for Dummies Questions & Answers

sort/uniq

I have a file: Fred Fred Fred Jim Fred Jim Jim If sort is executed on the listed file, shouldn't the output be?: Fred Fred Fred Fred Jim Jim Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
Login or Register to Ask a Question