uniq -c in the pipeline

05-20-2012

Registered User

6, 0

Join Date: May 2012

Last Activity: 20 May 2012, 12:26 AM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Whats your definition of irony?

Mine is when the intended outcome or meaning juxtaposes significantly enough from the actual outcome or meaning for me to seek out the unix.com forums, become a member, post a problem, read a WORKAROUND solution, then be told the WORKAROUND is intended somehow as a subtle misinterpretation of the man page (missed that the lines must be ADJACENT) for uniq to do its -c switch as documented in the manpage:

-c Precede each output line with the count of the number of times the line occurred in the input, fol-
lowed by a single space.

Imagine if I implemented "sort" and said "applies only to letters I R O N and Y" (but buried that subtly with one word in a man page)

At least the uniq man page should clarify this with a note and a switch to presort (on the penalty of performance) to deliver the EXPECTED -c results?

I would like to see the uniq source code - is there a reference?

thanks

fletch00

View Public Profile for fletch00

Find all posts by fletch00

05-20-2012

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

You are making a really big issue out of this trivial matter and trying to blame the tool, instead of making it a learning experience.

This has nothing to do with the -c switch. -c just adds a number. This is a default behavior of uniq -- it filters only adjacent (consecutive) lines.

Quote:

Imagine if I implemented "sort" and said "applies only to letters I R O N and Y" (but buried that subtly with one word in a man page)

What are you trying to say with this comment? The fact that it operates on consecutive lines makes it more general and useful, not less.
So how would you write uniq, if you took the effort? How would you deal with the repeated lines? Would you rather slurp the whole file into memory and make this completely useless for large files? Or do you have a better solution? I'd be very interested to hear it.

Quote:

At least the uniq man page should clarify this with a note

But it does! Didn't you read my post? :

Code:

Note:  'uniq'  does  not detect repeated lines unless they are adjacent.   You may want to sort the input first, or use `sort -u' without `uniq'.

Which uniq do you have installed? What does your man page say?

Quote:

I would like to see the uniq source code - is there a reference?

Of course, help yourself:
GNU Project Archives
Again, I do not know whether it's GNU coreutils that you are using.

mirni

View Public Profile for mirni

Find all posts by mirni

05-20-2012

Registered User

6, 0

Join Date: May 2012

Last Activity: 20 May 2012, 12:26 AM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

uniq duplicates solution fix

No - no, I thank you for the solution - and I think you miss the irony - to use the uniq command for unique results one is require to presort the input.
What self respecting computer scientist would ever make such a 1/2 assed implementation without a full disclosure for the Big O tradeoff / duplicate results and offer a switch for the slower, yet accurate version of uniq -c is beyond me.

Everyone:

if you have duplicates in uniq -c - this is a feature, not a bug, since the lines must be ADJACENT to be considered.

If you want your expected results, first sort, then uniq, then sort again.

May the google duplicate uniq sort fix solution find you

thanks to mirni - I owe you a vBeer.

fletch00

View Public Profile for fletch00

Find all posts by fletch00

05-20-2012

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

Quote:

If you want your expected results, first sort, then uniq, then sort again.

No need for the last sort, it's already sorted.

O(nlog(n)) is so close to O(n), that sorting does not make much difference at all. And everything is fully disclosed in the documentation, you just have to read carefully -- every word can have significant meaning.

It is not half-assed at all, again you are missing an important point -- this is so that you can filter huge outputs without worrying about memory limitation. It is cleverly designed to be as useful as possible.

Glad I could help. (And I don't drink, but thanks!

)

This User Gave Thanks to mirni For This Post:

mirni

View Public Profile for mirni

Find all posts by mirni

Shell Programming and Scripting

uniq -c in the pipeline

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using cat and pipeline to execute script

Discussion started by: crepe6

2. Shell Programming and Scripting

Command pipeline trouble

Discussion started by: jaysunn

3. Shell Programming and Scripting

If statement with pipeline

Discussion started by: Priya Amaresh

4. Shell Programming and Scripting

Change the delimiter from Comma to Pipeline

Discussion started by: Arun Mishra

5. Shell Programming and Scripting

Shell pipeline help for a n00b

Discussion started by: Gbear

6. Shell Programming and Scripting

Retaining Pipeline values

Discussion started by: sachinnayyar

7. Shell Programming and Scripting

Comments within a shell pipeline

Discussion started by: otheus

8. UNIX for Dummies Questions & Answers

Unix Pipeline help

Discussion started by: netmaster

9. Programming

C program help please! input from pipeline

Discussion started by: kinggizmo