Numbering duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Numbering duplicates
# 1  
Old 08-02-2009
Numbering duplicates

Hi,

I have this large file and sometimes there are duplicates and I want to basically find them and figure how many there are.

So I have a file with multiple columns and the last column (9) has the duplicates.

eg.

yan
tar
tar
man
ban
tan
tub
tub
tub

Basically what I want to do is label non duplicates as "0" and duplicates as "0", "1" and in the case of triplicates "0", "1" and "2"

So the output file will look like this

yan 0
tar 0
tar 1
man 0
ban 0
tan 0
tub 0
tub 1
tub 2

thanks

KylleSmilie
# 2  
Old 08-02-2009
Code:
awk '
NF >= 9 { word[$9]++ }
END { for (w in word) {
            print w,word[w]
            }
       }
'  inputfile

# 3  
Old 08-02-2009
you didnt tell what delimeter you have but Try this...
Code:
awk '{print $9,word[$9]++}' yourfile

# 4  
Old 08-02-2009
malcomex999 is better reader Smilie, use it.
# 5  
Old 08-02-2009
Hi its tab deliminted,

thanks but Im not sure if that does what I want it to do. It counted how many are unique and how many are replicates. Basically what i want it to do is this:

Before...

yan
tar
tar
man
ban
tan
tub
tub
tub


yan unique
tar unique
tar duplicate
man unique
ban unique
tan unique
tub unique
tub duplicate
tub triplicate

thanks

---------- Post updated at 12:53 PM ---------- Previous update was at 12:46 PM ----------

Hi its tab deliminted,

thanks but Im not sure if that does what I want it to do. It counted how many are unique and how many are replicates. Basically what i want it to do is this:

Before...

yan
tar
tar
man
ban
tan
tub
tub
tub


yan unique
tar unique
tar duplicate
man unique
ban unique
tan unique
tub unique
tub duplicate
tub triplicate

thanks
# 6  
Old 08-02-2009
Assuming you're using the 9th column:

Code:
awk '{print $9, a[$9]++?" duplicate":" unique"}' file

# 7  
Old 08-02-2009
Did you try
Quote:
Originally Posted by malcomex999
malcom
version ?
It give result:
Code:
yan 0
tar 0
tar 1
man 0
ban 0
tan 0
tub 0
tub 1
tub 2

Which is just that what you have in your 1st definition. Your field delimeter is tab, which is one of the default delimeter. If your data include also space in data, then you need set FS value:
Code:
awk -F "\t" '{print $9,word[$9]++}' yourfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Numbering by field

I'm not really sure how to explain this but I will try. In the attached file if $4=$4 and $5="-" then the last record is 1 and the one above that is 2, etc... However, $4=$4 and $5="-" then the first record is 1 and the one below that is 2, etc... "-" example: chr10 90694830 90695123... (7 Replies)
Discussion started by: cmccabe
7 Replies

2. Shell Programming and Scripting

Incremental numbering?

Would it be possible for a script to duplicate a file and incrementally number it? File in: XXX_007_0580_xxxx_v0016.aep File out: XXX_007_0580_xxxx_v0017.aep If someone knows of a way I'd love to see it. Thanks! (7 Replies)
Discussion started by: scribling
7 Replies

3. Shell Programming and Scripting

Numbering file's lines

hey a file called test : Code: hey1 hey2 hey3 ........ how to : Code: 1.hey1 2.hey2 3.hey3 .......... (3 Replies)
Discussion started by: eawedat
3 Replies

4. Shell Programming and Scripting

help with numbering a file

Hi, All I need to do is number a file. The file looks like this > JJJJJJJJJJJJJJJJJJJJJ > JKJKJKKKKKKJJJ > MMMMYKKKJKKK what I want to do is number it so that theres a numerical value beside the >. >1 JJJJJJJJJJJJJJJJJJJJJ >2 JKJKJKKKKKKJJJ (2 Replies)
Discussion started by: kylle345
2 Replies

5. UNIX for Dummies Questions & Answers

Numbering the rows

If I a list of components, is there anyway to number (like automatically have: 1,2,3,...) the rows of my data? (1 Reply)
Discussion started by: cosmologist
1 Replies

6. Shell Programming and Scripting

Numbering Lines

Hello everyone, I want get numbered lines from a file. and i can do it with: sed = file.txt | sed "/./N; s/\n/ /" | sed -n "5,7p" but the output that i get is something similar to: 5 line5 6 line6 7 line7 and i want something like this (with 2points after the number): 5:... (6 Replies)
Discussion started by: vibra
6 Replies

7. UNIX for Advanced & Expert Users

numbering blanks

hello i'm trying to figure out how to number a blank line. For instance this : sed '/./=' file | sed '/./N; s/\n/ /' gives me 1 aaaa 2 bbbbbb 4 cccccc 5 ffkkkfff 6 ffsdfdfs I would like something like this: 1 aaaaa 2 3 bbbbbb 4 5 cccccc And so... (6 Replies)
Discussion started by: wisher115
6 Replies

8. Shell Programming and Scripting

Numbering

I'm trying to do a script that will look for a log file if it is already there change the name to another name. I.E if log.0 is there rename to log.1 rename log.1 to log.2 rename log.2 to log.3 and so on. Only thing is I got no idea where or what is the best command to use for this? ... (3 Replies)
Discussion started by: merlin
3 Replies

9. UNIX for Dummies Questions & Answers

Numbering!

Just a shot question... how to make 1,2,3,...999 into the form of 001,002,003....999 (3 digits) Thanks.... (9 Replies)
Discussion started by: biglemon
9 Replies

10. UNIX for Dummies Questions & Answers

numbering of process

:confused: How does UNIX handle the numbering of processes? (2 Replies)
Discussion started by: tweety111
2 Replies
Login or Register to Ask a Question