Find duplicates in the first column of text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find duplicates in the first column of text file
# 1  
Old 06-27-2010
Error Find duplicates in the first column of text file

Hello,

My text file has input of the form

Code:
abc dft45.xml
ert  rt653.xml
abc ert57.xml

I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form...

Code:
abc dft45.xml
abc ert57.xml

Can some one help me plz?

Last edited by Scott; 06-27-2010 at 06:23 AM.. Reason: Code tags, please...
# 2  
Old 06-27-2010
Hi

Code:
awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 1)print;}' file file

You need to give the filename twice as shown above.

Guru.
# 3  
Old 06-27-2010
Explain

Can u plz explain the awk command what it is doing? & why u have mentioned "file" two times?
# 4  
Old 06-27-2010
Hi
First time, when the file is processed, it takes the count of 1st column duplicates. Second time, when it is processed, it starts printing those lines which has count more than 1.

btw, did it work?

Guru.
This User Gave Thanks to guruprasadpr For This Post:
# 5  
Old 06-27-2010
A single-pass version (increased ram requirement since all lines of the file are stored for END use):
Code:
 awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a[i]}' data

Regards,
Alister
# 6  
Old 06-28-2010
what is meant by "First time process" and "Second time process" ?

I will try out & comment here quicky ASAP b'coz there is a problem in my machine.

---------- Post updated at 08:05 PM ---------- Previous update was at 07:52 PM ----------

Quote:
Originally Posted by alister
A single-pass version (increased ram requirement since all lines of the file are stored for END use):
Code:
 awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a[i]}' data

Regards,
Alister

Can u explain what's the code is doing?

---------- Post updated 06-28-10 at 02:16 PM ---------- Previous update was 06-27-10 at 08:05 PM ----------

Quote:
Originally Posted by guruprasadpr
Hi
First time, when the file is processed, it takes the count of 1st column duplicates. Second time, when it is processed, it starts printing those lines which has count more than 1.

btw, did it work?

Guru.

It worked ! thanks...But i need to find the count of it's occurence

Code:
awk ' { per[$1] += 1}
END { for (i in per)
print i, per[i] } ' dupli.txt > dupli_count.txt

in the above code i need to print the Total count as "Sum=????" (i need to count 2nd column.)

Last edited by Scott; 06-28-2010 at 05:47 AM.. Reason: Code tags, please...
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected... (4 Replies)
Discussion started by: vatigers
4 Replies

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

with below given format, I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any, 0.237788 Aaban Aahva 0.291066 Aabheer Aahlaad 0.845814 Aabid Aahan 0.152208 Aadam... (6 Replies)
Discussion started by: busyboy
6 Replies

4. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

5. Red Hat

How to find a garbage entry in a column wise text file in Linux?

Suppose I have a file containing :- 1 Apple $50 2 Orange $30 3 Banana $10 4 Guava $25 5 Pine@apple $12 6 Strawberry $21 7 Grapes $12 In the 5th row, @ character inserted. I want through sort command or by any other way this row should either on top or bottom. By sort command garbage... (1 Reply)
Discussion started by: Dipankar Mitra
1 Replies

6. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

8. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

9. Shell Programming and Scripting

How to find the number of column in the text file...?

Hi, i have text file with ~ seperated columns. it is very huge size of file, in the file sompulsary supposed to has 20 columns with ~ seperated. so how can i find if the file has 20 column in the all rows...? Sample file: APA+VU~10~~~~~03~101~101~~~APA.N O 20081017 120.00... (1 Reply)
Discussion started by: psiva_arul
1 Replies
Login or Register to Ask a Question