Get duplicate rows from a csv file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Get duplicate rows from a csv file
# 1  
Old 07-27-2017
Get duplicate rows from a csv file

How can i get the duplicates rows from a file using unix, for example i have data like

Code:
a,1
b,2
c,3
d,4
a,1
c,3
e,5

i want output to be like

Code:
a,1
c,3

# 2  
Old 07-27-2017
Try:
Code:
awk '++A[$0]==2' file

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 07-27-2017
Hi.

Also for the data on file z5:
Code:
$ sort z5 | uniq -d
a,1
c,3

on a system like:
Code:
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.8 (jessie) 
sort (GNU coreutils) 8.23
uniq (GNU coreutils) 8.23

Best wishes ... cheers, drl
# 4  
Old 07-27-2017
@scrutinizer, thanks it really worked, could you please help me with the same case, if i have to find the duplicate rows in a csv if column 2 of csv is not equal, like if

Code:
a,1,2
a,3,2
b,2,4

my output should be like

Code:
a,1,2
a,3,2

because 2nd column do not equal rest are same.

Last edited by Scrutinizer; 07-27-2017 at 01:09 PM.. Reason: code tags
# 5  
Old 07-30-2017
Try something like:
Code:
awk -F, 'NR==FNR {A[$1,$3]++; next} A[$1,$3]>1'  file file

or
Code:
awk -F, 'NR==FNR {A[$1,$3]++; next} A[$1,$3]>1 && !B[$0]++' file file

if there can be multiple occurrences of the same record and they only need to be printed once.


--
Note the input file needs to be specified twice..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing Duplicate Rows in a file

Hello I have a file with contents like this... Part1 Field2 Field3 Field4 (line1) Part2 Field2 Field3 Field4 (line2) Part3 Field2 Field3 Field4 (line3) Part1 Field2 Field3 Field4 (line4) Part4 Field2 Field3 Field4 (line5) Part5 Field2 Field3 Field4 (line6) Part2 Field2 Field3 Field4... (7 Replies)
Discussion started by: ekbaazigar
7 Replies

2. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

3. Shell Programming and Scripting

Duplicate rows in a text file

notes: i am using cygwin and notepad++ only for checking this and my OS is XP. #!/bin/bash typeset -i totalvalue=(wc -w /cygdrive/c/cygwinfiles/database.txt) typeset -i totallines=(wc -l /cygdrive/c/cygwinfiles/database.txt) typeset -i columnlines=`expr $totalvalue / $totallines` awk -F' ' -v... (5 Replies)
Discussion started by: whitecross
5 Replies

4. Shell Programming and Scripting

To remove date and duplicate rows from a log file using unix commands

Hi, I have a log file having size of 48mb. For such a large log file. I want to get the message in a particular format which includes only unique error and exception messages. The following things to be done : 1) To remove all the date and time from the log file 2) To remove all the... (1 Reply)
Discussion started by: Pank10
1 Replies

5. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I want to duplicate a row if found two or more values in a particular column for corresponding row which is delimitted by comma. Input abc,line one,value1 abc,line two, value1, value2 abc,line three,value1 needs to converted to abc,line one,value1 abc,line two, value1 abc,line... (8 Replies)
Discussion started by: Incrediblian
8 Replies

6. HP-UX

How to get Duplicate rows in a file

Hi all, I have written one shell script. The output file of this script is having sql output. In that file, I want to extract the rows which are having multiple entries(duplicate rows). For example, the output file will be like the following way. ... (7 Replies)
Discussion started by: raghu.iv85
7 Replies

7. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies

8. Shell Programming and Scripting

how to delete duplicate rows in a file

I have a file content like below. "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","",""... (5 Replies)
Discussion started by: vamshikrishnab
5 Replies

9. Shell Programming and Scripting

duplicate rows in a file

hi all can anyone please let me know if there is a way to find out duplicate rows in a file. i have a file that has hundreds of numbers(all in next row). i want to find out the numbers that are repeted in the file. eg. 123434 534 5575 4746767 347624 5575 i want 5575 please help (3 Replies)
Discussion started by: infyanurag
3 Replies

10. UNIX for Advanced & Expert Users

obtain duplicate keys in csv file

Hi, having two csv files, both sorted, by key (column1), f1 containing duplicate keys and f2 containing no duplicate keys, how can I obtain all rows from f1 with the keys listed in file2? Example: f1 is: k1,gsj01fd k2,vi982cj k2,1fjk01e k3,81kjfds k4,sd9dasi f2 is: k2 k3 and I... (3 Replies)
Discussion started by: oscarmon
3 Replies
Login or Register to Ask a Question