Getting Data Count by Removing Duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting Data Count by Removing Duplicates
# 1  
Old 04-10-2012
Getting Data Count by Removing Duplicates

Hi Experts,

I have many CSV data files in the below format (Example) :-
Code:
Doc Number,Line Number,Condition Number
111,10,ABC
111,10,PQR
111,10,XYZ
222,20,DEF
222,20,EFG
222,20,HIJ
333,30,CCC
333,30,TCP

Now, for the above data i want to get the row count based on the Doc Number & Line Number combination (Excluding the Condition Number) i.e the program should consider only the Doc Number & Line Number as an unique combination & give me the record count, for the above example the record count should come as 3.
Can anyone please tell me which UNIX command can be used for this?

Thanks

Last edited by Franklin52; 04-10-2012 at 07:15 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 04-10-2012
Code:
 
$ nawk -F, '{a[$1","$2]++;next}END{for(i in a)print i"---->"a[i]}' input.txt
333,30---->2
222,20---->3
111,10---->3

# 3  
Old 04-10-2012
Or is it this what you're looking for:
Code:
awk -F, '{a[$1$2]}END{for(i in a)s++;print s}' file

This User Gave Thanks to Franklin52 For This Post:
# 4  
Old 04-10-2012
Hi Franklin,

Yes, this is what i was looking for, thanks a lot for your help Smilie, i will let you know in case i need any more help.

Hi Kamraj,

Thanks for your help as well Smilie
# 5  
Old 04-10-2012
Perhaps also skip the header and use a comma (SUBSEP) in the index to prevent unintended blending of field1 and field2?
Code:
awk -F, 'NR>1{a[$1,$2]}END{for(i in a)s++;print s}' infile

# 6  
Old 04-10-2012
Hi Experts,

Thanks Smilie
Just one more query, is it possible for me to run this awk command for a number of files in one go?
I mean, i have around 1000 files & i want to get the data count of all these 1000 files in one go rather then running the command 1000 times.
# 7  
Old 04-10-2012
You can use multiple files, if you replace NR with FNR (if all these files have a header)..
But with 1000 files you'll run into system limits when you supply that many on the command line...

Do you only want the grand total or the total per file?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

2. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

3. Shell Programming and Scripting

Help in removing duplicates

I have an input file abc.txt with info like: abcd rateuse inklite robet rateuse abcd I need to remove duplicates from the file (eg: abcd,rateuse) from the file and need to place the contents in same file abc.txt if needed can be placed in another file. can anyone help me in this :( (4 Replies)
Discussion started by: rkrish
4 Replies

4. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

5. Emergency UNIX and Linux Support

Removing all the duplicates

i want to remove all the duplictaes in a file.I dont want even a single entry. For the input data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 i need the below data in one file 15670|12|13 and the below data in another file (9 Replies)
Discussion started by: pandeesh
9 Replies

6. Shell Programming and Scripting

Removing duplicates

I have a test file with the following 2 columns: Col 1 | Col 2 T1 | 1 <= remove T5 | 1 T4 | 2 T1 | 3 T3 | 3 T4 | 1 <= remove T1 | 2 <= remove T3 ... (7 Replies)
Discussion started by: gctex
7 Replies

7. UNIX for Advanced & Expert Users

removing duplicates.

Hi All In unix ,we have a file ,there we have to remove the duplicates by using one specific column. Can any body tell me the command. ex: file1 id,name 1,ww 2,qwq 2,asas 3,asa 4,asas 4,asas o/p: 1,ww 2,qwq 3,asa (7 Replies)
Discussion started by: raju4u
7 Replies

8. Shell Programming and Scripting

Removing duplicates

Hi, I have a file in the below format., test test (10) to to (25) see see (45) and i need the output in the format of test 10 to 25 see 45 Some one help me? (6 Replies)
Discussion started by: imdadulla
6 Replies

9. Shell Programming and Scripting

removing duplicates

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here ) I'm trying to write a scrip that will remove duplicate entries the script can... (5 Replies)
Discussion started by: stevie_velvet
5 Replies

10. Shell Programming and Scripting

Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working. I've search the forum but nothing comes close. I have a sample file: 27147140631203RA CCD * 27147140631203RA PPN * 37147140631207RD AAA 47147140631203RD JNA... (12 Replies)
Discussion started by: giannicello
12 Replies
Login or Register to Ask a Question