Getting Data Count by Removing Duplicates

04-10-2012

Registered User

5, 0

Join Date: Apr 2012

Last Activity: 12 April 2012, 4:49 AM EDT

Posts: 5

Thanks Given: 2

Thanked 0 Times in 0 Posts

Getting Data Count by Removing Duplicates

Hi Experts,

I have many CSV data files in the below format (Example) :-

Code:

Doc Number,Line Number,Condition Number
111,10,ABC
111,10,PQR
111,10,XYZ
222,20,DEF
222,20,EFG
222,20,HIJ
333,30,CCC
333,30,TCP

Now, for the above data i want to get the row count based on the Doc Number & Line Number combination (Excluding the Condition Number) i.e the program should consider only the Doc Number & Line Number as an unique combination & give me the record count, for the above example the record count should come as 3.
Can anyone please tell me which UNIX command can be used for this?

Thanks

Last edited by Franklin52; 04-10-2012 at 07:15 AM.. Reason: Please use code tags for code and data samples, thank you

naikamit

View Public Profile for naikamit

Find all posts by naikamit

04-10-2012

Registered User

3,149, 702

Join Date: Apr 2010

Last Activity: 10 July 2019, 11:33 PM EDT

Posts: 3,149

Thanks Given: 46

Thanked 702 Times in 677 Posts

Code:

 
$ nawk -F, '{a[$1","$2]++;next}END{for(i in a)print i"---->"a[i]}' input.txt
333,30---->2
222,20---->3
111,10---->3

itkamaraj

View Public Profile for itkamaraj

Find all posts by itkamaraj

04-10-2012

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Or is it this what you're looking for:

Code:

awk -F, '{a[$1$2]}END{for(i in a)s++;print s}' file

This User Gave Thanks to Franklin52 For This Post:

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

04-10-2012

Registered User

5, 0

Join Date: Apr 2012

Last Activity: 12 April 2012, 4:49 AM EDT

Posts: 5

Thanks Given: 2

Thanked 0 Times in 0 Posts

Hi Franklin,

Yes, this is what i was looking for, thanks a lot for your help

, i will let you know in case i need any more help.

Hi Kamraj,

Thanks for your help as well

naikamit

View Public Profile for naikamit

Find all posts by naikamit

04-10-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Perhaps also skip the header and use a comma (SUBSEP) in the index to prevent unintended blending of field1 and field2?

Code:

awk -F, 'NR>1{a[$1,$2]}END{for(i in a)s++;print s}' infile

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

04-10-2012

Registered User

5, 0

Join Date: Apr 2012

Last Activity: 12 April 2012, 4:49 AM EDT

Posts: 5

Thanks Given: 2

Thanked 0 Times in 0 Posts

Hi Experts,

Thanks

Just one more query, is it possible for me to run this awk command for a number of files in one go?
I mean, i have around 1000 files & i want to get the data count of all these 1000 files in one go rather then running the command 1000 times.

naikamit

View Public Profile for naikamit

Find all posts by naikamit

04-10-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

You can use multiple files, if you replace NR with FNR (if all these files have a header)..
But with 1000 files you'll run into system limits when you supply that many on the command line...

Do you only want the grand total or the total per file?

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Getting Data Count by Removing Duplicates

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

Discussion started by: sagar_1986

2. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Discussion started by: Sri3001

3. Shell Programming and Scripting

Help in removing duplicates

Discussion started by: rkrish

4. Shell Programming and Scripting

Removing Duplicates from file

Discussion started by: tinufarid

5. Emergency UNIX and Linux Support

Removing all the duplicates

Discussion started by: pandeesh

6. Shell Programming and Scripting

Removing duplicates

Discussion started by: gctex

7. UNIX for Advanced & Expert Users

removing duplicates.

Discussion started by: raju4u

8. Shell Programming and Scripting

Removing duplicates

Discussion started by: imdadulla

9. Shell Programming and Scripting

removing duplicates

Discussion started by: stevie_velvet

10. Shell Programming and Scripting

Removing duplicates

Discussion started by: giannicello