Count and keep duplicates in Column

03-23-2016

Registered User

8, 0

Join Date: Mar 2016

Last Activity: 4 May 2020, 2:52 PM EDT

Posts: 8

Thanks Given: 6

Thanked 0 Times in 0 Posts

Count and keep duplicates in Column

Hi folks,

I've got a csv file called test.csv

Code:

Column A Column B
Apples      1900
Apples      1901
Pears        1902
Pears        1903

I want to count and keep duplicates in the first column. Desired output

Code:

Column A Column B Column C
Apples          2              1900
Apples          2              1901
Pears            2              1902
Pears            2              1903

I have tried sort and uniq but to no avail, the uniq -c removes the duplicates. I need to keep them.

Any help would be great.

Thanks.

Last edited by pshields1984; 03-23-2016 at 02:59 PM..

pshields1984

View Public Profile for pshields1984

Find all posts by pshields1984

03-23-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Please use code tags as required by forum rules!

I guess the second column header should go with the column, no? Having fields with the field separator inside doesn't really help processing. Try

Code:

awk 'NR == FNR {T[$1]++; next} FNR == 1 {print $1, $2, "CNT", $3, $4; next} {print $1, T[$1], $2}' file file
Column A CNT Column B
Apples 2 1900
Apples 2 1901
Pears 2 1902
Pears 2 1903

Last edited by RudiC; 03-24-2016 at 08:11 AM.. Reason: typo

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-23-2016

Registered User

8, 0

Join Date: Mar 2016

Last Activity: 4 May 2020, 2:52 PM EDT

Posts: 8

Thanks Given: 6

Thanked 0 Times in 0 Posts

Thank you so much, I am almost there. Long time lurker first time poster, apologies about quoting code correctly. Can you explain the command? I don't really need column headers to make things more straight forward.

pshields1984

View Public Profile for pshields1984

Find all posts by pshields1984

03-23-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

It's two passes across the same file - first pass to count the occurrences, the second to print the fields plus the count.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-23-2016

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

If you do not need the header:

Code:

awk 'NR == FNR {T[$1]++; next} FNR > 1{print $1, T[$1], $2}' pshields1984.input  pshields1984.input

Code:

NR == FNR {T[$1]++; next} # execute only in the first pass reading input
FNR > 1{print $1, T[$1], $2}  # skip first line and insert the tally in from previous read after the first column

Some Perl code that could be more flexible.

Code:

#!/usr/bin/perl

use strict;
use warnings;

my $filename = shift or die "Usage: $0 FILENAME\n";
my %tally;

open my $fh, '<', $filename or die "Could not open $filename: $!\n";

<$fh>;
my $data_position = tell $fh;

while (my $entry = <$fh>) {
    my ($id) = split '\s+', $entry;
    $tally{$id}++;
}
seek $fh, $data_position, 0;
while (my $entry = <$fh>) {
    my @fields = split '\s+', $entry;
    splice @fields, 1,0, $tally{$fields[0]};
    print "@fields\n";
}
close $fh;

Save as tally.pl
Run as perl tally.pl pshields1984.input

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

03-23-2016

Registered User

8, 0

Join Date: Mar 2016

Last Activity: 4 May 2020, 2:52 PM EDT

Posts: 8

Thanks Given: 6

Thanked 0 Times in 0 Posts

Thank you RudiC. It worked a charm.

---------- Post updated at 06:34 PM ---------- Previous update was at 02:18 PM ----------

Thanks Aia I'll give the perl suggestion a go.

pshields1984

View Public Profile for pshields1984

Find all posts by pshields1984

Shell Programming and Scripting

Count and keep duplicates in Column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

2. Shell Programming and Scripting

Filter first column duplicates

Discussion started by: giuliangiuseppe

3. Shell Programming and Scripting

Read first column and count lines in second column using awk

Discussion started by: Padavan

4. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Discussion started by: corfuitl

5. Shell Programming and Scripting

Count total duplicates

Discussion started by: mikloz

6. UNIX for Dummies Questions & Answers

Grep and Count Duplicates

Discussion started by: mouthpiec

7. Shell Programming and Scripting

Getting Data Count by Removing Duplicates

Discussion started by: naikamit

8. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

9. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Discussion started by: neeraj617

10. Shell Programming and Scripting

duplicates lines with one column different

Discussion started by: dhanamurthy