Sponsored Content
Top Forums Shell Programming and Scripting Count and keep duplicates in Column Post 302969519 by Aia on Wednesday 23rd of March 2016 02:39:32 PM
Old 03-23-2016
If you do not need the header:
Code:
awk 'NR == FNR {T[$1]++; next} FNR > 1{print $1, T[$1], $2}' pshields1984.input  pshields1984.input

Code:
NR == FNR {T[$1]++; next} # execute only in the first pass reading input
FNR > 1{print $1, T[$1], $2}  # skip first line and insert the tally in from previous read after the first column


Some Perl code that could be more flexible.
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $filename = shift or die "Usage: $0 FILENAME\n";
my %tally;

open my $fh, '<', $filename or die "Could not open $filename: $!\n";

<$fh>;
my $data_position = tell $fh;

while (my $entry = <$fh>) {
    my ($id) = split '\s+', $entry;
    $tally{$id}++;
}
seek $fh, $data_position, 0;
while (my $entry = <$fh>) {
    my @fields = split '\s+', $entry;
    splice @fields, 1,0, $tally{$fields[0]};
    print "@fields\n";
}
close $fh;

Save as tally.pl
Run as perl tally.pl pshields1984.input
This User Gave Thanks to Aia For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

duplicates lines with one column different

Hi I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738... (10 Replies)
Discussion started by: dhanamurthy
10 Replies

2. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Hi All, i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data p1sc1m1 15517 11325 0 01:00:24 ? 0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@ p1sc1m1 15519 11325 0 01:00:24 ? ... (5 Replies)
Discussion started by: neeraj617
5 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. Shell Programming and Scripting

Getting Data Count by Removing Duplicates

Hi Experts, I have many CSV data files in the below format (Example) :- Doc Number,Line Number,Condition Number 111,10,ABC 111,10,PQR 111,10,XYZ 222,20,DEF 222,20,EFG 222,20,HIJ 333,30,CCC 333,30,TCP Now, for the above data i want to get the row count based on the Doc Number & Line... (9 Replies)
Discussion started by: naikamit
9 Replies

5. UNIX for Dummies Questions & Answers

Grep and Count Duplicates

I have a delimited file (by |), and the second field is made out of Surnames. Is it possible to list the surnames together with their count of occurances. For example, image the first two lines are the following: Joe | Doe | 30 Jane | Doe | 28 Peter | Smith | 25 John | Jones | 26 I... (2 Replies)
Discussion started by: mouthpiec
2 Replies

6. Shell Programming and Scripting

Count total duplicates

Hi all, I have found another post threads talking about count duplicate lines, but I am interested in obtain the total number of duplicates. For example: #file.txt a1 a2 a1 a3 a1 a2 a4 a5 #out 3 (lines are duplicates) Thank you! (12 Replies)
Discussion started by: mikloz
12 Replies

7. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies

8. Shell Programming and Scripting

Read first column and count lines in second column using awk

Hello all, I would like to ask your help here: I've a huge file that has 2 columns. A part of it is: sorted.txt: kss23 rml.67lkj kss23 zhh.6gf kss23 nhd.09.fdd kss23 hp.767.88.89 fl67 nmdsfs.56.df.67 fl67 kk.fgf.98.56.n fl67 bgdgdfg.hjj.879.d fl66 kl..hfh.76.ghg fl66... (5 Replies)
Discussion started by: Padavan
5 Replies

9. Shell Programming and Scripting

Filter first column duplicates

Dear All, I really enjoy your help or suggestion for resolving an issue. Briefly, I have a file like this: a b c a d e f g h k g h x y z If the first column has the same ID, for example a, just remove it. The output should be this: f g h k g h x y z I was thinking to do it... (11 Replies)
Discussion started by: giuliangiuseppe
11 Replies

10. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies
csv(n)								  CSV processing							    csv(n)

NAME
csv - Procedures to handle CSV data. SYNOPSIS
package require Tcl 8.3 package require csv ?0.3? ::csv::join values {sepChar ,} ::csv::joinlist values {sepChar ,} ::csv::read2matrix chan m {sepChar ,} {expand none} ::csv::read2queue chan q {sepChar ,} ::csv::report cmd matrix ?chan? ::csv::split line {sepChar ,} ::csv::split2matrix m line {sepChar ,} {expand none} ::csv::split2queue q line {sepChar ,} ::csv::writematrix m chan {sepChar ,} ::csv::writequeue q chan {sepChar ,} DESCRIPTION
The csv package provides commands to manipulate information in CSV FORMAT (CSV = Comma Separated Values). COMMANDS
The following commands are available: ::csv::join values {sepChar ,} Takes a list of values and returns a string in CSV format containing these values. The separator character can be defined by the caller, but this is optional. The default is ",". ::csv::joinlist values {sepChar ,} Takes a list of lists of values and returns a string in CSV format containing these values. The separator character can be defined by the caller, but this is optional. The default is ",". Each element of the outer list is considered a record, these are separated by newlines in the result. The elements of each record are formatted as usual (via ::csv::join). ::csv::read2matrix chan m {sepChar ,} {expand none} A wrapper around ::csv::split2matrix (see below) reading CSV-formatted lines from the specified channel (until EOF) and adding them to the given matrix. For an explanation of the expand argument see ::csv::split2matrix. ::csv::read2queue chan q {sepChar ,} A wrapper around ::csv::split2queue (see below) reading CSV-formatted lines from the specified channel (until EOF) and adding them to the given queue. ::csv::report cmd matrix ?chan? A report command which can be used by the matrix methods format 2string and format 2chan. For the latter this command delegates the work to ::csv::writematrix. cmd is expected to be either printmatrix or printmatrix2channel. The channel argument, chan, has to be present for the latter and must not be present for the first. ::csv::split line {sepChar ,} converts a line in CSV format into a list of the values contained in the line. The character used to separate the values from each other can be defined by the caller, via sepChar, but this is optional. The default is ",". ::csv::split2matrix m line {sepChar ,} {expand none} The same as ::csv::split, but appends the resulting list as a new row to the matrix m, using the method add row. The expansion mode specified via expand determines how the command handles a matrix with less columns than contained in line. The allowed modes are: none This is the default mode. In this mode it is the responsibility of the caller to ensure that the matrix has enough columns to contain the full line. If there are not enough columns the list of values is silently truncated at the end to fit. empty In this mode the command expands an empty matrix to hold all columns of the specified line, but goes no further. The overall effect is that the first of a series of lines determines the number of columns in the matrix and all following lines are truncated to that size, as if mode none was set. auto In this mode the command expands the matrix as needed to hold all columns contained in line. The overall effect is that after adding a series of lines the matrix will have enough columns to hold all columns of the longest line encountered so far. ::csv::split2queue q line {sepChar ,} The same as ::csv::split, but appending the resulting list as a single item to the queue q, using the method put. ::csv::writematrix m chan {sepChar ,} A wrapper around ::csv::join taking all rows in the matrix m and writing them CSV formatted into the channel chan. ::csv::writequeue q chan {sepChar ,} A wrapper around ::csv::join taking all items in the queue q (assumes that they are lists) and writing them CSV formatted into the channel chan. FORMAT
Each record of a csv file (comma-separated values, as exported e.g. by Excel) is a set of ASCII values separated by ",". For other lan- guages it may be ";" however, although this is not important for this case (The functions provided here allow any separator character). If a value contains itself the separator ",", then it (the value) is put between "". If a value contains ", it is replaced by "". EXAMPLE
The record 123,"123,521.2","Mary says ""Hello, I am Mary""" is parsed as follows: a) 123 b) 123,521.2 c) Mary says "Hello, I am Mary" SEE ALSO
matrix, queue KEYWORDS
csv, matrix, queue, package, tcllib csv 0.3 csv(n)
All times are GMT -4. The time now is 06:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy