Sponsored Content
Top Forums Shell Programming and Scripting awk script to find duplicate values Post 302910703 by aramacha on Sunday 27th of July 2014 01:57:49 PM
Old 07-27-2014
awk script to find duplicate values

The data below consits of items with Class, Sub Class and Property values. I would like to find out same value being captured for different property values for a same Class/Sub Class combination (with in an Item & across items). Like 123 being captured for PAD1, PAD2, PAD4 for ABC-DEF, 456 captured for PXM1, PXM4 and 234 captured for PAD2, PAD1. (Note sometime value could be separated by coma(,) within a cell)


Column Separator = Pipe (|)

Input data
Code:
ID|Class|SubClass|Prop|Value
1|ABC|DEF|PAD1|123|
1|ABC|DEF|PAD2|234|
1|ABC|DEF|PAD3|476|
1|ABC|DEF|PAD4|123|
2|XYZ|MNF|PXM1|456|
2|XYZ|MNF|PXM2|289|
2|XYZ|MNF|PXM3|279|
2|XYZ|MNF|PXM4|488,456|
2|XYZ|MNF|PXM5|284|
3|ABC|DEF|PAD1|234|
3|ABC|DEF|PAD2|777,123|
3|ABC|DEF|PAD3|567|
3|ABC|DEF|PAD4|556|

Output data
Code:
ID|Class|SubClass|Prop|Value|
1|ABC|DEF|PAD1|123|
1|ABC|DEF|PAD4|123|
3|ABC|DEF|PAD2|123|
3|ABC|DEF|PAD1|234|
1|ABC|DEF|PAD2|234|
2|XYZ|MNF|PXM1|456|
2|XYZ|MNF|PXM4|456|

Thanks

Last edited by Scott; 07-27-2014 at 03:28 PM.. Reason: Please use code tags for code and data
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two... (5 Replies)
Discussion started by: nuthalapati
5 Replies

2. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

3. Shell Programming and Scripting

[Solved] Find duplicate and add pattern in sed/awk

<Update> I have the solution: sed 's/\{3\}/&;&;---;4/' The thread can be marked as solved! </Update> Hi There, I'm working on a script processing some data from a website into cvs format. There is only one final problem left I can't find a solution. I've processed my file... (0 Replies)
Discussion started by: lolworlds
0 Replies

4. Shell Programming and Scripting

How to find the X highest values in a list depending on the values of another list with bash/awk?

Hi everyone, This is an exemple of inpout.txt file (a "," delimited text file which can be open as csv file): ID, Code, Value, Store SP|01, AABBCDE, 15, 3 SP|01, AABBCDE, 14, 2 SP|01, AABBCDF, 13, 2 SP|01, AABBCDE, 16, 3 SP|02, AABBCED, 15, 2 SP|01, AABBCDF, 12, 3 SP|01, AABBCDD,... (1 Reply)
Discussion started by: jeremy589
1 Replies

5. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

6. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

I need to sum values in text file in case duplicate row are present with same name and different value below is example of data in file i have and format i need. Data in text file 20170308 PM,U,2 PM,U,113 PM,I,123 DA,U,135 DA,I,113 DA,I,1 20170309 PM,U,2 PM,U,1 PM,I,123 PM,I,1... (3 Replies)
Discussion started by: Adfire
3 Replies

7. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (12 Replies)
Discussion started by: as7951
12 Replies

8. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies
RRDXPORT(1)							      rrdtool							       RRDXPORT(1)

NAME
rrdxport - Export data in XML format based on data from one or several RRD SYNOPSIS
rrdtool xport [-s|--start seconds] [-e|--end seconds] [-m|--maxrows rows] [--step value] [--json] [--daemon address] [DEF:vname=rrd:ds- name:CF] [CDEF:vname=rpn-expression] [XPORT:vname[:legend]] DESCRIPTION
The xport function's main purpose is to write an XML formatted representation of the data stored in one or several RRDs. It can also extract numerical reports. If no XPORT statements are found, there will be no output. -s|--start seconds (default end-1day) The time when the exported range should begin. Time in seconds since epoch (1970-01-01) is required. Negative numbers are relative to the current time. By default one day worth of data will be printed. See also AT-STYLE TIME SPECIFICATION section in the rrdfetch documentation for a detailed explanation on how to specify time. -e|--end seconds (default now) The time when the exported range should end. Time in seconds since epoch. See also AT-STYLE TIME SPECIFICATION section in the rrdfetch documentation for a detailed explanation of ways to specify time. -m|--maxrows rows (default 400 rows) This works like the -w|--width parameter of rrdgraph. In fact it is exactly the same, but the parameter was renamed to describe its purpose in this module. See rrdgraph documentation for details. --step value (default automatic) See rrdgraph documentation. --daemon address Address of the rrdcached daemon. If specified, a "flush" command is sent to the server before reading the RRD files. This allows rrdtool to return fresh data even if the daemon is configured to cache values for a long time. For a list of accepted formats, see the -l option in the rrdcached manual. rrdtool xport --daemon unix:/var/run/rrdcached.sock ... --json produce json formated output (instead of xml) --enumds The generated xml should contain the data values in enumerated tags. <v0>val</v0><v1>val</v1> DEF:vname=rrd:ds-name:CF See rrdgraph documentation. CDEF:vname=rpn-expression See rrdgraph documentation. XPORT:vname::legend At least one XPORT statement should be present. The values referenced by vname are printed. Optionally add a legend. Output format The output is enclosed in an xport element and contains two blocks. The first block is enclosed by a meta element and contains some meta data. The second block is enclosed by a data element and contains the data rows. Let's assume that the xport command looks like this: rrdtool xport --start now-1h --end now DEF:xx=host-inout.lo.rrd:output:AVERAGE DEF:yy=host-inout.lo.rrd:input:AVERAGE CDEF:aa=xx,yy,+,8,* XPORT:xx:"out bytes" XPORT:aa:"in and out bits" The resulting meta data section is (the values will depend on the RRD characteristics): <meta> <start>1020611700</start> <step>300</step> <end>1020615600</end> <rows>14</rows> <columns>2</columns> <legend> <entry>out bytes</entry> <entry>in and out bits</entry> </legend> </meta> The resulting data section is: <data> <row><t>1020611700</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020612000</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020612300</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020612600</t><v>3.4113333333e+00</v><v>5.4581333333e+01</v></row> <row><t>1020612900</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020613200</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020613500</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020613800</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020614100</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020614400</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020614700</t><v>3.7333333333e+00</v><v>5.9733333333e+01</v></row> <row><t>1020615000</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020615300</t><v>3.4000000000e+00</v><v>5.4400000000e+01</v></row> <row><t>1020615600</t><v>NaN</v><v>NaN</v></row> </data> EXAMPLE 1 rrdtool xport DEF:out=if1-inouts.rrd:outoctets:AVERAGE XPORT:out:"out bytes" EXAMPLE 2 rrdtool xport DEF:out1=if1-inouts.rrd:outoctets:AVERAGE DEF:out2=if2-inouts.rrd:outoctets:AVERAGE CDEF:sum=out1,out2,+ XPORT:out1:"if1 out bytes" XPORT:out2:"if2 out bytes" XPORT:sum:"output sum" ENVIRONMENT VARIABLES
The following environment variables may be used to change the behavior of "rrdtool xport": RRDCACHED_ADDRESS If this environment variable is set it will have the same effect as specifying the "--daemon" option on the command line. If both are present, the command line argument takes precedence. AUTHOR
Tobias Oetiker <tobi@oetiker.ch> 1.4.8 2013-05-23 RRDXPORT(1)
All times are GMT -4. The time now is 08:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy