Filter duplicate records from csv file with condition on one column

12-28-2017

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns
Pasting just three column for problem description
I want to filter record if column 1 matches CN or DN then,
check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345
and if column 2 contains 6789, 6789 in row, then in column 3 values must be in sequence 7890, 7890
or if column 2 contains duplicate value(1234,1234) in row(1-4) in bundle, then column 3 must also contains duplicate value(4567,4567) in row(1-4)
or if column 2 contains duplicate value(5678,5678) in row(5-8) in bundle, then column 3 must also contains duplicate value(4321,4321) in row(5-8)
if combination as explained above is not present, then logs must be printed in another file with error code and line number

Sample file.

Code:

CN	1234	4567
CN	1234	4567
CN	1234	4567
CN	1234	4567
CN	5678	4321
CN	5678	4321
CN	5678	4321
CN	5678	4321

Last edited by jim mcnamara; 12-28-2017 at 12:25 PM..

as7951

View Public Profile for as7951

Find all posts by as7951

12-28-2017

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

This is the kind of question that needs to have:
Sample good input that will not be "filtered"
Sample bad input -> expected output

Without this start we cannot help.

What code have you tried? Please show us where you are in your attempt.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

12-28-2017

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Hi Jim,

In this problem i want to look into csn file and need to print error if combination above does not exist in any row.(No changes to be done in csv file)
i tired the below code, but not sure what to do next,

Code:

awk '{if (x[$2$3]) { x_count[$2$3]++; print $0; if (x_count[$2$3] == 1) { print x[$2$3] } } x[$2$3] = $0}'

As asked by you :
Good input will be like below :

Code:

DT	DN	ON
CN	1234	4567
CN	1234	4567
CN	1234	4567
CN	1234	4567
CN	5678	4321
CN	5678	4321
CN	5678	4321
CN	5678	4321

Bad input will be like below

marked in red)

Code:

DT	DN    ON
CN	1234	4567
CN	1234	4567
CN	1234	4567
CN	5678	4564
CN	5678	4321
CN	5678	4564
CN	7890	7654
CN	7890	7654
CN	7890	3243

Last edited by Don Cragun; 12-29-2017 at 03:35 AM.. Reason: Add CODE tags again.

as7951

View Public Profile for as7951

Find all posts by as7951

12-28-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

And what should the output look like?
Do you always need 4 rows of identical values?
I can't recognize a pattern. How do we tell correct from wrong numbers?
Will it always be those exact numbers given in post#1?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-28-2017

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Taking a bit of a guess at the error message format but hopefully this is close enough for the OP to modify to their liking:

Code:

awk '
$2 in V && V[$2] != $3 {
    print "Line " NR " " $3 " <> " V[$2]
    next }
{ V[$2] = $3 }' inputfile

Output for testing data:

Code:

Line 5 4321 <> 4564
Line 9 3243 <> 7654

f you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Edit:

or another similar solution:

Code:

awk '
{ 
  if (V[$2] && V[$2] != $3 )
    print "Line " NR " " $3 " <> " V[$2]
  else V[$2] = $3
}' inputfile

Last edited by Chubler_XL; 12-28-2017 at 08:21 PM..

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

12-30-2017

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Hi Rudic,

I don't want to modify input data in csv file and don't want output in diff file
i just want to throw/print error for the rows where condition is not met in csv file

File should contain data in two columns in below given format.
and numbers in row and column may vary.
In short if column 2 contain row(1-2) with duplicate values(1234,1234) and column 3 should also contain duplicate values(4567,4567) in row(1-2)
and false condition will be when column 2 contain duplicate value(0808,0808,0808) where in row(1-3) but column 3 does not contain duplicate value(4567,4567,1234) in rows(1-3) , where column 3 contain 1234 in row 3 which causes this condition to be false

hope im clear now
Good condition

Code:

DT	DN	ON
CN	1234	4567
CN	1234	4567
CN	9876	6543
CN	9876	6543
CN	5678	4321
CN	5678	4321
CN	0909	3089
CN	0909	3089

False condition in "red"

Code:

DT   DN     ON
CN   0808  4567
CN   0808  4567
CN   0808  1234

---------- Post updated at 03:31 AM ---------- Previous update was at 02:24 AM ----------

Hi chubler,

Could you please help me , how to execute these script.
As when i tried putting these code in .sh file then no output is coming
and when tried from command line getting syntax error at "next" command.

---------- Post updated 12-30-17 at 12:41 AM ---------- Previous update was 12-29-17 at 03:31 AM ----------

Hi chubler,

Thank you for the code, will run and test the same,
and will let you know for issue if any.

thanks

Last edited by Don Cragun; 12-29-2017 at 03:38 AM.. Reason: Add CODE tags again.

as7951

View Public Profile for as7951

Find all posts by as7951

Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Discussion started by: sunilmudikonda

2. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Discussion started by: as7951

3. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Discussion started by: LMHmedchem

4. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Discussion started by: deadyetagain

5. Linux

Filter a .CSV file based on the 5th column values

Discussion started by: dhruuv369

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

Discussion started by: cokedude

7. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Discussion started by: G.K.K

8. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

9. Shell Programming and Scripting

Apply condition on fixed width file and filter records

Discussion started by: sureshg_sampat

10. Shell Programming and Scripting

Find Duplicate records in first Column in File

Discussion started by: Murugesh