CSV File:Filter duplicate records from column1 & another column having unique record

12-28-2017

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts,

I have csv file with 30, 40 columns
Pasting just 2 column for problem description.
Need to print error if below combination is not present in file
check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same.
For all such rows, the field LineNumber (column-2) should be unique for each row.
if column1 contain duplicate value(2345,2345) on row(1-2) then, column 2 must contain any random unique value like (1,2) in row(1-2)
similary for column 1 row(3-4) with duplicate value(6789,6789), then column 2 must contain uniquie value as below 5,6
If combination as explained above is not present, then logs must be printed in another file with error code and line number

Sample file.

Code:

DocumentNumber LineNumber
2345	         1
2345	         2
6789	         5
6789	         6
4321             2
4321             3

Last edited by Don Cragun; 12-29-2017 at 04:53 AM.. Reason: Add CODE tags again. Fix Bold tags.

as7951

View Public Profile for as7951

Find all posts by as7951

12-28-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

More details , please. What should the output look like? Will the always be exactly two lines per document number? What be the criterion for field#3 - just non-identical numbers per document No.? Any limits on those numbers?

Last edited by RudiC; 12-28-2017 at 08:37 PM..

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-29-2017

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Is this not Filter duplicate records from csv file with condition on one column ? If it is the same discussion, let me know and I will close off this thread so all the comments go to a single place for clarity.

Kind regards,
Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

12-29-2017

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Hi Robin,

This is a separate query and thread and not the same as mentioned in "Filter duplicate records from csv file with condition on one column".

---------- Post updated at 03:38 AM ---------- Previous update was at 03:34 AM ----------

Hi robin,

i dnt want to modify input file and do not want separate output,
just wanted to print line number with error code if above conditions are not met.

as7951

View Public Profile for as7951

Find all posts by as7951

01-02-2018

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

Hi Experts,

Apologies in case i am disturbing you with my posts.
I am not much good with awk scripting but I do shell scripting and try to learn more with the issues i come across
But sincerely i need to know work around for this query.

I tried the below code, but it is not working as per my expectation.
It is working when column 2 contains unique value in every row, but if row 2 and row 5 contains same value, it prints "error".

Code:

awk -F"|" '
{++CNT[$1]
}
{++ABC[$2]
}

(CNT[$1] && ABC[$2] > 1) { print "error"
        }
'

Request if you can help to improve.

I need to have file suppose that contains duplicate values in column 1 then against those duplicate value in column 2 there should be unique values
In above sample file.
There wont' be 2 line per document number, there can be any number of duplicate values, it can be more than 5 or even 50
Yes, there should be non-identical number in column2(Line number) per Document number(column1) and there is no limit on number, they just has to be non duplicate.
if column 1 contain duplicate values in row then corresponding to those duplicate values in row column 2 should contain non duplicate values

Moderator's Comments:

Please use CODE tags as required by forum rules!

Last edited by RudiC; 01-02-2018 at 07:32 AM.. Reason: Added CODE tags.

as7951

View Public Profile for as7951

Find all posts by as7951

01-02-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

No apologies needed as people in these fora are here to help. Posts don't disturb anybody - what IS disturbing is if people don't learn, be it to comply to forum rules, how to resonably specify a problem, or to apply / adapt coding hints to actual problerms.

Your code sample doesn't word with the sample in post#1 as the field separator in the data is a <TAB> followed by multiple spaces (matched by the default awk FS) and the code has | . Try

Code:

awk 'C[$1,$2]++ {print "error line", NR}' file

and report back the results.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-02-2018

Registered User

67, 1

Join Date: Dec 2017

Last Activity: 11 May 2020, 5:49 AM EDT

Posts: 67

Thanks Given: 9

Thanked 1 Time in 1 Post

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Rudic,

Thank you
It worked
You saved my life.

Salute you.

Also, pls can you let me know how this code is performing the required task.
what C stands for

as7951

View Public Profile for as7951

Find all posts by as7951

Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

Discussion started by: as7951

2. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Discussion started by: LMHmedchem

3. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

4. Shell Programming and Scripting

Output first unique record in csv file

Discussion started by: Chris LAU

5. Linux

Filter a .CSV file based on the 5th column values

Discussion started by: dhruuv369

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

Discussion started by: cokedude

7. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Discussion started by: G.K.K

8. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

9. Shell Programming and Scripting

Find Duplicate records in first Column in File

Discussion started by: Murugesh

10. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Discussion started by: phoeberunner