finding duplicates in csv based on key columns Post: 302576180

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars

2. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims...

3. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k...

4. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"...

5. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique .

6. Shell Programming and Scripting

CSV with commas in field values, remove duplicates, cut columns

Hi Description of input file I have: ------------------------- 1) CSV with double quotes for string fields. 2) Some string fields have Comma as part of field value. 3) Have Duplicate lines 4) Have 200 columns/fields 5) File size is more than 10GB Description of output file I need:...

7. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ...

8. Shell Programming and Scripting

Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column

Hi Experts , we have a CDC file where we need to get the latest record of the Key columns Key Columns will be CDC_FLAG and SRC_PMTN_I and fetch the latest record from the CDC_PRCS_TS Can we do it with a single awk command. Please help....

9. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Hi, I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns.. i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which...

10. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker...

LEARN ABOUT DEBIAN

dbix::class::helper::schema::lintcontents

DBIx::Class::Helper::Schema::LintContents(3pm)		User Contributed Perl Documentation	    DBIx::Class::Helper::Schema::LintContents(3pm)

NAME

       DBIx::Class::Helper::Schema::LintContents - Check the data in your database match your constraints

VERSION

       version 2.013002

SYNOPSIS

	package MyApp::Schema;

	use parent 'DBIx::Class::Schema';

	__PACKAGE__->load_components('Helper::Schema::LintContents');

	1;

       And later, somewhere else:

	say "Incorrectly Null Users:";
	for ($schema->null_check_source_auto('User')->all) {
	   say '* ' . $_->id
	}

	say "Duplicate Users:";
	my $duplicates = $schema->dup_check_source_auto('User');
	for (keys %$duplicates) {
	   say "Constraint: $_";
	   for ($duplicates->{$_}->all) {
	      say '* ' . $_->id
	   }
	}

	say "Users with invalid FK's:";
	my $invalid_fks = $schema->fk_check_source_auto('User');
	for (keys %$invalid_fks) {
	   say "Rel: $_";
	   for ($invalid_fks->{$_}->all) {
	      say '* ' . $_->id
	   }
	}

DESCRIPTION

       Some people think that constraints make their databases slower.	As silly as that is, I have been in a similar situation!  I'm here to help
       you, dear developers!  Basically this is a suite of methods that allow you to find violated "constraints."  To be clear, the constraints I
       mean are the ones you tell DBIx::Class about, real constraints are fairly sure to be followed.

METHODS

   fk_check_source
	my $busted = $schema->fk_check_source(
	  'User',
	  'Group',
	  { group_id => 'id' },
	);

       "fk_check_source" takes three arguments, the first is the from source moniker of a relationship.  The second is the to source or source
       moniker of a relationship.  The final argument is a hash reference representing the columns of the relationship.  The return value is a
       resultset of the from source that do not have a corresponding to row.  To be clear, the example given above would return a resultset of
       "User" rows that have a "group_id" that points to a "Group" that does not exist.

   fk_check_source_auto
	my $broken = $schema->fk_check_source_auto('User');

       "fk_check_source_auto" takes a single argument: the source to check.  It will check all the foreign key (that is, "belongs_to")
       relationships for missing...  "foreign" rows.  The return value will be a hashref where the keys are the relationship name and the values
       are resultsets of the respective violated relationship.

   dup_check_source
	my $smashed = $schema->fk_check_source( 'Group', ['id'] );

       "dup_check_source" takes two arguments, the first is the source moniker to be checked.  The second is an arrayref of columns that "should
       be" unique.  The return value is a resultset of the source that duplicate the passed columns.  So with the example above the resultset
       would return all groups that are "duplicates" of other groups based on "id".

   dup_check_source_auto
	my $ruined = $schema->dup_check_source_auto('Group');

       "dup_check_source_auto" takes a single argument, which is the name of the resultsource in which to check for duplicates.  It will return a
       hashref where they keys are the names of the unique constraints to be checked.  The values will be resultsets of the respective duplicate
       rows.

   null_check_source
	my $blarg = $schema->null_check_source('Group', ['id']);

       "null_check_source" tales two arguments, the first is the name of the source to check.  The second is an arrayref of columns that should
       contain no nulls.  The return value is simply a resultset of rows that contain nulls where they shouldn't be.

   null_check_source_auto
	my $wrecked = $schema->null_check_source_auto('Group');

       "null_check_source_auto" takes a single argument, which is the name of the resultsource in which to check for nulls.  The return value is
       simply a resultset of rows that contain nulls where they shouldn't be.  This method automatically uses the configured columns that have
       "is_nullable" set to false.

AUTHOR

       Arthur Axel "fREW" Schmidt <frioux+cpan@gmail.com>

COPYRIGHT AND LICENSE

       This software is copyright (c) 2012 by Arthur Axel "fREW" Schmidt.

       This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

perl v5.14.2							    2012-06-18			    DBIx::Class::Helper::Schema::LintContents(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

removing duplicates based on key

Discussion started by: pukars4u

2. Shell Programming and Scripting

finding duplicates in columns and removing lines

Discussion started by: totus

3. Shell Programming and Scripting

Remove duplicates based on the two key columns

Discussion started by: kmsekhar

4. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri