Sponsored Content
Top Forums Shell Programming and Scripting Removing duplicates from delimited file based on 2 columns Post 302912938 by kevinprood on Tuesday 12th of August 2014 10:33:18 PM
Old 08-12-2014
Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker...
Column #1 is a simple ID, which is used to identify the duplicate.
Once dups are identified, I need to only keep the one with the latest date, which is column #4, in mm/dd/yyyy format. Of course, rows that don't have dup's would remain as-is.
Example input.txt:
Code:
9300617000372|Skittles|Candy|5/1/2013|12
4381472200131|M&Ms|Chocolate|9/20/2013|39
9414789515104|Jif|Peanut Butter|11/8/2013|14
4381472200131|Reese's|Peanut Butter|5/20/2014|61
4381472200131|Reese's|Chocolate|2/20/2014|36


In that scenario, the output would be rows 1, 3, and 4, since rows 2 and 5 are duplicates based on the ID and are older than the one in row 4 based on date.
The other kicker is...
The file I'm doing this with is 400,000 rows. So, I need the method to be extremely efficient and as quick as possible. I can't afford for this to take hours.


This is running on a Windows machine with GnuWin utils, as one last note.
I am definitely not enough of an expert to make this work, especially efficiently, so I'm hoping someone can help. Many thanks in advance.

Last edited by Don Cragun; 08-13-2014 at 12:29 AM.. Reason: Remove FONT tags; add CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars (1 Reply)
Discussion started by: pukars4u
1 Replies

2. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

3. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k... (7 Replies)
Discussion started by: kmsekhar
7 Replies

4. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies

5. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique . (4 Replies)
Discussion started by: pandeesh
4 Replies

6. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

7. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ... (5 Replies)
Discussion started by: saj
5 Replies

8. Shell Programming and Scripting

To remove duplicates from pipe delimited file

Hi some one please help me to remove duplicates from a pipe delimited file based on first two columns. 123|asdf|sfsd|qwrer 431|yui|qwer|opws 123|asdf|pol|njio Here My first record and last record are duplicates.As per my requirement I want all the latest records into one file. I want the... (12 Replies)
Discussion started by: ginrkf
12 Replies

9. Shell Programming and Scripting

Removing duplicates on a single "column" (delimited file)

Hello ! I'm quite new to linux but haven't found a script to do this task, unfortunately my knowledge is quite limited on shellscripts... Could you guys help me removing the duplicate lines of a file, based only on a single "column"? For example: M202034357;01/2008;J30RJ021;Ciclo 01... (4 Replies)
Discussion started by: Rufinofr
4 Replies

10. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies
pods::SDL::GFX::ImageFilter(3pm)			User Contributed Perl Documentation			  pods::SDL::GFX::ImageFilter(3pm)

NAME
SDL::GFX::ImageFilter - image filtering functions CATEGORY
TODO, GFX METHODS
MMX_detect int gfx_image_MMX_detect() CODE: SDL_imageFilterMMXdetect(); MMX_off void gfx_image_MMX_off() CODE: SDL_imageFilterMMXoff(); MMX_on void gfx_image_MMX_on() CODE: SDL_imageFilterMMXon(); add int gfx_image_add(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterAdd(Src1, Src2, Dest, length); OUTPUT: RETVAL mean int gfx_image_mean(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterMean(Src1, Src2, Dest, length); OUTPUT: RETVAL sub int gfx_image_sub(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterSub(Src1, Src2, Dest, length); OUTPUT: RETVAL abs_diff int gfx_image_abs_diff(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterAbsDiff(Src1, Src2, Dest, length); OUTPUT: RETVAL mult int gfx_image_mult(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterMult(Src1, Src2, Dest, length); OUTPUT: RETVAL mult_nor int gfx_image_mult_nor(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterMultNor(Src1, Src2, Dest, length); OUTPUT: RETVAL mult_div_by_2 int gfx_image_mult_div_by_2(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterMultDivby2(Src1, Src2, Dest, length); OUTPUT: RETVAL mult_div_by_4 int gfx_image_mult_div_by_4(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterMultDivby4(Src1, Src2, Dest, length); OUTPUT: RETVAL bit_and int gfx_image_bit_and(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterBitAnd(Src1, Src2, Dest, length); OUTPUT: RETVAL bit_or int gfx_image_bit_or(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterBitOr(Src1, Src2, Dest, length); OUTPUT: RETVAL div int gfx_image_div(Src1, Src2, Dest, length) unsigned char *Src1 unsigned char *Src2 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterDiv(Src1, Src2, Dest, length); OUTPUT: RETVAL bit_negation int gfx_image_bit_negation(Src1, Dest, length) unsigned char *Src1 unsigned char *Dest int length CODE: RETVAL = SDL_imageFilterBitNegation(Src1, Dest, length); OUTPUT: RETVAL add_byte int gfx_image_add_byte(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned char C CODE: RETVAL = SDL_imageFilterAddByte(Src1, Dest, length, C); OUTPUT: RETVAL add_uint int gfx_image_add_uint(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned int C CODE: RETVAL = SDL_imageFilterAddUint(Src1, Dest, length, C); OUTPUT: RETVAL add_byte_to_half int gfx_image_add_byte_to_half(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned char C CODE: RETVAL = SDL_imageFilterAddByteToHalf(Src1, Dest, length, C); OUTPUT: RETVAL sub_byte int gfx_image_sub_byte(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned char C CODE: RETVAL = SDL_imageFilterSubByte(Src1, Dest, length, C); OUTPUT: RETVAL sub_uint int gfx_image_sub_uint(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned int C CODE: RETVAL = SDL_imageFilterSubUint(Src1, Dest, length, C); OUTPUT: RETVAL shift_right int gfx_image_shift_right(Src1, Dest, length, N) unsigned char *Src1 unsigned char *Dest int length unsigned char N CODE: RETVAL = SDL_imageFilterShiftRight(Src1, Dest, length, N); OUTPUT: RETVAL shift_right_uint int gfx_image_shift_right_uint(Src1, Dest, length, N) unsigned char *Src1 unsigned char *Dest int length unsigned char N CODE: RETVAL = SDL_imageFilterShiftRightUint(Src1, Dest, length, N); OUTPUT: RETVAL mult_by_byte int gfx_image_mult_by_byte(Src1, Dest, length, C) unsigned char *Src1 unsigned char *Dest int length unsigned char C CODE: RETVAL = SDL_imageFilterMultByByte(Src1, Dest, length, C); OUTPUT: RETVAL shift_right_and_mult_by_byte int gfx_image_shift_right_and_mult_by_byte(Src1, Dest, length, N, C) unsigned char *Src1 unsigned char *Dest int length unsigned char N unsigned char C CODE: RETVAL = SDL_imageFilterShiftRightAndMultByByte(Src1, Dest, length, N, C); OUTPUT: RETVAL shift_left_byte int gfx_image_shift_left_byte(Src1, Dest, length, N) unsigned char *Src1 unsigned char *Dest int length unsigned char N CODE: RETVAL = SDL_imageFilterShiftLeftByte(Src1, Dest, length, N); OUTPUT: RETVAL shift_left_uint int gfx_image_shift_left_uint(Src1, Dest, length, N) unsigned char *Src1 unsigned char *Dest int length unsigned char N CODE: RETVAL = SDL_imageFilterShiftLeftUint(Src1, Dest, length, N); OUTPUT: RETVAL shift_left int gfx_image_shift_left(Src1, Dest, length, N) unsigned char *Src1 unsigned char *Dest int length unsigned char N CODE: RETVAL = SDL_imageFilterShiftLeft(Src1, Dest, length, N); OUTPUT: RETVAL binarize_using_threshold int gfx_image_binarize_using_threshold(Src1, Dest, length, T) unsigned char *Src1 unsigned char *Dest int length unsigned char T CODE: RETVAL = SDL_imageFilterBinarizeUsingThreshold(Src1, Dest, length, T); OUTPUT: RETVAL clip_to_range int gfx_image_clip_to_range(Src1, Dest, length, Tmin, Tmax) unsigned char *Src1 unsigned char *Dest int length unsigned char Tmin unsigned char Tmax CODE: RETVAL = SDL_imageFilterClipToRange(Src1, Dest, length, Tmin, Tmax); OUTPUT: RETVAL normalize_linear int gfx_image_normalize_linear(Src1, Dest, length, Cmin, Cmax, Nmin, Nmax) unsigned char *Src1 unsigned char *Dest int length int Cmin int Cmax int Nmin int Nmax CODE: RETVAL = SDL_imageFilterNormalizeLinear(Src1, Dest, length, Cmin, Cmax, Nmin, Nmax); OUTPUT: RETVAL convolve_kernel_3x3_divide int gfx_image_convolve_kernel_3x3_divide(Src, Dest, rows, columns, Kernel, Divisor) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char Divisor CODE: RETVAL = SDL_imageFilterConvolveKernel3x3Divide(Src, Dest, rows, columns, Kernel, Divisor); OUTPUT: RETVAL convolve_kernel_5x5_divide int gfx_image_convolve_kernel_5x5_divide(Src, Dest, rows, columns, Kernel, Divisor) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char Divisor CODE: RETVAL = SDL_imageFilterConvolveKernel5x5Divide(Src, Dest, rows, columns, Kernel, Divisor); OUTPUT: RETVAL convolve_kernel_7x7_divide int gfx_image_convolve_kernel_7x7_divide(Src, Dest, rows, columns, Kernel, Divisor) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char Divisor CODE: RETVAL = SDL_imageFilterConvolveKernel7x7Divide(Src, Dest, rows, columns, Kernel, Divisor); OUTPUT: RETVAL convolve_kernel_9x9_divide int gfx_image_convolve_kernel_9x9_divide(Src, Dest, rows, columns, Kernel, Divisor) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char Divisor CODE: RETVAL = SDL_imageFilterConvolveKernel9x9Divide(Src, Dest, rows, columns, Kernel, Divisor); OUTPUT: RETVAL convolve_kernel_3x3_shift_right int gfx_image_convolve_kernel_3x3_shift_right(Src, Dest, rows, columns, Kernel, NRightShift) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char NRightShift CODE: RETVAL = SDL_imageFilterConvolveKernel3x3ShiftRight(Src, Dest, rows, columns, Kernel, NRightShift); OUTPUT: RETVAL convolve_kernel_5x5_shift_right int gfx_image_convolve_kernel_5x5_shift_right(Src, Dest, rows, columns, Kernel, NRightShift) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char NRightShift CODE: RETVAL = SDL_imageFilterConvolveKernel5x5ShiftRight(Src, Dest, rows, columns, Kernel, NRightShift); OUTPUT: RETVAL convolve_kernel_7x7_shift_right int gfx_image_convolve_kernel_7x7_shift_right(Src, Dest, rows, columns, Kernel, NRightShift) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char NRightShift CODE: RETVAL = SDL_imageFilterConvolveKernel7x7ShiftRight(Src, Dest, rows, columns, Kernel, NRightShift); OUTPUT: RETVAL convolve_kernel_9x9_shift_right int gfx_image_convolve_kernel_9x9_shift_right(Src, Dest, rows, columns, Kernel, NRightShift) unsigned char *Src unsigned char *Dest int rows int columns Sint16 *Kernel unsigned char NRightShift CODE: RETVAL = SDL_imageFilterConvolveKernel9x9ShiftRight(Src, Dest, rows, columns, Kernel, NRightShift); OUTPUT: RETVAL sobel_x int gfx_image_sobel_x(Src, Dest, rows, columns) unsigned char *Src unsigned char *Dest int rows int columns CODE: RETVAL = SDL_imageFilterSobelX(Src, Dest, rows, columns); OUTPUT: RETVAL sobel_x_shift_right int gfx_image_sobel_x_shift_right(Src, Dest, rows, columns, NRightShift) unsigned char *Src unsigned char *Dest int rows int columns unsigned char NRightShift CODE: RETVAL = SDL_imageFilterSobelXShiftRight(Src, Dest, rows, columns, NRightShift); OUTPUT: RETVAL align_stack void gfx_image_align_stack() CODE: SDL_imageFilterAlignStack(); restore_stack void gfx_image_restore_stack() CODE: SDL_imageFilterRestoreStack(); AUTHORS
See "AUTHORS" in SDL. perl v5.14.2 2012-05-28 pods::SDL::GFX::ImageFilter(3pm)
All times are GMT -4. The time now is 01:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy