Sponsored Content
Top Forums Shell Programming and Scripting Duplicate line removal matching some columns only Post 302782929 by Michael Stora on Tuesday 19th of March 2013 02:52:36 PM
Old 03-19-2013
Duplicate line removal matching some columns only

I'm looking to remove duplicate rows from a CSV file with a twist.

The first row is a header.
There are 31 columns. I want to remove duplicates when the first 29 rows are identical ignoring row 30 and 31 BUT the duplicate that is kept should have the shortest total character length in rows 30 and 31 combined.
Col 31 can contain commas that are not delimiters (quoted)

Example
col1,col2,col2,...,col29,col30,"col31, may have some commas in it, but they are within quotes"

Mike
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort, duplicate removal - Query

Hi All, I have a problem with the sort and duplicate filter command I am using in one of my scripts. I have a '|' delimited file and want to sort and remove duplicates on the 1,2,15 fields. These fields constitute the primary key of the table I will be loading the data into. But I see that some... (4 Replies)
Discussion started by: novice1324
4 Replies

2. UNIX for Dummies Questions & Answers

exclude columns with a matching line pattern

Hi , I have 5 columns total and am wanting to search lines in columns 3-5 and basically grep -v patterns that match 'BBB_0123' 'BVG_0895' 'BSD_0987' Does anyone know how to do this? I tried combining grep -v with grep -e but, it didn't work. Thanks! (5 Replies)
Discussion started by: greptastic
5 Replies

3. Shell Programming and Scripting

Removal of Duplicate Entries from the file

I have a file which consists of 1000 entries. Out of 1000 entries i have 500 Duplicate Entires. I want to remove the first Duplicate Entry (i,e entire Line) in the File. The example of the File is shown below: 8244100010143276|MARISOL CARO||MORALES|HSD768|CARR 430 KM 1.7 ... (1 Reply)
Discussion started by: ravi_rn
1 Replies

4. Shell Programming and Scripting

using command line arguments as columns for pattern matching using awk

Hi, I wish to use a column, as inputted by a user from command line, for pattern matching. awk file: { if($1 ~ /^8/) { print $0> "temp2.csv" } } something like this, but i want '$1' to be any column as selected by the user from command line. ... (1 Reply)
Discussion started by: invinclible0009
1 Replies

5. Shell Programming and Scripting

Column Search and Line Removal

Hello Gurus, I need to remove lines within a file if it contains specific criteria. Here is what I am trying to resolve: Users of AppRuntime: (Total of 10 licenses issued; Total of 6 licenses in use) buih02 dsktp501 AppGui 1 (compute_lic/27006 3122), start Mon 2/22 7:58 dingj1... (3 Replies)
Discussion started by: leepet01
3 Replies

6. Shell Programming and Scripting

Remove duplicate lines (the first matching line by field criteria)

Hello to all, I have this file 2002 1 23 0 0 2435.60 131.70 5.60 20.99 0.89 0.00 285.80 2303.90 2002 1 23 15 0 2436.60 132.90 6.45 21.19 1.03 0.00 285.80 2303.70 2002 1 23 ... (6 Replies)
Discussion started by: joggdial3000
6 Replies

7. UNIX for Advanced & Expert Users

Duplicate removal

I have an input file of 5GB which contains duplicate records and have to remove duplicate records by retaing first instance of that record . Based on 5 fields the duplicates has to be removed . Kindly request to help me in writing a Unix Script. Thanks Asim (11 Replies)
Discussion started by: duplicate
11 Replies

8. Shell Programming and Scripting

awk to copy previous line matching a particular columns

Hello Help, 2356798 7689867 999 000 123678 20385907 9797 666 17978975 87468976 968978 98798 I am trying to have out put which actually look for the third column value of 9797 and then it insert line there after with first, second column value exactly as the previous line and replace the third... (3 Replies)
Discussion started by: Indra2011
3 Replies

9. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies

10. UNIX for Dummies Questions & Answers

Print only the duplicate line only with matching columns

Hi There, I have an I/P which looks like -- 1 2 3 4 5 1 2 3 4 6 4 7 8 9 9 5 6 7 8 9 I would like O/P to be --- 1 2 3 4 5 1 2 3 4 6 So, printing only the consecutive lines where $1,$2,$3,$4 are matching. Is there any command to do this or small awk script? Thanks, (12 Replies)
Discussion started by: Indra2011
12 Replies
funtablerowput(3)						SAORD Documentation						 funtablerowput(3)

NAME
FunTableRowPut - put Funtools rows SYNOPSIS
int FunTableRowPut(Fun fun, void *rows, int nev, int idx, char *plist) DESCRIPTION
The FunTableRowPut() routine writes rows to a FITS binary table, taking its input from an array of user structs that contain column values selected by a previous call to FunColumnSelect(). Selected column values are automatically converted from native data format to FITS data format as necessary. The first argument is the Fun handle associated with this row data. The second rows argument is the array of user structs to output. The third nrow argument specifies the number number of rows to write. The routine will write nrow records, starting from the location speci- fied by rows. The fourth idx argument is the index of the first raw input row to write, in the case where rows from the user buffer are being merged with their raw input row counterparts (see below). Note that this idx value is has nothing to do with the row buffer specified in argument 1. It merely matches the row being written with its corresponding (hidden) raw row. Thus, if you read a number of rows, process them, and then write them out all at once starting from the first user row, the value of idx should be 0: Ev ebuf, ev; /* get rows -- let routine allocate the row array */ while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = ebuf+i; ... } /* write out this batch of rows, starting with the first */ FunTableRowPut(fun2, (char *)ebuf, got, 0, NULL); /* free row data */ if( ebuf ) free(ebuf); } On the other hand, if you write out the rows one at a time (possibly skipping rows), then, when writing the i'th row from the input array of rows, set idx to the value of i: Ev ebuf, ev; /* get rows -- let routine allocate the row array */ while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = ebuf+i; ... /* write out the current (i.e., i'th) row */ FunTableRowPut(fun2, (char *)ev, 1, i, NULL); } /* free row data */ if( ebuf ) free(ebuf); } The final argument is a param list string that is not currently used. The routine returns the number of rows output. This should be equal to the value passed in the third nrow</B argument. When FunTableRowPut() is first called for a given binary table, Funtools checks to see of the primary header has already been written (either by writing a previous row table or by writing an image.) If not, a dummy primary header is written to the file specifying that an extension should be expected. After this, a binary table header is automatically written containing information about the columns that will populate this table. In addition, if a Funtools reference handle was specified when this table was opened, the parameters from this Funtools reference handle are merged into the new binary table header. In a typical Funtools row loop, you read rows using FunTableRowGet()() and write rows using FunTableRowPut(). The columns written by FunT- ableRowPut()() are those defined as writable by a previous call to FunColumnSelect(). If that call to FunColumnSelect also specified merge=[update|replace|append], then the entire corresponding raw input row record will be merged with the output row according to the merge specification (see FunColumnSelect() above). A call to write rows can either be done once, after all rows in the input batch have been processed, or it can be done (slightly less effi- ciently) one row at a time (or anything in between). We do recommend that you write all rows associated with a given batch of input rows before reading new rows. This is required if you are merging the output rows with the raw input rows (since the raw rows are destroyed with each successive call to get new rows). For example: Ev buf, ev; ... /* get rows -- let routine allocate the row array */ while( (buf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* point to the i'th row */ ev = buf + i; .... process } /* write out this batch of rows */ FunTableRowPut(fun2, buf, got, 0, NULL); /* free row data */ if( buf ) free(buf); } or Ev buf, ev; ... /* get rows -- let routine allocate the row array */ while( (buf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = buf + i; ... process /* write out this batch of rows with the new column */ if( dowrite ) FunTableRowPut(fun2, buf, 1, i, NULL); } /* free row data */ if( buf ) free(buf); } Note that the difference between these calls is that the first one outputs got rows all at once and therefore passes idx=0 in argument four, so that merging starts at the first raw input row. In the second case, a check it made on each row to see if it needs to be output. If so, the value of idx is passed as the value of the i variable which points to the current row being processed in the batch of input rows. As shown above, successive calls to FunTableRowPut() will write rows sequentially. When you are finished writing all rows in a table, you should call FunFlush() to write out the FITS binary table padding. However, this is not necessary if you subsequently call FunClose() with- out doing any other I/O to the FITS file. Note that FunTableRowPut() also can be called as FunEventsPut(), for backward compatibility. SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funtablerowput(3)
All times are GMT -4. The time now is 02:00 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy