Sponsored Content
Top Forums UNIX for Dummies Questions & Answers CSV file:Find duplicates, save original and duplicate records in a new file Post 302536497 by arvindosu on Tuesday 5th of July 2011 02:59:37 PM
Old 07-05-2011
CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus,

Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me.

File format: CSV file
File has four columns with no header
File Size is 120GB.

Here are a few sample rows:

Code:
72426459560          2010-06-2 ABC                           LC11100619758

95327GNFA4S          2010-06-2 XYZ                           97BCX3AMD10G

95327GNFA4S          2010-06-2 XYZ                           97BCX3AMKLMO

900278VGA4T          2010-06-2 KLM                            QVA697C8LAYMACBF

900278VG567          2010-06-2 LUF                            QVA697C8LAYMACBF

There are duplicates in column 1 and 4 (I know this for a fact).
I would like to find all the duplicates in column 1 and 4. In the example above, I want rows 2 and 3 (since the columns 1 has duplicates) and also rows 4 and 5 (since column four has duplicates).

If this is too complicated, may be I can look for duplicates in column 1 first and save a new file and then look for duplicates in column 4. (Since I am new to Unix, may be thats the way to go)

I want to save all the duplicates with original records (as in the example above) in a new CSV file.

---------- Post updated at 01:59 PM ---------- Previous update was at 01:56 PM ----------

For more clarity: My results would look like this:

Code:
95327GNFA4S 2010-06-2 XYZ 97BCX3AMD10G

95327GNFA4S 2010-06-2 XYZ 97BCX3AMKLMO

900278VGA4T 2010-06-2 KLM QVA697C8LAYMACBF

900278VG567 2010-06-2 LUF QVA697C8LAYMACBF

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

2. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

3. Shell Programming and Scripting

Find Duplicate records in first Column in File

Hi, Need to find a duplicate records on the first column, ANU4501710430989 0000000W20389390 ANU4501710430989 0000000W67065483 ANU4501130050520 0000000W80838713 ANU4501210170685 0000000W69246611... (3 Replies)
Discussion started by: Murugesh
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

FILE_ID extraction from file name and save it in CSV file after looping through each folders My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that? I have folders in unix environment, directory structure is... (15 Replies)
Discussion started by: princetd001
15 Replies

6. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Hi, all I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file. test.csv SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21 /home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Discussion started by: refrain
10 Replies

7. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Hi, I have another problem. I want to sort another csv file by the first field. result.csv SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw /home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Discussion started by: refrain
2 Replies

8. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

9. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
INSERT(7)							   SQL Commands 							 INSERT(7)

NAME
       INSERT - create new rows in a table

SYNOPSIS
       INSERT INTO table [ ( column [, ...] ) ]
	   { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
	   [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]

DESCRIPTION
       INSERT  inserts new rows into a table.  One can insert one or more rows specified by value expressions, or zero or more rows resulting from
       a query.

       The target column names can be listed in any order. If no list of column names is given at all, the default is all the columns of the table
       in  their  declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query. The values sup-
       plied by the VALUES clause or query are associated with the explicit or implicit column list left-to-right.

       Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default  value	or
       null if there is none.

       If the expression for any column is not of the correct data type, automatic type conversion will be attempted.

       The  optional  RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted.  This is primarily useful
       for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is
       allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT.

       You  must have INSERT privilege on a table in order to insert into it. If a column list is specified, you only need INSERT privilege on the
       listed columns.	Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING.  If you use the query  clause
       to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query.

PARAMETERS
       table  The name (optionally schema-qualified) of an existing table.

       column The name of a column in table.  The column name can be qualified with a subfield name or array subscript, if needed. (Inserting into
	      only some fields of a composite column leaves the other fields null.)

       DEFAULT VALUES
	      All columns will be filled with their default values.

       expression
	      An expression or value to assign to the corresponding column.

       DEFAULT
	      The corresponding column will be filled with its default value.

       query  A query (SELECT statement) that supplies the rows to be inserted. Refer to the SELECT [select(7)] statement for a description of the
	      syntax.

       output_expression
	      An  expression to be computed and returned by the INSERT command after each row is inserted. The expression can use any column names
	      of the table.  Write * to return all columns of the inserted row(s).

       output_name
	      A name to use for a returned column.

OUTPUTS
       On successful completion, an INSERT command returns a command tag of the form

       INSERT oid count

       The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid  is  the  OID  assigned  to  the
       inserted row. Otherwise oid is zero.

       If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and val-
       ues defined in the RETURNING list, computed over the row(s) inserted by the command.

EXAMPLES
       Insert a single row into table films:

       INSERT INTO films VALUES
	   ('UA502', 'Bananas', 105, '1971-07-13', 'Comedy', '82 minutes');

       In this example, the len column is omitted and therefore it will have the default value:

       INSERT INTO films (code, title, did, date_prod, kind)
	   VALUES ('T_601', 'Yojimbo', 106, '1961-06-16', 'Drama');

       This example uses the DEFAULT clause for the date columns rather than specifying a value:

       INSERT INTO films VALUES
	   ('UA502', 'Bananas', 105, DEFAULT, 'Comedy', '82 minutes');
       INSERT INTO films (code, title, did, date_prod, kind)
	   VALUES ('T_601', 'Yojimbo', 106, DEFAULT, 'Drama');

       To insert a row consisting entirely of default values:

       INSERT INTO films DEFAULT VALUES;

       To insert multiple rows using the multirow VALUES syntax:

       INSERT INTO films (code, title, did, date_prod, kind) VALUES
	   ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
	   ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

       This example inserts some rows into table films from a table tmp_films with the same column layout as films:

       INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';

       This example inserts into array columns:

       -- Create an empty 3x3 gameboard for noughts-and-crosses
       INSERT INTO tictactoe (game, board[1:3][1:3])
	   VALUES (1, '{{" "," "," "},{" "," "," "},{" "," "," "}}');
       -- The subscripts in the above example aren't really needed
       INSERT INTO tictactoe (game, board)
	   VALUES (2, '{{X," "," "},{" ",O," "},{" ",X," "}}');

       Insert a single row into table distributors, returning the sequence number generated by the DEFAULT clause:

       INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
	  RETURNING did;

COMPATIBILITY
       INSERT conforms to the SQL standard, except that the RETURNING clause is a PostgreSQL extension. Also, the case in which a column name list
       is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

       Possible limitations of the query clause are documented under SELECT [select(7)].

SQL - Language Statements					    2010-05-14								 INSERT(7)
All times are GMT -4. The time now is 06:11 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy