Sponsored Content
Top Forums UNIX for Advanced & Expert Users Performance problem with removing duplicates in a huge file (50+ GB) Post 302752709 by achenle on Monday 7th of January 2013 12:01:29 PM
Old 01-07-2013
Any DBAs around with some spare disk space?

Use a DB server. Create a single column table with a unique index on that column. Insert each line as a row into the table, ignoring duplicate entry failures. Export the data.

Might not be super fast, but it'll be faster than any script. And it's easy.
These 2 Users Gave Thanks to achenle For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

3. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

4. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

5. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

6. HP-UX

Performance issue with 'grep' command for huge file size

I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is: while read line do emp_name=`echo $line` grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies

7. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

8. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies

10. Shell Programming and Scripting

Removing White spaces from a huge file

I am trying to remove whitespaces from a file containing sample data as: 457 <EOFD> Mar 1 2007 12:00:00:000AM <EOFD> Mar 31 2007 12:00:00:000AM <EOFD> system <EORD> 458 <EOFD> Mar 1 2007 12:00:00:000AM<EOFD>agf <EOFD> Apr 20 2007 9:10:56:036PM <EOFD> prodiws<EORD> . Basically these... (11 Replies)
Discussion started by: amvip
11 Replies
INSERT(7)							   SQL Commands 							 INSERT(7)

NAME
       INSERT - create new rows in a table

SYNOPSIS
       INSERT INTO table [ ( column [, ...] ) ]
	   { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
	   [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]

DESCRIPTION
       INSERT  inserts new rows into a table.  One can insert one or more rows specified by value expressions, or zero or more rows resulting from
       a query.

       The target column names can be listed in any order. If no list of column names is given at all, the default is all the columns of the table
       in  their  declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query. The values sup-
       plied by the VALUES clause or query are associated with the explicit or implicit column list left-to-right.

       Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default  value	or
       null if there is none.

       If the expression for any column is not of the correct data type, automatic type conversion will be attempted.

       The  optional  RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted.  This is primarily useful
       for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is
       allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT.

       You  must have INSERT privilege on a table in order to insert into it. If a column list is specified, you only need INSERT privilege on the
       listed columns.	Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING.  If you use the query  clause
       to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query.

PARAMETERS
       table  The name (optionally schema-qualified) of an existing table.

       column The name of a column in table.  The column name can be qualified with a subfield name or array subscript, if needed. (Inserting into
	      only some fields of a composite column leaves the other fields null.)

       DEFAULT VALUES
	      All columns will be filled with their default values.

       expression
	      An expression or value to assign to the corresponding column.

       DEFAULT
	      The corresponding column will be filled with its default value.

       query  A query (SELECT statement) that supplies the rows to be inserted. Refer to the SELECT [select(7)] statement for a description of the
	      syntax.

       output_expression
	      An  expression to be computed and returned by the INSERT command after each row is inserted. The expression can use any column names
	      of the table.  Write * to return all columns of the inserted row(s).

       output_name
	      A name to use for a returned column.

OUTPUTS
       On successful completion, an INSERT command returns a command tag of the form

       INSERT oid count

       The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid  is  the  OID  assigned  to  the
       inserted row. Otherwise oid is zero.

       If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and val-
       ues defined in the RETURNING list, computed over the row(s) inserted by the command.

EXAMPLES
       Insert a single row into table films:

       INSERT INTO films VALUES
	   ('UA502', 'Bananas', 105, '1971-07-13', 'Comedy', '82 minutes');

       In this example, the len column is omitted and therefore it will have the default value:

       INSERT INTO films (code, title, did, date_prod, kind)
	   VALUES ('T_601', 'Yojimbo', 106, '1961-06-16', 'Drama');

       This example uses the DEFAULT clause for the date columns rather than specifying a value:

       INSERT INTO films VALUES
	   ('UA502', 'Bananas', 105, DEFAULT, 'Comedy', '82 minutes');
       INSERT INTO films (code, title, did, date_prod, kind)
	   VALUES ('T_601', 'Yojimbo', 106, DEFAULT, 'Drama');

       To insert a row consisting entirely of default values:

       INSERT INTO films DEFAULT VALUES;

       To insert multiple rows using the multirow VALUES syntax:

       INSERT INTO films (code, title, did, date_prod, kind) VALUES
	   ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
	   ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

       This example inserts some rows into table films from a table tmp_films with the same column layout as films:

       INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';

       This example inserts into array columns:

       -- Create an empty 3x3 gameboard for noughts-and-crosses
       INSERT INTO tictactoe (game, board[1:3][1:3])
	   VALUES (1, '{{" "," "," "},{" "," "," "},{" "," "," "}}');
       -- The subscripts in the above example aren't really needed
       INSERT INTO tictactoe (game, board)
	   VALUES (2, '{{X," "," "},{" ",O," "},{" ",X," "}}');

       Insert a single row into table distributors, returning the sequence number generated by the DEFAULT clause:

       INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
	  RETURNING did;

COMPATIBILITY
       INSERT conforms to the SQL standard, except that the RETURNING clause is a PostgreSQL extension. Also, the case in which a column name list
       is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

       Possible limitations of the query clause are documented under SELECT [select(7)].

SQL - Language Statements					    2010-05-14								 INSERT(7)
All times are GMT -4. The time now is 03:06 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy