Sponsored Content
Top Forums UNIX for Advanced & Expert Users Performance problem with removing duplicates in a huge file (50+ GB) Post 302752709 by achenle on Monday 7th of January 2013 12:01:29 PM
Old 01-07-2013
Any DBAs around with some spare disk space?

Use a DB server. Create a single column table with a unique index on that column. Insert each line as a row into the table, ignoring duplicate entry failures. Export the data.

Might not be super fast, but it'll be faster than any script. And it's easy.
These 2 Users Gave Thanks to achenle For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

3. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

4. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

5. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

6. HP-UX

Performance issue with 'grep' command for huge file size

I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is: while read line do emp_name=`echo $line` grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies

7. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

8. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies

10. Shell Programming and Scripting

Removing White spaces from a huge file

I am trying to remove whitespaces from a file containing sample data as: 457 <EOFD> Mar 1 2007 12:00:00:000AM <EOFD> Mar 31 2007 12:00:00:000AM <EOFD> system <EORD> 458 <EOFD> Mar 1 2007 12:00:00:000AM<EOFD>agf <EOFD> Apr 20 2007 9:10:56:036PM <EOFD> prodiws<EORD> . Basically these... (11 Replies)
Discussion started by: amvip
11 Replies
DB2_SPECIAL_COLUMNS(3)							 1						    DB2_SPECIAL_COLUMNS(3)

db2_special_columns - Returns a result set listing the unique row identifier columns for a table

SYNOPSIS
resource db2_special_columns (resource $connection, string $qualifier, string $schema, string $table_name, int $scope) DESCRIPTION
Returns a result set listing the unique row identifier columns for a table. PARAMETERS
o $connection - A valid connection to an IBM DB2, Cloudscape, or Apache Derby database. o $qualifier - A qualifier for DB2 databases running on OS/390 or z/OS servers. For other databases, pass NULL or an empty string. o $schema - The schema which contains the tables. o $table_name - The name of the table. o $scope - Integer value representing the minimum duration for which the unique row identifier is valid. This can be one of the following values: +--------------+--------------------------------------+---+ |Integer value | | | | | | | | | SQL constant | | | | | | | | Description | | | | | | +--------------+--------------------------------------+---+ | 0 | | | | | | | | | SQL_SCOPE_CURROW | | | | | | | | Row identifier is valid only while | | | | the cursor is positioned on the row. | | | | | | | 1 | | | | | | | | | SQL_SCOPE_TRANSACTION | | | | | | | | Row identifier is valid for the | | | | duration of the transaction. | | | | | | | 2 | | | | | | | | | SQL_SCOPE_SESSION | | | | | | | | Row identifier is valid for the | | | | duration of the connection. | | | | | | +--------------+--------------------------------------+---+ RETURN VALUES
Returns a statement resource with a result set containing rows with unique row identifier information for a table. The rows are composed of the following columns: +------------+---------------------------------------------------+ |Column name | | | | | | | Description | | | | +------------+---------------------------------------------------+ | SCOPE | | | | | | | | | | | | | box, tab (|); c | c | c | . T{ Integer | | | value | | | | | | SQL constant | | | | | | Description | | | | +------------+---------------------------------------------------+ | 0 | | | | | | | SQL_SCOPE_CURROW | | | | | | Row identifier is valid only while the cursor is | | | positioned on the row. | | | | | 1 | | | | | | | SQL_SCOPE_TRANSACTION | | | | | | Row identifier is valid for the duration of the | | | transaction. | | | | | 2 | | | | | | | SQL_SCOPE_SESSION | | | | | | Row identifier is valid for the duration of the | | | connection. | | | | +------------+---------------------------------------------------+ T} T{ COLUMN_NAME T} |T{ Name of the unique column. T} T{ DATA_TYPE T} |T{ SQL data type for the column. T} T{ TYPE_NAME T} |T{ Character string representation of the SQL data type for the column. T} T{ COLUMN_SIZE T} |T{ An integer value representing the size of the column. T} T{ BUFFER_LENGTH T} |T{ Maximum number of bytes necessary to store data from this column. T} T{ DECIMAL_DIGITS T} |T{ The scale of the column, or NULL where scale is not applicable. T} T{ NUM_PREC_RADIX T} |T{ An integer value of either 10 (representing an exact numeric data type), 2 (representing an approximate numeric data type), or NULL (rep- resenting a data type for which radix is not applicable). T} T{ PSEUDO_COLUMN T} |T{ Always returns 1. T} SEE ALSO
db2_column_privileges(3), db2_columns(3), db2_foreign_keys(3), db2_primary_keys(3), db2_procedure_columns(3), db2_procedures(3), db2_sta- tistics(3), db2_table_privileges(3), db2_tables(3). PHP Documentation Group DB2_SPECIAL_COLUMNS(3)
All times are GMT -4. The time now is 11:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy