Sponsored Content
Full Discussion: Dedup a large file(30M rows)
Top Forums Shell Programming and Scripting Dedup a large file(30M rows) Post 302705787 by ran123 on Tuesday 25th of September 2012 01:55:05 PM
Old 09-25-2012
Dedup a large file(30M rows)

Hi, I have a large file with number of records in there. I need some help to find only first row based on a key and ignore other rows with the same key. I tried few things but file is huge(30 million rows). So need some solution that is very efficient.

e.g
Code:
Junk|Apple|7|Random|data|here...
Junk|Apple|1|Random|data|here...
Junk|Apple|5|Random|data|here...
Junk|Orange|1|Random|data|here...
Junk|Orange|9|Random|data|here...

Here second field is the key. So I want only first record with 'Apple' and then first record with next key (in this case Orange). So output shall be
Code:
Junk|Apple|7|Random|data|here...
Junk|Orange|1|Random|data|here.

Since the file is large, I need help with some solution that do not run out memory.

Thank you...

Last edited by Corona688; 09-25-2012 at 04:05 PM..
 

9 More Discussions You Might Find Interesting

1. AIX

sort and dedup problem

I have a file with contents: 1|4|oho hosfadu| 1|3|sdfsd fds| 2|2|sdfg| 2|1|sdf a| 3|5|ouhuh hu| I would like to do three things to it; 1- first, sort it on the first two fields 2- get a unique count on the first field 3- and write the first two unique rows (uniqueness based off the... (4 Replies)
Discussion started by: ChicagoBlues
4 Replies

2. Shell Programming and Scripting

How to delete rows by RowNumber from a Large text file

Friends, I have text file with 700,000 rows. Once I load this file to our database via our cutom process, it logs the row number for rejected rows. How do I delete rows from a Large text file based on the Row Number? Thanks, Prashant (8 Replies)
Discussion started by: ppat7046
8 Replies

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

4. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Hi Guys, I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language the text file contains data in a pipe delimited format sdfsdfs sdfsdfsd START_ROW sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf... (9 Replies)
Discussion started by: manish2009
9 Replies

5. Shell Programming and Scripting

delete rows in a file based on the rows of another file

I need to delete rows based on the number of lines in a different file, I have a piece of code with me working but when I merge with my C application, it doesnt work. sed '1,'\"`wc -l < /tmp/fileyyyy`\"'d' /tmp/fileA > /tmp/filexxxx Can anyone give me an alternate solution for the above (2 Replies)
Discussion started by: Muthuraj K
2 Replies

6. Shell Programming and Scripting

Large file - columns into rows etc

I have done a couple of searches on this and have found many threads but I don't think I've found one that is useful to me - probably because I have very basic comprehension of perl and beginners shell so trying to manipulate a script already posted maybe beyond my capabilities.... Anyway - I... (26 Replies)
Discussion started by: Myrona
26 Replies

7. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

8. Shell Programming and Scripting

Moving or copying first rows and last rows into another file

Hi I would like to move the first 1000 rows of my file into an output file and then move the last 1000 rows into another output file. Any help would be great Thanks (6 Replies)
Discussion started by: kylle345
6 Replies

9. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies
XkbFreeGeomOverlayRows(3)					   XKB FUNCTIONS					 XkbFreeGeomOverlayRows(3)

NAME
XkbFreeGeomOverlayRows - Free rows in an overlay SYNOPSIS
void XkbFreeGeomOverlayRows (XkbSectionPtr overlay, int first, int count, Bool free_all); ARGUMENTS
- overlay section in which rows should be freed - first first row to be freed - count number of rows to be freed - free_all True => all rows are freed DESCRIPTION
Xkb provides a number of functions to allocate and free subcomponents of a keyboard geometry. Use these functions to create or modify key- board geometries. Note that these functions merely allocate space for the new element(s), and it is up to you to fill in the values explicitly in your code. These allocation functions increase sz_* but never touch num_* (unless there is an allocation failure, in which case they reset both sz_* and num_* to zero). These functions return Success if they succeed, BadAlloc if they are not able to allocate space, or BadValue if a parameter is not as expected. If free_all is True, all rows are freed regardless of the value of first and count. Otherwise, the number of rows specified by count are freed, beginning with the row specified by first in the specified overlay. RETURN VALUES
Success The XkbFreeGeomOverlayRows function returns Success if there are no allocation or argument errors. DIAGNOSTICS
BadAlloc Unable to allocate storage BadValue An argument is out of range X Version 11 libX11 1.6.0 XkbFreeGeomOverlayRows(3)
All times are GMT -4. The time now is 05:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy