Huge files manipulation Post: 302255895

Sponsored Content

Top Forums UNIX for Advanced & Expert Users Huge files manipulation Post 302255895 by Klashxx on Friday 7th of November 2008 10:20:24 AM

11-07-2008

Registered User

Many thanks for your ideas..

Quote:

I am thinking here...
based on the first position character, copy all lines with "^[aA]" to file_a by using grep (for instance)
repeat for bB and cC and so on

then do your dup check (sort -u maybe) on each of 26 files

finally, recombine the 26 files

Nice i one .. but i want to use the split tools as last option.

Quote:

If your key first character is not highly redundant you can try this with awk.
The idea is predicated on your original code blowing the limits for a hash

Definitely a good trick , but the file content is a little messy ...

Quote:

I wonder, will the duplicated lines always follow each other, or are they spread around in the file?

Unfortunately it is unsorted, and the duplicated keys are in a random order.

chatwizrd, Doesn't work , .. out of memory.

..To clarify, this is the structure of the file:

Code:

30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0005655,00|||+0000000000000,00
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0000000000000,00|||+0000000000000,00
30xx|4150010003502043|CARDS|20081031|MP415001|00000024265698|01|F|1804|00|||+0000000000000,00|||+0000000000000,00

Having a key formed by the first 7 fields i want to print or delete only the duplicates.

I 'm very new to perl, but i read somewhere tha Tie::File module can handle very large files , i tried but cannot get the right code...
Any ideas?

Thank you in advance.

Regards

Klashxx

View Public Profile for Klashxx

Find all posts by Klashxx

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of...

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect...

3. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

we have one file (11 Million) line that is being matched with (10 Billion) line. the proof of concept we are trying , is to join them on Unix : All files are delimited and they have composite keys.. could unix be faster than Oracle in This regards.. Please advice

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA...

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|.........

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

7. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM...

8. Shell Programming and Scripting

Compression - Exclude huge files

I have a DB folder which sizes to 60GB approx. It has logs which size from 500MB - 1GB. I have an Installation which would update the DB. I need to backup this DB folder, just incase my Installation FAILS. But I do not need the logs in my backup. How do I exclude them during compression (tar)? ...

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see...

10. Shell Programming and Scripting

Aggregation of Huge files

Hi Friends !! I am facing a hash total issue while performing over a set of files of huge volume: Command used: tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f' Pipe delimited file and 156 column is for hash totalling....

LEARN ABOUT ULTRIX

ppmtopgm

ppmtopgm(1)						      General Commands Manual						       ppmtopgm(1)

NAME

       ppmtopgm - convert a portable pixmap into a portable graymap

SYNOPSIS

       ppmtopgm [ppmfile]

DESCRIPTION

       Reads  a  portable  pixmap  as  input.  Produces a portable graymap as output.  The output is a "black and white" rendering of the original
       image, as in a black and white photograph.  The quantization formula used is .299 r + .587 g + .114 b.

       Note that although there is a pgmtoppm program, it is not necessary for simple conversions from pgm to ppm , because any  ppm  program  can
       read  pgm  (and	pbm  ) files automatically.  pgmtoppm is for colorizing a pgm file.  Also, see ppmtorgb3 for a different way of converting
       color to gray.  And ppmdist generates a grayscale image from a color image, but in a way that makes it easy to differentiate  the  original
       colors, not necessarily a way that looks like a black and white photograph.

QUOTE

       Cold-hearted orb that rules the night
       Removes the colors from our sight
       Red is gray, and yellow white
       But we decide which is right
       And which is a quantization error.

SEE ALSO

       pgmtoppm(1),ppmtorgb3(1),rgb3toppm(1),ppmdist(1),ppm(5),pgm(5)

AUTHOR

       Copyright (C) 1989 by Jef Poskanzer.

								   10 April 2000						       ppmtopgm(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Discussion started by: kmkbuddy_1983

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

3. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

Discussion started by: magedfawzy

4. Shell Programming and Scripting

Split a huge data into few different files?!

Discussion started by: patrick87

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Discussion started by: lakteja

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

7. Shell Programming and Scripting

Comparing 2 huge text files

Discussion started by: linuxgeek

8. Shell Programming and Scripting

Compression - Exclude huge files

Discussion started by: DevendraG

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Discussion started by: kaaliakahn

10. Shell Programming and Scripting

Aggregation of Huge files

Discussion started by: Ravichander

LEARN ABOUT ULTRIX

ppmtopgm