Huge files manipulation


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Huge files manipulation
# 8  
Old 11-11-2008
Quote:
Originally Posted by Klashxx
...
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump)

In using HP-UX large servers.
...
IMHO the definitive answer is installing gawk, at least that's the routine advice given over at the HP forums to people facing the same situation, that really helps resolving this issue, and it'll also come very handy in the future. From my experience, sed is able to manipulate huge files in HP-UX ( 11.00 ).

If that's not an option then perl will be your friend. In the meantime you can use awk 2 perl converter ( a2p code.awk ) that comes with perl install package, for an equivalent perl solution from an awk code.The following one might be helpful for your problem, and I'm sure that it can be optimized even more, but it'll get the job done:

Code:
#!/usr/bin/perl

$[ = 1;
$\ = "\n";
$FS = "|";
while (<>) {
    chomp;
     @Fld = split($FS, $_, -1);
     if ($a{$Fld[1] . $FS . $Fld[2] . $FS . $Fld[3] . $FS .

      $Fld[4] . $FS . $Fld[5] . $FS . $Fld[6] . $FS . $Fld[7]}++ == 0) {
        print $_;
    }
}

[ Tested on Cygwin, not near a HP box now. ]
______________________________________

EDIT: On HP-UX worked fine, it took a while though. It ran about 4 min on a file with ~ 200,000 lines, ( ~ 13 MB ).

Last edited by rubin; 11-11-2008 at 09:49 PM.. Reason: Tested on HP-UX
# 9  
Old 12-01-2008
Hi, I know this is old, but couldn't you do this?

#!/usr/bin/ksh

while read name

do

something_to $name

done < $data_to_read

This is line by line, from "mastering shell scripting".

Last edited by nj78; 12-01-2008 at 06:16 PM.. Reason: early submit
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Aggregation of Huge files

Hi Friends !! I am facing a hash total issue while performing over a set of files of huge volume: Command used: tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f' Pipe delimited file and 156 column is for hash totalling.... (14 Replies)
Discussion started by: Ravichander
14 Replies

2. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see... (9 Replies)
Discussion started by: kaaliakahn
9 Replies

3. Shell Programming and Scripting

Compression - Exclude huge files

I have a DB folder which sizes to 60GB approx. It has logs which size from 500MB - 1GB. I have an Installation which would update the DB. I need to backup this DB folder, just incase my Installation FAILS. But I do not need the logs in my backup. How do I exclude them during compression (tar)? ... (2 Replies)
Discussion started by: DevendraG
2 Replies

4. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

5. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

6. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|......... (3 Replies)
Discussion started by: lakteja
3 Replies

7. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

8. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

we have one file (11 Million) line that is being matched with (10 Billion) line. the proof of concept we are trying , is to join them on Unix : All files are delimited and they have composite keys.. could unix be faster than Oracle in This regards.. Please advice (1 Reply)
Discussion started by: magedfawzy
1 Replies

9. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect... (13 Replies)
Discussion started by: pyaranoid
13 Replies

10. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies
Login or Register to Ask a Question