a problem with large files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting a problem with large files
# 1  
Old 07-10-2010
a problem with large files

hello all,

kindly i need your help, i made a script to print a specific lines from a huge file about 3 million line. the output of the script will be about 700,000 line...the problem is the script is too slow...it kept working for 5 days and the output was only 200,000 lines !!!

the script is so simple:

for i in `cat file` ------> file is the file that contains the line no. to be printed from a file.


sed '$i q;d' file1 > file2 ----> where file1 is the huge file 3 millions lines and file 2 is the output file which will be 700.000 lines

so plz could anyone tell me how can i decrease the processing time of that script and why is it taking all that time !!!?

thanks in advance
# 2  
Old 07-10-2010
Do you know the to and from line no of the file which you want to print??

because cat file will be really slow if file has 3 million records please avoid that.
# 3  
Old 07-10-2010
From my experience, running a pile of small(er) files take much less time than working with a single "fatty" Smilie thus, you may want to split up both source files in junks, then process them (possibly in parallel) and finally concatenate the results.
# 4  
Old 07-10-2010
i tried not to use the cat in for and i did with head -line no | tail -1 and it also worked but still too slow....i can't print specific range because the lines not know i run script to get the line no. which differs from file to file.
i really don't know what is the problem with it !?

---------- Post updated at 09:11 PM ---------- Previous update was at 09:09 PM ----------

dr house, how many lines per file do u think it will be good to split the fatty file?
# 5  
Old 07-10-2010
Quote:
Originally Posted by m_wassal
how many lines per file do u think it will be good to split the fatty file
As this is very machine-dependent, I'd start with a ten-percent split, process one (!) file, time this, then either process the remaining nine splits - or split, process and time the "guinea pig" again (thus going down to e.g. five-percent splits). You get the idea ...
# 6  
Old 07-10-2010
Can you provide a couple sample lines from file ?
The sed command simply prints the appropriate line, right?
# 7  
Old 07-10-2010
Code:
awk 'FNR==NR{n[$0];next}FNR in n'  file file1 > file2

This User Gave Thanks to binlib For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

A Large Percent Problem

Hello everyone, I have two matrices at same sizes. I need to re-calculate the numbers in matrix A according to the percentages in martix B it is like matrix A is 10.00 20.00 30.00 40.00 60.00 70.00 80.00 90.00 20.00 30.00 80.00 50.00 martix B is 00.08 00.05 ... (2 Replies)
Discussion started by: miriammiriam
2 Replies

2. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

3. Shell Programming and Scripting

Divide large data files into smaller files

Hello everyone! I have 2 types of files in the following format: 1) *.fa >1234 ...some text... >2345 ...some text... >3456 ...some text... . . . . 2) *.info >1234 (7 Replies)
Discussion started by: ad23
7 Replies

4. UNIX for Dummies Questions & Answers

Large Problem with nautilus

Hi, I am a torrent-maniak and I use Transmission. All things were good but Nautilus begun to show problem while I was runnning Transmission.Its situation was becoming worse and worse. Now, when I boot I can hardly open a nautilus window and browse my files.It will "stack" in seconds for sure! I... (2 Replies)
Discussion started by: hakermania
2 Replies

5. UNIX for Dummies Questions & Answers

Large file problem

I have a large file, around 570 gb that I want to copy to tape. However, my tape drive will load only up to 500 gb. I don't have enough space on disk to compress it before copying to tape. Can I compress and tar to tape in one command without writing a compressed disk file? Any suggestions... (8 Replies)
Discussion started by: iancrozier
8 Replies

6. UNIX for Dummies Questions & Answers

Problem using find with prune on large number of files

Hi all; I'm having a problem when want to list a large number of files in current directory using find together with the prune option. First i used this command but it list all the files including those in sub directories: find . -name "*.dat" | xargs ls -ltr Then i modified the command... (2 Replies)
Discussion started by: ashikin_8119
2 Replies

7. UNIX for Advanced & Expert Users

Large file FTP problem

We are experiencing a problem on a lengthy data transfer by FTP through a firewall. Since there are two ports in use on a ftp transfer (data and control), one sits idle while the other's transfering data. The idle port (control) will get timed out and the data transfer won't know that it's... (3 Replies)
Discussion started by: rprajendran
3 Replies

8. Shell Programming and Scripting

problem with 0 byte and large files

how to remove all zero byte files in a particular directory and also files that are morew than 1GB. pLEASE let me know (3 Replies)
Discussion started by: dsravan
3 Replies

9. Shell Programming and Scripting

Problem in processing a very large file.

Hi Friends, Getting an error while processing a very large file using an sqlloader........ The file is larger than 2 GB. Now need to change the compiler to 64-bit so that the file can be processed. Is there any command for the same. Thanks in advance. (1 Reply)
Discussion started by: Rohini Vijay
1 Replies
Login or Register to Ask a Question