a problem with large files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting a problem with large files
# 8  
Old 07-10-2010
Or since 2,3 million lines will have to be deleted, this may be necessary:
Code:
awk 'NR==1{getline x<f}NR==x{print;getline x<f}' f=file file1 > file2

assuming that "file" is sorted numerically. otherwise sort -n first.
# 9  
Old 07-11-2010
@vidyahar85
Quote:
because cat file will be really slow if file has 3 million records please avoid that.
This statement is ridiculous and has no basis in fact whatsoever.


Back on topic.

My impression is that the O/P is reading "file" and searching through "file1" once for every line in "file" to produce the output in "file2".
Is would appear that "file" contains 700,000 line numbers and that "file1" contains 3,000,000 records.
Therefore the number of reads is:
700,000 times 3,000,000 = 2,100,000,000,000
We are clearly on a powerful computer or it would have got nowhere in two days.

To my mind the issue is how to do ONE PASS through "file1" and select the record numbers contained in "file".
We need the following facts from the O/P.
1) Is "file" in numerical order. Is each record unique? Are there leading zeros in the record numbers. Is there a delimiter?
2) Does the record layout of "file1" include the record number? If so, where exactly in the record? Is there a delimiter?
3) Is there a Database and database language available which would make this task easier?
# 10  
Old 07-11-2010
thx a lot for your replies....
and here is the answer for your questions:
1) Is "file" in numerical order. Is each record unique? NO Are there leading zeros in the record numbers. Is there a delimiter? NO

2) Does the record layout of "file1" include the record number? YES If so, where exactly in the record? Is there a delimiter? they are in one column u can consider enter the delimiter
3) Is there a Database and database language available which would make this task easier? no i'm just trying to reformat it to a specific application.

---------- Post updated at 12:59 AM ---------- Previous update was at 12:58 AM ----------

i will try it and feed u back
thanks a alot

---------- Post updated at 01:01 AM ---------- Previous update was at 12:59 AM ----------

it is just lines
and the sed is used to print a line no.s saved in a file
# 11  
Old 07-12-2010
As suggested in post #6, can we see a sample portion of "file" and "file" making it clear which field is the record number.
Please confirm that "file" can contain duplicate record numbers. If so, this is one that needs cleaning up first.
# 12  
Old 07-12-2010
Quote:
Originally Posted by Scrutinizer
Or since 2,3 million lines will have to be deleted, this may be necessary:
Code:
awk 'NR==1{getline x<f}NR==x{print;getline x<f}' f=file file1 > file2

assuming that &quot;file&quot; is sorted numerically. otherwise sort -n first.

Last edited by m_wassal; 07-12-2010 at 09:15 AM.. Reason: forgot writing the contents
# 13  
Old 07-13-2010
Quote:
Originally Posted by binlib
Code:
awk 'FNR==NR{n[$0];next}FNR in n'  file file1 > file2

it didn't work.....syntax error...could you please advise?

---------- Post updated at 08:33 PM ---------- Previous update was at 08:32 PM ----------

Quote:
Originally Posted by Scrutinizer
Or since 2,3 million lines will have to be deleted, this may be necessary:
Code:
awk 'NR==1{getline x<f}NR==x{print;getline x<f}' f=file file1 > file2

assuming that "file" is sorted numerically. otherwise sort -n first.
it didn't work also...syntax error....could you please explain and advise...i really need your help...
# 14  
Old 07-13-2010
Are you on Solaris? If so use nawk or /usr/xpg4/bin/awk instead of the silly awk that is the default.

Last edited by Scrutinizer; 07-13-2010 at 03:15 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

A Large Percent Problem

Hello everyone, I have two matrices at same sizes. I need to re-calculate the numbers in matrix A according to the percentages in martix B it is like matrix A is 10.00 20.00 30.00 40.00 60.00 70.00 80.00 90.00 20.00 30.00 80.00 50.00 martix B is 00.08 00.05 ... (2 Replies)
Discussion started by: miriammiriam
2 Replies

2. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

3. Shell Programming and Scripting

Divide large data files into smaller files

Hello everyone! I have 2 types of files in the following format: 1) *.fa >1234 ...some text... >2345 ...some text... >3456 ...some text... . . . . 2) *.info >1234 (7 Replies)
Discussion started by: ad23
7 Replies

4. UNIX for Dummies Questions & Answers

Large Problem with nautilus

Hi, I am a torrent-maniak and I use Transmission. All things were good but Nautilus begun to show problem while I was runnning Transmission.Its situation was becoming worse and worse. Now, when I boot I can hardly open a nautilus window and browse my files.It will "stack" in seconds for sure! I... (2 Replies)
Discussion started by: hakermania
2 Replies

5. UNIX for Dummies Questions & Answers

Large file problem

I have a large file, around 570 gb that I want to copy to tape. However, my tape drive will load only up to 500 gb. I don't have enough space on disk to compress it before copying to tape. Can I compress and tar to tape in one command without writing a compressed disk file? Any suggestions... (8 Replies)
Discussion started by: iancrozier
8 Replies

6. UNIX for Dummies Questions & Answers

Problem using find with prune on large number of files

Hi all; I'm having a problem when want to list a large number of files in current directory using find together with the prune option. First i used this command but it list all the files including those in sub directories: find . -name "*.dat" | xargs ls -ltr Then i modified the command... (2 Replies)
Discussion started by: ashikin_8119
2 Replies

7. UNIX for Advanced & Expert Users

Large file FTP problem

We are experiencing a problem on a lengthy data transfer by FTP through a firewall. Since there are two ports in use on a ftp transfer (data and control), one sits idle while the other's transfering data. The idle port (control) will get timed out and the data transfer won't know that it's... (3 Replies)
Discussion started by: rprajendran
3 Replies

8. Shell Programming and Scripting

problem with 0 byte and large files

how to remove all zero byte files in a particular directory and also files that are morew than 1GB. pLEASE let me know (3 Replies)
Discussion started by: dsravan
3 Replies

9. Shell Programming and Scripting

Problem in processing a very large file.

Hi Friends, Getting an error while processing a very large file using an sqlloader........ The file is larger than 2 GB. Now need to change the compiler to 64-bit so that the file can be processed. Is there any command for the same. Thanks in advance. (1 Reply)
Discussion started by: Rohini Vijay
1 Replies
Login or Register to Ask a Question