deleting multiple records from a huge file at one time


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting deleting multiple records from a huge file at one time
# 1  
Old 02-05-2008
deleting multiple records from a huge file at one time

I have a very big file of 5gb size and there are about 50 million records in there. I have to delete the records based on recrord number that I know fromoutside with out opening the file. The record numbers are very random like 5000678, 7890005 etc.


Can somebody let me know how i can remove records based on the record number all at one time and not one time
a piece please?
# 2  
Old 02-05-2008
What do you mean by "with out opening the file"? You cannot delete records without opening the file.

Is this file a structured file or a database file? Are the records in the file sorted or random? You seem to indicate that the records are random but I am unclear if you are referring to the file or the list of records to be deleted.

You need to provide more precise information if you want somebody to help you.
# 3  
Old 02-05-2008
The reason I said i want to delete with out opening is because the file is too large too open. The file is a regular ascii FILE with data in it. I just need to delete some records from there one time with out having to do it as many times as i want to delete the records.
# 4  
Old 02-05-2008
Hi.

See post #6 in https://www.unix.com/shell-programmin...#post302154933 -- I think you should be able to adapt that procedure for creating a sed script that will delete specific lines in a single pass over the file. It was used to print ("p") print lines, but a delete is a similar operation. You would also need to omit the "-n" option on the final execution of sed.

It still will not be cheap -- every program that processes a file will "open" the file in the sense that it tells the system that it will be dealing with the content of that file. The program will need to read every line in order to create the new copy minus the lines you delete. Afterwards, you could rename the new file to the old name.

I suggest you try the procedure on small sample files first ... cheers, drl
# 5  
Old 02-06-2008
Dsarvan,
Split & process,we process few GB of file through AWK.
1.split them based lines (approximate nr)
2.process the the files in parallel ( if you server is having decent RAM & CPU's)
remember not to have same name if you use any temporary file, one way could be adding a random number or adding process id to it.
# 6  
Old 02-06-2008
Thank you very much drl. The post you gave me helped me.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Deleting records from .dat file

Hi, I need to delete one row from a .dat file. error processing column PCT_USED in row 1053295 for datafile /exp/stats/ts_stats.dat ORA-01722: invalid number This is used to load records using sql loader. Please let me know the procedure to do it. Regards, VN (3 Replies)
Discussion started by: narayanv
3 Replies

2. Shell Programming and Scripting

Awk to Count Multiple patterns in a huge file

Hi, I have a file that is 430K lines long. It has records like below |site1|MAP |site2|MAP |site1|MODAL |site2|MAP |site2|MODAL |site2|LINK |site1|LINK My task is to count the number of time MAP, MODAL, LINK occurs for a single site and write new records like below to a new file ... (5 Replies)
Discussion started by: reach.sree@gmai
5 Replies

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

4. Shell Programming and Scripting

splitting a huge line of file into multiple lines with fixed number of columns

Hi, I have a huge file with a single line. But I want to break that line into lines of with each line having five columns. My file is like this: code: "hi","there","how","are","you?","It","was","great","working","with","you.","hope","to","work","you." I want it like this: code:... (1 Reply)
Discussion started by: rajsharma
1 Replies

5. Shell Programming and Scripting

Deleting last records of a file

can you please give shell script for daleting the last 7 records of file... (7 Replies)
Discussion started by: vsairam
7 Replies

6. Shell Programming and Scripting

Grep matched records from huge file

111111111100000000001111111111 123232323200000010001114545454 232435424200000000001232131212 342354234301000000002323423443 232435424200000000001232131212 2390898994200000000001238908092 This is the record format. From 11th position to 20th position in a record there are 0's occuring,and... (6 Replies)
Discussion started by: mjkreddy
6 Replies

7. UNIX for Dummies Questions & Answers

Parsing out records from one huge record

Hi, I have one huge record and know that each record in the file is 550 bytes long. How do I parse out individual records from the single huge record. Thanks, (4 Replies)
Discussion started by: bwrynz1
4 Replies

8. UNIX for Dummies Questions & Answers

deleting files based on file name and modified time

Hi, I have some log files created in the following fashion Ex: file name modified date 1) s.log1 01-jan-08 2) s.log2 02-jan-08 3) s.log3 03-jan-08 4) s.log4 04-jan-08 Now I want to have the latest 2 logs and delete the others. Can you tell me the one liner /... (1 Reply)
Discussion started by: ammu
1 Replies

9. Programming

problem deleting date-time stamped file in a directory

I have a number of files of the format filename.xfr_mmddyy_%H%M%S which i get in a specified directory daily. Now i want to search in the specified directory & delete the files which are more than 2 days old .So I use a command find $DIR/backup/* -ctime +2 -exec rm -f {} \; But after executing... (1 Reply)
Discussion started by: dharmesht
1 Replies

10. Filesystems, Disks and Memory

Time taken for creation of a huge core file

Hi, I needed to know how I can find out the time needed for an Unix machine(HP) to create a corefile as huge as 500MB(core created either by a SEGV or a kill -6 command). An approximate figure of the time taken would be really helpful.:confused: (4 Replies)
Discussion started by: nayeem
4 Replies
Login or Register to Ask a Question