Need to remove certain records off a file.


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Need to remove certain records off a file.
# 1  
Old 09-04-2012
Need to remove certain records off a file.

New to unix. I have a couple files of 5 million records. I have a key field on those records. I have about 300 keys that I need to remove off the file, and I don't want to write a program to do it. I have used grep -v in the past and that works great for a few records, but I can't see myself having to do that 300 times/file.

Is there an easier way using grep, egrep, sed/awk, etc.... that I remove this records quickly. The file layout is simple:

H00012345

The key starts in position 5. of the file. In this example, I would need to remove 12345 from the file of 5 million records. My problems is that I have 300 differenet keys/records that need to be removed. I know what each of the key values are, but I don't want to have to remove them one at a time.

Example

H00011111
H00012345
H00022222
H00033333
H00044444

I need to remove H00012345 from the file and have the following result.

H00011111
H00022222
H00033333
H00044444

Thanks.
# 2  
Old 09-04-2012
Put the three hundred items in file1. assume the big file is called file2
Code:
awk 'FILENAME=="file1" {arr[substr($0,5,5)]++}
       FILENAME=="file2" {t=substr($0,5,5); if(t in arr){next} else {print $0}}' file1 file2 > newfile

Thoe order of "file1 file2" on the command line is very important. Has to be as given.
# 3  
Old 09-04-2012
Here is a solution using grep. It involves using -f option to read patterns stored in a file (key_file) and using regular expression to search starting at the 5th position.

Code:
cat key_file
^....12345

Code:
grep -vf key_file master_file > new_file


Last edited by mjf; 09-04-2012 at 12:00 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove bad records from file and move them into a file then send those via email

Hi my requirement is that i want pull the bad records from input file and move those records in to a seperate file. that file has to be sent via email.. any suggentions please (1 Reply)
Discussion started by: sxk4999
1 Replies

2. Shell Programming and Scripting

remove bad records.

HI I have a problem in a file .The file was generated with the wrong data in it. MAL 005158UK473BBTICK1120722 A9999999ADASCD 1120722ADD_SECURIADD_SECURI MAL 005158UK473BBU 1120722 A9999999FF000EA0B9C 1120722ADD_SECURIADD_SECURI MAL 005158UK473ISN 1120722 A9999999US005158UK43... (5 Replies)
Discussion started by: ptappeta
5 Replies

3. Shell Programming and Scripting

remove records which have 2 same fields

how can i remove records which have 2 same fields? my file: saeed 1 2 sa vahid 2 3 45 reza 212 33 sa amir 1 1 ui reza 21 33 sa i want to remove records which first and 3rd field of that are as the same, here line 3 and 5 must be removed. (3 Replies)
Discussion started by: saeed.soltani
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. Shell Programming and Scripting

how to remove particular records from a file???

I need to remove header(H) and trailer(T) from a file keeping other records as such. The source file will look as below I have to remove H|20120203_000500|20120203_000500 and T| 10111246 from the above file. Please let me know how to do... (6 Replies)
Discussion started by: siteregsam
6 Replies

6. Shell Programming and Scripting

Remove somewhat Duplicate records from a flat file

I have a flat file that contains records similar to the following two lines; 1984/11/08 7 700000 123456789 2 1984/11/08 1941/05/19 7 700000 123456789 2 The 123456789 2 represents an account number, this is how I identify the duplicate record. The ### signs represent... (4 Replies)
Discussion started by: jolney
4 Replies

7. Shell Programming and Scripting

Remove Corrupted Records in the file

Hi All, Please help me with the below query. I need to delete the bad records in the file and have to save the file after deleting them. Thanks in advance. Ex: ----- I have a file called ABC and the data in the file is as follows. 08XS021642525520110830BBSBI 99.9375 ... (5 Replies)
Discussion started by: rajeshamathi
5 Replies

8. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

9. Shell Programming and Scripting

Remove all instances of duplicate records from the file

Hi experts, I am new to scripting. I have a requirement as below. File1: A|123|NAME1 A|123|NAME2 B|123|NAME3 File2: C|123|NAME4 C|123|NAME5 D|123|NAME6 1) I have 2 merge both the files. 2) need to do a sort ( key fields are first and second field) 3) remove all the instances... (3 Replies)
Discussion started by: vukkusila
3 Replies

10. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies
Login or Register to Ask a Question