Use records from one file to delete records in another file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Use records from one file to delete records in another file
# 1  
Old 06-18-2008
Use records from one file to delete records in another file

file_in_1:

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30 31 32

file_in_2:

9 10 11 12
21 22 23 24
1 2 3 4
17 18 19 20

file_out:

5 6 7 8
13 14 15 16
25 26 27 28
29 30 31 32

How can I use each record in file file_in_2 and delete the
corresponding records in file file_in_1 and produce
the file file_out.

Note:
My files can have thousands, even millions of records.
and tend to be not sorted.


Thank you,
Kenny.
# 2  
Old 06-18-2008
Code:
fgrep -vxf file_in_2 file_in_1 >file_out

--- if your fgrep has the -f option. Otherwise, perhaps something along the lines of

Code:
sed -e 's/[][\\.*$^]/\\&/g' -e 's!.*!/^&$/d!' file_in_2 |
sed -f - file_in_1 >file_out

If your sed doesn't grok -f - either, you need to put the output from the first sed script in a temporary file.

Code:
sed -e 's/[][\\.*$^]/\\&/g' -e 's!.*!/^&$/d!' file_in_2 >temporary
sed -f temporary file_in_1 >file_out
rm temporary

Different versions of sed understand slightly different variants of regular expression syntax, so there may be a need to adjust the first sed script slightly. But if your input file is just numbers and spaces, that is absolutely nothing to worry about.

As a last resort, maybe Perl could be useful:

Code:
perl -nle 'if ($. == ++$l) { $r{$_} = 1; close ARGV if eof; next}
print unless $r{$_}' file_in_2 file_in_1 >file.out

If you really have millions of lines in both sets, it might be fruitful to import them into a database or something if your regular line-oriented tools choke on really large files.

Last edited by era; 06-18-2008 at 01:59 PM.. Reason: As an afterthought, maybe use a database ...? And a Perl version!
# 3  
Old 06-18-2008
Thanks era,

The fgrep doesn't produce the desired result.
I think it might require the input files to be sorted first.

Your first sed code works.
In my example files you can see that file 2 is unsorted.

Thanks again,
Kenny.
# 4  
Old 06-18-2008
fgrep doesn't care about sort order. The examples you posted work here with fgrep. Perhaps there is some minor whitespace difference or control character somewhere ...?
# 5  
Old 06-18-2008
Code:
awk >file_out 'NR==FNR{_[$0];next}!($0 in _)' file_in_2 file_in_1

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 6  
Old 06-18-2008
Hammer & Screwdriver not sure if practical, based on filesize etc, but...

what about the following?
Yes, it could take a while; yes, it could all be done in one command (done this way to show what is happening); yes, it does re-organize the data.

Code:
> cat ifile1 ifile2 | sort -n >ifile3
> cat ifile3
1 2 3 4
1 2 3 4
5 6 7 8
9 10 11 12
9 10 11 12
13 14 15 16
17 18 19 20
17 18 19 20
21 22 23 24
21 22 23 24
25 26 27 28
29 30 31 32

then the command for uniq
Code:
> cat ifile3 | uniq -u 
5 6 7 8
13 14 15 16
25 26 27 28
29 30 31 32


Last edited by joeyg; 06-18-2008 at 04:14 PM.. Reason: eliminated a redundant sort command
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete the records in file based on lookup file.

Hi I have two files one.txt and two.txt one.txt 123 324 456 235 456 two txt abc one 000 123 abc abc one 000 456 abc abc one 000 122 abc abc one 000 111 abc My question here is, the records which are present in one.txt has to deleted in second file two.txt my output result... (2 Replies)
Discussion started by: Ganesh L
2 Replies

2. Shell Programming and Scripting

Separate records of a file on 2 types of records

Hi I am new to shell programming in unix Please if I can provide help. I have a file structure of a header record and "N" detail records. The header record will be the total number of detail records I need to split the file in 2: One for the header Another for all detail records Could... (1 Reply)
Discussion started by: jamcogar
1 Replies

3. Shell Programming and Scripting

Delete records within a file upon a condition

Hi Friends, I have the following file, cat input chr1 1000 2000 chr1 600 699 chr1 701 1000 chr1 600 1710 chr2 900 1800 Now, I would like to see the difference of Record1.Col2 - Record2.Col2 Record1.Col2 - Record2.Col3 Record1.Col3 - Record2.Col2 Record1.Col3 - Record2.Col3 ... (1 Reply)
Discussion started by: jacobs.smith
1 Replies

4. Shell Programming and Scripting

Need unix commands to delete records from one file if the same record present in another file...

Need unix commands to delete records from one file if the same record present in another file... just like join ... if the record present in both files.. delete from first file or delete the particular record and write the unmatched records to new file.. tried with grep and while... (6 Replies)
Discussion started by: msathees
6 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

7. UNIX for Dummies Questions & Answers

How can you delete records in a file matching a pattern?

I am curious if the following can be done in a file in unix. Let's say I have a flat file with the following data AAA,12,2,,,, BBB,3,1,,,, CCC,,,,, DDD,2,,,,, SQQ,,,,, ASJ,,3,5 I only want to capture the data with values into a new file. If the data contains the pattern ,,,,, as in... (2 Replies)
Discussion started by: mode09
2 Replies

8. Shell Programming and Scripting

Delete Duplicate records from a tilde delimited file

Hi All, I want to delete duplicate records from a tilde delimited file. Criteria is considering the first 2 fields, the combination of which has to be unique, below is a sample of records in the input file 1620000010338~2446694087~0~20061130220000~A00BCC1CT... (5 Replies)
Discussion started by: irshadm
5 Replies

9. Shell Programming and Scripting

delete records from a file

I have a big file with "|" delimiter. I want to delete all the records that have 'abc' in the 2nd field. How can i do that? I am not abe to open it in VI that is why i need to do it from outside. Please suggest (6 Replies)
Discussion started by: dsravan
6 Replies

10. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies
Login or Register to Ask a Question