Compare lines in two files (Memory getting exhausted)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare lines in two files (Memory getting exhausted)
# 1  
Old 07-23-2009
Compare lines in two files (Memory getting exhausted)

Hi,

Could someone please help me with the best approach to compare lines from one file to another? Here is how I have entries -

File 1
Code:
a1
a2
a3
a4
a9
a10
a15

File2
Code:
a5
a6
a15
a7
a9

Expected output is -
a15
a9
(order doesn't matter)


The file1 is a huge file. I am trying to execute following command -

Code:
grep -f File1 File2

When I do that, I get the error after a minute- grep: Memory exhausted.

Strangely, when I do grep -f File2 File1, then it doesn't give that error but output it generates is wrong as it prints some output that is not in File2!

Please help with what is the better way to compare and print output?
# 2  
Old 07-23-2009
what do you mean compare? you could use "diff" but that will only tell you the difference between the files. What are you ultimately trying to do?
# 3  
Old 07-23-2009
How huge is file1? If it can't load all the strings into memory, then it just won't go. You may have to process it in smaller batches then merge them.
# 4  
Old 07-23-2009
I can't use diff. I need to find those entries in File2 that are present in File1. They may not be in the same order. Please check expected output I have provided and input files.

Thanks for your help.

---------- Post updated at 01:12 PM ---------- Previous update was at 01:11 PM ----------

Corona,

That is also true. Would you know any command that splits a file into multiple chunks?

Thanks,
# 5  
Old 07-23-2009
Yes, 'split < File1' would produce files xaa, xab, ... in chunks of 1000 lines. The parameter '-l 10000' would make it split at 10000 lines. This could produce hundreds or thousands of files if File1 is enormous enough, be warned.
# 6  
Old 07-23-2009
Without knowing the sizes of files involved it's not possible to give a precise answer.

this should work if file sizes are not too extreme; what is aproximate file sizes?

Code:
# first files need to be sorted
sort file1 > file1_sorted
sort file2 > file2_sorted

comm -1 -2 file1_sorted file2_sorted

the limited example given this works. If too large as stated above files will need to be split first.

the below link references a script that would breakup your files into whatever size chunks you specify.
HTML Code:
https://www.unix.com/unix-dummies-questions-answers/45814-split-files-using-csplit.html#post302148039
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare lines between two files

I have two files I need to compare these two files and take the lines that are common in both the files and consider the line present in second file for my further processing I have used "Awk" along with "FNR and NR" but that is not working gawk -F= ' > FNR==NR {a=$1; next}; > ... (2 Replies)
Discussion started by: Priya Amaresh
2 Replies

2. UNIX for Dummies Questions & Answers

Compare lines in 2 files

I have 2 files with exactly the same information (with header and separated by ";") and what I would like to do is print (for both files!) the columns that are different and also print the "key" column that is equal in the 2 files For example, if File1: key1;aaa;bbb;ccc key2;ddd;eee;fff... (4 Replies)
Discussion started by: mvalonso
4 Replies

3. Shell Programming and Scripting

Compare files by lines and columns

Inspired by the extremely short awk code from Ygor on this post I wanted to compare two files on only one field. I can't get it to work. Can anybody help on explaining the code and fix the code? My code which does not work: awk 'BEGIN{a=1};a!=1' file1.txt file2.txt >outfile.txt file1.txt... (1 Reply)
Discussion started by: sdf
1 Replies

4. Shell Programming and Scripting

Error PHP Fatal error: Allowed memory size of 67108864 bytes exhausted(tried to allocate 401 bytes)

While running script I am getting an error like Few lines in data are not being processed. After googling it I came to know that adding such line would give some memory to it ini_set("memory_limit","64M"); my input file size is 1 GB. Is that memory limit is based on RAM we have on... (1 Reply)
Discussion started by: elamurugu
1 Replies

5. Web Development

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted

Any clues on how to get rid of this PHP error? PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /website/www/includes/functions_manpages.php on line 58 PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71... (4 Replies)
Discussion started by: Neo
4 Replies

6. Shell Programming and Scripting

compare files and then remove some lines

Hi everyone I have a dilemma and I'm hoping someone has an answer for me. I have two files: # cat masterfile line3 line4 line5 line6 line7 # cat tempfile line1 line2 line3 line4 I want to compare tempfile with masterfile. (3 Replies)
Discussion started by: soliberus
3 Replies

7. Shell Programming and Scripting

compare two files and to remove the matching lines on both the files

I have two files and need to compare the two files and to remove the matching lines from both the files (4 Replies)
Discussion started by: shellscripter
4 Replies

8. Shell Programming and Scripting

Trying to compare lines in 2 files

Hello, I am new to scripting and need some help. In looking at other posts on this forum, I came up with the following logic. I cannot figure out why I am getting names of files of the current directory in my echo output. Scenario: message file has a line containing the version. Version.txt... (2 Replies)
Discussion started by: brdholman
2 Replies

9. Shell Programming and Scripting

Memory exhausted in awk

Dear All, I have executed a awk script in linux box which consists of 21 Million records.And i have two mapping files of 500 and 5200 records.To my surprise i found an error awk: cmd. line:19: (FILENAME=/home/FILE FNR=21031272) fatal: Memory exhausted. Is there any limitation for records... (3 Replies)
Discussion started by: cskumar
3 Replies
Login or Register to Ask a Question