File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this


Login or Register to Reply

 
Thread Tools Search this Thread
# 8  
Quote:
Originally Posted by kartikirans
grep -F -x -v -f file2 file1 ?? or any other optimization command
sounds about right.
Just remember - whatever you do, comparing 60G files will be slow...
Test this on a smaller chunks to see if you're getting the desired results first.
# 9  
Hi kartikirans,

I'd be tempted to look at comm -3 ${file1} ${file2} this will suppress lines common to ${file1} and ${file2} later versions of comm don't require the files to be sorted.

Regards

Gull04
# 10  
One additional question: what means "non-matching lines"?

- only Lines in file1 which are not in file2? or
- plus lines in file2 which are not in file1?

bakunin
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Help- counting delimiter in a huge file and split data into 2 files
lv99
I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...... Shell Programming and Scripting
7
Shell Programming and Scripting
Problem running Perl Script with huge data files
ad23
Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...... Shell Programming and Scripting
4
Shell Programming and Scripting
Split a huge data into few different files?!
patrick87
Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA...... Shell Programming and Scripting
7
Shell Programming and Scripting
How to extract data from a huge file?
srsahu75
Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab...... Shell Programming and Scripting
5
Shell Programming and Scripting
search and grab data from a huge file
ting123
folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number...... UNIX for Dummies Questions & Answers
1
UNIX for Dummies Questions & Answers