Comparing files exceeding 1.7GB


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing files exceeding 1.7GB
# 1  
Old 06-05-2008
Comparing files exceeding 1.7GB

HI,

I have few files in two folders with the same name exceeding 2GB.I need to compare these files. These files are in the format

File1 in first folder
1|20080430|IA001|TREND DYNAMICS INC
2|20080430|IP001|AMERITAS LIFE INSURANCE CO
3|20080430|IP002|TRANSAMERICA LIFE INSURANCE CO

File1 in second folder
1|20080430|IA45|TREND DYNAMICS INC
2|20080430|IP001|AMERITAS LIFE INSURANCE CO


The files may be pipe or tab separated.

What i need to do here is to sort both the files, then compare. But the problem here is since the file exceeds 2GB sort command wont work and the diff command wont work. The comparison has to be line by line and field to field. The output should be in this format

For lines from files in first folder i need to indicate it by appending "From Test1" to the beginning of mismatching line like this

From Test1 - 1|20080430|IA001|TREND DYNAMICS INC

For lines from files in second folder i need to indicate it by appending "From Test2" to the beginning of mismatching line like this

From Test2 - 1|20080430|IA45|TREND DYNAMICS INC
And if a line found in file 1 of first folder is not found in file 1 of second folder then print that line alone to my output file


Hence my Final output should be like

From Test1 - 1|20080430|IA001|TREND DYNAMICS INC
From Test2 - 1|20080430|IA45|TREND DYNAMICS INC

From Test1 - 3|20080430|IP002|TRANSAMERICA LIFE INSURANCE CO

Is there a way to do it?

Last edited by ragavhere; 06-05-2008 at 05:18 PM..
# 2  
Old 06-05-2008
i) get yourself a better sort command (e.g. gnu sort should work and runs on nearly every unixoid system)
ii) write yourself a short sort programm in some suitable high-level language like perl
# 3  
Old 06-05-2008
Question

Sorry. I am new to UNIX and dont have knowledge on PERL.
Can you please help me?Smilie
# 4  
Old 06-05-2008
So better take option i) - this is much easier. Get the package GNU coreutils or GNU fileutils (old name, quite much the same content) for your OS.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies

2. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies

3. UNIX for Advanced & Expert Users

How to find duplicates contents in a files by comparing other files?

Hi Guys , we have one directory ...in that directory all files will be set on each day.. files must have header ,contents ,footer.. i wants to compare the header,contents,footer ..if its same means display an error message as 'files contents same' (7 Replies)
Discussion started by: Venkatesh1
7 Replies

4. Shell Programming and Scripting

Comparing the matches in two files using awk when both files have their own field separators

I've two files with data like below: file1.txt: AAA,Apples,123 BBB,Bananas,124 CCC,Carrot,125 file2.txt: Store1|AAA|123|11 Store2|BBB|124|23 Store3|CCC|125|57 Store4|DDD|126|38 So,the field separator in file1.txt is a comma and in file2.txt,it is | Now,the output should be... (2 Replies)
Discussion started by: asyed
2 Replies

5. Shell Programming and Scripting

wrapping text not exceeding 80 characters

I have a file where the text might exceed 80 characters. I want to have the maximum text lengths to be 80, and cut text from a space. I written an awk script below but does not seem to work very well { gsub("\t"," ") $0 = line $0 while (length <= WIDTH) { line = $0 ... (3 Replies)
Discussion started by: kristinu
3 Replies

6. Shell Programming and Scripting

Need help comparing two files and deleting some things in those files!

So I have two files: File1 pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2 pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2 ref4948 1.1 treehouse.txt 1.6 ref8573 1.5 ref3284 1.4 ref5838... (24 Replies)
Discussion started by: linuxkid
24 Replies

7. AIX

email alerts for memory or cpu exceeding thresholds

Hi Guys, I hope this is an easy question: I need some kind of script or an idea how I can convince syslog to send an email to root or someone else once cpu usage exceeds 95% or the memory consumption (maybe via AVM value times 4k) exceeds 85% of my real memory on any of my 700 lpars. We're... (4 Replies)
Discussion started by: zxmaus
4 Replies

8. Shell Programming and Scripting

Help required in displaying lines exceeding 79 chars along with their line numbers ??

Hi folks, I am looking for a solution to display those lines in any file that contains 80 or more characters along with their corresponding line number in the file. The below script will print the lines with their corresponding line numbers... sed = Sample.cpp | sed 'N;s/\n/\t/;... (8 Replies)
Discussion started by: frozensmilz
8 Replies

9. UNIX for Advanced & Expert Users

comparing shadow files with real files

Hi I need to compare shadow file sizes with their real file counterparts. If the shadow file size differs form the realfile size then it must send a mail. My problem is that our system has over 1600 shadowfiles in different directories, with different names. the only consistancy is the .sh file... (4 Replies)
Discussion started by: terrym
4 Replies

10. UNIX for Dummies Questions & Answers

File size exceeding 2GB

I am working on HP-Unix. I have a 600 MB file in compressed form. During decompression, when file size reaches 2GB, decompression aborts. What should be done? (3 Replies)
Discussion started by: Nadeem Mistry
3 Replies
Login or Register to Ask a Question