06-05-2008
Comparing files exceeding 1.7GB
HI,
I have few files in two folders with the same name exceeding 2GB.I need to compare these files. These files are in the format
File1 in first folder
1|20080430|IA001|TREND DYNAMICS INC
2|20080430|IP001|AMERITAS LIFE INSURANCE CO
3|20080430|IP002|TRANSAMERICA LIFE INSURANCE CO
File1 in second folder
1|20080430|IA45|TREND DYNAMICS INC
2|20080430|IP001|AMERITAS LIFE INSURANCE CO
The files may be pipe or tab separated.
What i need to do here is to sort both the files, then compare. But the problem here is since the file exceeds 2GB sort command wont work and the diff command wont work. The comparison has to be line by line and field to field. The output should be in this format
For lines from files in first folder i need to indicate it by appending "From Test1" to the beginning of mismatching line like this
From Test1 - 1|20080430|IA001|TREND DYNAMICS INC
For lines from files in second folder i need to indicate it by appending "From Test2" to the beginning of mismatching line like this
From Test2 - 1|20080430|IA45|TREND DYNAMICS INC
And if a line found in file 1 of first folder is not found in file 1 of second folder then print that line alone to my output file
Hence my Final output should be like
From Test1 - 1|20080430|IA001|TREND DYNAMICS INC
From Test2 - 1|20080430|IA45|TREND DYNAMICS INC
From Test1 - 3|20080430|IP002|TRANSAMERICA LIFE INSURANCE CO
Is there a way to do it?
Last edited by ragavhere; 06-05-2008 at 05:18 PM..
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I am working on HP-Unix.
I have a 600 MB file in compressed form.
During decompression, when file size reaches
2GB, decompression aborts.
What should be done? (3 Replies)
Discussion started by: Nadeem Mistry
3 Replies
2. UNIX for Advanced & Expert Users
Hi
I need to compare shadow file sizes with their real file counterparts. If the shadow file size differs form the realfile size then it must send a mail. My problem is that our system has over 1600 shadowfiles in different directories, with different names. the only consistancy is the .sh file... (4 Replies)
Discussion started by: terrym
4 Replies
3. Shell Programming and Scripting
Hi folks,
I am looking for a solution to display those lines in any file that contains 80 or more characters along with their corresponding line number in the file.
The below script will print the lines with their corresponding line numbers...
sed = Sample.cpp | sed 'N;s/\n/\t/;... (8 Replies)
Discussion started by: frozensmilz
8 Replies
4. AIX
Hi Guys,
I hope this is an easy question: I need some kind of script or an idea how I can convince syslog to send an email to root or someone else once cpu usage exceeds 95% or the memory consumption (maybe via AVM value times 4k) exceeds 85% of my real memory on any of my 700 lpars. We're... (4 Replies)
Discussion started by: zxmaus
4 Replies
5. Shell Programming and Scripting
So I have two files:
File1
pictures.txt 1.1 1.3
dance.txt 1.2 1.4
treehouse.txt 1.3 1.5
File2
pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244
dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2 ref4948 1.1
treehouse.txt 1.6 ref8573 1.5 ref3284 1.4 ref5838... (24 Replies)
Discussion started by: linuxkid
24 Replies
6. Shell Programming and Scripting
I have a file where the text might exceed 80 characters. I want to have the maximum text lengths to be 80, and cut text from a space.
I written an awk script below but does not seem to work very well
{
gsub("\t"," ")
$0 = line $0
while (length <= WIDTH) {
line = $0
... (3 Replies)
Discussion started by: kristinu
3 Replies
7. Shell Programming and Scripting
I've two files with data like below:
file1.txt:
AAA,Apples,123
BBB,Bananas,124
CCC,Carrot,125
file2.txt:
Store1|AAA|123|11
Store2|BBB|124|23
Store3|CCC|125|57
Store4|DDD|126|38
So,the field separator in file1.txt is a comma and in file2.txt,it is |
Now,the output should be... (2 Replies)
Discussion started by: asyed
2 Replies
8. UNIX for Advanced & Expert Users
Hi Guys ,
we have one directory ...in that directory all files will be set on each day..
files must have header ,contents ,footer..
i wants to compare the header,contents,footer ..if its same means display an error message as 'files contents same' (7 Replies)
Discussion started by: Venkatesh1
7 Replies
9. Shell Programming and Scripting
I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code:
awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies
10. UNIX for Beginners Questions & Answers
I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ?
Source File
*************
# file: /local/test_1
# owner: own
#... (4 Replies)
Discussion started by: sarathy_a35
4 Replies
folders(1) General Commands Manual folders(1)
NAME
folders - list folders and contents (only available within the message handling system, mh)
SYNOPSIS
folders [+folder] [msg] [options]
OPTIONS
Lists only the name of folders, with no additional information. This is faster because the folders need not be read. Prints a list of the
valid options to this command. Lists the contents of the folder-stack. No +folder argument is allowed with this option. Re-numbers mes-
sages in the folders. Messages are re-numbered sequentially, and any gaps in the numbering are removed. The default operation is -nopack,
which does not change the numbering in the folder. Discards the top of the folder-stack, after setting the current folder to that value.
No +folder argument is allowed with this option. This corresponds to the popd operation in the C-shell; see csh(1). The -push and -pop
options are mutually exclusive: the last occurrence of either one overrides any previous occurrence of the other. Pushes the current
folder onto the folder-stack, and makes the +folder argument into the current folder. If +folder is not given, the current folder and the
top of the folder-stack are exchanged. This corresponds to the pushd operation in the C-shell; see csh(1). The -push switch and the -pop
switch are mutually exclusive: the last occurrence of either one overrides any previous occurrence of the other. Lists folders recur-
sively. Information on each folder is displayed, followed by information on any sub-folders which it contains. Displays only the total
number of messages and folders in your Mail directory. This option does not print any information about individual folders. It can be sup-
pressed using the -nototal option.
The defaults for folders are:
+folder defaults to all msg defaults to none -nofast -noheader -nototal -nopack -norecurse
DESCRIPTION
The folders command displays the names of your folders and the number of messages that they each contain.
The folders command displays a list of all the folders in your Mail directory. The folders are sorted alphabetically, each on its own line.
This is illustrated in the following example: Folder # of messages ( range ); cur msg (other files)
V2.3 has 3 messages ( 1- 3).
adrian has 20 messages ( 1- 20); cur= 2.
brian has 16 messages ( 1- 16).
chris has 12 messages ( 1- 12).
copylog has 242 messages ( 1- 242); cur= 225.
inbox+ has 73 messages ( 1- 127); cur= 127.
int has 4 messages ( 1- 4); cur= 2 (others).
jack has 17 messages ( 1- 17); cur= 17.
TOTAL= 387 messages in 8 folders.
The plus sign (+) after inbox indicates that it is the current folder. The information about the int folder includes the term (others).
This indicates that the folder int contains files which are not messages. These files may be either sub-folders, or files that do not
belong under the MH file naming scheme.
The folders command is identical to the effect of using the -all option to the folder command.
If you use folders with the +folder argument, it will display all the subfolders within the named folder. as shown in the following exam-
ple:
% folders +test Folder # of messages ( range ); cur msg (other files) test+ has 18 messages ( 1- 18);
(others). test/testone has 1 message ( 1- 1). test/testtwo has no messages.
TOTAL= 19 messages in 3 folders.
See refile(1) for more details of sub-folders.
RESTRICTIONS
MH does not allow you to have more than 100 folders at any level in your Mail directory.
PROFILE COMPONENTS
Path: To determine your MH directory
Folder-Protect: To set protections when creating a new folder
Folder-Stack: To determine the folder stack
lsproc: Program to list the contents of a folder
FILES
The user profile.
SEE ALSO
csh(1), folder(1), refile(1), mhpath(1)
folders(1)