Basically to run it: hash2files.pl inputfile1 inputfile2 outputfile1 outputfile2
Inputfile1 contains nuneric id's:
Code:
1233
2345
3456
4444
7777
To be compared against Inpufile2 which also has id's:
Code:
1244
2345
3456
9898
9999
The outputfile1 will contain all the id's in inputfile1 that are not found in inputfile2
In this case the result would be;
Code:
1233
4444
7777
Outputfile2 will have all the id's in inputfile2 not found in inputfile1. In this case:
Code:
9898
9999
It works really well with average size file. But it it can not handle loading 2 huge files (inputfile1 and 2) into the hash memory and it stops after a while w/o any error msgs oither than it does it produce the results. It terminates basically.
How can I make this work for huge files. The inputfile1 is about 204 million records and almost the same amount of records in inputfile2? I kniow it needs to be modified to somehow load one of them such as inputfile2 into the hash memory and not both, and do a compare on the id by reading one line from inputfile1 and if found in the has just delete it from the hash one at a time since we do not care about the matched one's at this point. What should remain in the hash is all not found id's and write them to a file. But i do not knoq how to do that !!
I hope helps explaining my issue.
Last edited by Scott; 07-13-2012 at 02:00 PM..
Reason: Blah blah blah blah and blah blah. Thanks.
Hi All,
Can you please help me in resolving the following problem?
My requirement is like this:
1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data.
2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by... (5 Replies)
Hi,
I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Below is my perl script:
#!/usr/bin/perl
open(FILE,"$ARGV") or die "$!";
@DATA = <FILE>;
close FILE;
$join = join("",@DATA);
@array = split( ">",$join);
for($i=0;$i<=scalar(@array);$i++){
system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d... (5 Replies)
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Hello Everyone,
I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this :
foreach my $t (@text)
{
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
... (4 Replies)
Hi all,
I have two large files and i want a field by field comparison for each record in it.
All fields are tab seperated.
file1:
Email SELVAKUMAR RAMACHANDRAN
Email SHILPA SAHU
Web NIYATI SONI
Web NIYATI SONI
Email VIINII DOSHI
Web RAJNISH KUMAR
Web ... (4 Replies)
Hi,
I'm new to perl and i have to write a perl script that will compare to log/txt files and display the differences. Unfortunately I'm not allowed to use any complied binaries or applications like diff or comm.
So far i've across a code like this:
use strict;
use warnings;
my $list1;... (2 Replies)
Hi,
I have the following command in place
nawk -F, '!a++' file > file.uniq
It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error:
bash-3.2$ nawk -F, '!a++'... (17 Replies)
I have these two file that I am trying to compare using shell arrays. I need to find out the changed or the missing
enteries from File2. For example. The line "f nsd1" in file2 is different from file1 and the line "g nsd6" is missing
from file2.
I dont want to use "for loop" because my files... (2 Replies)
Discussion started by: sags007_99
2 Replies
LEARN ABOUT CENTOS
dh_compress
DH_COMPRESS(1) Debhelper DH_COMPRESS(1)NAME
dh_compress - compress files and fix symlinks in package build directories
SYNOPSIS
dh_compress [debhelperoptions] [-Xitem] [-A] [file...]
DESCRIPTION
dh_compress is a debhelper program that is responsible for compressing the files in package build directories, and makes sure that any
symlinks that pointed to the files before they were compressed are updated to point to the new files.
By default, dh_compress compresses files that Debian policy mandates should be compressed, namely all files in usr/share/info,
usr/share/man, files in usr/share/doc that are larger than 4k in size, (except the copyright file, .html and other web files, image files,
and files that appear to be already compressed based on their extensions), and all changelog files. Plus PCF fonts underneath
usr/share/fonts/X11/
FILES
debian/package.compress
These files are deprecated.
If this file exists, the default files are not compressed. Instead, the file is ran as a shell script, and all filenames that the shell
script outputs will be compressed. The shell script will be run from inside the package build directory. Note though that using -X is a
much better idea in general; you should only use a debian/package.compress file if you really need to.
OPTIONS -Xitem, --exclude=item
Exclude files that contain item anywhere in their filename from being compressed. For example, -X.tiff will exclude TIFF files from
compression. You may use this option multiple times to build up a list of things to exclude.
-A, --all
Compress all files specified by command line parameters in ALL packages acted on.
file ...
Add these files to the list of files to compress.
CONFORMS TO
Debian policy, version 3.0
SEE ALSO debhelper(7)
This program is a part of debhelper.
AUTHOR
Joey Hess <joeyh@debian.org>
11.1.6ubuntu2 2018-05-10 DH_COMPRESS(1)