Perl: Need help comparing huge files Post: 302670921

Sponsored Content

Top Forums Shell Programming and Scripting Perl: Need help comparing huge files Post 302670921 by mrn6430 on Thursday 12th of July 2012 04:12:38 PM

07-12-2012

Registered User

Basically to run it: hash2files.pl inputfile1 inputfile2 outputfile1 outputfile2

Inputfile1 contains nuneric id's:

Code:

To be compared against Inpufile2 which also has id's:

Code:

The outputfile1 will contain all the id's in inputfile1 that are not found in inputfile2
In this case the result would be;

Code:

1233
4444
7777

Outputfile2 will have all the id's in inputfile2 not found in inputfile1. In this case:

Code:

9898
9999

It works really well with average size file. But it it can not handle loading 2 huge files (inputfile1 and 2) into the hash memory and it stops after a while w/o any error msgs oither than it does it produce the results. It terminates basically.

How can I make this work for huge files. The inputfile1 is about 204 million records and almost the same amount of records in inputfile2? I kniow it needs to be modified to somehow load one of them such as inputfile2 into the hash memory and not both, and do a compare on the id by reading one line from inputfile1 and if found in the has just delete it from the hash one at a time since we do not care about the matched one's at this point. What should remain in the hash is all not found id's and write them to a file. But i do not knoq how to do that !!

I hope helps explaining my issue.

Last edited by Scott; 07-13-2012 at 02:00 PM.. Reason: Blah blah blah blah and blah blah. Thanks.

mrn6430

View Public Profile for mrn6430

Find all posts by mrn6430

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by...

2. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of...

3. Shell Programming and Scripting

Perl script error to split huge data one by one.

Below is my perl script: #!/usr/bin/perl open(FILE,"$ARGV") or die "$!"; @DATA = <FILE>; close FILE; $join = join("",@DATA); @array = split( ">",$join); for($i=0;$i<=scalar(@array);$i++){ system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d...

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

6. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ...

7. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM...

8. Shell Programming and Scripting

Perl: Comparing to two files and displaying the differences

Hi, I'm new to perl and i have to write a perl script that will compare to log/txt files and display the differences. Unfortunately I'm not allowed to use any complied binaries or applications like diff or comm. So far i've across a code like this: use strict; use warnings; my $list1;...

9. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

10. Shell Programming and Scripting

Need help in comparing two files using shell or Perl

I have these two file that I am trying to compare using shell arrays. I need to find out the changed or the missing enteries from File2. For example. The line "f nsd1" in file2 is different from file1 and the line "g nsd6" is missing from file2. I dont want to use "for loop" because my files...

LEARN ABOUT DEBIAN

dcsort

DCSORT(1)					      DICOM PS3 - Make sorted list of images						 DCSORT(1)

NAME

       dcsort - ACR/NEMA DICOM PS3 ... DICOM PS3 - Make sorted list of images

SYNOPSIS

       dcsort  "  inputfile1  [ inputfile2 ... ]" [ -v|verbose ] [ -vv|veryverbose ] [ -vvv|veryveryverbose ] [ -index ] [ -show ] [ -interval ] [
		 -tolerance mm ] [ -check ] [ -checkFoR ] [ -descending ] [ -sortby|k  attributename ]

DESCRIPTION

       dcsort reads the named dicom input files and sorts them by the specified sort key.

       The sort key should be a single valued numeric attribute, with the exception of ImageOrientationPatient and ImagePositionPatient which  are
       handled as special cases.

       There is no output by default unless the index or verbose options are specified.

OPTIONS

       The output and errors go to standard error.

       The basic input switches are described in dcintro(1). Options specific to this program are:

       -index
	      Creates a first column of output that is the index in the sort order, starting from 0, and a second column that is the filename.

       -show
	      Show the value of the sort key after the file name with the -index option

       -interval
	      Show the interval between values of the sort key, or an error if not equal

       -tolerance mm
	      The tolerance value in mm to use when comparing intervals between slices; defaults to +/- 0.01 mm if unspecified

       -check
	      Check that all the images are from the same series.

       -checkFoR
	      Check that all the images have the same Frame of Reference (UID).

       -descending
	      Sort in descending, rather than the default ascending, order.

       -sortby|k  attributename
	      Specify attributename as the sort key.

ENVIRONMENT


EXAMPLES


       % dcsort -index -sortby SliceLocation ./1/[0-9]*
       0    ./1/1
       1    ./1/2

FILES


SEE ALSO

       dcintro(1)

AUTHOR

       Copyright (C) 1993-2010. David A. Clunie DBA PixelMed Publishing. All rights reserved.

BUGS

DICOM PS3							 22 December 2006							 DCSORT(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Discussion started by: madhukalyan

2. Shell Programming and Scripting

Comparing two huge files

Discussion started by: kmkbuddy_1983

3. Shell Programming and Scripting

Perl script error to split huge data one by one.

Discussion started by: patrick87

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Discussion started by: ad23

6. Shell Programming and Scripting

Comparing two huge files on field basis.

Discussion started by: Suman Singh

7. Shell Programming and Scripting

Comparing 2 huge text files

Discussion started by: linuxgeek

8. Shell Programming and Scripting

Perl: Comparing to two files and displaying the differences

Discussion started by: dont_be_hasty

9. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Discussion started by: makn

10. Shell Programming and Scripting

Need help in comparing two files using shell or Perl

Discussion started by: sags007_99

LEARN ABOUT DEBIAN

dcsort