File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this Post: 303025358

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number...

2. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab...

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and...

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA...

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|.........

6. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

7. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . .

8. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see...

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ......

LEARN ABOUT V7

diff

DIFF(1) 						      General Commands Manual							   DIFF(1)

NAME

       diff - differential file comparator

SYNOPSIS

       diff [ -efbh ] file1 file2

DESCRIPTION

       Diff  tells what lines must be changed in two files to bring them into agreement.  If file1 (file2) is `-', the standard input is used.	If
       file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2  (file1)  is  used.   The
       normal output contains lines of these forms:

	    n1 a n3,n4
	    n1,n2 d n3
	    n1,n2 c n3,n4

       These  lines resemble ed commands to convert file1 into file2.  The numbers after the letters pertain to file2.	In fact, by exchanging `a'
       for `d' and reading backward one may ascertain equally how to convert file2 into file1.	As in ed, identical pairs where n1 = n2 or n3 = n4
       are abbreviated as a single number.

       Following  each	of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected
       in the second file flagged by `>'.

       The -b option causes trailing blanks (spaces and tabs) to be ignored and other strings of blanks to compare equal.

       The -e option produces a script of a, c and d commands for the editor ed, which will recreate file2 from file1.	The -f option  produces  a
       similar	script,  not useful with ed, in the opposite order.  In connection with -e, the following shell program may help maintain multiple
       versions of a file.  Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand.	 A
       `latest version' appears on the standard output.

	    (shift; cat $*; echo '1,$p') | ed - $1

       Except in rare circumstances, diff finds a smallest sufficient set of file differences.

       Option  -h  does  a  fast,  half-hearted job.  It works only when changed stretches are short and well separated, but does work on files of
       unlimited length.  Options -e and -f are unavailable with -h.

FILES

       /tmp/d?????
       /usr/lib/diffh for -h

SEE ALSO

       cmp(1), comm(1), ed(1)

DIAGNOSTICS

       Exit status is 0 for no differences, 1 for some, 2 for trouble.

BUGS

       Editing scripts produced under the -e or -f option are naive about creating lines consisting of a single `.'.

																	   DIFF(1)