File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this Post: 303025152

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number...

2. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab...

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and...

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA...

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|.........

6. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

7. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . .

8. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see...

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ......

LEARN ABOUT CENTOS

perf-diff

PERF-DIFF(1)							    perf Manual 						      PERF-DIFF(1)

NAME

       perf-diff - Read perf.data files and display the differential profile

SYNOPSIS

       perf diff [baseline file] [data file1] [[data file2] ... ]

DESCRIPTION

       This command displays the performance difference amongst two or more perf.data files captured via perf record.

       If no parameters are passed it will assume perf.data.old and perf.data.

       The differential profile is displayed only for events matching both specified perf.data files.

OPTIONS

       -D, --dump-raw-trace
	   Dump raw trace in ASCII.

       -m, --modules
	   Load module symbols. WARNING: use only with -k and LIVE kernel

       -d, --dsos=
	   Only consider symbols in these dsos. CSV that understands file://filename entries.

       -C, --comms=
	   Only consider symbols in these comms. CSV that understands file://filename entries.

       -S, --symbols=
	   Only consider these symbols. CSV that understands file://filename entries.

       -s, --sort=
	   Sort by key(s): pid, comm, dso, symbol.

       -t, --field-separator=
	   Use a special separator character and don't pad with spaces, replacing all occurrences of this separator in symbol names (and other
	   output) with a .  character, that thus it's the only non valid separator.

       -v, --verbose
	   Be verbose, for instance, show the raw counts in addition to the diff.

       -f, --force
	   Don't complain, do it.

       --symfs=<directory>
	   Look for files with symbols relative to this directory.

       -b, --baseline-only
	   Show only items with match in baseline.

       -c, --compute
	   Differential computation selection - delta,ratio,wdiff (default is delta). See COMPARISON METHODS section for more info.

       -p, --period
	   Show period values for both compared hist entries.

       -F, --formula
	   Show formula for given computation.

       -o, --order
	   Specify compute sorting column number.

COMPARISON

       The comparison is governed by the baseline file. The baseline perf.data file is iterated for samples. All other perf.data files specified
       on the command line are searched for the baseline sample pair. If the pair is found, specified computation is made and result is displayed.

       All samples from non-baseline perf.data files, that do not match any baseline entry, are displayed with empty space within baseline column
       and possible computation results (delta) in their related column.

       Example files samples: - file A with samples f1, f2, f3, f4, f6 - file B with samples f2, f4, f5 - file C with samples f1, f2, f5

       Example output: x - computation takes place for pair b - baseline sample percentage

       o   perf diff A B C

	       baseline/A compute/B compute/C  samples
	       ---------------------------------------
	       b		    x	       f1
	       b	  x	    x	       f2
	       b			       f3
	       b	  x		       f4
	       b			       f6
			  x	    x	       f5

       o   perf diff B A C

	       baseline/B compute/A compute/C  samples
	       ---------------------------------------
	       b	  x	    x	       f2
	       b	  x		       f4
	       b		    x	       f5
			  x	    x	       f1
			  x		       f3
			  x		       f6

       o   perf diff C B A

	       baseline/C compute/B compute/A  samples
	       ---------------------------------------
	       b		    x	       f1
	       b	  x	    x	       f2
	       b	  x		       f5
				    x	       f3
			  x	    x	       f4
				    x	       f6

COMPARISON METHODS

   delta
       If specified the Delta column is displayed with value d computed as:

	   d = A->period_percent - B->period_percent

       with: - A/B being matching hist entry from data/baseline file specified (or perf.data/perf.data.old) respectively.

       o   period_percent being the % of the hist entry period value within single data file

   ratio
       If specified the Ratio column is displayed with value r computed as:

	   r = A->period / B->period

       with: - A/B being matching hist entry from data/baseline file specified (or perf.data/perf.data.old) respectively.

       o   period being the hist entry period value

   wdiff:WEIGHT-B,WEIGHT-A
       If specified the Weighted diff column is displayed with value d computed as:

	   d = B->period * WEIGHT-A - A->period * WEIGHT-B

       o   A/B being matching hist entry from data/baseline file specified (or perf.data/perf.data.old) respectively.

       o   period being the hist entry period value

       o   WEIGHT-A/WEIGHT-B being user suplied weights in the the -c option behind : separator like -c wdiff:1,2.

       o   WIEGHT-A being the weight of the data file

       o   WIEGHT-B being the weight of the baseline data file

SEE ALSO

       perf-record(1)

perf								    06/30/2014							      PERF-DIFF(1)