Performance issue with 'grep' command for huge file size

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Performance Issue for a file search command

Hi All, This query is regarding performance improvement of a command. I have a list of IDs in a file (say file1 with single ID column) and file2 has the data rows. I need to get the IDs from file1 and search in file2, matching rows from file2 should be written to a file3. For this...

2. UNIX for Dummies Questions & Answers

What is the faster way to grep from huge file?

Hi All, I am new to this forum and this is my first post. My requirement is like to optimize the time taken to grep the file with 40000 lines. There are two files FILEA(40000 lines) FILEB(40000 lines). The requirement is like this, both the file will be in the format below...

3. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !!

4. Shell Programming and Scripting

Performance issue while using find command

Hi, I have created a shell script for Server Log Automation Process. I have used find xargs grep command to search the string. for Example, find -name | xargs grep "816995225" > test.txt . Here my problem is, We have lot of records and we want to grep the string...

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised...

6. Shell Programming and Scripting

FTP a huge Size file

Dear All, Good Evening!! I have a requirement to ftp a 220GB backup file to a remote backup server. I wrote a script for this purpose. But it takes more than 8 hours to transfer this file. Is there any other method to do it in less time??? Thanks in Advance!!! ---------- Post updated...

7. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ...

8. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

I have file which contains around 5000 lines. The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters. I want to delete the lines a> if it starts with 1 and if 576th postion is a digit i,e 0-9 or b> if it starts with 0 or 9(i,e header and footer) ...

9. Shell Programming and Scripting

performance of shell script ( grep command)

Hi, I have to find out the run time for 40-45 different componets. These components writes in to a genreric log file in a single directory. eg. directory is LOG and the log file name format is generic_log_<process_id>_<date YY_MM_DD_HH_MM_SS>.log i am taking the run time using the time...

10. Shell Programming and Scripting

Grep matched records from huge file

111111111100000000001111111111 123232323200000010001114545454 232435424200000000001232131212 342354234301000000002323423443 232435424200000000001232131212 2390898994200000000001238908092 This is the record format. From 11th position to 20th position in a record there are 0's occuring,and...

LEARN ABOUT DEBIAN

moose::cookbook::basics::company_subtypes

Moose::Cookbook::Basics::Company_Subtypes(3pm)		User Contributed Perl Documentation	    Moose::Cookbook::Basics::Company_Subtypes(3pm)

NAME

       Moose::Cookbook::Basics::Company_Subtypes - Demonstrates the use of subtypes and how to model classes related to companies, people,
       employees, etc.

VERSION

       version 2.0603

SYNOPSIS

	 package Address;
	 use Moose;
	 use Moose::Util::TypeConstraints;

	 use Locale::US;
	 use Regexp::Common 'zip';

	 my $STATES = Locale::US->new;
	 subtype 'USState'
	     => as Str
	     => where {
		    (	 exists $STATES->{code2state}{ uc($_) }
		      || exists $STATES->{state2code}{ uc($_) } );
		};

	 subtype 'USZipCode'
	     => as Value
	     => where {
		    /^$RE{zip}{US}{-extended => 'allow'}$/;
		};

	 has 'street'	=> ( is => 'rw', isa => 'Str' );
	 has 'city'	=> ( is => 'rw', isa => 'Str' );
	 has 'state'	=> ( is => 'rw', isa => 'USState' );
	 has 'zip_code' => ( is => 'rw', isa => 'USZipCode' );

	 package Company;
	 use Moose;
	 use Moose::Util::TypeConstraints;

	 has 'name' => ( is => 'rw', isa => 'Str', required => 1 );
	 has 'address'	 => ( is => 'rw', isa => 'Address' );
	 has 'employees' => (
	     is      => 'rw',
	     isa     => 'ArrayRef[Employee]',
	     default => sub { [] },
	 );

	 sub BUILD {
	     my ( $self, $params ) = @_;
	     foreach my $employee ( @{ $self->employees } ) {
		 $employee->employer($self);
	     }
	 }

	 after 'employees' => sub {
	     my ( $self, $employees ) = @_;
	     return unless $employees;
	     foreach my $employee ( @$employees ) {
		 $employee->employer($self);
	     }
	 };

	 package Person;
	 use Moose;

	 has 'first_name' => ( is => 'rw', isa => 'Str', required => 1 );
	 has 'last_name'  => ( is => 'rw', isa => 'Str', required => 1 );
	 has 'middle_initial' => (
	     is        => 'rw', isa => 'Str',
	     predicate => 'has_middle_initial'
	 );
	 has 'address' => ( is => 'rw', isa => 'Address' );

	 sub full_name {
	     my $self = shift;
	     return $self->first_name
		 . (
		 $self->has_middle_initial
		 ? ' ' . $self->middle_initial . '. '
		 : ' '
		 ) . $self->last_name;
	 }

	 package Employee;
	 use Moose;

	 extends 'Person';

	 has 'title'	=> ( is => 'rw', isa => 'Str',	   required => 1 );
	 has 'employer' => ( is => 'rw', isa => 'Company', weak_ref => 1 );

	 override 'full_name' => sub {
	     my $self = shift;
	     super() . ', ' . $self->title;
	 };

DESCRIPTION

       This recipe introduces the "subtype" sugar function from Moose::Util::TypeConstraints. The "subtype" function lets you declaratively create
       type constraints without building an entire class.

       In the recipe we also make use of Locale::US and Regexp::Common to build constraints, showing how constraints can make use of existing CPAN
       tools for data validation.

       Finally, we introduce the "required" attribute option.

       In the "Address" class we define two subtypes. The first uses the Locale::US module to check the validity of a state. It accepts either a
       state abbreviation of full name.

       A state will be passed in as a string, so we make our "USState" type a subtype of Moose's builtin "Str" type. This is done using the "as"
       sugar. The actual constraint is defined using "where". This function accepts a single subroutine reference. That subroutine will be called
       with the value to be checked in $_(1). It is expected to return a true or false value indicating whether the value is valid for the type.

       We can now use the "USState" type just like Moose's builtin types:

	 has 'state'	=> ( is => 'rw', isa => 'USState' );

       When the "state" attribute is set, the value is checked against the "USState" constraint. If the value is not valid, an exception will be
       thrown.

       The next "subtype", "USZipCode", uses Regexp::Common. Regexp::Common includes a regex for validating US zip codes. We use this constraint
       for the "zip_code" attribute.

	 subtype 'USZipCode'
	     => as Value
	     => where {
		    /^$RE{zip}{US}{-extended => 'allow'}$/;
		};

       Using a subtype instead of requiring a class for each type greatly simplifies the code. We don't really need a class for these types, as
       they're just strings, but we do want to ensure that they're valid.

       The type constraints we created are reusable. Type constraints are stored by name in a global registry, which means that we can refer to
       them in other classes. Because the registry is global, we do recommend that you use some sort of namespacing in real applications, like
       "MyApp::Type::USState" (just as you would do with class names).

       These two subtypes allow us to define a simple "Address" class.

       Then we define our "Company" class, which has an address. As we saw in earlier recipes, Moose automatically creates a type constraint for
       each our classes, so we can use that for the "Company" class's "address" attribute:

	 has 'address'	 => ( is => 'rw', isa => 'Address' );

       A company also needs a name:

	 has 'name' => ( is => 'rw', isa => 'Str', required => 1 );

       This introduces a new attribute option, "required". If an attribute is required, then it must be passed to the class's constructor, or an
       exception will be thrown. It's important to understand that a "required" attribute can still be false or "undef", if its type constraint
       allows that.

       The next attribute, "employees", uses a parameterized type constraint:

	 has 'employees' => (
	     is      => 'rw',
	     isa     => 'ArrayRef[Employee]'
	     default => sub { [] },
	 );

       This constraint says that "employees" must be an array reference where each element of the array is an "Employee" object. It's worth noting
       that an empty array reference also satisfies this constraint, such as the value given as the default here.

       Parameterizable type constraints (or "container types"), such as "ArrayRef[`a]", can be made more specific with a type parameter. In fact,
       we can arbitrarily nest these types, producing something like "HashRef[ArrayRef[Int]]". However, you can also just use the type by itself,
       so "ArrayRef" is legal.(2)

       If you jump down to the definition of the "Employee" class, you will see that it has an "employer" attribute.

       When we set the "employees" for a "Company" we want to make sure that each of these employee objects refers back to the right "Company" in
       its "employer" attribute.

       To do that, we need to hook into object construction. Moose lets us do this by writing a "BUILD" method in our class. When your class
       defines a "BUILD" method, it will be called by the constructor immediately after object construction, but before the object is returned to
       the caller. Note that all "BUILD" methods in your class hierarchy will be called automatically; there is no need to (and you should not)
       call the superclass "BUILD" method.

       The "Company" class uses the "BUILD" method to ensure that each employee of a company has the proper "Company" object in its "employer"
       attribute:

	 sub BUILD {
	     my ( $self, $params ) = @_;
	     foreach my $employee ( @{ $self->employees } ) {
		 $employee->employer($self);
	     }
	 }

       The "BUILD" method is executed after type constraints are checked, so it is safe to assume that if "$self->employees" has a value, it will
       be an array reference, and that the elements of that array reference will be "Employee" objects.

       We also want to make sure that whenever the "employees" attribute for a "Company" is changed, we also update the "employer" for each
       employee.

       To do this we can use an "after" modifier:

	 after 'employees' => sub {
	     my ( $self, $employees ) = @_;
	     return unless $employees;
	     foreach my $employee ( @$employees ) {
		 $employee->employer($self);
	     }
	 };

       Again, as with the "BUILD" method, we know that the type constraint check has already happened, so we know that if $employees is defined it
       will contain an array reference of "Employee" objects.

       Note that "employees" is a read/write accessor, so we must return early if it's called as a reader.

       The Person class does not really demonstrate anything new. It has several "required" attributes. It also has a "predicate" method, which we
       first used in Moose::Cookbook::Basics::BinaryTree_AttributeFeatures.

       The only new feature in the "Employee" class is the "override" method modifier:

	 override 'full_name' => sub {
	     my $self = shift;
	     super() . ', ' . $self->title;
	 };

       This is just a sugary alternative to Perl's built in "SUPER::" feature. However, there is one difference. You cannot pass any arguments to
       "super". Instead, Moose simply passes the same parameters that were passed to the method.

       A more detailed example of usage can be found in t/recipes/moose_cookbook_basics_recipe4.t.

CONCLUSION

       This recipe was intentionally longer and more complex. It illustrates how Moose classes can be used together with type constraints, as well
       as the density of information that you can get out of a small amount of typing when using Moose.

       This recipe also introduced the "subtype" function, the "required" attribute, and the "override" method modifier.

       We will revisit type constraints in future recipes, and cover type coercion as well.

FOOTNOTES(1) The value being checked is also passed as the first argument to the "where" block, so it can be accessed as $_[0].

       (2) Note that "ArrayRef[]" will not work. Moose will not parse this as a container type, and instead you will have a new type named
	   "ArrayRef[]", which doesn't make any sense.

AUTHOR

       Moose is maintained by the Moose Cabal, along with the help of many contributors. See "CABAL" in Moose and "CONTRIBUTORS" in Moose for
       details.

COPYRIGHT AND LICENSE

       This software is copyright (c) 2012 by Infinity Interactive, Inc..

       This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

perl v5.14.2							    2012-06-28			    Moose::Cookbook::Basics::Company_Subtypes(3pm)

HP-UX