Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Speeding/Optimizing GREP search on CSV files Post 302450972 by matrixmadhan on Sunday 5th of September 2010 05:57:41 AM
Old 09-05-2010
Quote:
Originally Posted by bartus11
You can try all of these, and see which one is the fastest:
Code:
awk -F, '$15$16$17$18~"book34"' /home/data/books/*

Code:
 perl -F, -anle 'print $_ if ($F[14].$F[15].$F[16].$F[17])=~/book34/' /home/data/books/*

Am not really sure, how this would make the search faster than the usual grep? There is no bypass or pruning to make it faster.

I could not think of a better apart approach from pruning the "literal search in the record" once a match is found or not found in the record at the expected field.
For ex:
When searching for 10th field in a record with 20 fields, don't continue searching for the pattern space even after the 10th field just prune the search and continue with the next record.
I agree this is not a great way but it will definitely make the search a bit faster.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies

2. Shell Programming and Scripting

grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file. It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example: The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
Discussion started by: elinenbe
4 Replies

3. UNIX for Dummies Questions & Answers

Using grep to search within files

Hi, At my company, we have custom web sites that we create for different clients. The folder structure is something like: <project name>/html/web/custom/ The custom folder contains a file called "category.html" Every project has the same folder structure, and same file names but, the data... (2 Replies)
Discussion started by: miklo
2 Replies

4. UNIX for Dummies Questions & Answers

Reading compressed files during a grep search

All, The bottom line is that im reading a file, storing it as variables, recursively grep searching it, and then piping it to allow word counts as well. I am unsure on how to open any .zip .tar and .gzip, search for keywords and return results. Any help would be much appreciated! Thanks (6 Replies)
Discussion started by: ryan.lee
6 Replies

5. UNIX for Dummies Questions & Answers

pattern search using grep in specific range of files

Hi, I am trying to do the following: grep -l <pattern> <files to be searched for> In <files to be searched for> , all files should of some specific date like "Apr 8" not all files in current directory. I just to search within files Apr 8 files so that it won't search in entire list of... (2 Replies)
Discussion started by: apjneeraj
2 Replies

6. Shell Programming and Scripting

Perl search csv fileA where two strings exist on another csv fileB

Hi I have two csv files, with the following formats: FileA.log: Application, This occured blah Application, That occured blah Application, Also this AnotherLog, Bob did this AnotherLog, Dave did that FileB.log: Uk, London, Application, datetime, LaterDateTime, Today it had'nt... (8 Replies)
Discussion started by: PerlNewbRP
8 Replies

7. Shell Programming and Scripting

Speeding up search and replace in a for loop

Hello, I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,... (15 Replies)
Discussion started by: pbluescript
15 Replies

8. Shell Programming and Scripting

awk read column csv and search in other csv

hi, someone to know how can i read a specific column of csv file and search the value in other csv columns if exist the value in the second csv copy entire row with all field in a new csv file. i suppose that its possible using awk but i m not expertise thanks in advance (8 Replies)
Discussion started by: giankan
8 Replies

9. Shell Programming and Scripting

Optimizing search using grep

I have a huge log file close to 3GB in size. My task is to generate some reporting based on # of times something is being logged. I need to find the number of time StringA , StringB , StringC is being called separately. What I am doing right now is: grep "StringA" server.log | wc -l... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

10. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies
Mail::Box::Search::Grep(3pm)				User Contributed Perl Documentation			      Mail::Box::Search::Grep(3pm)

NAME
Mail::Box::Search::Grep - select messages within a mail box like grep does INHERITANCE
Mail::Box::Search::Grep is a Mail::Box::Search is a Mail::Reporter SYNOPSIS
use Mail::Box::Manager; my $mgr = Mail::Box::Manager->new; my $folder = $mgr->open('Inbox'); my $filter = Mail::Box::Search::Grep->new ( label => 'selected' , in => 'BODY', match => qr/abc?d*e/ ); my @msgs = $filter->search($folder); my $filter = Mail::Box::Search::Grep->new ( field => 'To' , match => $my_email ); if($filter->search($message)) {...} DESCRIPTION
Try to find some text strings in the header and footer of messages. Various ways to limit the search to certain header fields, the whole header, only the body, the whole message, but even binary multiparts, are provided for. The name grep is derived from the UNIX tool grep, which means: "Get Regular Expression and Print". Although you can search using regular expressions (the Perl way of them), you do not have to print those as result. METHODS
Constructors Mail::Box::Search::Grep->new(OPTIONS) Create a UNIX-grep like search filter. -Option --Defined in --Default binaries Mail::Box::Search <false> decode Mail::Box::Search <true> delayed Mail::Box::Search <true> deleted Mail::Box::Search <false> deliver undef field undef in Mail::Box::Search <$field ? 'HEAD' : C<'BODY'>> label Mail::Box::Search undef limit Mail::Box::Search 0 log Mail::Reporter 'WARNINGS' logical Mail::Box::Search 'REPLACE' match <required> multiparts Mail::Box::Search <true> trace Mail::Reporter 'WARNINGS' binaries => BOOLEAN decode => BOOLEAN delayed => BOOLEAN deleted => BOOLEAN deliver => undef|CODE|'DELETE'|LABEL|'PRINT'|REF-ARRAY Store the details about where the match was found. The search may take much longer when this feature is enabled. When an ARRAY is specified it will contain a list of references to hashes. Each hash contains the information of one match. A match in a header line will result in a line with fields "message", "part", and "field", where the field is a Mail::Message::Field object. When the match is in the body the hash will contain a "message", "part", "linenr", and "line". In case of a CODE reference, that routine is called for each match. The first argument is this search object and the second a reference to same hash as would be stored in the array. The "PRINT" will call printMatchedHead() or printMatchedBody() when any matching header resp body line was found. The output is minimized by not reprinting the message info on multiple matches in the same message. "DELETE" will flag the message to be deleted in case of a match. When a multipart's part is matched, the whole message will be flagged for deletion. field => undef|STRING|REGEX|CODE Not valid in combination with "in" set to "BODY". The STRING is one full field name (case-insensitive). Use a REGEX to select more than one header line to be scanned. CODE is a routine which is called for each field in the header. The CODE is called with the header as first, and the field as second argument. If the CODE returns true, the message is selected. in => 'HEAD'|'BODY'|'MESSAGE' label => STRING limit => NUMBER log => LEVEL logical => 'REPLACE'|'AND'|'OR'|'NOT'|'AND NOT'|'OR NOT' match => STRING|REGEX|CODE The pattern to be search for can be a REGular EXpression, or a STRING. In both cases, the match succeeds if it is found anywhere within the selected fields. With a CODE reference, that function will be called each field or body-line. When the result is true, the details are delivered. The call formats are $code->($head, $field); # for HEAD searches $code->($body, $linenr, $line); # for BODY searches The $head resp $body are one message's head resp. body object. The $field is a header line which matches. The $line and $linenr tell the matching line in the body. Be warned that when you search in "MESSAGE" the code must accept both formats. multiparts => BOOLEAN trace => LEVEL Searching $obj->inBody(PART, BODY) See "Searching" in Mail::Box::Search $obj->inHead(PART, HEAD) See "Searching" in Mail::Box::Search $obj->search(FOLDER|THREAD|MESSAGE|ARRAY-OF-MESSAGES) See "Searching" in Mail::Box::Search $obj->searchPart(PART) See "Searching" in Mail::Box::Search The Results $obj->printMatch([FILEHANDLE], MATCH) $obj->printMatchedBody(FILEHANDLE, MATCH) $obj->printMatchedHead(FILEHANDLE, MATCH) Error handling $obj->AUTOLOAD() See "Error handling" in Mail::Reporter $obj->addReport(OBJECT) See "Error handling" in Mail::Reporter $obj->defaultTrace([LEVEL]|[LOGLEVEL, TRACELEVEL]|[LEVEL, CALLBACK]) Mail::Box::Search::Grep->defaultTrace([LEVEL]|[LOGLEVEL, TRACELEVEL]|[LEVEL, CALLBACK]) See "Error handling" in Mail::Reporter $obj->errors() See "Error handling" in Mail::Reporter $obj->log([LEVEL [,STRINGS]]) Mail::Box::Search::Grep->log([LEVEL [,STRINGS]]) See "Error handling" in Mail::Reporter $obj->logPriority(LEVEL) Mail::Box::Search::Grep->logPriority(LEVEL) See "Error handling" in Mail::Reporter $obj->logSettings() See "Error handling" in Mail::Reporter $obj->notImplemented() See "Error handling" in Mail::Reporter $obj->report([LEVEL]) See "Error handling" in Mail::Reporter $obj->reportAll([LEVEL]) See "Error handling" in Mail::Reporter $obj->trace([LEVEL]) See "Error handling" in Mail::Reporter $obj->warnings() See "Error handling" in Mail::Reporter Cleanup $obj->DESTROY() See "Cleanup" in Mail::Reporter $obj->inGlobalDestruction() See "Cleanup" in Mail::Reporter DIAGNOSTICS
Error: Package $package does not implement $method. Fatal error: the specific package (or one of its superclasses) does not implement this method where it should. This message means that some other related classes do implement this method however the class at hand does not. Probably you should investigate this and probably inform the author of the package. SEE ALSO
This module is part of Mail-Box distribution version 2.105, built on May 07, 2012. Website: http://perl.overmeer.net/mailbox/ LICENSE
Copyrights 2001-2012 by [Mark Overmeer]. For other contributors see ChangeLog. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html perl v5.14.2 2012-05-07 Mail::Box::Search::Grep(3pm)
All times are GMT -4. The time now is 10:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy