Speeding/Optimizing GREP search on CSV files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Speeding/Optimizing GREP search on CSV files
# 1  
Old 09-05-2010
Speeding/Optimizing GREP search on CSV files

Hi all,

I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min).

Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search.

Example:
1,22,11,44,55,7,45,55,55,555,55,66,6,66,book12horror,book34horror,book24horror,book45horror,22,44,55 ..etc.

grep -h "book34" /home/data/books/*

Also I wanted to know how to optimize grep on Solaris10, what option to use to speed up searches, what tweaks can I do on Solaris? Maybe awk can help me on this one?
# 2  
Old 09-05-2010
You can try all of these, and see which one is the fastest:
Code:
awk -F, '$15$16$17$18~"book34"' /home/data/books/*

Code:
 perl -F, -anle 'print $_ if ($F[14].$F[15].$F[16].$F[17])=~/book34/' /home/data/books/*

# 3  
Old 09-05-2010
Quote:
Originally Posted by bartus11
You can try all of these, and see which one is the fastest:
Code:
awk -F, '$15$16$17$18~"book34"' /home/data/books/*

Code:
 perl -F, -anle 'print $_ if ($F[14].$F[15].$F[16].$F[17])=~/book34/' /home/data/books/*

Am not really sure, how this would make the search faster than the usual grep? There is no bypass or pruning to make it faster.

I could not think of a better apart approach from pruning the "literal search in the record" once a match is found or not found in the record at the expected field.
For ex:
When searching for 10th field in a record with 20 fields, don't continue searching for the pattern space even after the 10th field just prune the search and continue with the next record.
I agree this is not a great way but it will definitely make the search a bit faster.
# 4  
Old 09-05-2010
Yes you are right, above example is not much faster (if any) :/

@matrixmadhan how tell perl or grep search to stop searching after 13 field and skip to next line, example ? (oneliner ofc)
# 5  
Old 09-05-2010
Quote:
Originally Posted by Whit3H0rse
Yes you are right above example is not much faster (if any) :/

@matrixmadhan how tell perl or grep search to stop searching after 13 field and skip to next line, example ? (oneliner ofc)
Frankly I don't know whether such tools or programs exist. Best thing is to modify grep code that suits your needs, it shouldn't be that difficult.
# 6  
Old 09-05-2010
Do you know any tutorial where can I learn to make custom search codes?
What I need is normal grep + modification so that grep skips to next line when only one match is found in line or when reached 13 field in csv.
# 7  
Old 09-05-2010
Try this
Cross Reference: /onnv/onnv-gate/usr/src/cmd/grep_xpg4/grep.c

---------- Post updated at 03:58 PM ---------- Previous update was at 03:54 PM ----------

Before you could start writing your own version, try with this option in grep

Code:
grep -iP

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies

2. Shell Programming and Scripting

Optimizing search using grep

I have a huge log file close to 3GB in size. My task is to generate some reporting based on # of times something is being logged. I need to find the number of time StringA , StringB , StringC is being called separately. What I am doing right now is: grep "StringA" server.log | wc -l... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

3. Shell Programming and Scripting

awk read column csv and search in other csv

hi, someone to know how can i read a specific column of csv file and search the value in other csv columns if exist the value in the second csv copy entire row with all field in a new csv file. i suppose that its possible using awk but i m not expertise thanks in advance (8 Replies)
Discussion started by: giankan
8 Replies

4. Shell Programming and Scripting

Speeding up search and replace in a for loop

Hello, I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,... (15 Replies)
Discussion started by: pbluescript
15 Replies

5. Shell Programming and Scripting

Perl search csv fileA where two strings exist on another csv fileB

Hi I have two csv files, with the following formats: FileA.log: Application, This occured blah Application, That occured blah Application, Also this AnotherLog, Bob did this AnotherLog, Dave did that FileB.log: Uk, London, Application, datetime, LaterDateTime, Today it had'nt... (8 Replies)
Discussion started by: PerlNewbRP
8 Replies

6. UNIX for Dummies Questions & Answers

pattern search using grep in specific range of files

Hi, I am trying to do the following: grep -l <pattern> <files to be searched for> In <files to be searched for> , all files should of some specific date like "Apr 8" not all files in current directory. I just to search within files Apr 8 files so that it won't search in entire list of... (2 Replies)
Discussion started by: apjneeraj
2 Replies

7. UNIX for Dummies Questions & Answers

Reading compressed files during a grep search

All, The bottom line is that im reading a file, storing it as variables, recursively grep searching it, and then piping it to allow word counts as well. I am unsure on how to open any .zip .tar and .gzip, search for keywords and return results. Any help would be much appreciated! Thanks (6 Replies)
Discussion started by: ryan.lee
6 Replies

8. UNIX for Dummies Questions & Answers

Using grep to search within files

Hi, At my company, we have custom web sites that we create for different clients. The folder structure is something like: <project name>/html/web/custom/ The custom folder contains a file called "category.html" Every project has the same folder structure, and same file names but, the data... (2 Replies)
Discussion started by: miklo
2 Replies

9. Shell Programming and Scripting

grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file. It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example: The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
Discussion started by: elinenbe
4 Replies

10. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies
Login or Register to Ask a Question