Sponsored Content
Top Forums Shell Programming and Scripting Please suggest alternative to grep Post 302652569 by drl on Thursday 7th of June 2012 09:56:44 AM
Old 06-07-2012
Hi.

First question is does this absolutely need to be faster? How many times are you going to run it? If it's a single-shot, then perhaps just letting it run to completion is the best solution.

Secondly, the first file looks like it is a sequence. If so, then perhaps a regular expression could be used rather than a volume of 5 GB of memory. If not a regular expression, then possibly a code that determines if a piece of the line matches the base + the sequence -- an arithmetic operation, which might be faster than string comparisons (for example, some mainframes & supercomputers had multiple units for arithmetic).

Thirdly, if you have sufficient IO throughput as well as multiple cores, then one could write a program that internally divides the main file into pieces by keeping track of start-stop line positions, and then uses processes or threads to process one segment each. A less elegant solution along the same lines would be to spilt the files into n sections, each in a file, and then run n instances of grep.

Fourthly, splitting the task up among a network of machines that might share the disk; as well as the easiest (but not cheapest) solution: get a faster box.

Best wishes ... cheers, drl
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Can you suggest a more efficient way for this?

Hi I have the following at the end of a service shutdown script used in part of an active-passive failover setup: ### # Shutdown all primary Network Interfaces # associated with failover ### # get interface names based on IP's # and shut them down to simulate loss of # heartbeatd ... (1 Reply)
Discussion started by: mikie
1 Replies

2. UNIX for Advanced & Expert Users

suggest book

Hi I am new to Unix/Linux I know commands and shell scripts which are useful for my project. But i need to know the basics and commands and shell scripts in detail and easy guide. Please refer a book. Thanks Haripatn (6 Replies)
Discussion started by: haripatn
6 Replies

3. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

4. Shell Programming and Scripting

Alternative to grep

How to find a particular line in a file without using grep? (3 Replies)
Discussion started by: proactiveaditya
3 Replies

5. Shell Programming and Scripting

Need best grep option or alternative

Hello, I am processing a text file which contains only words with few combination of characters (it is a dictionary file). example: havana have haven haven't havilland havoc Is there a way to exclude only 1 to 8 character long words which not include space or special characters : '-`~.. so... (5 Replies)
Discussion started by: alekkz
5 Replies

6. UNIX for Dummies Questions & Answers

alternative to the grep trick

Hi, We used to use the below commands often. ps -ef|grep bc ps -ef|grep abc|grep -v grep Both fairly returns the same result. For example, the process name is dynamic and we are having the process name in a variable, how we can apply the above trick. For example "a" is the... (11 Replies)
Discussion started by: pandeesh
11 Replies

7. Shell Programming and Scripting

Alternative command to grep -w option

Hi All, We have few scripts where we are using grep -w option to do exact matching of the pattern. This works fine on most of our servers. But I have encounter a very old HP-UX System(HP-UX B.11.00) where grep -w option is not available. This is causing my scripts to fail. I need to change... (7 Replies)
Discussion started by: veeresh_15
7 Replies

8. Shell Programming and Scripting

Help with grep, or alternative

say I have a big list of something like: sdg2000 weghre10 fewg53 gwg99 jwegwejjwej43 afg10293 I want to remove the numbers of any line that has letters + 1 to 4 numbers output: sdg weghre fewg gwg jwegwejjwej afg10293 (7 Replies)
Discussion started by: Siwon
7 Replies
GREP(1) 						      General Commands Manual							   GREP(1)

NAME
grep, egrep, fgrep - search a file for a pattern SYNOPSIS
grep [ option ] ... expression [ file ] ... egrep [ option ] ... [ expression ] [ file ] ... fgrep [ option ] ... [ strings ] [ file ] DESCRIPTION
Commands of the grep family search the input files (standard input default) for lines matching a pattern. Normally, each line found is copied to the standard output; unless the -h flag is used, the file name is shown if there is more than one input file. Grep patterns are limited regular expressions in the style of ed(1); it uses a compact nondeterministic algorithm. Egrep patterns are full regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. Fgrep patterns are fixed strings; it is fast and compact. The following options are recognized. -v All lines but those matching are printed. -c Only a count of matching lines is printed. -l The names of files with matching lines are listed (once) separated by newlines. -n Each line is preceded by its line number in the file. -b Each line is preceded by the block number on which it was found. This is sometimes useful in locating disk block numbers by con- text. -s No output is produced, only status. -h Do not print filename headers with output lines. -y Lower case letters in the pattern will also match upper case letters in the input (grep only). -e expression Same as a simple expression argument, but useful when the expression begins with a -. -f file The regular expression (egrep) or string list (fgrep) is taken from the file. -x (Exact) only lines matched in their entirety are printed (fgrep only). Care should be taken when using the characters $ * [ ^ | ? ' " ( ) and in the expression as they are also meaningful to the Shell. It is safest to enclose the entire expression argument in single quotes ' '. Fgrep searches for lines that contain one of the (newline-separated) strings. Egrep accepts extended regular expressions. In the following description `character' excludes newline: A followed by a single character matches that character. The character ^ ($) matches the beginning (end) of a line. A . matches any character. A single character not otherwise endowed with special meaning matches that character. A string enclosed in brackets [] matches any single character from the string. Ranges of ASCII character codes may be abbreviated as in `a-z0-9'. A ] may occur only as the first character of the string. A literal - must be placed where it can't be mistaken as a range indicator. A regular expression followed by * (+, ?) matches a sequence of 0 or more (1 or more, 0 or 1) matches of the regular expression. Two regular expressions concatenated match a match of the first followed by a match of the second. Two regular expressions separated by | or newline match either a match for the first or a match for the second. A regular expression enclosed in parentheses matches a match for the regular expression. The order of precedence of operators at the same parenthesis level is [] then *+? then concatenation then | and newline. SEE ALSO
ed(1), sed(1), sh(1) DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files. BUGS
Ideally there should be only one grep, but we don't know a single algorithm that spans a wide enough range of space-time tradeoffs. Lines are limited to 256 characters; longer lines are truncated. GREP(1)
All times are GMT -4. The time now is 12:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy