Sponsored Content
Top Forums Shell Programming and Scripting Please suggest alternative to grep Post 302652569 by drl on Thursday 7th of June 2012 09:56:44 AM
Old 06-07-2012
Hi.

First question is does this absolutely need to be faster? How many times are you going to run it? If it's a single-shot, then perhaps just letting it run to completion is the best solution.

Secondly, the first file looks like it is a sequence. If so, then perhaps a regular expression could be used rather than a volume of 5 GB of memory. If not a regular expression, then possibly a code that determines if a piece of the line matches the base + the sequence -- an arithmetic operation, which might be faster than string comparisons (for example, some mainframes & supercomputers had multiple units for arithmetic).

Thirdly, if you have sufficient IO throughput as well as multiple cores, then one could write a program that internally divides the main file into pieces by keeping track of start-stop line positions, and then uses processes or threads to process one segment each. A less elegant solution along the same lines would be to spilt the files into n sections, each in a file, and then run n instances of grep.

Fourthly, splitting the task up among a network of machines that might share the disk; as well as the easiest (but not cheapest) solution: get a faster box.

Best wishes ... cheers, drl
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Can you suggest a more efficient way for this?

Hi I have the following at the end of a service shutdown script used in part of an active-passive failover setup: ### # Shutdown all primary Network Interfaces # associated with failover ### # get interface names based on IP's # and shut them down to simulate loss of # heartbeatd ... (1 Reply)
Discussion started by: mikie
1 Replies

2. UNIX for Advanced & Expert Users

suggest book

Hi I am new to Unix/Linux I know commands and shell scripts which are useful for my project. But i need to know the basics and commands and shell scripts in detail and easy guide. Please refer a book. Thanks Haripatn (6 Replies)
Discussion started by: haripatn
6 Replies

3. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

4. Shell Programming and Scripting

Alternative to grep

How to find a particular line in a file without using grep? (3 Replies)
Discussion started by: proactiveaditya
3 Replies

5. Shell Programming and Scripting

Need best grep option or alternative

Hello, I am processing a text file which contains only words with few combination of characters (it is a dictionary file). example: havana have haven haven't havilland havoc Is there a way to exclude only 1 to 8 character long words which not include space or special characters : '-`~.. so... (5 Replies)
Discussion started by: alekkz
5 Replies

6. UNIX for Dummies Questions & Answers

alternative to the grep trick

Hi, We used to use the below commands often. ps -ef|grep bc ps -ef|grep abc|grep -v grep Both fairly returns the same result. For example, the process name is dynamic and we are having the process name in a variable, how we can apply the above trick. For example "a" is the... (11 Replies)
Discussion started by: pandeesh
11 Replies

7. Shell Programming and Scripting

Alternative command to grep -w option

Hi All, We have few scripts where we are using grep -w option to do exact matching of the pattern. This works fine on most of our servers. But I have encounter a very old HP-UX System(HP-UX B.11.00) where grep -w option is not available. This is causing my scripts to fail. I need to change... (7 Replies)
Discussion started by: veeresh_15
7 Replies

8. Shell Programming and Scripting

Help with grep, or alternative

say I have a big list of something like: sdg2000 weghre10 fewg53 gwg99 jwegwejjwej43 afg10293 I want to remove the numbers of any line that has letters + 1 to 4 numbers output: sdg weghre fewg gwg jwegwejjwej afg10293 (7 Replies)
Discussion started by: Siwon
7 Replies
grep(1) 						      General Commands Manual							   grep(1)

Name
       grep, egrep, fgrep - search file for regular expression

Syntax
       grep [option...] expression [file...]

       egrep [option...] [expression] [file...]

       fgrep [option...] [strings] [file]

Description
       Commands  of  the family search the input files (standard input default) for lines matching a pattern.  Normally, each line found is copied
       to the standard output.

       The command patterns are limited regular expressions in the style of which uses a compact nondeterministic algorithm.  The command patterns
       are  full  regular  expressions.  The command uses a fast deterministic algorithm that sometimes needs exponential space.  The command pat-
       terns are fixed strings.  The command is fast and compact.

       In all cases the file name is shown if there is more than one input file.  Take care when using the characters $ * [ ^ | ( ) and   in  the
       expression because they are also meaningful to the Shell.  It is safest to enclose the entire expression argument in single quotes ' '.

       The command searches for lines that contain one of the (new line-separated) strings.

       The command accepts extended regular expressions.  In the following description `character' excludes new line:

	      A  followed by a single character other than new line matches that character.

	      The character ^ matches the beginning of a line.

	      The character $ matches the end of a line.

	      A .  (dot) matches any character.

	      A single character not otherwise endowed with special meaning matches that character.

	      A  string  enclosed in brackets [] matches any single character from the string.	Ranges of ASCII character codes may be abbreviated
	      as in `a-z0-9'.  A ] may occur only as the first character of the string.  A literal - must be placed where it can't be mistaken	as
	      a range indicator.

	      A  regular  expression  followed	by  an	* (asterisk) matches a sequence of 0 or more matches of the regular expression.  A regular
	      expression followed by a + (plus) matches a sequence of 1 or more matches of the regular expression.  A regular expression  followed
	      by a ? (question mark) matches a sequence of 0 or 1 matches of the regular expression.

	      Two regular expressions concatenated match a match of the first followed by a match of the second.

	      Two regular expressions separated by | or new line match either a match for the first or a match for the second.

	      A regular expression enclosed in parentheses matches a match for the regular expression.

       The  order  of  precedence  of  operators at the same parenthesis level is the following:  [], then *+?, then concatenation, then | and new
       line.

Options
       -b	   Precedes each output line with its block number.  This is sometimes useful in locating disk block numbers by context.

       -c	   Produces count of matching lines only.

       -e expression
		   Uses next argument as expression that begins with a minus (-).

       -f file	   Takes regular expression (egrep) or string list (fgrep) from file.

       -i	   Considers upper and lowercase letter identical in making comparisons and only).

       -l	   Lists files with matching lines only once, separated by a new line.

       -n	   Precedes each matching line with its line number.

       -s	   Silent mode and nothing is printed (except error messages).	This is useful for checking the error status (see DIAGNOSTICS).

       -v	   Displays all lines that do not match specified expression.

       -w	   Searches for an expression as for a word (as if surrounded by `<' and `>').  For further information, see only.

       -x	   Prints exact lines matched in their entirety only).

Restrictions
       Lines are limited to 256 characters; longer lines are truncated.

Diagnostics
       Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files.

See Also
       ex(1), sed(1), sh(1)

																	   grep(1)
All times are GMT -4. The time now is 07:42 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy