06-07-2012
Hi.
First question is does this absolutely need to be faster? How many times are you going to run it? If it's a single-shot, then perhaps just letting it run to completion is the best solution.
Secondly, the first file looks like it is a sequence. If so, then perhaps a regular expression could be used rather than a volume of 5 GB of memory. If not a regular expression, then possibly a code that determines if a piece of the line matches the base + the sequence -- an arithmetic operation, which might be faster than string comparisons (for example, some mainframes & supercomputers had multiple units for arithmetic).
Thirdly, if you have sufficient IO throughput as well as multiple cores, then one could write a program that internally divides the main file into pieces by keeping track of start-stop line positions, and then uses processes or threads to process one segment each. A less elegant solution along the same lines would be to spilt the files into n sections, each in a file, and then run n instances of grep.
Fourthly, splitting the task up among a network of machines that might share the disk; as well as the easiest (but not cheapest) solution: get a faster box.
Best wishes ... cheers, drl
8 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi
I have the following at the end of a service shutdown script used in part of an active-passive failover setup:
###
# Shutdown all primary Network Interfaces
# associated with failover
###
# get interface names based on IP's
# and shut them down to simulate loss of
# heartbeatd
... (1 Reply)
Discussion started by: mikie
1 Replies
2. UNIX for Advanced & Expert Users
Hi
I am new to Unix/Linux
I know commands and shell scripts which are useful for my project.
But i need to know the basics and commands and shell scripts in detail and easy guide.
Please refer a book.
Thanks
Haripatn (6 Replies)
Discussion started by: haripatn
6 Replies
3. UNIX for Dummies Questions & Answers
I am looking for a file with 'MCR0000000716214' in it. I tried the following command:
grep MCR0000000716214 *
The problem is that the folder I am searching in has over 87000 files and I am getting the following:
bash: /bin/grep: Arg list too long
Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies
4. Shell Programming and Scripting
How to find a particular line in a file without using grep? (3 Replies)
Discussion started by: proactiveaditya
3 Replies
5. Shell Programming and Scripting
Hello,
I am processing a text file which contains only words with few combination of characters (it is a dictionary file).
example:
havana
have
haven
haven't
havilland
havoc
Is there a way to exclude only 1 to 8 character long words which not include space or special characters : '-`~.. so... (5 Replies)
Discussion started by: alekkz
5 Replies
6. UNIX for Dummies Questions & Answers
Hi,
We used to use the below commands often.
ps -ef|grep bc
ps -ef|grep abc|grep -v grep
Both fairly returns the same result.
For example, the process name is dynamic and we are having the process name in a variable, how we can apply the above trick.
For example "a" is the... (11 Replies)
Discussion started by: pandeesh
11 Replies
7. Shell Programming and Scripting
Hi All,
We have few scripts where we are using grep -w option to do exact matching of the pattern. This works fine on most of our servers.
But I have encounter a very old HP-UX System(HP-UX B.11.00) where grep -w option is not available.
This is causing my scripts to fail. I need to change... (7 Replies)
Discussion started by: veeresh_15
7 Replies
8. Shell Programming and Scripting
say I have a big list of something like:
sdg2000
weghre10
fewg53
gwg99
jwegwejjwej43
afg10293
I want to remove the numbers of any line that has letters + 1 to 4 numbers
output:
sdg
weghre
fewg
gwg
jwegwejjwej
afg10293 (7 Replies)
Discussion started by: Siwon
7 Replies
GREP(1) General Commands Manual GREP(1)
NAME
grep, egrep, fgrep - search a file for a pattern
SYNOPSIS
grep [ option ] ... expression [ file ] ...
egrep [ option ] ... [ expression ] [ file ] ...
fgrep [ option ] ... [ strings ] [ file ]
DESCRIPTION
Commands of the grep family search the input files (standard input default) for lines matching a pattern. Normally, each line found is
copied to the standard output; unless the -h flag is used, the file name is shown if there is more than one input file.
Grep patterns are limited regular expressions in the style of ed(1); it uses a compact nondeterministic algorithm. Egrep patterns are full
regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. Fgrep patterns are fixed strings; it
is fast and compact.
The following options are recognized.
-v All lines but those matching are printed.
-c Only a count of matching lines is printed.
-l The names of files with matching lines are listed (once) separated by newlines.
-n Each line is preceded by its line number in the file.
-b Each line is preceded by the block number on which it was found. This is sometimes useful in locating disk block numbers by con-
text.
-s No output is produced, only status.
-h Do not print filename headers with output lines.
-y Lower case letters in the pattern will also match upper case letters in the input (grep only).
-e expression
Same as a simple expression argument, but useful when the expression begins with a -.
-f file
The regular expression (egrep) or string list (fgrep) is taken from the file.
-x (Exact) only lines matched in their entirety are printed (fgrep only).
Care should be taken when using the characters $ * [ ^ | ? ' " ( ) and in the expression as they are also meaningful to the Shell. It is
safest to enclose the entire expression argument in single quotes ' '.
Fgrep searches for lines that contain one of the (newline-separated) strings.
Egrep accepts extended regular expressions. In the following description `character' excludes newline:
A followed by a single character matches that character.
The character ^ ($) matches the beginning (end) of a line.
A . matches any character.
A single character not otherwise endowed with special meaning matches that character.
A string enclosed in brackets [] matches any single character from the string. Ranges of ASCII character codes may be abbreviated
as in `a-z0-9'. A ] may occur only as the first character of the string. A literal - must be placed where it can't be mistaken as
a range indicator.
A regular expression followed by * (+, ?) matches a sequence of 0 or more (1 or more, 0 or 1) matches of the regular expression.
Two regular expressions concatenated match a match of the first followed by a match of the second.
Two regular expressions separated by | or newline match either a match for the first or a match for the second.
A regular expression enclosed in parentheses matches a match for the regular expression.
The order of precedence of operators at the same parenthesis level is [] then *+? then concatenation then | and newline.
SEE ALSO
ed(1), sed(1), sh(1)
DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files.
BUGS
Ideally there should be only one grep, but we don't know a single algorithm that spans a wide enough range of space-time tradeoffs.
Lines are limited to 256 characters; longer lines are truncated.
GREP(1)