06-07-2012
Hi.
First question is does this absolutely need to be faster? How many times are you going to run it? If it's a single-shot, then perhaps just letting it run to completion is the best solution.
Secondly, the first file looks like it is a sequence. If so, then perhaps a regular expression could be used rather than a volume of 5 GB of memory. If not a regular expression, then possibly a code that determines if a piece of the line matches the base + the sequence -- an arithmetic operation, which might be faster than string comparisons (for example, some mainframes & supercomputers had multiple units for arithmetic).
Thirdly, if you have sufficient IO throughput as well as multiple cores, then one could write a program that internally divides the main file into pieces by keeping track of start-stop line positions, and then uses processes or threads to process one segment each. A less elegant solution along the same lines would be to spilt the files into n sections, each in a file, and then run n instances of grep.
Fourthly, splitting the task up among a network of machines that might share the disk; as well as the easiest (but not cheapest) solution: get a faster box.
Best wishes ... cheers, drl
8 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi
I have the following at the end of a service shutdown script used in part of an active-passive failover setup:
###
# Shutdown all primary Network Interfaces
# associated with failover
###
# get interface names based on IP's
# and shut them down to simulate loss of
# heartbeatd
... (1 Reply)
Discussion started by: mikie
1 Replies
2. UNIX for Advanced & Expert Users
Hi
I am new to Unix/Linux
I know commands and shell scripts which are useful for my project.
But i need to know the basics and commands and shell scripts in detail and easy guide.
Please refer a book.
Thanks
Haripatn (6 Replies)
Discussion started by: haripatn
6 Replies
3. UNIX for Dummies Questions & Answers
I am looking for a file with 'MCR0000000716214' in it. I tried the following command:
grep MCR0000000716214 *
The problem is that the folder I am searching in has over 87000 files and I am getting the following:
bash: /bin/grep: Arg list too long
Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies
4. Shell Programming and Scripting
How to find a particular line in a file without using grep? (3 Replies)
Discussion started by: proactiveaditya
3 Replies
5. Shell Programming and Scripting
Hello,
I am processing a text file which contains only words with few combination of characters (it is a dictionary file).
example:
havana
have
haven
haven't
havilland
havoc
Is there a way to exclude only 1 to 8 character long words which not include space or special characters : '-`~.. so... (5 Replies)
Discussion started by: alekkz
5 Replies
6. UNIX for Dummies Questions & Answers
Hi,
We used to use the below commands often.
ps -ef|grep bc
ps -ef|grep abc|grep -v grep
Both fairly returns the same result.
For example, the process name is dynamic and we are having the process name in a variable, how we can apply the above trick.
For example "a" is the... (11 Replies)
Discussion started by: pandeesh
11 Replies
7. Shell Programming and Scripting
Hi All,
We have few scripts where we are using grep -w option to do exact matching of the pattern. This works fine on most of our servers.
But I have encounter a very old HP-UX System(HP-UX B.11.00) where grep -w option is not available.
This is causing my scripts to fail. I need to change... (7 Replies)
Discussion started by: veeresh_15
7 Replies
8. Shell Programming and Scripting
say I have a big list of something like:
sdg2000
weghre10
fewg53
gwg99
jwegwejjwej43
afg10293
I want to remove the numbers of any line that has letters + 1 to 4 numbers
output:
sdg
weghre
fewg
gwg
jwegwejjwej
afg10293 (7 Replies)
Discussion started by: Siwon
7 Replies