Thanks, drl. I can see some improvement using your command as I can see that all words in "obama|primaries|water" are searched at the same time. This can surely help reduce the amount of iterations needed.
I can also improve upon your script a little bit by doing this:
I also found another way using GNU parallel. I have tried several variations, but the variation which I am interested in is this (given here GNU Parallel):
Code:
parallel --pipepart --block 100M -a big_file.txt --fifo cat words_file.txt | parallel --pipe -L1000 --round-robin grep -f - {}
But the above code does not do what I really want, so I tried modifying to this:
Code:
parallel --pipepart --block 100M -a big_file.txt --fifo cat words_file.txt | parallel --pipe -L1000 --round-robin grep -fowP "(?:\w+\s){0,2}$name(?:\s\w+){0,2}" - {}
There is still some issues about how to include those word patterns from words_file.txt into this command.
Hi,
I'm developing a data processing pipeline with multiple stages, with data being moved between the stages using shared memory segments. The size of the data is typically of the order of hundreds of megabytes, and there are typically a few tens of main shared memory segments each of size... (2 Replies)
Refer from title:
How can i get memory used or anything that can show memory from sar file
example on solaris:-
we can use sar with option to show memory used at time that sar crontab run.
on HP-UX, it not has option to see memory used. But i think it may be have some parameter or some... (1 Reply)
I am looking for a file with 'MCR0000000716214' in it. I tried the following command:
grep MCR0000000716214 *
The problem is that the folder I am searching in has over 87000 files and I am getting the following:
bash: /bin/grep: Arg list too long
Is there any command I can use that can... (6 Replies)
We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Hi, my problem:
gzgrep "^.\{376\}8301685001120" filename /dev/null
###ERROR ###
grep: RE error 11: Range endpoint too large.
Whats my mistake?
Is the position 376 to large for grep???
Thanks. (2 Replies)
All,
I have a problem with grep/fgrep/egrep. Basically I am building a 200 times 200 correlation matrix. The entries of this matrix need to be retrieved from another very large matrix (~100G). I tried to use the grep/fgrep/egrep to locate each entry and put them into one file. It looks very... (1 Reply)
I was running a program and it stopped and showed "Out of Memory!". at that time, the RAM used by this process is around 4G and the free memory size of the machine is around 30G. Does anybody know what maybe the reason? this program is written with Perl. the OS of the machine is Solaris U8. And I... (1 Reply)
Background
-------------
The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files.
File-1
------
Contains 50,000 rows with 2 fields in each row, separated by pipe.
Row structure is like Object_Id|Object_Name, as following:
111|XXX
222|YYY
333|ZZZ
... (6 Replies)
I have one big file of size 9GB (big_file.txt). This big file has sentences and paragraphs like any usual English document. I have another file consisting of replacement strings for sed to use. The file name is replace.sed and each entry in one line looks like this:
s/\<shout\>/shout/g
s/\<b is... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies
LEARN ABOUT DEBIAN
parallel-slurp
PARALLEL-SLURP(1)PARALLEL-SLURP(1)NAME
parallel-slurp - copy files from listed hosts
SYNOPSIS
parallel-slurp [OPTIONS] -h hosts.txt -L destdir remote local
DESCRIPTION
pssh provides a number of commands for executing against a group of computers, using SSH. It's most useful for operating on clusters of
homogenously-configured hosts.
parallel-slurp gathers specified files from hosts you listed.
OPTIONS -r --recursive
recusively copy directories (OPTIONAL)
-L --localdir
output directory for remote file copies
-h --hosts
hosts file (each line "host[:port] [user]")
-l --user
username (OPTIONAL)
-p --par
max number of parallel threads (OPTIONAL)
-o --outdir
output directory for stdout files (OPTIONAL)
-e --errdir
output directory for stderr files (OPTIONAL)
-t --timeout
timeout (secs) (-1 = no timeout) per host (OPTIONAL)
-O --options
SSH options (OPTIONAL)
-v --verbose
turn on warning and diagnostic messages (OPTIONAL)
EXAMPLE
An example to copy /home/irb2/foo.txt from each host. Files gathered will be stored in /tmp/outdir/hostname/foo.txt.
# prallel-slurp -h hosts.txt -L /tmp/outdir -l irb2
/home/irb2/foo.txt foo.txt
ENVIRONMENT
All four programs take similar sets of options. All of these options can be set using the following environment variables:
o PSSH_HOSTS
o PSSH_USER
o PSSH_PAR
o PSSH_OUTDIR
o PSSH_VERBOSE
o PSSH_OPTIONS
SEE ALSO parallel-ssh(1), parallel-scp(1), parallel-nuke(1), parallel-rsync(1), ssh(1)AUTHOR
Brent N. Chun <bnc@theether.org>
COPYING
Copyright: 2003, 2004, 2005, 2006, 2007 Brent N. Chun
NOTES
1. bnc@theether.org
mailto:bnc@theether.org
03/30/2009 PARALLEL-SLURP(1)