Sponsored Content
Top Forums Shell Programming and Scripting In memory grep of a large file. Post 302960523 by shoaibjameel123 on Monday 16th of November 2015 01:10:07 PM
Old 11-16-2015
Yes, I used split command to split the big_file.txt to smaller chunks of 350MB each. I then tried your command and others too on one of these smaller chunks. I noticed that your suggested case takes sometime to read and then run. I can post the memory and CPU statistics on these smaller chunks, if you want. In fact, I am thinking of using these smaller chunks so that I can run the many grep's in parallel instead of using GNU parallel.

But the reason why I pointed GNU parallel is because in that GNU parallel webapge which I pointed above has a statement below this command:

Code:
cat regexp.txt | parallel --pipe -L1000 --round-robin grep -f - bigfile

Code:
If a line matches multiple regexps, the line may be duplicated. The  command will start one grep per CPU and read bigfile one time per CPU,  but as that is done in parallel, all reads except the first will be  cached in RAM.

 

10 More Discussions You Might Find Interesting

1. Linux

shmat() Failure While Using a Large Amount of Shared Memory

Hi, I'm developing a data processing pipeline with multiple stages, with data being moved between the stages using shared memory segments. The size of the data is typically of the order of hundreds of megabytes, and there are typically a few tens of main shared memory segments each of size... (2 Replies)
Discussion started by: theicarusagenda
2 Replies

2. HP-UX

How can I get memory usage or anything that show memory used from sar file?

Refer from title: How can i get memory used or anything that can show memory from sar file example on solaris:- we can use sar with option to show memory used at time that sar crontab run. on HP-UX, it not has option to see memory used. But i think it may be have some parameter or some... (1 Reply)
Discussion started by: panithat
1 Replies

3. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

4. AIX

amount of memory allocated to large page

We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Discussion started by: daveisme
1 Replies

5. Shell Programming and Scripting

grep error: range endpoint too large

Hi, my problem: gzgrep "^.\{376\}8301685001120" filename /dev/null ###ERROR ### grep: RE error 11: Range endpoint too large. Whats my mistake? Is the position 376 to large for grep??? Thanks. (2 Replies)
Discussion started by: Timmää
2 Replies

6. Shell Programming and Scripting

grep/fgrep/egrep for a very large matrix

All, I have a problem with grep/fgrep/egrep. Basically I am building a 200 times 200 correlation matrix. The entries of this matrix need to be retrieved from another very large matrix (~100G). I tried to use the grep/fgrep/egrep to locate each entry and put them into one file. It looks very... (1 Reply)
Discussion started by: realwindfly
1 Replies

7. UNIX for Advanced & Expert Users

Out of Memory error when free memory size is large

I was running a program and it stopped and showed "Out of Memory!". at that time, the RAM used by this process is around 4G and the free memory size of the machine is around 30G. Does anybody know what maybe the reason? this program is written with Perl. the OS of the machine is Solaris U8. And I... (1 Reply)
Discussion started by: lilili07
1 Replies

8. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

9. UNIX for Dummies Questions & Answers

virtual memory and diff'ing very large files

(0 Replies)
Discussion started by: uiop44
0 Replies

10. Shell Programming and Scripting

Large search replace using sed results in memory problem.

I have one big file of size 9GB (big_file.txt). This big file has sentences and paragraphs like any usual English document. I have another file consisting of replacement strings for sed to use. The file name is replace.sed and each entry in one line looks like this: s/\<shout\>/shout/g s/\<b is... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies
PREZIP-BIN(1)						 Aspell Abbreviated User's Manual					     PREZIP-BIN(1)

NAME
prezip-bin - prefix zip delta word list compressor/decompressor SYNOPSIS
prezip-bin [ -V | -d | -z ] DESCRIPTION
prezip-bin compresses/decompresses sorted word lists from standard input to standard output. Prezip-bin is similar to word-list-compress(1) but it allows a larger character set of {0x00...0x09, 0x0B, 0x0C, 0x0E...0xFF} and multi-words larger than 255 characters in length. It can also decompress word-list-compress(1) compatible files. COMMANDS
Prezip-bin accepts only one of these commands. -V Display prezip-bin version number to standard output. -d Read a compressed word list from standard input and decompress it to standard output. This can be a word-list-compress(1) or a prezip-bin compressed file. -z Read a binary word list from standard input and compress it to standard output. EXAMPLES
prezip-bin -d <wordlist.cwl >wordlist.txt Decompress file wordlist.cwl to text file wordlist.txt prezip-bin -z <wordlist.txt >wordlist.pz 2>errors.txt Compress wordlist.txt to binary file wordlist.pz and send any error messages to a text file named errors.txt LC_COLLATE=C sort -u <wordlist.txt | prezip-bin -z >wordlist.pz Sort a word list, then pipe it to prezip-bin to create a compressed binary wordlist.pz file. prezip-bin -d <words.pz | aspell create master ./words.rws Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage and options. TIPS
Prezip-bin is best used with sorted word list type files. It is not a general purpose compression program since resulting files may actu- ally increase in size. Unlike word-list-compress(1) if your word list has leading or trailing blank spaces for formatting purposes, you should remove them first before you compress your list using prezip-bin -z , otherwise those spaces will be included in the compressed binary output. DIAGNOSTICS
Prezip-bin normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr), and prezip-bin exits with a non-zero return value. Error messages are listed below: (display help/usage message) Unknown command given on the command line so prezip-bin displays a usage message to standard error output. unknown format The input file appears not to be an expected format, or may possibly be a more advanced format. The output file will be empty. corrupt input This is only for the decompression command -d. The input file appeared to be of a correct format, but something appears wrong now. There may be some valid data in output, but due to input corruption, the rest of the file can not be completed. unexpected EOF The input file appeared okay but ended sooner than expected, therefore the output file is not complete. SEE ALSO
aspell(1), aspell-import(1), run-with-aspell(1), word-list-compress(1) Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation. REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net>. Send bug reports/comments to the Aspell user list at the above address. AUTHOR
This info page was written by Jose Da Silva <digital@joescat.com>. prezip-bin-0.1.2 2005-09-30 PREZIP-BIN(1)
All times are GMT -4. The time now is 09:39 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy