Large search replace using sed results in memory problem.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Large search replace using sed results in memory problem.
# 1  
Old 12-11-2015
Large search replace using sed results in memory problem.

I have one big file of size 9GB (big_file.txt). This big file has sentences and paragraphs like any usual English document. I have another file consisting of replacement strings for sed to use. The file name is replace.sed and each entry in one line looks like this:
Code:
s/\<shout\>/shout/g
s/\<b is for\>/b_is_for/g
s/\<blue petrossa \>/blue_petrossa_/g
s/\<crocodile dundee\>/crocodile_dundee/g

There are 6 million such replacement strings each in a newline.

As one can see above that my objective is to replace those strings in the big_file.txt which have a space in between words with an underscore character.

I am running the code using this command:
Code:
cat big_file.txt | sed -f replace.sed > outfile

There are no issues with the code as it runs without any errors. I can understand that since I am doing large search and replacement, it will be both time and space demanding. But I cannot understand why the memory usage in the above command keeps on increasing with time and after sometime takes up the entire primary memory available in the computer?

The I used the split command to split the
Code:
big_file.txt

in smaller chunks each of 500MB. Running the same sed one liner on one of these smaller chunks only at one time also keeps on taking up the memory space.

I even tried with GNU parallel to speed up both on the large and the smaller file:
Code:
cat big_file.txt | parallel --pipe sed -f replace.sed > outfile

The above command chokes the entire computer resulting in disk thrashing. Any idea why the above script is taking too much of "ever-increasing" space? I am using BASH on Slackware.
# 2  
Old 12-11-2015
Paralleling process won't help but increase memory congestion as every process will allocate own memory for the same operations.

Don't cat the big_file but have sed read the file directly - that might reduce system memory consumption for piping/buffering. And, try to split the replacefile and iterate the result of the first part_replace through the rest of the part_replaces.

Like (untested)
Code:
sed -f part_rep1 big_file | sed -f part_rep2 | sed -f part_rep3   etc.

Try this on smaller subsets of both data and script files.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 12-11-2015
Thanks. Running the script now. I will post again about the observations.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed Unexpected results, missing first search item

I created 3 files with the identical data as follows dial-peer voice 9999 pots trunkgroup CO list outgoing Local translation-profile outgoing LOCAL-7-DIGITS-NO-PREPEND-97 preference 2 shutdown destination-pattern 9......$ forward-digits 7 dial-peer voice 10000 pots ... (6 Replies)
Discussion started by: popeye
6 Replies

2. Shell Programming and Scripting

Can ctag and cscope support recording search results and displaying the history results ?

Hello , When using vim, can ctag and cscope support recording search results and displaying the history results ? Once I jump to one tag, I can use :tnext to jump to next tag, but how can I display the preview search result? (0 Replies)
Discussion started by: 915086731
0 Replies

3. UNIX for Dummies Questions & Answers

How to use 'sed' to search and replace?

Hello - I have a very large file in which a certain numbers are repeated. I find that using vi to edit the entire file is useless. How should i use sed to find a replace such as this text: To replace: 145.D25.D558 With: 215.22.45.DW I tried this command: sed... (4 Replies)
Discussion started by: DallasT
4 Replies

4. Shell Programming and Scripting

highly specific search and replace for a large number of files

hey guys, I have a directory with about 600 files. I need to find a specific word inside a command and replace only that instance of the word in many files. For example, lets say I have a command called 'foo' in many files. One of the input arguments of the 'foo' call is 'bar'. The word 'bar'... (5 Replies)
Discussion started by: ksubrama
5 Replies

5. UNIX for Advanced & Expert Users

Out of Memory error when free memory size is large

I was running a program and it stopped and showed "Out of Memory!". at that time, the RAM used by this process is around 4G and the free memory size of the machine is around 30G. Does anybody know what maybe the reason? this program is written with Perl. the OS of the machine is Solaris U8. And I... (1 Reply)
Discussion started by: lilili07
1 Replies

6. Shell Programming and Scripting

sed search and replace

hi, im new for sed, anyone can help me to these in sed command my output file.txt "aaa",a1,bbb "ddd",a1,ccc "eee",a1,www need to change a1, to "a1"," output i need "aaa","a1","bbb "ddd","a1","ccc "eee","a1","www thanks in advance fsp (2 Replies)
Discussion started by: fspalero
2 Replies

7. UNIX and Linux Applications

GNU sed - Search and Replace problem

Hi, The following code loops through every file with an error extension and then loops through all XML files in that directory and replaces the target character @ with / . The problem I have is that if there is more than one occurance of @ in each individual file it doesn't replace it. Any... (2 Replies)
Discussion started by: Fishn
2 Replies

8. Shell Programming and Scripting

Problem with sed (search/replace)

Hi, In a file FILE, the following lines appear : WORD 8 8 8 ANOTHERWORD blabla ... Directly in the prompt, if I type $sed '/WORD/s/8/10/g' FILE it replace the 8's by 10's in file : $cat FILE WORD 10 10 10 ANOTHERWORD blabla ... (9 Replies)
Discussion started by: tipi
9 Replies

9. UNIX for Dummies Questions & Answers

Search/Replace with Sed

Is there a way to use the sed command to 1) search a specified pattern 2) in the line where that pattern is found, replace from character N to character N+4 with a new 4-character string. Thks in advance! (5 Replies)
Discussion started by: mvalonso
5 Replies

10. UNIX for Dummies Questions & Answers

sed search and replace

Hello Folks, Anyone know how I can replace this line in file.xml <oacore_nprocs oa_var="s_oacore_nprocs">8</oacore_nprocs> with this line <oacore_nprocs oa_var="s_oacore_nprocs">1</oacore_nprocs> using sed or awk ? Thanks for your time. Cheers, Dave (7 Replies)
Discussion started by: d__browne
7 Replies
Login or Register to Ask a Question