Optimizing find with many replacements


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Optimizing find with many replacements
# 1  
Old 04-16-2018
Optimizing find with many replacements

Hello,

I'm looking for advice on how to optimize this bash script, currently i use the shotgun approach to avoid file io/buffering problems of forks trying to write simultaneously to the same file. i'd like to keep this as a fairly portable bash script rather than writing a C routine.

in a nutshell, there are many conditions in a file that i'm looking to replace strings. any particular file may have some, none or all of the requirements to replace a string.

currently

Code:
Longstring='lots of stuff'
spushd $HOME/somepath

gfind . -depth -name "somefile" -type f -writable -exec gsed -i '1{/^#./! s/.*/'"$Longstring"'/}' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/ts=4/ts=2/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/sw=4/sw=2/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/tab-width: 4/tab-width: 2/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/mode: tcl/mode: _tcl/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/c-basic-offset: 4/c-basic-offset: 2/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/^\s*(size.*)$/\1/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/^\s*(md.*)$/\1/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/^\s*(rmd.*)$/\1/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/^\s*(sha.*)$/\1/g' {} \;
  gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r 's/^(python.versions.*)$/python.versions 27 36/g' {} \;
  spopd

as you can see, these operations are sequential which can take quite a while.

should i modify the find to do depth first?

can i fork the find and avoid file io problems?

spawn different processes?

thanks
# 2  
Old 04-16-2018
Just to be clear, is it true that you want the output of the sed from the 1st find to be written to standard output (and not be included in the changes made to updated files) while all of the other finds run seds that will make updates to the files and not write anything to standard outputa?

Why not run all of the sed commands in the last 10 invocations of sed in a single invocation of find -execing sed?

And, why not use two -execs in a single invocation of find instead of invoking find eleven times?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 04-16-2018
Quote:
Originally Posted by Don Cragun
Just to be clear, is it true that you want the output of the sed from the 1st find to be written to standard output (and not be included in the changes made to updated files) while all of the other finds run seds that will make updates to the files and not write anything to standard outputa?

Why not run all of the sed commands in the last 10 invocations of sed in a single invocation of find -execing sed?

And, why not use two -execs in a single invocation of find instead of invoking find eleven times?
Thanks for help.

All of the sed is basically a large 'OR' boolean.

Any particular file, could have any sed condition, 1..."#conditions", so the the sed needs to search for a condition in the file before moving on to the next file.

Basically, the script has expanded over time and now it's getting to the point where I'd like to refactor it.

That is sort of the question, is it more efficient to let find search a massive amount of files and let sed chew on one condition at a time? Which it does now, which is basically unrolling the loops in your suggestion about concatening the sed to two exec commands?

or as you suggest, find pauses its search while let sed grind on one file searching all the conditions at once?

Say average files to search is ~ 100,000 files, average size ~40k/~100k

Thanks for the thoughts.

Last edited by f77hack; 04-16-2018 at 05:47 PM.. Reason: typos
# 4  
Old 04-16-2018
I think what Don is trying to tell you is: this command

Code:
gfind . -depth -name "somefile" -type f -writable

will find some list of files. Since it is repeated eleven times it will find (and hence process) eleven times the same list of files.

So you could put all the changes in the different sed-scripts into one sed-script and write something like:

Code:
gfind . -depth -name "somefile" -type f -writable -exec gsed -i -f /some/where/script {} \;

where /some/where/script would contain
Code:
1 {
     /^#./! s/.*/"$Longstring"/
   }
s/ts=4/ts=2/g
s/sw=4/sw=2/g
s/tab-width: 4/tab-width: 2/g
....

I have to admit you would have to work a bit to get the variable "$Longstring" passed properly, but this minor issue aside you should be a lot faster: you recurse the filesystem only once (instead of eleven times) and you call sed only once instead of eleven times for each file.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 5  
Old 04-16-2018
@bakunin yes, this is exactly what i was looking to do.

Thank you.

P.S. How would expand the script if instead of "somefile" but an array of "somefiles=()"? Would you spawn off
Code:
gfind

?

Last edited by f77hack; 04-16-2018 at 08:42 PM.. Reason: add PS
# 6  
Old 04-16-2018
Using bash and depending on how big the somefiles[] is (don't want to blow out the command line):

Code:
 gfind . \( -false ${somefiles[@]/#/-o -name } \) -type f ...

This User Gave Thanks to Chubler_XL For This Post:
# 7  
Old 04-17-2018
You can still have an embedded sed script.
All shells but (t)csh can have a multiline string
Code:
echo 'two
lines'

So the following should work
Code:
gfind . -depth -name "somefile" -type f -writable -exec gsed -i -r '
    1{/^#./! s/.*/'"$Longstring"'/}
    s/ts=4/ts=2/g
... 
    s/^(python.versions.*)$/python.versions 27 36/g
' {} \;

@Don, not true, -i outputs to file, given in all the sed invocations.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help building a variable string from a keyword - character replacements!

Hello scripting geniusii! I come to kneel before the alter of your wisdom! I am looking to take a keyword and replace characters within that keyword and add them to a string variable. I would like this to only go through however many characters the word has, which may vary in size. ... (10 Replies)
Discussion started by: ghaniba
10 Replies

2. Web Development

Optimizing JS and CSS

Yes. Got few suggestions. - How about minifying resources - mod_expires - Service workers setup https://www.unix.com/attachments/web-programming/7709d1550557731-sneak-preview-new-unix-com-usercp-vuejs-demo-screenshot-png (8 Replies)
Discussion started by: Akshay Hegde
8 Replies

3. Shell Programming and Scripting

BASH script to read external file to perform text replacements?

Hi all, I have a moderate size (300 lines) BASH Shell script that performs various tasks on different source reports (CSV files). One of the tasks that it performs, is to use SED to replace 'non-conforming' titles with conformant ones. For example "How to format a RAW Report" needs to become... (3 Replies)
Discussion started by: richardsantink
3 Replies

4. Shell Programming and Scripting

awk delete newline after other replacements

Dear All, could you please help me to remove \n characters after all other replacements have been done as in the code below: { #remove punctuation and starting whitespaces gsub("]"," "); $1=$1; } { #print lines containing 'whatever' if ($1=="whatever") {print} #print... (3 Replies)
Discussion started by: shivacoder
3 Replies

5. UNIX for Dummies Questions & Answers

Selective Replacements: Using sed or awk to replace letters with numbers in a very specific way

Hello all. I am a beginner UNIX user who is using UNIX to work on a bioinformatics project for my university. I have a bit of a complicated issue in trying to use sed (or awk) to "find and replace" bases (letters) in a genetics data spreadsheet (converted to a text file, can be either... (3 Replies)
Discussion started by: Mince
3 Replies

6. Shell Programming and Scripting

How to get count of replacements done by sed?

Hi , How can i get count of replacements done by sed in a file. I know grep -c is a method. But say if sed had made 10 replacement in a file, can i get number 10 some how? (8 Replies)
Discussion started by: abhitanshu
8 Replies

7. UNIX for Dummies Questions & Answers

How to use sed to do multiple replacements all at once?

I have a text file where I want to use sed to do multiple replacements all at once (i.e. with a single command) . I want to convert all AA's to 0, all AG's to 1 and all GG's to 2. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

8. Shell Programming and Scripting

Gen random char then sed replacements

Hey guys, I need to first generate some random characters, which I am already doing perfectly as follows: randomize=`cat /dev/urandom | tr -dc "a-z0-9" | fold -w 6 | head -n 1` This is where I am stuck...I need to sed replace some static values with those random characters, but I need each... (4 Replies)
Discussion started by: holyearth
4 Replies

9. Shell Programming and Scripting

Conditional replacements

Hi, I have a requirement as below Input Jacuzzi,"Jet Rings, Pillows",Accessory,Optional,,9230917,69094,,P556805,69094,FALSE,1,0,, Jacuzzi,"Jet Rings, Pillows, Skirt/Apron",Accessory,Optional,,9230917,69094,,P556805,69094,FALSE,1,0,, Output Jacuzzi,"Jet Rings!@%... (6 Replies)
Discussion started by: kewk
6 Replies

10. UNIX for Dummies Questions & Answers

optimizing - to find the number of occurrence

Hi, I need to find the number of occurrence of string in a file, for ex: >cat filename abc abc def ghi ghi ghi ghi abc abc >output would be abc 4 def 1 (10 Replies)
Discussion started by: matrixmadhan
10 Replies
Login or Register to Ask a Question