Shells, forks, subprocesses... oh my

03-10-2017

Registered User

32, 1

Join Date: Jan 2016

Last Activity: 16 April 2018, 10:03 PM EDT

Location: 3rd planet

Posts: 32

Thanks Given: 27

Thanked 1 Time in 1 Post

Shells, forks, subprocesses... oh my

all,
i've been reading to try and get an abstract idea of the process effeciency of commands , sed, bash, perl, awk, find, grep, etc

which processes will spawn?, fork?, launch subshell?, etc and under what conditions?
how do you know which commands have the faster and better stdio implementation?

and so am looking for some guru advice instead of running thousands of use cases for different configurations.

example: finding a specific line in a multiple files spanning a volume

i can use something like this

Code:

sed 'LINENOq;d' $dir/$filename

which seems very fast for searching many(60,000+) of files <10 kb ascii, UTF-8 but one could also use

Code:

tail -n+LINENO $dir/$filename | head -n1

which seems fairly fast as well, one could also probably come up with a few one liners in perl.

f77hack

View Public Profile for f77hack

Find all posts by f77hack

03-10-2017

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

sed, bash, perl, awk, find, grep are all processes. A subshell is a process. A fork is a fork is a fork.

Whether any of these are faster or slower than other ways to solve your problem, really depends on your problem, and the algorithm you use to solve it. So "one solution to solve everything, forever" may be out the window.

There's some cardinal sins to avoid:

Don't reprocess the same file repeatedly. You can almost always do everything in one pass that you could do in two.
Don't launch whole processes to process tiny amounts of data. echo "a b c" | awk '{ print $1 }' is a tragic waste, this is when shell builtins would be thousands of times more efficient.
Running your innermost loop in the shell will be slow. A while read loop line by line over a file will be slower than awk '{ something }' filename. Shell is for the high level things, not the nitty gritty bulk work. This is when externals would be thousands of times more efficient.
If you're doing cat | awk | sed | cut | tr | kitchen | sink, put it all in one awk. awk is a programming language which is capable of replacing all of these with some near-trivial code, and one awk call will be faster than ten anything else.
Useless Use of Cat. Don't do that. Nothing needs a cat | in front of it to read a file.

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-15-2017

Registered User

32, 1

Join Date: Jan 2016

Last Activity: 16 April 2018, 10:03 PM EDT

Location: 3rd planet

Posts: 32

Thanks Given: 27

Thanked 1 Time in 1 Post

Quote:

Originally Posted by Corona688

Don't reprocess the same file repeatedly. You can almost always do everything in one pass that you could do in two.
Don't launch whole processes to process tiny amounts of data. echo "a b c" | awk '{ print $1 }' is a tragic waste, this is when shell builtins would be thousands of times more efficient.
Running your innermost loop in the shell will be slow. A while read loop line by line over a file will be slower than awk '{ something }' filename. Shell is for the high level things, not the nitty gritty bulk work. This is when externals would be thousands of times more efficient.
If you're doing cat | awk | sed | cut | tr | kitchen | sink, put it all in one awk. awk is a programming language which is capable of replacing all of these with some near-trivial code, and one awk call will be faster than ten anything else.
Useless Use of Cat. Don't do that. Nothing needs a cat | in front of it to read a file.

Many thanks. This is what I was looking for, general rule of thumb.

f77hack

View Public Profile for f77hack

Find all posts by f77hack

Shell Programming and Scripting

Shells, forks, subprocesses... oh my

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to kill a script and all its subprocesses?

Discussion started by: lordofazeroth

2. UNIX for Dummies Questions & Answers

Please what are shells?

Discussion started by: postcd

3. UNIX for Dummies Questions & Answers

What is meant by subprocesses?

Discussion started by: Straitsfan

4. Programming

read from file using forks

Discussion started by: ddx08

5. Programming

multiple forks and printf question

Discussion started by: navigator

6. UNIX for Advanced & Expert Users

Question on forks and pipes

Discussion started by: Phantom12345

7. Programming

forks, ipc, fifos, update issues...

Discussion started by: Funktar

8. UNIX for Advanced & Expert Users

forks....HELP!!! someone anyone?

Discussion started by: richardspence2

9. UNIX for Advanced & Expert Users

possibility to call subprocesses from ksh ??

Discussion started by: TheBlueLady