Avoiding external utilities


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Avoiding external utilities
# 1  
Old 01-07-2018
Avoiding external utilities

under past circumstances, id be fine running commands like this:

Code:
case 1:

printf '%s\n' "${MASSIVETEXT}" | egrep "i am on the first line"

case 2:

vprintf '%s\n' "${MASSIVETEXT}" | egrep -v "i am on the first line"

This works fine. Bit it calls external utility "egrep" and "printf". Yes, printf is actually a utility. i thought it was a function.

nevertheless, here's what im trying to do to help me avoid external commands:

Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

case "${MASSIVETEXT}" in
*${mea}*)
        echo "${MASSIVETEXT}"
        break
;;
*${meb}*)
        echo "${MASSIVETEXT}"
        break
;;
esac

The problem here is, when the strings in variable "mea" and "meb" are found, the code outputs everything in the $MASSIVETEXT variable.

I only want it to output the line containing the strings in either "mea" or "meb" variable.
# 2  
Old 01-07-2018
Quote:
Originally Posted by SkySmart
nevertheless, here's what im trying to do to help me avoid external commands:
It is a good idea to avoid external utilities when possible, but that doesn't mean you should never use them. In fact they are quite good at what they are supposed to do and as long as you do not misuse them you are perfectly entitled to used them in your code.

In your case, where you search for a fixed string, you do not need egrepat all. egrep uses a very mighty (and therefore resource-intensive) regex machine, but to search for a fixed string you don't need that. Use fgrep ("fast grep"), which does the same as egrep but with a very limited (and hence resource-saving) regexp machine.

What you want to do cannot be done in shell. Actually it can be done, but not economically. It would look similar to this:

Code:
echo ${LongText} | while read line ; do
     case line in
          *RegexpA*)
               echo $line
               ;;

          *RegexpB*)
               echo $line
               ;;

     esac
done

I expect that to work slower than using (any) grep, though. echo, btw., is a shell builtin, if you use bash.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 3  
Old 01-08-2018
Well, first to the usage of fgrep: I don't think you will notice any time different to egrep. The main advantage of fgrep is not speed, but convenience: With fgrep, you don't have to escape characters which have a special meaning inside a regexp.

Also, note that egrep and fgrep are obsolete; the recommended form is
Code:
grep -E

and
Code:
grep -F

. Of course, especially when typing from the command line, fgrep is faster to type than grep -F.

Not to the problem at hand: While I don't see how you can avoid an external utility (unless you write your own shell version of grep, which is possible, but not necessarily faster when it comes to large input), you can - if you are using bash or zsh - at least get rid of the pipe:

Code:
grep -F  'i am on the first line'  <<<$MASSIVETEXT

# 4  
Old 01-08-2018
Hi SkySmart,
To avoid using any external utilities with most shells written since 1985 (including bash and ksh), I would use something more like:
Code:
#!/bin/ksh
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

found=

printf '%s\n' "$MASSIVETEXT" | while read -r line
do
	case "$line" in
	(*$mea*)
		echo "$line"
		exit;;
	(*$meb*)
		found="$line";;
	esac
done
[ -n "$found" ] && printf '%s\n' "$found"

to do what I think you're trying to do. I.e., to search the entire variable contents for a match for $mea before looking for any match for $meb and to only print the first line in $MASSIVETEXT that matches the appropriate pattern.

Although echo and printf are almost always provided as built-ins in recently developed (i.e., since the 1970s) shells, they are required by the standards to also be available to be executed as stand alone utilities in one of the directories listed in the POSIX-compliant default setting for the PATH environment variable. This is true for all utilities defined by the standards except for the special built-in utilities: break, :, continue, ., eval, exec, exit, export, readonly, return, set, shift, times, trap, and unset. (Beware, however, that not all systems conform to the POSIX requirements. You may find some systems that don't have all required utilities available as a stand-alone utility; read is an example of a utility that is missing on some non-conforming systems.) To determine whether a given utility is built into your shell, the standards say you can use:
Code:
type utility_name...

With ksh (version 93u+) on macOS 10.13.2, the command:
Code:
type echo printf set cat

produces the output:
Code:
echo is a shell builtin
printf is a shell builtin
set is a special shell builtin
cat is a tracked alias for /bin/cat

while with bash (version 3.2.57) on the same system, the output produced is:
Code:
echo is a shell builtin
printf is a shell builtin
set is a shell builtin
cat is /bin/cat

Note that bash doesn't distinguish between regular built-ins and special built-ins like ksh does. Both meet the required specifications in the standards for the output produced by type.


Hi bakunin,
Note that SkySmart seems to only want to print the 1st line in $MASSIVETEXT that is matched by $mea and, only if no match for that fixed string is found, then print the 1st line in $MASSIVETEXT that matches $meb. The code you suggested will print every line matching either fixed string. Using only standard grep options, the above code may well be faster than fgrep even for medium sized contents of the variable MASSIVETEXT especially if neither fixed string is present in the file or if the 1st fixed string appears early in $MASSIVETEXT.

Since SkySmart hasn't told us what OS and shell are being used, we would need to just use standard options leading to something like:
Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

{	printf '%s\n' "$MASSIVETEXT" | grep -F "$mea" ||
	printf '%s\n' "$MASSIVETEXT" | grep -F "$meb"
} | {	read -r line
	[ -n "$line" ] && printf '%s\n' "$line"
}

which should work with any POSIX-conforming shell and grep utility. On many systems, the following would frequently be much faster, but it depends on the non-standard -m max_count option being supported by the user's grep utility:
Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

printf '%s\n' "$MASSIVETEXT" | grep -F -m 1 "$mea" ||
    printf '%s\n' "$MASSIVETEXT" | grep -F -m 1 "$meb"

Note that if neither string is present in the file, you have to invoke grep twice and read the entire "massive" text twice.

And, the shell being used hasn't been specified either. With a recent bash or ksh, the above scripts could all avoid most of the invocations of printfs by using here-strings:
Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

found=

while read -r line
do
	case "$line" in
	(*$mea*)
		echo "$line"
		exit;;
	(*$meb*)
		found="$line";;
	esac
done <<<$MASSIVETEXT
[ -n "$found" ] && printf '%s\n' "$found"

Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

{	grep -F "$mea" <<<$MASSIVETEXT ||
	grep -F "$meb" <<<$MASSIVETEXT
} | {	read -r line
	[ -n "$line" ] && printf '%s\n' "$line"
}

Code:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"

mea="i am on the first line"
meb="i am on the second line"

grep -Fm1 "$mea" <<<$MASSIVETEXT || grep -Fm1 "$meb" <<<$MASSIVETEXT

Hi rovf,
On every system I've seen, for a large amount of data (which one might assume from a variable named MASSIVETEXT), there is a noticeable difference in performance between grep -F (fastest), grep without -E and without -F (slower), and grep -E (slower still). However, with fixed strings as REs, I don't usually see much difference between plain grep and grep -E. I don't have any experience with where grep -P fits into the speed spectrum on systems that include support for perl's RE extensions in grep.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-08-2018
Quote:
Originally Posted by Don Cragun
Hi bakunin,
Note that SkySmart seems to only want to print the 1st line in $MASSIVETEXT that is matched by $mea and, only if no match for that fixed string is found, then print the 1st line in $MASSIVETEXT that matches $meb. The code you suggested will print every line matching either fixed string.
You are right. Only now i notice i misread that part of his question. My bad. But the point i wanted to make was to use grep (or any other utility, for that matter) where it makes sense. The code was only added for illustration.

bakunin
This User Gave Thanks to bakunin For This Post:
# 6  
Old 01-08-2018
Quote:
Originally Posted by Don Cragun
Hi rovf,
On every system I've seen, for a large amount of data (which one might assume from a variable named MASSIVETEXT), there is a noticeable difference in performance between grep -F (fastest), grep without -E and without -F (slower), and grep -E (slower still).
Interesting point. I tried it with a 300MB file, using Cygwin grep, and searching for a fixed string with various settings. I repeated each run 3 times. Here the times

with -F:

Code:
0.27s user 0.08s system 91% cpu 0.372 total

0.30s user 0.05s system 93% cpu 0.367 total

0.28s user 0.06s system 95% cpu 0.358 total

without option:

Code:
0.22s user 0.14s system 98% cpu 0.362 total

0.31s user 0.05s system 98% cpu 0.363 total

0.22s user 0.12s system 95% cpu 0.358 total

with -E:

Code:
0.25s user 0.09s system 93% cpu 0.367 total

0.27s user 0.08s system 95% cpu 0.359 total

0.25s user 0.11s system 98% cpu 0.365 total

Of course, it could be that the amount of data is not massive enough to show a significant difference; or that the I/O of the Cygwin layer is so heavy, that it shadows performance differences. Or that gnu grep is clever enough to recognize, that the pattern does not contain any regexp characters and does internally a '-F' always.

To evaluate at the last hypothesis, I replaced the pattern by one which really contained an extended regexp instead of a fixed string (using a .+ regexp operation). In this case, the times were systematically higher, but not very much:

Code:
0.37s user 0.11s system 99% cpu 0.486 total

0.36s user 0.09s system 93% cpu 0.481 total

0.30s user 0.14s system 89% cpu 0.486 total

Perhaps such a test should really be repeated with a file which is several GB in size.

Interestingly, in the OP's problem, the whole content of the file was stored in a shell variable, and this makes me wonder, where the practical limit is for the content of a variable in, say, bash or ksh...

Last edited by rbatte1; 01-08-2018 at 10:49 AM.. Reason: Added CODE tags for output
These 2 Users Gave Thanks to rovf For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Avoiding new line for the counts

Hi Team, Am getting the below output but need the count of records to be displayed in same line but currently count alone moves to next line. Please let me know how we can still keep the count in the same line. ######code ##### while read YEAR; do for i in TEST_*PGYR${YEAR}_${DT}.csv; do... (3 Replies)
Discussion started by: weknowd
3 Replies

2. Shell Programming and Scripting

Search avoiding special characters

Hi all, I have a list which I want to search in another file. I can do that using grep -f but the search is failing due to special characters, how do I solve this? One row in that list is amino-acid permease inda1 gb|EDU41782.1| amino-acid permease inda1 Input file to be searched... (2 Replies)
Discussion started by: gina.lizar
2 Replies

3. Shell Programming and Scripting

Avoiding some files inside a loop

In my script I need to loop around some files like below example files are fa.info.abcd fa.info.bxde fa.info.cdas ------ for test_data in fa.info.* do # Some text appending logic applied # Copy to another directory done Now I need to discard some files while looping around ... (9 Replies)
Discussion started by: smile689
9 Replies

4. UNIX for Dummies Questions & Answers

Avoiding the history

In bash shell, how we can avoid the commands getting recorded in history file. One way i can think of is : export HISTSIZE=0 Is there any other way to achieve this? Thanks (1 Reply)
Discussion started by: pandeesh
1 Replies

5. Shell Programming and Scripting

Avoiding 'sh -c' when running ps from CRON

Hi, I have a script which has the below line: ps -ef | grep ${SCRIPT_NAME} | grep ksh | grep -v grep >> /tmp/instance.tmp When the script is invoked through CRON, I get 2 lines in instance.tmp when actually only one instance is running: cdrd 17790 17789 0 15:14:01 ? 0:00 /bin/ksh... (8 Replies)
Discussion started by: cavallino4u
8 Replies

6. UNIX for Dummies Questions & Answers

Avoiding the second run of the script

Hi all, I want to put a check in my script to check if the same instance is already running and not finished and if not then does not allow it to run! in which part of my script I should put this? and any idea how I should write it? tx (4 Replies)
Discussion started by: messi777
4 Replies

7. Shell Programming and Scripting

using find but avoiding sparse files

I am no Unix administrator...I live in windows land. I wrote a script to find files of certain names and process them but was later advised to avoid checking sparse files since it would use up a lot of resources and the files I was looking for were not there. How do I avoid doing the find on... (3 Replies)
Discussion started by: shellFun
3 Replies

8. OS X (Apple)

Installing applications by avoiding GUI

Hi Experts, Now the problem is when I run the install script inside .app folder, it opens a GUI and asks for user input. I want to avoid these GUI. I want to provide input when i run install script e.g. $ ./install < inputfile I used to redirect input from input file to install script. But... (4 Replies)
Discussion started by: akash.mahakode
4 Replies

9. SCO

Avoiding duplicates with some special case

Hi Gurus, I had a question regarding avoiding duplicates.i have a file abc.txt abc.txt ------- READER_1_1_1> HIER_28056 XML Reader: Error occurred while parsing:; line number ; column number READER_1_3_1> Sun Mar 23 23:52:48 2008 READER_1_3_1> HIER_28056 XML Reader: Error occurred while... (0 Replies)
Discussion started by: pssandeep
0 Replies
Login or Register to Ask a Question