How to form a correct syntax to sift out according to complementary patterns with 'find'?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to form a correct syntax to sift out according to complementary patterns with 'find'?
# 1  
Old 01-13-2018
Apple How to form a correct syntax to sift out according to complementary patterns with 'find'?

I need to find all files and folders containing keyword from the topmost directory deep down the tree but omitting all references to keyword in web-search logs and entries, i.e. excluding search and browsing history made using web-browser1, web-browser2, web-browser3, (bypassing all entries of the type "/Users/myuser/Library/web-browser1, 2, 3/History/keyword/blahblahblah" etc. )

I use

Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*/(keyword|KEYWORD)/.*' -and -not -path '.*/(web-browser1|web-browser2|web-browser3)/.*'

So far it only gets right the first half of this expression (ending with keyword) but fails to execute the second half. I tried using grep -v piping the first half of the original expression to it instead with the same argument, I modified it to
Code:
\! -path '.*/(web-browser1|web-browser2|web-browser3)/.*'

only to end up with the same result (all entries with the paths, each containing keyword). What I'm doing wrong? Is it possible to do complementary match with find or maybe I failed to arrange the regular expression correctly?

Last edited by scrutinizerix; 01-17-2018 at 04:51 PM..
# 2  
Old 01-13-2018
Hi,

I think this might be time to break out the -prune flag to find, perhaps. It can be used to exclude from your results all things that match a particular criteria.

So, an example. I created a sort-of-similar directory tree to yours, with one single test file containing the text "FOO" copied to all directories.

So a simple find with an appropriate -exec returns this:

Code:
$ find . -type f -exec grep -l FOO \{\} \;
./Users/myuser/Library/browser3/test
./Users/myuser/Library/Stuff/test
./Users/myuser/Library/Things/test
./Users/myuser/Library/browser2/test
./Users/myuser/Library/browser1/test
$

So far, so normal. But now here's what happens if I use prune to exclude directories that match the pattern browser?:

Code:
$ find . -type d -name 'browser?' -prune -o -type f -exec grep -l FOO \{\} \;
./Users/myuser/Library/Stuff/test
./Users/myuser/Library/Things/test
$

The files underneath browser1, browser2 and browser3 were ignored, because even though they themselves matched we'd removed all directories whose name matched the pattern 'browser?' from consideration.

The basic syntax for prune is you set up your conditions first for what you want to exclude (so -type d -name 'browser?' in this case), and then after the -prune -o you put the conditions for what you actually want to match once what's being pruned is excluded.

Hope this helps.
# 3  
Old 01-17-2018
Apple

Hi,
Thanks for your reply. I tried your syntax but it failed to do what I wanted. Actually what I passed on my previous post for browser1, browser2, browser3 were all different names: Safari, Opera, Firefox - so the regex in this case should be
Code:
(Safari|Opera|Firefox)

(is it correct, btw?) and since these are just parts then the question how do I define that arranging my regexes. I'd like it to be directories to be skipped (in which case I composed the regex as
Code:
'.*/(Safari|[Oo]pera|Firefox|[Mm]ozilla)/.*'

, though I feel uncertain of its correctness; I used this notation too with no or converse output than desired) or all types of files containing one of those names (
Code:
'.*(Safari|Opera|Firefox).*'

?). I'm not sure how to handle that.

In my case I tried
Code:
'.*(Safari|Opera|Firefox).*'

and in this scenario I appended find with -E option and prepended
Code:
'.*(Safari|Opera|Firefox).*'

with -regex so the most common variant to which other my combinations used could be reduced is

Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -name '.*(keyword|KEYWORD).*'

I've tried so many variants I can barely remember what an option output what. I tried appending -exec (
Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -exec grep '.*(keyword|KEYWORD).*' {} ';'

- it printed many lines of the kind "grep: .*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*: no such file or directory") , I tried piping to grep the latter used as an argument to xargs (
Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(keyword|KEYWORD).*' | xargs -I {} grep -RLE {} '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*'

and got the message "no termination character" or smth like that).
Right now it hanged with no output or error message at all executing
Code:
echo MYPASSWORD | sudo -S find -E / -name  '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -type f -exec grep -il *keyword*  {} ';'

. I tried using {} + at the end of this line instead of {} ';', I added and omitted -type option - no difference either. Interesting that when I used operators -and -not -path (or -name in place of -path) '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' it would return results containing one of those names/paths. That's just weird.

The shell is bash 3.2.48
# 4  
Old 01-17-2018
Quote:
Originally Posted by scrutinizerix
Hi,
Thanks for your reply. I tried your syntax but it failed to do what I wanted. Actually what I passed on my previous post for browser1, browser2, browser3 were all different names: Safari, Opera, Firefox - so the regex in this case should be
Code:
(Safari|Opera|Firefox)

(is it correct, btw?) and since these are just parts then the question how do I define that arranging my regexes. I'd like it to be directories to be skipped (in which case I composed the regex as
Code:
'.*/(Safari|[Oo]pera|Firefox|[Mm]ozilla)/.*'

, though I feel uncertain of its correctness; I used this notation too with no or converse output than desired) or all types of files containing one of those names (
Code:
'.*(Safari|Opera|Firefox).*'

?). I'm not sure how to handle that.

In my case I tried
Code:
'.*(Safari|Opera|Firefox).*'

and in this scenario I appended find with -E option and prepended
Code:
'.*(Safari|Opera|Firefox).*'

with -regex so the most common variant to which other my combinations used could be reduced is

Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -name '.*(keyword|KEYWORD).*'

I've tried so many variants I can barely remember what an option output what. I tried appending -exec (
Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -exec grep '.*(keyword|KEYWORD).*' {} ';'

- it printed many lines of the kind "grep: .*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*: no such file or directory") , I tried piping to grep the latter used as an argument to xargs (
Code:
echo MYPASSWORD | sudo -S find -E / -regex '.*(keyword|KEYWORD).*' | xargs -I {} grep -RLE {} '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*'

and got the message "no termination character" or smth like that).
Right now it hanged with no output or error message at all executing
Code:
echo MYPASSWORD | sudo -S find -E / -name  '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' -prune -o -type f -exec grep -il *keyword*  {} ';'

. I tried using {} + at the end of this line instead of {} ';', I added and omitted -type option - no difference either. Interesting that when I used operators -and -not -path (or -name in place of -path) '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*' it would return results containing one of those names/paths. That's just weird.

The shell is bash 3.2.48
I don't see why you think that is weird. -and instead of -a and -not instead of ! are not an issue. They are, respectively, synonyms. But -name and -path are completely different. The -name primary is not affected by the -E option and the pattern used for filename (not pathname) matches is a shell pathname matching pattern. So:
Code:
find -E -name '.*(Safari|[Oo]pera|Firefox|[Mm]ozilla).*'

is looking for a file with a name that starts with a <period> followed by any string of zero or more characters followed by an <open-parenthesis> followed by the string Safari followed by a <vertical-bar> followed by an upper-case or lower-case o followed by the string pera|Firefox| followed by an upper-case or lower-case m followed by a <close-parenthesis> followed by a <period> followed by a string of zero or more characters. My guess would be that you don't have any directories that are matched by that filename matching pattern so no directories are pruned from your search.

As you have been told before, if you don't tell us what operating system and shell you're using, questions like this waste a lot of our time and yours guessing at what might or might not be failing on your end because you're using options that are only available on some systems and are using some options that may behave differently on different operating systems.

Furthermore, your specification is not at all clear. Sometimes you're trying to exclude pathnames that contain a directory name that is a case-insensitive spelling of "keyword". Other times you trying to exclude regular files (no matter what the file's name is) if the file contains a case-insensitive spelling of keyword.

You haven't shown us any sample filenames or pathname nor their contents for files that should and should not have their pathnames printed, what should be printed in addition to their pathnames (if anything), ...

Please give us a clear specification of:
  1. what you are trying to do,
  2. what operating system you're using,
  3. what shell you're using,
  4. what the file hierarchy you're searching looks like,
  5. what files look like that should be searched,
  6. what files look like that should not be search, and
  7. the output you are trying to produce from that sample file hierarchy.
# 5  
Old 01-17-2018
Apple

I thought it was obvious that my system is OS X since Apple icon I sticked to the top of my message and besides I indicated at the bottom that my shell is bash 3.2.48.

I have an app whose bundle id contains the word keyword. Also this word is a significant part of all of the files and folders that got installed or created together with the main bundle upon me running and installing the app. On OS X these files can be installed across the entire system (HFS+) beginning with /private/var/db or /private/var/folders, /Library folders and, of course those of ~/Library (plists, cache file, application support folders etc.residing in different places). Unfortunately when searching with the most basic form of find (find / -name keyword) I got a bunch of pathnames that are references to web-entries, containing the keyword whose practical value is negligible. I need to filter out those and have only those files and folders that belong to the items created by the app proper. They contain either keyword or KEYWORD. That's why I used alternation operator | with the items grouped inside parentheses. Since the history of web-search contains the same keyword that is a part of pathnames each containing the name of one of these web-browsers I wanted to skip every pathname containing the respective name of any of the browsers. As browser's name
appears both in the pathname of some of the browser's folders (like /Users/myuser/Library/Safari, /Users/myuser/Library/Cache/Safari, /Users/myuser/Library/Preferences/com.apple.Safari.plist etc., the same's true for Firefox containing sometimes "mozilla" or "Mozilla" in its folder names too) and its regular files of smaller size, I tried to form the regular expression meeting these criteria with the alternation operator as well with regards to case sensitivity (find all patterns matching the keyword irrespective of the case; ignore all the pathnames containing the name of one of the browsers included in the parentheses on the alternative basis).

The thing is I'm confused about purpose and meaning of the syntax that looks similar: do I need to use -exec find, -exec grep (but then in the latter case I need the option -E because | is Extended Set)? Or maybe pipe to grep instead?
I noticed two options to grep: -L, --files-without-match
Quote:
Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.
What does "from which no output would normally have been printed." mean?

"The scanning will stop on the first match" - Match to what? And if it will stop how do I get the output.

Furthermore, -l, --files-with-matches
Quote:
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
"from which output would normally have been printed"?

I used --directories=skip because
Quote:
-d ACTION, --directories=ACTION
If an input file is a directory, use ACTION to process it. By default, ACTION is read, which means that directories are read just as if they were ordinary files. If ACTION is skip, directories are silently skipped. If ACTION is recurse, grep reads all files under each directory, recursively; this is equivalent to the -r option.
,
I thought the directories whose pathnames contains names of these browsers would be skipped.

Let's say I write
Code:
find -E /" -regex '.*/(Safari|[Oo]pera|Firefox|[Mm]ozilla)/.*' -prune -o -exec grep -iE './keyword/.*' {} ;

1. IF the syntax itself is correct then I have no clue what to expect. I cannot be sure which option to pick since I don't understand this:

from man find on -exec utility [argument ...]

Quote:
The expression must be terminated by a semicolon (``;''). If you invoke find from a shell you may need to quote the semicolon if the shell would otherwise treat it as a control operator. When command runs, the argument {} is replaced with the name of the current file.
What the current file is? What's the deal with ; as a control operator? What does it control?
On the other hand

from man find on -exec utility [argument ...] {} +
Quote:
Same as -exec, except that ``{}'' is replaced with as many pathnames as possible for each invocation of utility.
``{}'' is replaced with as many pathnames as possible for each invocation of utility . Smilie

How would you compose the line to achieve the task? Because all the explanations are crystal clear while they manipulate simple examples. That one is more advance, I dare to think.

Moderator's Comments:
Mod Comment
Please wrap all code, files, input & output/errors in CODE tags.
Please wrap code/data snippets inside a paragraph with ICODE tags.
Please wrap quoted text in QUOTE tags.
It makes them far easier to read and preserves spacing for indenting or fixed width data.

Last edited by rbatte1; 01-18-2018 at 06:40 AM.. Reason: Formatting changes for CODE, ICODE, QUOTE and removing excessive text colouring
# 6  
Old 01-18-2018
Quote:
Originally Posted by scrutinizerix
1. IF the syntax itself is correct then I have no clue what to expect. I cannot be sure which option to pick since I don't understand this:

from man find on -exec utility [argument ...]
[....]
What the current file is? What's the deal with ";" as a control operator? What does it control?
find is not only a utility to find files - that is, to produce a list of filenames to be printed - but a "programmable commandline filemanager", so to say.

How is that done? The basic operation is find finds all files and directories and prepares an initial "result set". Then you have one or more clauses which returns a logical value, TRUE or FALSE. Each file/directory in the result set is presented to the first clause. If it returns TRUE the file/directory is kept in the result set, otherwise it is dropped. If it is kept, it is presented to the second clause, etc..

An example:

Code:
find /some/path -name "foo" -print

The initial result set is all files/directories in /some/path. This list is presented to the clause -name foo and if the name of the file/directory is "foo" it is kept, otherwise dropped. What still is in the result set is then presented to the -print clause, which just prints it, without modifying the result set further.

So far, so basic, but it is necessary to understand this mechanism of presenting one filename/directory name after the other to each of these clauses successively.

Remember i said "file manager" up there? Up to now we only produce - more or less tailored - lists of file-/directorynames. Now we want to actually do something with the files/directories found that way. For this there is a special clause: -exec.

-exec takes a "template commandline and executes this template commandline with every file/diretory in the result set. An example:

Code:
find /some/path -name "foo" -type f -print
/some/path/dir1/foo
/some/path/dir2/foo
/some/path/dir2/subdir/foo

Now we replace the -print with -exec in this command:
Code:
find /some/path -name "foo" -type f -exec echo file found: {} \;
file found: /some/path/dir1/foo
file found: /some/path/dir2/foo
file found: /some/path/dir2/subdir/foo

What has happened? First, the {} is the placeholder for the filename, which is presented to the clause. That is, -exec executed these commands:

Code:
echo file found: /some/path/dir1/foo
echo file found: /some/path/dir2/foo
echo file found: /some/path/dir2/subdir/foo

Second, you need a way to tell the shell, into which you type the whole find-command, where the "template-commandline" for -exec ends and the normal commandline resumes. This is done by an (escaped) semicolon, hence the \; at the end. Here is a more complex example i have annotated:

Code:
                                          normal commandline resumes
                                                                   |
                                  template commandline ends here   |
                                                               |   |
<--------normal commandline-------------> <--template cmdl-->  V <---->
find /some/path -name "foo" -type f -exec echo file found: {} \; -print

Notice, that you can use the -exec-clause even to select from the result set: if the template command returns TRUE when executed with the file/dir the file/dir will be further included in the result set, otherwise it will be dropped. You can even have more than one -exec-clauses, where some will only help shape the result set and the final one will actually do the work.

Finally, some performance considerations: consider the following example:

Code:
find . -name "*txt" -exec cat {} \;

This will produce a (potentially long) list of commands cat foo.txt, cat bar.txt, etc.. As this list could grow very long there will be many processes started which might tax the system (starting a process is actually "expensive" resource-wise). But cat could be called this way:

Code:
cat file1 file2 file3 [...]

and this way one cat-process would be started for a whole group of files and not for eeach one. This is what the + is for. Use this:

Code:
find . -name "*txt" -exec cat {} +

To do exctly that: group the files in the result set and call cat with each group instead of with each file individually.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 7  
Old 01-18-2018
Thanks, it was a nice explanation. What about
Code:
grep -l

or
Code:
grep -L

man entries I highlighted? Confusing as hell.

Also, what about
Code:
-prune

? How does it work exactly?


Would you mind finding some time to explain it in your comprehensible way, please? Would aid a lot.

Last edited by scrutinizerix; 01-18-2018 at 08:05 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. OS X (Apple)

Can't figure out the correct syntax for a command loading a webkit plugin

Hello, Using Bash on Mac OS X 10.7.5 (Lion). I downloaded a GrowlSafari plugin for Webkit from its GitHub page GitHub - uasi/growl-safari-bridge: GrowlSafariBridge enables arbitrary javascript (including Safari Extensions) to notify via Growl.. In the description it says that after installing for... (0 Replies)
Discussion started by: scrutinizerix
0 Replies

2. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

3. Shell Programming and Scripting

Cannot find correct syntax to make file name uppercase letters

I have a file name : var=UsrAccChgRpt I want to make them upper case. Tried: $var | tr Error: tr: Invalid combination of options and Strings. Usage: tr | -ds | -s | -ds | -s ] String1 String2 tr { -d | -s | -d | -s } String1 Could you please help. I am using AIX... (2 Replies)
Discussion started by: digioleg54
2 Replies

4. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

5. Shell Programming and Scripting

how to form Records[multiple line] between two known patterns

file contents looks like this : #START line1 of record1 line2 of record1 #END #START line1 of record2 line2 of record2 line3 of record2 #END #START line1 of record3 #END my question how should i make it a records between #START and #END . willl i be able to get the contents of the... (5 Replies)
Discussion started by: sathish92
5 Replies

6. Shell Programming and Scripting

Do syntax is correct ?

I tried with sed command to create a space between namespace from the XML file. I used this syntax. Can someone tell me is this syntax is vaild? /usr/xpg4/bin/sed -e 's/<\/^.*><^.:Errort>/<\/^.*> <^.:Errort>/g' test > test2 I dint find any changes or any space being created between... (10 Replies)
Discussion started by: raghunsi
10 Replies

7. UNIX Desktop Questions & Answers

Correct syntax

Hi, I want to check if file(s) exist even in subdirectories and perform an action. After searching here couldn't find solution that would work, but made my own solution that works fine: if then echo egrep "$1|$2|$3" `find| grep MLOG` else echo "MLOG does not exist" fiThat will check... (1 Reply)
Discussion started by: Vitoriung
1 Replies

8. Shell Programming and Scripting

if [ $NOWDATE -gt $STARTDATE ] , date comparison correct syntax?

i've looked at a bunch of the date comparison threads on these boards but unfortunately not been able to figure this thing out yet. still confused by some of the way conditionals handle variables... here is what i where i am now... # a bunch of initializition steps are here ...... (1 Reply)
Discussion started by: danpaluska
1 Replies

9. Shell Programming and Scripting

Plz correct my syntax of shell script

Dear all I am still bit new in shell script area.I am writing down a shell script which I guess somewhere wrong so please kindly correct it. I would be greatful for that. What I actually want from this shell script is that it will move all the files one by one to another server which can be... (2 Replies)
Discussion started by: girish.batra
2 Replies

10. Shell Programming and Scripting

Correct Syntax For Calling Shell Script in Perl Module

Problem: I have a shell script that will be called by a Perl module that will connect to a db and delete rows. The Perl module will be called by CRON. I am using a Perl module to call a shell script because I need to get the db connection from Perl. Here is the Perl pseudocode: ... (4 Replies)
Discussion started by: mh53j_fe
4 Replies
Login or Register to Ask a Question