Problems with "grep -vf", and exclusion files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Problems with "grep -vf", and exclusion files
# 8  
Old 12-01-2013
With -F, exclude contains literal strings. Without it, regular expressions.

Next time, save everyone (yourself included) some time and post the contents of all relevant files.

Regards,
Alister
# 9  
Old 12-01-2013
Quote:
Originally Posted by Doug Lassiter
My apologies for not using code tags before.

But hey,
Code:
grep -Fvf exclude log.txt > out

worked *perfectly* for anything I put in my "exclude" file. That's the solution. Bingo.

But why? What does that "F" (that I wasn't originally using) do??? "Fixed strings"? That's a handy option.
Using fixed strings (-F option) instead of basic regular expressions (default) or extended regular expression (-E option) is faster when there aren't any characters that are special in a regular expression; but in this case (with or without -F), the results should be the same except for how long it takes to complete as long as the exclude file and all files being processed are proper text files. Either one of your files isn't a text file, there is a bug in the version of grep you're using, or some hidden characters in your exclude file are affecting regular expression parsing. It would still be interesting to see the output of:
Code:
od -bc exclude

for a version of exclude that causes:
Code:
grep -vf exclude log.txt > out

to go into never-never land.
This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 12-01-2013
P.S.

Code:
od -bc exclude
0000000   062 060 066 012 063 060 064 012 064 060 063 012 064 060 064 012
           2   0   6  \n   3   0   4  \n   4   0   3  \n   4   0   4  \n
0000020   064 060 065 012 065 060 060 012 151 156 166 151 164 145 012 155
           4   0   5  \n   5   0   0  \n   i   n   v   i   t   e  \n   m
0000040   160 063 012 144 141 156 012 163 157 146 151 141                
           p   3  \n   d   a   n  \n   s   o   f   i   a                
0000054

and

Code:
tail -5 log.txt|od -bc
0000000   061 070 060 056 067 066 056 065 056 062 060 040 055 040 055 040
           1   8   0   .   7   6   .   5   .   2   0       -       -    
0000020   133 063 060 057 116 157 166 057 062 060 061 063 072 060 064 072
           [   3   0   /   N   o   v   /   2   0   1   3   :   0   4   :
0000040   061 066 072 062 071 040 055 060 066 060 060 135 040 042 107 105
           1   6   :   2   9       -   0   6   0   0   ]       "   G   E
0000060   124 040 057 176 146 151 163 157 057 164 145 154 145 143 157 156
           T       /   ~   f   i   s   o   /   t   e   l   e   c   o   n
0000100   057 102 141 147 144 151 147 151 141 156 055 103 141 162 162 141
           /   B   a   g   d   i   g   i   a   n   -   C   a   r   r   a
0000120   163 161 165 151 154 154 157 137 065 055 062 062 055 061 063 170
           s   q   u   i   l   l   o   _   5   -   2   2   -   1   3   x
0000140   057 040 110 124 124 120 057 061 056 061 042 040 064 060 063 040
           /       H   T   T   P   /   1   .   1   "       4   0   3    
0000160   062 065 066 012 064 056 062 066 056 061 063 062 056 067 060 040
           2   5   6  \n   4   .   2   6   .   1   3   2   .   7   0    
0000200   055 040 055 040 133 063 060 057 116 157 166 057 062 060 061 063
           -       -       [   3   0   /   N   o   v   /   2   0   1   3
0000220   072 060 064 072 061 070 072 062 066 040 055 060 066 060 060 135
           :   0   4   :   1   8   :   2   6       -   0   6   0   0   ]
0000240   040 042 120 117 123 124 040 057 045 067 060 045 066 070 045 067
               "   P   O   S   T       /   %   7   0   %   6   8   %   7
0000260   060 045 067 060 045 066 061 045 067 064 045 066 070 057 045 067
           0   %   7   0   %   6   1   %   7   4   %   6   8   /   %   7
0000300   060 045 066 070 045 067 060 077 045 062 104 045 066 064 053 045
           0   %   6   8   %   7   0   ?   %   2   D   %   6   4   +   %
0000320   066 061 045 066 103 045 066 103 045 066 106 045 067 067 045 065
           6   1   %   6   C   %   6   C   %   6   F   %   7   7   %   5
0000340   106 045 067 065 045 067 062 045 066 103 045 065 106 045 066 071
           F   %   7   5   %   7   2   %   6   C   %   5   F   %   6   9
0000360   045 066 105 045 066 063 045 066 103 045 067 065 045 066 064 045
           %   6   E   %   6   3   %   6   C   %   7   5   %   6   4   %
0000400   066 065 045 063 104 045 066 106 045 066 105 053 045 062 104 045
           6   5   %   3   D   %   6   F   %   6   E   +   %   2   D   %
0000420   066 064 053 045 067 063 045 066 061 045 066 066 045 066 065 045
           6   4   +   %   7   3   %   6   1   %   6   6   %   6   5   %
0000440   065 106 045 066 104 045 066 106 045 066 064 045 066 065 045 063
           5   F   %   6   D   %   6   F   %   6   4   %   6   5   %   3
0000460   104 045 066 106 045 066 066 045 066 066 053 045 062 104 045 066
           D   %   6   F   %   6   6   %   6   6   +   %   2   D   %   6
0000500   064 053 045 067 063 045 067 065 045 066 070 045 066 106 045 067
           4   +   %   7   3   %   7   5   %   6   8   %   6   F   %   7
0000520   063 045 066 071 045 066 105 045 062 105 045 067 063 045 066 071
           3   %   6   9   %   6   E   %   2   E   %   7   3   %   6   9
0000540   045 066 104 045 067 065 045 066 103 045 066 061 045 067 064 045
           %   6   D   %   7   5   %   6   C   %   6   1   %   7   4   %
0000560   066 071 045 066 106 045 066 105 045 063 104 045 066 106 045 066
           6   9   %   6   F   %   6   E   %   3   D   %   6   F   %   6
0000600   105 053 045 062 104 045 066 064 053 045 066 064 045 066 071 045
           E   +   %   2   D   %   6   4   +   %   6   4   %   6   9   %
0000620   067 063 045 066 061 045 066 062 045 066 103 045 066 065 045 065
           7   3   %   6   1   %   6   2   %   6   C   %   6   5   %   5
0000640   106 045 066 066 045 067 065 045 066 105 045 066 063 045 067 064
           F   %   6   6   %   7   5   %   6   E   %   6   3   %   7   4
0000660   045 066 071 045 066 106 045 066 105 045 067 063 045 063 104 045
           %   6   9   %   6   F   %   6   E   %   7   3   %   3   D   %
0000700   062 062 045 062 062 053 045 062 104 045 066 064 053 045 066 106
           2   2   %   2   2   +   %   2   D   %   6   4   +   %   6   F
0000720   045 067 060 045 066 065 045 066 105 045 065 106 045 066 062 045
           %   7   0   %   6   5   %   6   E   %   5   F   %   6   2   %
0000740   066 061 045 067 063 045 066 065 045 066 064 045 066 071 045 067
           6   1   %   7   3   %   6   5   %   6   4   %   6   9   %   7
0000760   062 045 063 104 045 066 105 045 066 106 045 066 105 045 066 065
           2   %   3   D   %   6   E   %   6   F   %   6   E   %   6   5
0001000   053 045 062 104 045 066 064 053 045 066 061 045 067 065 045 067
           +   %   2   D   %   6   4   +   %   6   1   %   7   5   %   7
0001020   064 045 066 106 045 065 106 045 067 060 045 067 062 045 066 065
           4   %   6   F   %   5   F   %   7   0   %   7   2   %   6   5
0001040   045 067 060 045 066 065 045 066 105 045 066 064 045 065 106 045
           %   7   0   %   6   5   %   6   E   %   6   4   %   5   F   %
0001060   066 066 045 066 071 045 066 103 045 066 065 045 063 104 045 067
           6   6   %   6   9   %   6   C   %   6   5   %   3   D   %   7
0001100   060 045 066 070 045 067 060 045 063 101 045 062 106 045 062 106
           0   %   6   8   %   7   0   %   3   A   %   2   F   %   2   F
0001120   045 066 071 045 066 105 045 067 060 045 067 065 045 067 064 053
           %   6   9   %   6   E   %   7   0   %   7   5   %   7   4   +
0001140   045 062 104 045 066 105 040 110 124 124 120 057 061 056 061 042
           %   2   D   %   6   E       H   T   T   P   /   1   .   1   "
0001160   040 064 060 064 040 062 061 067 012 062 061 062 056 064 060 056
               4   0   4       2   1   7  \n   2   1   2   .   4   0   .
0001200   061 063 066 056 062 065 040 055 040 055 040 133 063 060 057 116
           1   3   6   .   2   5       -       -       [   3   0   /   N
0001220   157 166 057 062 060 061 063 072 060 064 072 062 067 072 064 071
           o   v   /   2   0   1   3   :   0   4   :   2   7   :   4   9
0001240   040 055 060 066 060 060 135 040 042 107 105 124 040 057 176 146
               -   0   6   0   0   ]       "   G   E   T       /   ~   f
0001260   151 163 157 057 164 145 154 145 143 157 156 057 103 157 156 144
           i   s   o   /   t   e   l   e   c   o   n   /   C   o   n   d
0001300   157 156 137 071 055 061 070 055 061 063 057 040 110 124 124 120
           o   n   _   9   -   1   8   -   1   3   /       H   T   T   P
0001320   057 061 056 061 042 040 062 060 060 040 066 060 066 071 012 070
           /   1   .   1   "       2   0   0       6   0   6   9  \n   8
0001340   065 056 063 061 056 062 061 071 056 066 064 040 055 040 055 040
           5   .   3   1   .   2   1   9   .   6   4       -       -    
0001360   133 063 060 057 116 157 166 057 062 060 061 063 072 060 064 072
           [   3   0   /   N   o   v   /   2   0   1   3   :   0   4   :
0001400   062 071 072 062 061 040 055 060 066 060 060 135 040 042 107 105
           2   9   :   2   1       -   0   6   0   0   ]       "   G   E
0001420   124 040 057 176 144 141 156 057 114 145 163 164 145 162 137 111
           T       /   ~   d   a   n   /   L   e   s   t   e   r   _   I
0001440   123 104 103 062 060 061 063 056 160 160 164 040 110 124 124 120
           S   D   C   2   0   1   3   .   p   p   t       H   T   T   P
0001460   057 061 056 061 042 040 062 060 060 040 064 070 064 065 065 066
           /   1   .   1   "       2   0   0       4   8   4   5   5   6
0001500   070 012 066 066 056 062 064 071 056 067 063 056 061 060 070 040
           8  \n   6   6   .   2   4   9   .   7   3   .   1   0   8    
0001520   055 040 055 040 133 063 060 057 116 157 166 057 062 060 061 063
           -       -       [   3   0   /   N   o   v   /   2   0   1   3
0001540   072 060 064 072 062 071 072 064 061 040 055 060 066 060 060 135
           :   0   4   :   2   9   :   4   1       -   0   6   0   0   ]
0001560   040 042 107 105 124 040 057 176 146 151 163 157 057 164 145 154
               "   G   E   T       /   ~   f   i   s   o   /   t   e   l
0001600   145 143 157 156 057 124 150 162 157 156 163 157 156 055 113 165
           e   c   o   n   /   T   h   r   o   n   s   o   n   -   K   u
0001620   164 164 145 162 137 061 062 055 061 065 055 061 060 057 040 110
           t   t   e   r   _   1   2   -   1   5   -   1   0   /       H
0001640   124 124 120 057 061 056 061 042 040 062 060 060 040 071 067 061
           T   T   P   /   1   .   1   "       2   0   0       9   7   1
0001660   012                                                            
          \n                                                            
0001661

That's for an "exclude" file that sends grep -vf into never-never land, but one that works with grep -Fvf.

Also, FWIW, I wrote those "exclude" files both with the Mac text editor *and* with vi. No difference. So I can't see why there might be hidden characters.

---------- Post updated at 05:54 PM ---------- Previous update was at 05:44 PM ----------

Quote:
Originally Posted by alister
With -F, exclude contains literal strings. Without it, regular expressions.
I have to assume that was what was special about the alphabetic (as opposed to numeric) characters in the "exclude" file. With grep- vf, grep wanted to interpret those alphabetic character strings as expressions. The numbers were obviously not expressions, so it didn't get confused with those.

That's what seemed so screwy. grep -vf could exclude numbers successfully, but not alphabetic strings. Now, in my book, both are alphanumerics, so if one worked, the other should as well.
# 11  
Old 12-01-2013
Quote:
Originally Posted by Doug Lassiter
P.S.

Code:
od -bc exclude
0000000   062 060 066 012 063 060 064 012 064 060 063 012 064 060 064 012
           2   0   6  \n   3   0   4  \n   4   0   3  \n   4   0   4  \n
0000020   064 060 065 012 065 060 060 012 151 156 166 151 164 145 012 155
           4   0   5  \n   5   0   0  \n   i   n   v   i   t   e  \n   m
0000040   160 063 012 144 141 156 012 163 157 146 151 141                
           p   3  \n   d   a   n  \n   s   o   f   i   a                
0000054

... ... ...
That's for an "exclude" file that sends grep -vf into never-never land, but one that works with grep -Fvf.

Also, FWIW, I wrote those "exclude" files both with the Mac text editor *and* with vi. No difference. So I can't see why there might be hidden characters.
Wrong. There is a HUGE difference.

You didn't write the above exclude file with vi. It is not a text file. (It doesn't have a <newline> character at the end of the last line.) The behavior of grep is undefined when the input files you give it are not text files.

If you add the missing <newline> character to that file, I would be very surprised if grep -vf doesn't start working as you expected. There are several ways to fix that file. Among them are:
Code:
echo >> exclude

but don't do that if there is a <newline> at the end of the file or it will add an empty line to the end of the file; and
Code:
vi exclude

which will probably include a note in the status line when you open it something like:
Code:
"exclude" [noeol] 10L, 44C

where the noeol means that there is no end-of-line marker at the end of the last line. Using the vi command :wq will add the missing <newline> character to the buffer, rewrite the file with the missing <newline>, and quit. You can repeat this as many times as you want and it won't add empty lines to the end of the file.

Note that the standards don't require vi to work when given a file that is not a text file either; but the vi (or vim) on OS X will do what you need in this case.
# 12  
Old 12-01-2013
"The behavior of grep is undefined when the input files you give it are not text files."

So, um, here's a dumb question -- how is it that a file produced by Mac "TextEdit" is not a "text file"? But indeed, if the filename thus produced doesn't have a .txt on the end, it doesn't seem to have a <newline> at the end. In fact, if I open such a file with vi, it says at the bottom "[noeol]", like you said it would. I save it with vi, and from then on, "[noeol]" isn't reported. vi inserts that <newline> when it saves it and makes it into a real live text file, I guess. I can also just change the file name from "exclude" to "exclude.txt", and the OS sticks a <newline> on, it seems. Wow.

So a real "text file" has to have a <newline> character at the end, and Mac TextEdit doesn't put it there, if you don't specify a .txt suffix. I never knew that. I naively thought that, well, text is text.

Now, having done that, grep -vf still doesn't work on that file, once it has a <newline> on it.
# 13  
Old 12-01-2013
Quote:
Originally Posted by Doug Lassiter
"The behavior of grep is undefined when the input files you give it are not text files."

So, um, here's a dumb question -- how is it that a file produced by Mac "TextEdit" is not a "text file"? But indeed, if the filename thus produced doesn't have a .txt on the end, it doesn't seem to have a <newline> at the end. In fact, if I open such a file with vi, it says at the bottom "[noeol]", like you said it would. I save it with vi, and from then on, "[noeol]" isn't reported. vi inserts that <newline> when it saves it and makes it into a real live text file, I guess. I can also just change the file name from "exclude" to "exclude.txt", and the OS sticks a <newline> on, it seems. Wow.

So a real "text file" has to have a <newline> character at the end, and Mac TextEdit doesn't put it there, if you don't specify a .txt suffix. I never knew that. I naively thought that, well, text is text.

Now, having done that, grep -vf still doesn't work on that file, once it has a <newline> on it.
The Mac OS X TextEdit application processes several file formats that are text files and several file formats that are not text files. If the name of a file opened (or created) by TextEdit ends with ".txt", it will treat it as a text file; if it ends with ".rft", it will treat it as a rich text file; if it ends with ".doc", it will handle some of the text formatting done by Microsoft Word (and note that most Microsoft Word files ARE NOT text files). If there is no filename extension on the file, the preferences you have set in TextEdit will determine how it treats that file.

If you have a file (say xxx) that is not a text file and you rename the file xxx.txt, that doesn't change the format or contents of the file. (Although TextEdit might try to turn it into a text file if you use it to edit that file after you rename it.) Most UNIX utilities that take a filename as an operand could care less what the name of the file is. The filename extensions like .txt, .sh, .mp3, .rtf, et cetera provide a useful convention to help humans (and a few applications) make good guesses about what should be inside that file.

If you have turned exclude into a real text file and:
Code:
grep -vf exclude log.txt

still goes to never-never land, I would assume that (even though the filename ends in .txt and has a <newline> at the end of the file) it is not a text file as defined by the standards. The most likely problems would be that one or more "lines" in log.txt are longer than LINE_MAX (2048 on recent Mac OS X systems) bytes or it contains one or more null bytes (i.e., a byte with all bits set to 0).
# 14  
Old 12-01-2013
Yes, that is, of course, very true about different kinds of text files. But it is fascinating that a <newline> character, which isn't displayed in even vi, determines whether grep thinks of the file as truly text. That octal dump command is handy in that regard, as I suppose is the status line in vi.

And yes, that's right about just changing the suffix. I thought that worked, but it really doesn't. My mistake.

That's interesting speculation about why a file that looks like a real text file isn't doing the job with grep -vf. But at least with the F option, it's all fine and I can make it work. So I'll think about that and, in the meantime, let me thank everyone here for their very prompt and helpful comments.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

2. Shell Programming and Scripting

grep with "[" and "]" and "dot" within the search string

Hello. Following recommendations for one of my threads, this is working perfectly : #!/bin/bash CNT=$( grep -c -e "some text 1" -e "some text 2" -e "some text 3" "/tmp/log_file.txt" ) Now I need a grep success for some thing like : #!/bin/bash CNT=$( grep -c -e "some text_1... (4 Replies)
Discussion started by: jcdole
4 Replies

3. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

4. Shell Programming and Scripting

ps -ef | grep "string1" "string2" " "string3"

Hi all, can any one suggest me the script to grep multiple strings from ps -ef pls correct the below script . its not working/ i want to print OK if all the below process are running in my solaris system. else i want to print NOT OK. bash-3.00$ ps -ef | grep blu lscpusr 48 42 ... (11 Replies)
Discussion started by: steve2216
11 Replies

5. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

6. AIX

xx=`date +"%a %b %d"`;rsh xxx grep "^$XX" zzz ?

AIX 4.2 I am trying to do an rsh grep to search for date records inside server logs by doing this : xx=`date +"%a %b %d"` rsh xxx grep "^$XX" zzz gives : grep: 0652-033 Cannot open Jun. grep: 0652-033 Cannot open 11. But if I do : xx=`date +"%a %b %d"` grep "^$XX" zzz it works... (2 Replies)
Discussion started by: Browser_ice
2 Replies

7. UNIX for Dummies Questions & Answers

How to use the "grep/egrep" command to search files.

Hi Team, I am new to this forum and also trying to learn Unix. I will highly appriciate your help if you can help me to get the right command . {{{ I use the command " today | egrep '(10:| 11: )' | grep ERROR " to grep all the files that has been error betweeen 10 to 11... (6 Replies)
Discussion started by: rkhanal
6 Replies

8. Shell Programming and Scripting

ls -laR | grep "^-" | awk '{print $9}'| grep "$.txt"

Hi, I don't know hot to make this command work: ls -laR | grep "^-" | awk '{print $9}'| grep "$.txt" It should return the list of file .txt It's important to search .txt at the end of the line, becouse some file name have "txt" in their name but have other extensions (13 Replies)
Discussion started by: DNAx86
13 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

10. Shell Programming and Scripting

grep to find content in between curly braces, "{" and "},"

problem String ~~~~~~~~~~~~~~~~~~ icecream= { smart peopleLink "good" LC "happy" , smartpeopleLink "dull" LC "sad" } aend = {smart vc4 eatr kalu} output needed ~~~~~~~~~~~~~~~~~~ smart peopleLink "good" LC "happy" , smartpeopleLink "dull" LC "sad" smart vc4... (4 Replies)
Discussion started by: keshav_rk
4 Replies
Login or Register to Ask a Question