Problems with "grep -vf", and exclusion files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Problems with "grep -vf", and exclusion files
# 1  
Old 11-30-2013
Problems with "grep -vf", and exclusion files

OK, this really has me bolluxed.

I'm using Mac 10.6.8, and I'm trying to do simple analysis of my Apache logs in Unix. Let's call that text file "log".

Now I want to remove from that file all lines with incomplete GETs, so I do

Code:
grep -vf exclude log > out

where the file "exclude" has lines (one each) for 206 , 304 , 403 , 404 , 405 , 500 (included spaces). Those are status codes whose lines I want to remove from the log file. That works great.

Oh, I'll get rid of requests for mp3s too. So I add to that exclude file a line with "mp3" in it. I run the command, and it goes off in never-never land.

If I use an exclude file with just "mp3" in it, it works fine. Lines containing "mp3" are removed.

If I use an exclude file with "mp3" and "invitation" it works fine. Those lines are removed.

But if I use an exclude file with three or more lines of alphabetical characters, like "mp3, "invitation", and "pdf", it goes off into never-never land.

The rule seems to be that the "exclude" file works great with a list of numbers, OR with two or less words (as in, not numbers). This seems just nuts. What's going on??? I have basic Unix expertise. Is this a Mac peculiarity, or is there something else going on?

Last edited by radoulov; 12-01-2013 at 05:11 AM..
# 2  
Old 11-30-2013
When I try this on Mac OS X 10.7.5 with exclude containing the numbers and strings you listed, with exclude containing just the numbers you listed (with leading and trailing spaces), and with exclude containing just the three strings you listed, grep -vf exclude works as expected.

Please show us the exact contents of the exclude file and the exact command line you're using when grep goes off in never-never land.
# 3  
Old 12-01-2013
Thanks. Let me be more specific. My log file "log.txt" is pretty big. About 30 MB. My command line is

Code:
grep -vf exclude log.txt > out

If my "exclude" file is a text file with

Code:
206
304
403
404
405
500

It works fine. Almost instantaneously, the file "out" is created, in which lines with these numbers have been properly removed.

But if my exclude file is

Code:
206
304
403
404
405
500
invite

A blank "out" file is quickly created, but the command line doesn't report that it the command has finished. It just hangs. I am careful, by the way, not to have blank lines in my "exclude" file. top tells me that the grep process is still working hard (100% of CPU!), but the "out" file remains blank. That's what I call "never-never land".

Again, the general rule seems to be that a list of *numbers* in my exclude file works fine, as does one or two alphabetic strings. Three alphabetic strings does not work. Not what's nuts about this is that a number *is* an alphameric string (though not an alphabetic string).

Last edited by Don Cragun; 12-01-2013 at 02:12 PM.. Reason: Add CODE tags.
# 4  
Old 12-01-2013
Quote:
Originally Posted by Doug Lassiter
A blank "out" file is quickly created, but the command line doesn't report that it the command has finished. It just hangs. I am careful, by the way, not to have blank lines in my "exclude" file. top tells me that the grep process is still working hard (100% of CPU!), but the "out" file remains blank. That's what I call "never-never land".
The output file is not "blank" -- it's a zero length file. Any time you use the redirection operator > in a command line it will immediately truncate the file to zero regardless of file size. This happens before grep gets executes.

I am not sure why the CPU is pegged with your test case but I wanted to clarify that behaviour.
# 5  
Old 12-01-2013
Please use code tags as required by forum rules!

What happens if you try that with a short log file?
# 6  
Old 12-01-2013
Quote:
Originally Posted by Doug Lassiter
Thanks. Let me be more specific. My log file "log.txt" is pretty big. About 30 MB. My command line is

Code:
grep -vf exclude log.txt > out

If my "exclude" file is a text file with

Code:
206
304
403
404
405
500

It works fine. Almost instantaneously, the file "out" is created, in which lines with these numbers have been properly removed.

But if my exclude file is

Code:
206
304
403
404
405
500
invite

A blank "out" file is quickly created, but the command line doesn't report that it the command has finished. It just hangs. I am careful, by the way, not to have blank lines in my "exclude" file. top tells me that the grep process is still working hard (100% of CPU!), but the "out" file remains blank. That's what I call "never-never land".

Again, the general rule seems to be that a list of *numbers* in my exclude file works fine, as does one or two alphabetic strings. Three alphabetic strings does not work. Not what's nuts about this is that a number *is* an alphameric string (though not an alphabetic string).
I'm not seeing that behavior on OS X 10.7.5. And, I can't imagine a reason why alphabetic characters rather than numeric characters in the exclude file would matter. Please try the following commands and let us know if any of them complete successfully with the 2nd sample exclude file contents shown above:
Code:
grep -Fvf exclude log.txt > out
grep -vf exclude /usr/include/sysexits.h > out
grep -Fvf exclude /usr/include/sysexits.h > out

(Note: I chose /usr/include/sysexits.h as a sample text file for this test because it contains the string "invite" on OS X 10.7.5.)

Several OS X text processing utilities do strange things if asked to process a file that is not a text file (contains a line longer than 2048 bytes including the terminating <newline> character, or a file that is not empty but does not end with a <newline> character). Please show us the output from the commands:
Code:
od -bc exclude
tail -5 log.txt|od -bc

(and use CODE tags to show us that output).
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 12-01-2013
My apologies for not using code tags before.

But hey,
Code:
grep -Fvf exclude log.txt > out

worked *perfectly* for anything I put in my "exclude" file. That's the solution. Bingo.

But why? What does that "F" (that I wasn't originally using) do??? "Fixed strings"? That's a handy option.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

2. Shell Programming and Scripting

grep with "[" and "]" and "dot" within the search string

Hello. Following recommendations for one of my threads, this is working perfectly : #!/bin/bash CNT=$( grep -c -e "some text 1" -e "some text 2" -e "some text 3" "/tmp/log_file.txt" ) Now I need a grep success for some thing like : #!/bin/bash CNT=$( grep -c -e "some text_1... (4 Replies)
Discussion started by: jcdole
4 Replies

3. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

4. Shell Programming and Scripting

ps -ef | grep "string1" "string2" " "string3"

Hi all, can any one suggest me the script to grep multiple strings from ps -ef pls correct the below script . its not working/ i want to print OK if all the below process are running in my solaris system. else i want to print NOT OK. bash-3.00$ ps -ef | grep blu lscpusr 48 42 ... (11 Replies)
Discussion started by: steve2216
11 Replies

5. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

6. AIX

xx=`date +"%a %b %d"`;rsh xxx grep "^$XX" zzz ?

AIX 4.2 I am trying to do an rsh grep to search for date records inside server logs by doing this : xx=`date +"%a %b %d"` rsh xxx grep "^$XX" zzz gives : grep: 0652-033 Cannot open Jun. grep: 0652-033 Cannot open 11. But if I do : xx=`date +"%a %b %d"` grep "^$XX" zzz it works... (2 Replies)
Discussion started by: Browser_ice
2 Replies

7. UNIX for Dummies Questions & Answers

How to use the "grep/egrep" command to search files.

Hi Team, I am new to this forum and also trying to learn Unix. I will highly appriciate your help if you can help me to get the right command . {{{ I use the command " today | egrep '(10:| 11: )' | grep ERROR " to grep all the files that has been error betweeen 10 to 11... (6 Replies)
Discussion started by: rkhanal
6 Replies

8. Shell Programming and Scripting

ls -laR | grep "^-" | awk '{print $9}'| grep "$.txt"

Hi, I don't know hot to make this command work: ls -laR | grep "^-" | awk '{print $9}'| grep "$.txt" It should return the list of file .txt It's important to search .txt at the end of the line, becouse some file name have "txt" in their name but have other extensions (13 Replies)
Discussion started by: DNAx86
13 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

10. Shell Programming and Scripting

grep to find content in between curly braces, "{" and "},"

problem String ~~~~~~~~~~~~~~~~~~ icecream= { smart peopleLink "good" LC "happy" , smartpeopleLink "dull" LC "sad" } aend = {smart vc4 eatr kalu} output needed ~~~~~~~~~~~~~~~~~~ smart peopleLink "good" LC "happy" , smartpeopleLink "dull" LC "sad" smart vc4... (4 Replies)
Discussion started by: keshav_rk
4 Replies
Login or Register to Ask a Question