Need some help with shell content scanner


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need some help with shell content scanner
# 1  
Old 05-19-2009
Question Need some help with shell content scanner

Just started to create my own small content scanner that searches all the visible files on my server, but now I got stuck. It should be used to scan the files for phrases like in the following example.



What I tried is the following code:
Code:
#!/bin/bash
find /home/userid*/public_html/ -size -307200k -exec grep -H -n -i -l 'www.exampleurl1.com/favicon.ico\|www.exampleurl2.com/v/' > /home/mypath/scan_content.php {} \;

That code first finds all the files within all public_html folders that are not larger than 307200k follows with scanning the content of that files.


Now that worked fine for the first few thousand files, but now it stopped working. I thing there are to many files so that grep cant read all of them or something else. There is no error or something, the process just keeps alive but with a cpu & mem usage of 0 and that forever.


So it would be great if someone has an idea of how to write that scanner to ensure that it also works with a few hundred thousand files.


Thanks
# 2  
Old 05-19-2009
I just got the tip to use find with xargs and grep to solve that problem, but my combinations just wont works. Hopefully someone could help, because I have never tried something like that before.

Code:
#!/bin/bash
find /home/userid*/public_html/ -size -2048k | xargs grep -H -n -i -l 'phrase1\|phrase2' > /home/filepath/public_html/path/scans/scan_result.php {} \;

Need some pro here to help me with problem, because I am still a beginner with bash.
# 3  
Old 05-19-2009
Not clear whether you need just the filenames or their lines as well, also I'm assuming that you're using the pipe | as an alternation operator (RE) - not as a physical part of the files' records. Modify if needed.

Code:
find /home/userid*/public_html/ -size -2048k | xargs grep -Eil 'phrase1|phrase2' > /home/filepath/public_html/path/scans/scan_result.php

If using GNU find/xargs ( most Linuxes ), use their -0 option to handle problematic filenames.
# 4  
Old 05-20-2009
Thanks for the reply, I just tried your code but got some problems.

First I just tried:

Code:
find /home/userid*/public_html/ -size -2048k | xargs grep -Eil 'phrase1|phrase2' > /home/filepath/public_html/path/scans/scan_result.php

there I go a lot of error messages from grep that the files or folders don't exist.

I also tried it with -0 in the following way:

Code:
find /home/userid*/public_html/ -size -2048k | xargs -0 grep -Eil 'phrase1|phrase2' > /home/filepath/public_html/path/scans/scan_result.php

There the problem is that I get an error with xargs telling me that the xargs Argument is too long.
# 5  
Old 05-20-2009
Quote:
Originally Posted by medic
Thanks for the reply, I just tried your code but got some problems.

First I just tried:

Code:
find /home/userid*/public_html/ -size -2048k | xargs grep -Eil 'phrase1|phrase2' > /home/filepath/public_html/path/scans/scan_result.php

there I go a lot of error messages from grep that the files or folders don't exist.
...

It works fine with me. Try testing the commands separately, first find:

( I guess you know that the way it is, find will find files and directories )

Code:
find /home/userid*/public_html/ -size -2048k

grep command on some test files ( plural ),

Code:
grep -Eil 'phrase1|phrase2' test_files*

then all together with xargs, ( I'm guessing you're on Linux - with GNU find/xargs the right syntax is a bit different, print has to be spelled out explicitly ):

Code:
find /home/userid*/public_html/ -size -2048k -print0 | xargs -0 grep -Eil 'phrase1|phrase2' > output_file

# 6  
Old 05-20-2009
Just found the problem. The first part of the code is working fine, but grep is producing some problems.

I just tried to scan for more than one phrase and there the problem occurs.

Code:
find /home/userid*/public_html/ -size -2048k -print0 | xargs -0 grep -Eil 'phrase1|phrase2' > output_file

I just tried it like that:

Code:
'www.steampowered.com\|www.icq.com'

With that code, there is no output. By just using one of these phrases it works.

Do I miss something?
# 7  
Old 05-20-2009
Okay, just found my fault, it was just the copy paste of the grep parameters.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Hardware

Epson Scanner

Running Debian 8.5 on a Dell Laptop I have an Epson V39 scanner. Simple scan cannot detect it. Here is what I have: root@server1:/home/server1# sane-find-scanner # sane-find-scanner will now attempt to detect your scanner. If the # result is different from what you expected, first... (2 Replies)
Discussion started by: Meow613
2 Replies

2. Ubuntu

Can Scanner be Initialized from the Terminal

Hi, somewhat of a newbie with Linux, although I have been at it for about three weeks now. Is there a way to wake up or initialize my scanner with a command in the terminal? (6 Replies)
Discussion started by: klrman
6 Replies

3. Red Hat

IP Scanner tool

Hey guys.. What is the best tool that can be used on Linux for IP scanning tool that can bring ping status, hostname, and any other open service. I wish I can find a tool like "The Dude" from Mikrotik, but that works only under Windows. Thanks (4 Replies)
Discussion started by: leo_ultra_leo
4 Replies

4. Shell Programming and Scripting

Shell :copying the content from one file to another

I have a log containing the below lines. file1.log ----------- module: module1 module10 module2 module002 module9 moduleRT100.2.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... (1 Reply)
Discussion started by: giridhar276
1 Replies

5. Linux

micro film scanner

epson microfilm 500 scsi: Is there any way to make this work under linux ? I'm using pclinuxos, it shows the machine in the device panel as sg2 and lists the machine , so Im guessing the kernel knows what it is, but I can't view it as a scanner or capture or input device . What catagory does... (4 Replies)
Discussion started by: tom1200
4 Replies

6. Shell Programming and Scripting

Need get content of ELF shell script

I have a script file that file type is ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs) Now I want to get the contents of this file. How can I ? Any help me to get cotents of this file type? (2 Replies)
Discussion started by: karthickk02
2 Replies

7. Shell Programming and Scripting

Shell script to remove some content in a file

How can I remove all data that contain domain e.g zzgh@something.com, sdd@something.com.my and gg@something.my in one file? so that i only have data without the domain in the file. Here is the file structure "test.out" more test.out 1 zzztop@b.com 1 zzzulll 1 zzzullll@s.com.my ... (4 Replies)
Discussion started by: Mr_47
4 Replies

8. Shell Programming and Scripting

shell script to edit the content of a file

Hi I need some help using shell script to edit a file. My original file has the following format: /txt/email/myemail.txt /txt/email/myemail2.txt /pdf/email/myemail.pdf /pdf/email/myemail2.pdf /doc/email/myemail.doc /doc/email/myemail2.doc I need to read each line. If the path is... (3 Replies)
Discussion started by: tiger99
3 Replies

9. Solaris

log file scanner

anyone know of a FREE logfile checker that they would recommend? looking to scan thru syslog, sulog, messages, etc... looking for security type related entries., thanks, brian (1 Reply)
Discussion started by: BG_JrAdmin
1 Replies
Login or Register to Ask a Question