Full text searching for multiple items


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Full text searching for multiple items
# 1  
Old 04-29-2010
Full text searching for multiple items

I am trying to find a solution to a request here at work. I have been asked to do a full text search of around 300,000 files for multiple content items.

The following words need to appear in the file.
(april and\or may) and pie and (red and\or white).

So a file with the words april pie and white would be valid, but a file with only the words april and white would not.

a file with all the words is ok also, I have tried egrep with regex but its not doing what I need. So I figured I needed to write something in shell or perl.

any help appreciated.
Thanks
# 2  
Old 04-29-2010
Code:
 egrep -l "april|may" * | xargs grep -l "pie" | xargs egrep -l "red|white"

# 3  
Old 04-29-2010
Quote:
Originally Posted by anbu23
Code:
 egrep -l "april|may" * | xargs grep -l "pie" | xargs egrep -l "red|white"

Thank you for the quick reply, when I try do to this, it gives me the following errors. The first egrep works, but the | xargs is where it fails. Any ideas on why i might be getting this. If it matters this is AIX 6.1. I have full read\write access to these files. Those are the only 2 files in my test directory i am using to see if I can make it work.

grep: 0652-033 Cannot open test
grep: 0652-033 Cannot open 1.txt

Thanks
# 4  
Old 04-29-2010
Code:
#!/bin/ksh
ok()
{
    awk ' !arr["pie"] && /pie/ {arr["pie"]++}
            !arr["March or April"] && ( /March/ || /April/ ) {arr["March or April"]++}
            !arr["red or white"] && (/red/ || /white/) {arr["red or white"]++}
           END { for(i in arr) {k++}  exit( k==3 ? 0 : 1) }' "$1"                            
     print $?
}
find /path/to/files -type f |
while read fname
do
    [[ $(ok $fname) -eq 0 ]] && echo $fname
done

If the files are large and those keywords are scarce, then any solution has the potential take a very, very long time. Your "requirements" make the request seem more like homework, which I hope it is not. We have rules for homework.
# 5  
Old 04-29-2010
The requirements are for lawyers, we need the full text search for some litigation. I changed the keywords because well no one needs to know what we are searching for. Thank you for your code, I appreciate the help, I am not good at scripting. If you would like conformation this is for a professional purpose I would be more than happy to provide that.
# 6  
Old 04-29-2010
No, that's okay. However:
for 300,000 files you are going to be in for a long wait. I hope the lawyers pay you the way they bill us Smilie
# 7  
Old 04-29-2010
even better, is i get to perform this search with 40 different combinations of seach terms on all the same file sets. gotta love lawyers.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Issue with search and replacing multiple items in multiple files

Im having an issue when trying to replace the first column with a new set of values in multiple files. The results from the following code only replaces the files with the last set of values in val.txt. I want to replace all the files with all the values. for date in {1..31} do for val in... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

2. UNIX for Dummies Questions & Answers

Grep - Searching for multiple items using one command

I am performing a regular check on UNIX servers which involves logging onto UNIX servers and using the grep command to check if a GID exists in the /etc/group directory e.g. grep 12345 /etc/group I have five to check on each server, is there anyway I can incorporate them into one command and... (2 Replies)
Discussion started by: @MeDaveT
2 Replies

3. Shell Programming and Scripting

Help in searching a multiple text in zip file

Hi Gurus, i have 8 zipped files and each file is having more than 100,000 records or more. issue :- i want to search the missing text from each zipped files i have stuck here, the below command works fine if i give the value 10 for the deptno. if i have more than 1 records... (6 Replies)
Discussion started by: SeenuGuddu
6 Replies

4. Shell Programming and Scripting

searching a file with a specified text without using conventional file searching commands

without using conventional file searching commands like find etc, is it possible to locate a file if i just know that the file that i'm searching for contains a particular text like "Hello world" or something? (5 Replies)
Discussion started by: arindamlive
5 Replies

5. Shell Programming and Scripting

Nawk help searching for multiple lines and multiple searches

I use this command to find a search (Nr of active alarms are) and print one line before and 10 lines after the search keywords. nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=1 a=10 s="Nr of active alarms are:" *.log However, I would like to know how to tell it to print... (3 Replies)
Discussion started by: tthach830
3 Replies

6. UNIX for Dummies Questions & Answers

Searching multiple items

Hi, I'm a complete newbie so bear with me. I have a directory (and sub-dirs) full of .doc, .xls files. What I'm trying to do is do a single search within the files (i.e. within each .doc etc) for occurrences of multiple items e.g. apples, pears, grapes, bananas. Basically I'd provide a... (4 Replies)
Discussion started by: kainfs
4 Replies

7. Shell Programming and Scripting

awk between items including items

OS=HP-UX ksh The following works, except I want to include the <start> and <end> in the output. awk -F '<start>' 'BEGIN{RS="<end>"; OFS="\n"; ORS=""} {print $2} somefile.log' The following work in bash but not in ksh sed -n '/^<start>/,/^<end>/{/LABEL$/!p}' somefile.log (4 Replies)
Discussion started by: Ikon
4 Replies

8. Shell Programming and Scripting

Dynamic select with multiple word menu items

Hello all, I'm developing a deployment script at work and at one point it would need to display something like this: Which version of ADMIN would you like to deploy? 1) 1.0.0 (store1, 10 Jan 2004) 2) 1.0.1 (store1, 20 Jun 2004) 3) 1.0.2 (store1, 15 Jul 2004) Select a version : I know... (5 Replies)
Discussion started by: domivv
5 Replies

9. Shell Programming and Scripting

Searching multiple files with multiple expressions

I am using a DEC ALPHA running Digital UNIX (formly DEC OSF/1) and ksh. I have a directory with hundreds of files that only share the extension .rpt. I would like to search that directory based on serial number and operation number and only files that meet both requirements to be printed out. I... (6 Replies)
Discussion started by: Anahka
6 Replies
Login or Register to Ask a Question