Sponsored Content
Top Forums Shell Programming and Scripting Is there a 'fuzzy search' facility in Linux? Post 302468623 by Bashingaway on Wednesday 3rd of November 2010 10:59:21 AM
Old 11-03-2010
Is there a 'fuzzy search' facility in Linux?

I have over 10m documents that I want to search through against a list of know keywords, however the documents were produced using a technique that isn't perfect in how the data was presented.

Is there a fuzzy keyword search available in Linux or can anyone think of a way of doing it that isn't horrendously time expensive?

Example Keyword

Banana

Search therefore, case insensitive for...

Banana
Banan*
Bana*a
Ban*na
Ba*ana
B*nana
*anana

Bana**
Ban*n*
Ba*an*
B*nan*
*anan*
Ban**a
Ba*a*a
B*na*a
*ana*a

and so on.....

With 500 keywords and average of 10 characters per word that's over 50k 'fuzzy searches' per page to cover all the permutations, for words above 9 characters you'd probably want to have even more then 2 * characters per word which ramps up the number of searches even more.

Ideas please?
 

7 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using the LOG_AUTH facility

Hi, I am wanting to enable logging of all ftp sessions on my Solaris 8 host. I want to at least log all ftp logins and if possible any commands that the user executes. I have tried various settings in syslog.conf then rereading syslogd but logging still does not happen. I have... (1 Reply)
Discussion started by: blp001
1 Replies

2. OS X (Apple)

Unix email facility

Dear all, I am an inexperienced man with Macitosh and green in Apple OS X . I had tried very hard to use Unix, in fact the Terminal, with its Email function. I read some books and came to know that it has Mail, mailx or mail functions that we can use for simple mail. I have try every... (3 Replies)
Discussion started by: Larry LAM
3 Replies

3. Programming

Fuzzy Match Logic for Numerical Values

I have searched the internet (including these forums) and perhaps I'm not using the right wording. What I'm looking for is a function (preferably C) that analyzes the similitude of two numerical or near-numerical values, and returns either a true/false (match/nomatch) or a return code that... (4 Replies)
Discussion started by: marcus121
4 Replies

4. UNIX for Dummies Questions & Answers

Unable to use the CDE Facility

Hello I have a SunBlade 1000 workstation and I cannot login via CDE. All I get is a console login prompt. I then have to login via root and I just get the command line interface. I have being doing some research on the UNIX forum and the problem may lie with the content in etc/hosts file.... (7 Replies)
Discussion started by: tjwops
7 Replies

5. Hardware

Monitor/projector display looks fuzzy

Hi there Not sure if I'm posting this in the right section...but here goes. I'm using an HP Compaq nc8430 laptop. Graphics card according to specs is an ATI Mobility Radeon X1600. It's the first time I installed Linux for use on my personal laptop and I'm having trouble using it with a... (0 Replies)
Discussion started by: notreallyhere
0 Replies

6. Shell Programming and Scripting

How to delete corrupted characters and then do fuzzy searches?

Hi All I have a whole block of pages that have come in from various sources, unfortunately the pages in many instances have blocks of corrupted text. What I'm trying to do is write a sed line that will just delete non alphanumeric characters if they're in a block of say three or four... (5 Replies)
Discussion started by: Bashingaway
5 Replies

7. Shell Programming and Scripting

fuzzy sequence match in a text file

Hi Forum: I have struggle with it and decide to use my eye ball to accomplish this. Basically I am looking for sequence of date inside a file. If one of the sequence repeat 2-3 time or skip once; it's still consider a match. input text file: Sep 6 A Sep 6 A Sep 10 A Sep 7 B Sep 8... (7 Replies)
Discussion started by: chirish
7 Replies
CMIGREP(1)						      General Commands Manual							CMIGREP(1)

NAME
cmigrep - search in ocaml compiled interface files SYNOPSIS
cmigrep <options> <module-expression> DESCRIPTION
This manual page documents briefly the cmigrep command. This manual page was written for the Debian GNU/Linux distribution because the original program does not have a manual page. cmigrep allows to search for information in compiled interfaces of OCaml modules. By default, the search applies to the modules described in the .cmi files in the curent directory and in the ocaml standard directory, but this can be changed with the -I option (see below). The argument <module-expr> can be an exact module name, or a shell wildcard. Multiple modules can be specified. Example: "ModA ModB Foo*.Make" means to search ModA, ModB, and any submodule Make of a module that starts with Foo. OPTIONS
General Options -I directory Add directory to the search path for modules -package packages comma separated list of findlib packages to search open modules comma separated list of open modules (in order!) -help, --help display list of options Search Patterns -t regexp print types with matching names -r regexp print record field labels with matching names -c regexp print constructors with matching names -p regexp print polymorphic variants with matching names -m regexp print all matching module names in the path -e regexp print exceptions with matching constructors -v regexp print values with matching names -o regexp print all classes with matching names -a regexp print all names which match the given expression SEE ALSO
Examples can be found on /usr/share/doc/cmigrep/README. AUTHOR
cmigrep is written by Eric Stokes <letaris@mac.com>. This manual page was compiled by Ralf Treinen <treinen@debian.org>. CMIGREP(1)
All times are GMT -4. The time now is 07:29 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy