Wildcard Pattern Matching In C


 
Thread Tools Search this Thread
Top Forums Programming Wildcard Pattern Matching In C
# 1  
Old 07-01-2016
Wildcard Pattern Matching In C

I've been having problems lately trying to do pattern matching in C while implementing wildcards. Take for instance the following code:

Code:
#include <sys/types.h> 
#include <sys/stat.h> 
#include <stdio.h> 
#include <stdlib.h> 
#include <unistd.h> 
#include <dirent.h> 
#include <string.h> 
#include <time.h>  

void grepwc(char *b) {      

        FILE *fp;     
        fp = fopen("/var/log/apache2/other_vhosts_access.log", "r");     
        char line[100];     
        unsigned int i = 0;      
 
        while(fgets(line, sizeof(line), fp)) {       
             if (strstr(line, b) != NULL) {         
                  i++;       
              }     
        }    

        printf("%s %d\n", b, i);   
        fclose(fp);  
}  

int main() {          

time_t current = time(NULL);         
char date_time[10];         
char newhold[34];  

/*         strftime(date_time, sizeof(date_time), "%d", localtime(&current));         
           strncat( newhold, date_time, 10 );         
           strncat( newhold, "?", 2 );         
           strftime(date_time, sizeof(date_time), "%b", localtime(&current));         
           strncat( newhold, date_time, 10 );         
           strncat( newhold, "*", 2 ); */         
           strncat( newhold, "pattern", 10 );       
       
           grepwc(newhold);          
              
           return 0; 
}

With what I have commented out it looks for the word "pattern" in a log file and counts each match:
Code:
$ ./test
pattern 2

I should note that output may look a little broken depending on your architechure of processor. This is the file I am gathering this from right now:

Code:
$ cat /var/log/apache2/other_vhosts_access.log
01 Jul pattern
pattern

However, if we uncomment these lines:
Code:
                    strftime(date_time, sizeof(date_time), "%d", localtime(&current));         
                    strncat( newhold, date_time, 10 );         
                    strncat( newhold, "?", 2 );         
                    strftime(date_time, sizeof(date_time), "%b", localtime(&current));         
                    strncat( newhold, date_time, 10 );         
                    strncat( newhold, "*", 2 );

The date is not matched:
Code:
$ ./test 
01?Jul*pattern 0

I've tried searching some lots pattern matching tutorials in C and even tried applying some regex. The best I got was to get this to work with only one wildcard, but I'm needing to get this to work with 2 or more. Any suggestions greatly appreciated.

Last edited by Azrael; 07-01-2016 at 12:51 PM.. Reason: Formatting, Grammar
# 2  
Old 07-01-2016
Looking at strstr's man page, I can't see it would accept any wildcard char nor regex. So you might need to build you own grep routine?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 07-01-2016
I looked at this man page just now. I did not see anything saying it would or would not accept wildcards or regex on my OS. I also noticed these:

Code:
SEE ALSO
       index(3), memchr(3), rindex(3), strcasecmp(3),  strchr(3),  string(3),
       strpbrk(3), strsep(3), strspn(3), strtok(3), wcsstr(3)

I looked at the man pages for most of these too, but found no mention of wildcards or regex. Am I not seeing something in these man pages? Or does anyone else know a function that would work with this purpose?
# 4  
Old 07-01-2016
You won't find anything in those libraries. Check into regex.h...
# 5  
Old 07-01-2016
There are two basic sets of pattern matching: files and strings

fnmatch() is used to match wildcards like ? and * in file name patterns.
regcomp(), regexec(), regfree() are called in that order to build, then execute, then release resources for grep and egrep like pattern matching.

Generally you are better off to use these library calls than to roll your own. If you already can use ls pattern matching it is easy to use the fnmatch call.

The code structure for emulating what the grep command does is a little more complex.
If you remember, grep and egrep have a lot of options. Since they are implemented by the regex family of calls, the calls are more complex. Options for constructing the resources regcomp (regular expression compile) supports several. The regex command supports the others.

There is also the PCRE library that perl regex uses. If you are a perl user, consider that library.

Don't try to roll your own if you've never gotten fully acquainted with a regex library. If you must, read Russ Cox to get an idea how to proceed.

Implementing Regular Expressions

Site has howtos
These 2 Users Gave Thanks to jim mcnamara For This Post:
# 6  
Old 07-02-2016
Don't get me wrong, regex is a great and wonderful tool. However, from what I understand regex in C is re-compiled every time it runs. That would not be so bad if I was not planning to call this code later many times with threads on a system where resources are very tight. For that reason I'm trying to stay away from them if possible.

I tried looking at fnmatch earlier. I changed my while loop to this, but the pattern was not matched:

Code:
    while(fgets(line, sizeof(line), fp)) {
        if (fnmatch(newhold, line, 100) == 0){
        i++;
      }
    }

I looked at the man page and some examples online and appears the second parameter to fnmatch() needs to be a constant or a struct value. Either that or I'm doing something wrong I don't know?
# 7  
Old 07-02-2016
If you don't need too many different regexes but could repeatedly (re)use a handful of them on many different strings, compile each of them once into a new pattern buffer, and run all the comparisons with these few pattern buffers.
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace String matching wildcard pattern

Hi, I know how to replace a string with another in a file. But, i wish to replace the below string pattern EncryptedPassword="{gafgfa}]\asffafsf312a" i.e EncryptedPassword="<any random string>" To EncryptedPassword="" i.e remove the random password to a empty string. Can you... (3 Replies)
Discussion started by: mohtashims
3 Replies

2. Shell Programming and Scripting

Grep and BzGrep with Wildcard in Search Pattern

Hello All, I hope this is the right area. If not, Kindly let me know and I will report in the appropriate spot. I am needing to find a search pattern that will make the * act as Wildcard in the search pattern instead of being literal. The example I am using is bzgrep "to=<*@domain.com>"... (5 Replies)
Discussion started by: mancountry
5 Replies

3. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

4. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

5. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

6. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

7. UNIX for Dummies Questions & Answers

sed non-greedy pattern matching with wildcard

Toby> cat sample1 This is some arbitrary text before var1, This IS SOME DIFFERENT ARBITRARY TEXT before var2 Toby> sed -e 's/^This .* before //' -e 's/This .* before //' sample1 var2 I need to convert the above text in sample1 so that the output becomes var1, var2 by... (2 Replies)
Discussion started by: TobyNorris
2 Replies

8. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

9. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies

10. UNIX for Dummies Questions & Answers

Find wildcard .shtml files in wildcard directories and removing them- How's it done?

I'm trying to figure out how to build a small shell script that will find old .shtml files in every /tgp/ directory on the server and delete them if they are older than 10 days... The structure of the paths are like this: /home/domains/www.domain2.com/tgp/ /home/domains/www.domain3.com/tgp/... (1 Reply)
Discussion started by: Neko
1 Replies
Login or Register to Ask a Question