Wildcard Pattern Matching In C


 
Thread Tools Search this Thread
Top Forums Programming Wildcard Pattern Matching In C
# 8  
Old 07-03-2016
You can "statically" pre-compile to a text file: use the regcomp command - not a C call. It creates the compiled buffer you need in a file. I typically use maybe a dozen of ptr-compiles in a simple app for validating input. You put them in an include file.

And Rudic is correct - you can compile once and then run the compiled buffer multiple times internally -- as well as pre-compile.
This User Gave Thanks to jim mcnamara For This Post:
# 9  
Old 07-03-2016
Well, after no luck with fnmatch() and other pattern matching functions I found online I decided to give regex a try:

Code:
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <regex.h>

int main() {

   char newhold[40];

        regex_t re;
        time_t current = time(NULL);
        char day[10];
        char mon[10];
        int retval = 0;        

        strftime(day, sizeof(day), "%d", localtime(&current));
        strncat( newhold, day, 10 );
        strncat( newhold, "?", 2 );
        strftime(mon, sizeof(mon), "%b", localtime(&current));
        strncat( newhold, mon, 10 );
        strncat( newhold, "*", 2 );
        strncat( newhold, "pattern", 10 );

     if(regcomp(&re , newhold, REG_EXTENDED) != 0 ){
         return;
     }

    FILE *fp;
    fp = fopen("/var/log/apache2/other_vhosts_access.log", "r");
    char line[100];
    unsigned int i = 0;

    while(fgets(line, sizeof(line), fp)) {
        if ((retval = regexec(&re, line, 0, NULL, 0)) == 0){
        i++;
      }
    }
  printf("%s %d\n", newhold, i );
  fclose(fp);

  return 0;
}

Obviously this doesn't work. I know in the following part that "newhold" would normal have a constant defined instead of a char array. I could do that with the "pattern" section of this regex and with the "?" and "*" wildcards. However, the variables "day" and "mon" are going to be checked by the system every time the code runs. So a constant wouldn't work in this case.

Perhaps I'm going wrong in other aspects as well, but that's the biggest problem I see at the moment. Anyone know any tricks to pass variables into regex for this? I tried searching that online as well with no success. Maybe my Google-fu is just lacking?

Also I'm not finding much creating a file from the regcomp command either. I'd love to see that if anyone can provide an example.

Last edited by Azrael; 07-03-2016 at 05:05 AM..
# 10  
Old 07-03-2016
What is in /var/log/apache2/other_vhosts_access.log?

Please explain in English what you are hoping the extended regular expression you have created in newhold[] will match.

Which of the lines in /var/log/apache2/other_vhosts_access.log do you hope will be matched by your ERE?
# 11  
Old 07-03-2016
Take the contents of the string variable (newhold) you constructed, print it, and try it as a pattern for grep as a console command. You also need to check the return code from regexec in case something else biffed. It should be: REG_NOMATCH or zero. Anything else is a fatal error. regerror() is your friend.
This User Gave Thanks to jim mcnamara For This Post:
# 12  
Old 07-03-2016
Sorry, I thought I had provided the contents of the file earlier. Here it is again with just the following for testing:

Code:
$ cat /var/log/apache2/other_vhosts_access.log
03/Jul blah pattern

I did not know regex in C and grep/egrep were so closely related. I did as Jim suggested and saw nothing was matched:

Code:
$ egrep "03?Jul*pattern" /var/log/apache2/other_vhosts_access.log | wc -l
0

I changed this to the following and it was matched:

Code:
egrep "03.Jul.*pattern" /var/log/apache2/other_vhosts_access.log | wc -l
1

Still not getting a match when I try applying this to C though:
Code:
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <regex.h>

int main() {

   char newhold[40];

        regex_t re;
        time_t current = time(NULL);
        char day[10];
        char mon[10];
        int retval = 0;

        strftime(day, sizeof(day), "%d", localtime(&current));
        strncat( newhold, day, 10 );
        strncat( newhold, ".", 2 );
        strftime(mon, sizeof(mon), "%b", localtime(&current));
        strncat( newhold, mon, 10 );
        strncat( newhold, ".*", 4 );
        strncat( newhold, "pattern", 10 );

     if(regcomp(&re , newhold, REG_NOMATCH) != 0 ){
         return 1;
     }

    FILE *fp;
    fp = fopen("/var/log/apache2/other_vhosts_access.log", "r");
    char line[100];
    unsigned int i = 0;

    while(fgets(line, sizeof(line), fp)) {
        if ((retval = regexec(&re, line, 0, NULL, 0)) == 0){
        i++;
      }
    }
  printf("%s %d\n", newhold, i );
  fclose(fp);
  return 0;
}

I also tried working with regerror(), but had trouble applying the few examples I was able to find.
# 13  
Old 07-03-2016
You're very close... In the line:
Code:
     if(regcomp(&re , newhold, REG_NOMATCH) != 0 ){

REG_NOMATCH is a defined to be a return code for regexec() indicating that it did not find a match; it is not defined to be a flag to be passed to regcomp(). If you change the above line in your code to:
Code:
     if(regcomp(&re , newhold, 0) != 0 ){

and rebuild your code, running it produces the output:
Code:
03.Jul.*pattern 1

I would, however, suggest changing:
Code:
        strftime(day, sizeof(day), "%d", localtime(&current));
        strncat( newhold, day, 10 );
        strncat( newhold, ".", 2 );
        strftime(mon, sizeof(mon), "%b", localtime(&current));
        strncat( newhold, mon, 10 );
        strncat( newhold, ".*", 4 );
        strncat( newhold, "pattern", 10 );

to:
Code:
	strftime(newhold, sizeof(newhold), "%d/%b.*pattern",
	    localtime(&current));

which gets rid of several chances to overflow the size of newhold[] and chances to unnecessarily truncate text being added in intermediate strncat() calls. If you do this and rebuild your code again, it will produce the output:
Code:
03/Jul.*pattern 1

And, then, of course, you can also get rid of the day[] and mon[] arrays.

Last edited by Don Cragun; 07-03-2016 at 07:38 PM.. Reason: Fix typo: s/get rids/gets rid/
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 07-03-2016
Thank you Don Cragun! That does work correctly and is a lot cleaner!

I'm still interested in statically pre-compiling the regex to a text file that jim mcnamara mentioned. If anyone has any examples of that let me know.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace String matching wildcard pattern

Hi, I know how to replace a string with another in a file. But, i wish to replace the below string pattern EncryptedPassword="{gafgfa}]\asffafsf312a" i.e EncryptedPassword="<any random string>" To EncryptedPassword="" i.e remove the random password to a empty string. Can you... (3 Replies)
Discussion started by: mohtashims
3 Replies

2. Shell Programming and Scripting

Grep and BzGrep with Wildcard in Search Pattern

Hello All, I hope this is the right area. If not, Kindly let me know and I will report in the appropriate spot. I am needing to find a search pattern that will make the * act as Wildcard in the search pattern instead of being literal. The example I am using is bzgrep "to=<*@domain.com>"... (5 Replies)
Discussion started by: mancountry
5 Replies

3. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

4. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

5. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

6. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

7. UNIX for Dummies Questions & Answers

sed non-greedy pattern matching with wildcard

Toby> cat sample1 This is some arbitrary text before var1, This IS SOME DIFFERENT ARBITRARY TEXT before var2 Toby> sed -e 's/^This .* before //' -e 's/This .* before //' sample1 var2 I need to convert the above text in sample1 so that the output becomes var1, var2 by... (2 Replies)
Discussion started by: TobyNorris
2 Replies

8. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

9. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies

10. UNIX for Dummies Questions & Answers

Find wildcard .shtml files in wildcard directories and removing them- How's it done?

I'm trying to figure out how to build a small shell script that will find old .shtml files in every /tgp/ directory on the server and delete them if they are older than 10 days... The structure of the paths are like this: /home/domains/www.domain2.com/tgp/ /home/domains/www.domain3.com/tgp/... (1 Reply)
Discussion started by: Neko
1 Replies
Login or Register to Ask a Question