Segfault When Parsing Delimiters In C


 
Thread Tools Search this Thread
Top Forums Programming Segfault When Parsing Delimiters In C
# 1  
Old 07-19-2016
Segfault When Parsing Delimiters In C

Another project, another bump in the road and another chance to learn. I've been trying to open gzipped files and parse data from them and hit a snag. I have data in gzips with a place followed by an ip or ip range sort of like this:

Code:
Some place:x.x.x.x-x.x.x.x

I was able to modify some code I found that works fine for parsing the data to only show the ips:

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (void) {
char str[128];
char *ptr;

strcpy (str, "Some place:x.x.x.x-x.x.x.x");
strtok_r (str, ":", &ptr);

printf ("%s\n", ptr);
return 0;

}

Result:
Code:
$ ./test       
x.x.x.x-x.x.x.x

However, when I add it to the code I have for opening the gzips and reading them I get a segmentation fault. Here is the code I am trying to work from now:

Code:
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main(int argc, char *argv[])
{
  const char prefix[] = "zcat ";
  const char *arg;
  char *strip;
  char *range;
  char *cmd;
  FILE *in;
  char buf[4096];

  if (argc != 2) {
    fprintf(stderr, "Usage: %s file\n", argv[0]);
    return 1;
  }

  arg = argv[1];
  cmd = malloc(sizeof(prefix) + strlen(arg) + 1);
  if (!cmd) {
    fprintf(stderr, "%s: malloc: %s\n", argv[0], strerror(errno));
    return 1;
  }

  sprintf(cmd, "%s%s", prefix, arg);

  in = popen(cmd, "r");
  if (!in) {
    fprintf(stderr, "%s: popen: %s\n", argv[0], strerror(errno));
    return 1;
  }

  while (fscanf(in, "%*s %99[^\n]", buf) == 1){
    strcpy (strip, buf);
    strtok_r (strip, ":", &range);
    printf("%s\n", range);
  }

  if (ferror(in)) {
    fprintf(stderr, "%s: fread: %s\n", argv[0], strerror(errno));
    return 1;
  }
  else if (!feof(in)) {
    fprintf(stderr, "%s: %s: unconsumed input\n", argv[0], argv[1]);
    return 1;
  }

  return 0;
}

I tried to look at this with strace and it seems to die directly after reading the first line. Any thoughts appreciated.
# 2  
Old 07-19-2016
You forgot to include stdio.h for printf, etc, which is a crash-causing error in 64-bit programs.

Most major problem:

Code:
char *strip;

...

strcpy(strip,buf);

You use strip without giving it any sort of valid pointer like what you did with cmd and malloc, or just giving it contents from the start, like buf.

There's no point copying it either, use buf directly.

There's also no point making your program more complicated with strtok_r for a program this simple.

You also forgot error checking after calling strtok, which would be a crash-causing error for any line not containing :

Also, while that's a clever use of scanf but there's a built-in function which does that faster and more simply called fgets.

Also, you forgot to call pclose when the program's done, which can cause zombie processes.

Also, prefix should be a define, not a variable. (What that really does is copy from a constant array into a non-constant one at runtime. Using a #define, or just a "" string, just uses the original source.)

Also, never use sizeof() to determine the lengths of strings. That worked by pure coincidence here, since you put it in an array of content-defined length, but that will surprise you in some contexts. sizeof(buf) would always be 4096. sizeof(cmd) would either be 4(32-bit systems) or 8(64-bit systems). strlen() avoids that ambiguity.

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define PREFIX "zcat "

int main(int argc, char *argv[])
{
  const char *arg;
  char *cmd, buf[4096];
  FILE *in;

  if (argc != 2) {
    fprintf(stderr, "Usage: %s file\n", argv[0]);
    return 1;
  }

  arg = argv[1];
  cmd = malloc(strlen(PREFIX) + strlen(arg) + 1);
  if (!cmd) {
    fprintf(stderr, "%s: malloc: %s\n", argv[0], strerror(errno));
    return 1;
  }

  sprintf(cmd, "%s%s", PREFIX, arg);

  in = popen(cmd, "r");
  if (!in) {
    fprintf(stderr, "%s: popen: %s\n", argv[0], strerror(errno));
    return 1;
  }

  while(fgets(buf, 4096, in)) {
    char *tok=strtok(buf, ":"); // First token, 'place'
    // Second token, 'xxx-xxx'
    if(tok != NULL) tok=strtok(NULL, ":");
    // If anything was found, print it
    if(tok != NULL) printf("%s", tok);
  }

  if (ferror(in)) {
    fprintf(stderr, "%s: fread: %s\n", argv[0], strerror(errno));
    return 1;
  }
  else if (!feof(in)) {
    fprintf(stderr, "%s: %s: unconsumed input\n", argv[0], argv[1]);
    return 1;
  }

  pclose(in);

  return 0;
}

This is a much easier shell script than a C program, by the way.

Code:
#!/bin/sh

zcat "$@" | awk -F: '$2 { print $2 }'

These 2 Users Gave Thanks to Corona688 For This Post:
# 3  
Old 07-20-2016
Thank you Corona688. I actually had stdio.h in my code, but did not copy it correctly when pasting to this thread. I found strtok was used when using delimeters in C when I looked up a lot of examples. I used strtok_r to be thread safe for later.

There's no doubt this could be more easily done with Bash, but when you are parsing multiple lists that are millions of lines in length, C seemed like a better option. Plus I have more to do that would be better used with C in this project.

I appreciate all your suggestions and will review all this tonight in hopes of making this better and cleaner. As will most of my projects, I think being sleep deprived got to me.

Thanks
# 4  
Old 07-27-2016
Quote:
Originally Posted by Azrael
There's no doubt this could be more easily done with Bash, but when you are parsing multiple lists that are millions of lines in length, C seemed like a better option.
The more complicated you make it, the better off you'd be using a text processing language for text processing.

That was not a BASH script, that was an awk one. awk is quite efficient.
# 5  
Old 09-30-2016
It is dead easy to read gz files if you use this:

http://http://www.zlib.net/manual.html#Utility

It works on plain files too so you don't need to worry if its gz or not.
# 6  
Old 09-30-2016
If you can find a suitable scanf-format, you might be able to do it all with one fscanf. If you have to change things later, this could be useful, or not. Probably not, but you are the judge.

Code:
while (fscanf(fp, "%*[^:]:%4095[^\n]\n", buf) == 1)
  ...

Juha
# 7  
Old 10-07-2016
Once again, there is a built-in function to do that, fgets.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

--Parsing out strings for repeating delimiters for everyline

Hello: I have some text output, on SunOS 5.11 platform using KSH: I am trying to parse out each string within the () for each line. I tried, as example: perl -lanF"" -e 'print "$F $F $F $F $F $F"' But for some reason, the output gets all garbled after the the first fields.... (8 Replies)
Discussion started by: gilgamesh
8 Replies

2. Programming

String array iteration causing segfault

I am populating an array of string and print it. But it going in infinite loop and causing segfault. char Name = { "yahoo", "rediff", "facebook", NULL }; main(int argc, char* argv) { int j = 0; ... (7 Replies)
Discussion started by: rupeshkp728
7 Replies

3. Programming

segfault in pointer to string program

hello all, my question is not about How code can be rewritten, i just wanna know even though i am not using read only memory of C (i have declared str) why this function gives me segfault :wall:and the other code executes comfortably though both code uses same pointer arithmetic. ... (4 Replies)
Discussion started by: zius_oram
4 Replies

4. Programming

Is Drive Valid Segfault

I have a program that allows users to specify the debug log file location and name. I have tried using the access() and stat() but they both segfault if the drive say (d:\) is invalid. Both seem to be fine if the drive exists. Could someone please point me in the direction to a function that... (1 Reply)
Discussion started by: robfwauk
1 Replies

5. Programming

id3lib SEGFAULT

Hello everyone, I'm writing a program using the id3lib unfortunately I've encountered with memory issue that cause segmentation fault. I tried to rerun and analyze the program with valgrind but it doesn't point me anywhere. I really stuck on this one. Valgrind output: ==14716== Invalid read of... (2 Replies)
Discussion started by: errb
2 Replies

6. Programming

2 Problems: Segfault on ctrl+c and syslog() prob

1. Even if i have the handles for ctrl+c it gives off a segfault 2. syslog doesn't log LOG_ERR event with log masked specified or non specified, it logs LOG_WARNING however... #include <sys/types.h> /* include this before any other sys headers */ #include <sys/stat.h> #include <fcntl.h>... (2 Replies)
Discussion started by: james2432
2 Replies

7. Solaris

Working around netscape 4.9 segfault on Solaris 8

We have a Solaris 8 server which users login to via VNC to get a desktop. On that desktop these users use Netscape Communicator 4.9 to access a very important mail account. Unfortunately Netscape has started segfaulting regularly. Does anyone have any ideas how I can try to find out what point... (1 Reply)
Discussion started by: aussieos
1 Replies

8. UNIX for Dummies Questions & Answers

[Linux] How Do I Run Until Segfault

Hello, sorry if this has been posted before but i was wondering if there is a way to run a program until a segmentation fault is found. Currently i'm using a simple shell script which runs my program 100 times, sleeps 1 second because srand(time(0)) is dependent on seconds. Is there a possible... (1 Reply)
Discussion started by: aslambilal
1 Replies

9. Programming

gnu history library signal segfault

i am trying to use the history functions in a c++ program along with a custom signal handler for SIGINT. the prog works fine catching signals without the line: add_history(*args); but as soon as this line is added, the prog segfaults on SIGINT. does anyone have experience using gnu... (2 Replies)
Discussion started by: a1g0rithm
2 Replies

10. UNIX for Dummies Questions & Answers

parsing with multible delimiters

I have data that looks like this aaa!bbb!ccc/ddd/eee It is not fixed format. I need to parse ddd into a var in order to decide if I want to process that row. If I do I need to put ccc and bbb into vars to process it. I need to do this during a while loop one record at a time. Any... (11 Replies)
Discussion started by: gillbates
11 Replies
Login or Register to Ask a Question