Building an argc/argv style structure from a string (char*)


 
Thread Tools Search this Thread
Top Forums Programming Building an argc/argv style structure from a string (char*)
# 1  
Old 12-16-2009
Building an argc/argv style structure from a string (char*)

Hello All,

First post. I've been struggling with the following:
Given a char* string, I need to construct an "int argc, char *argv[]" style structure. What I'm struggling with most is handling escaped-whitespace and quotes.

e.g. the string:
char *s = "hello world 'my name is simon' foo\ bar"
should produce an argc/argv of:
argc = 4
argv[0] = "hello"
argv[1] = "world"
argv[2] = "my name is simon"
argv[3] = "foo bar"

Make sense?

Anyway, after struggling for a while, I thought I would look at the source for bash/csh to see how they do it, but their setup is way more complicated (since they handle their own shell languages in those strings).

Anyone have any clear/simple example of how to do this, or know of a place/application where I could find one?

Many Thanks
# 2  
Old 12-16-2009
You need to parse your line of text into fields.

strtok() allows the use of multiple field delimiters. It has drawbacks, but you can call it
on a temporary string several times to get the field breakout you need.

You can also use lex/yacc to parse fields. An intermediate approach is to use the regex engine. See man regcmp and/or man regex.

Now - the obvious question - why can you not simply use argc, argv? ie., write a child process that is "called" with your string:

Code:
/* mychild.c */
#include <stdio.h>
int main( int argc, char **argv)
{
      int i=0;
      while(i<argc)
      {
           printf("argv[%d]=%s\n", i, argv[i++]);
       }
       printf("argc=%d\n", argc);
}

call your child code with popen -
Code:
void foo(char *stringtoparse)
{
char tmp[256]={0x0};
char cmdstr[256]={0x0};
FILE *cmd=NULL;

sprintf(cmdstr, "./mychild %s", stringtoparse);
cmd=popen(cmdstr, "r");
while(fgets(tmp, sizeof(tmp), cmd)!=NULL)
   printf("%s", tmp);
pclose(cmd);
}


Last edited by jim mcnamara; 12-16-2009 at 01:32 PM..
# 3  
Old 12-16-2009
Hi Jim,

Thanks for your reply.
I had considered using a separate program popen'd from the parent, but decided against it (at least for now) since it would require installing two programs instead of one ... although it is an easy solution.

I've been trying to figure out a way to popen the binary that's running (or accomplish something similar) so that I could handle all of this in one program. Do you, or anyone else, happen know if there is a reliable way to get the absolute pathname for the currently running binary? If I could get that, this might work.

Many Thanks again,
-Ryan
# 4  
Old 12-16-2009
Are you looking at using /proc?

.... you "own" the binary in question. This whole thing is very confusing, as currently explained. As a rule of thumb, this usually happens when somebody decides how to solve a problem and is missing something important. Not that what you seem to ask is impossible. Just messy.

Please explain -
What EXACTLY are you trying to do -- not how you think it should be done.
An example answer might be - I'm trying to get the command line arguments for a process I do not own, from binary code I cannot change.
# 5  
Old 12-16-2009
Hi Jim,

Ok, I'll start from scratch with what I'm doing and perhaps there's an easier way you might see.

I'm writing an application that uses vi-like keybindings and has a command-mode similar to vi. Specifically, in command mode one can do a "write" just like in vi
:w
-or- to save the current playlist (this is a music application), one can do
:w filename

If they specificy a filename such as
:w foo\ bar
or,
:w "foo bar"

In these cases, I'd obviously like to be able to parse the parameters correctly (i.e. recognize that "foo\ bar" is one string, not two).

There are other a few other commands I have (and I'm currently working on a few more), that take multiple parameters, and I'd like to be able to handle spaces and quoting correctly for them.

About my current setup:
All of the command functions take two params, "int argc" and "char *argv[]", just like a regular "main" function. I then have an array of strings and function pointers (to these functions) that behaves essentially like a path.
When a user is in command mode and enters a string, I parse that string into a (bad) argc/argv structure, and then search the path for a matching named record, and if found, execute the function with the argc/argv that I built.

Does this help make clear at least my setup and what I'm asking about? I can point you to code if you would like.

Thanks again,
-Ryan
# 6  
Old 12-16-2009
I got it. You really do need a parser. But the shell has one - modern shells that is.
Code:
#include <stdio.h>
#include <string.h>
/* using the default shell parser - no extra code required */
void make_args(char **argv, int *argc, char *string)
{
		char tmp[256]={0x0};
		FILE *cmd=NULL;
		int i=0;
		char *p=NULL;
		
		sprintf(tmp, "set - %s && for i in %c$@%c;\n do\n echo $i\ndone", 
		        string, '"', '"');
		cmd=popen(tmp, "r");
		while (fgets(tmp, sizeof(tmp), cmd)!=NULL)
		{
		    p=strchr(tmp, '\n');
		    if (p!=NULL) *p=0x0;
				strcpy(argv[i++], tmp);
		}
		*argc=i;
}

# 7  
Old 12-16-2009
Jim (or others interested),

I've been working on a parser, and believe I may have it. Though if you (or others) have a suggestion for an easier or more obvious solution, I'd love a good smack of the clue-stick!

If you're interested, and wouldn't mind, I'd love comments.

The below code is a bit rough, but it includes a driver program that prompts user to enter a string and then runs it through my parser. Afterwords, it outputs my argc/argv structure.

(right now it just builds a global argc/argv structure...once i'm convinced this proof-of-concept works, it will obviously be updated)

Many Thanks again Jim for your comments,
-Ryan

Code:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

/* for debugging */
#define STATUS(format, args...) \
   printf("here: %d. ", __LINE__); printf(format, ## args); printf("\n"); fflush(stdout);


/* currently building the argc/argv stuff in a global context */
#define ARGV_MAX  255
#define ARGV_TOKEN_MAX  255
int    _argc;
char  *_argv[ARGV_MAX];
char  *_argv_token;

/* initialize empty argc/argv struct */
void
argv_init()
{
   _argc = 0;
   if ((_argv_token = calloc(ARGV_TOKEN_MAX, sizeof(char))) == NULL)
      err(1, "argv_init: failed to calloc");
   bzero(_argv_token, ARGV_TOKEN_MAX * sizeof(char));
}

/* add a character to the current token */
void
argv_token_addch(int c)
{
   int n;

   n = strlen(_argv_token);
   if (n == ARGV_TOKEN_MAX - 1)
      errx(1, "argv_token_addch: reached max token length (%d)", ARGV_TOKEN_MAX);

   _argv_token[n] = c;
}

/* finish the current token: copy it into _argv and setup next token */
void
argv_token_finish()
{
   if (_argc == ARGV_MAX)
      errx(1, "argv_token_finish: reached max argv length (%d)", ARGV_MAX);

/*STATUS("finishing token: '%s'\n", _argv_token);*/
   _argv[_argc++] = _argv_token;
   if ((_argv_token = calloc(ARGV_TOKEN_MAX, sizeof(char))) == NULL)
      err(1, "argv_token_finish: failed to calloc");
   bzero(_argv_token, ARGV_TOKEN_MAX * sizeof(char));
}

/* main parser */
void
str2argv(char *s)
{
   bool in_token;
   bool in_container;
   bool escaped;
   char container_start;
   char c;
   int  len;
   int  i;

   container_start = 0;
   in_token = false;
   in_container = false;
   escaped = false;

   len = strlen(s);

   argv_init();
   for (i = 0; i < len; i++) {
      c = s[i];

      switch (c) {
         /* handle whitespace */
         case ' ':
         case '\t':
         case '\n':
            if (!in_token)
               continue;

            if (in_container) {
               argv_token_addch(c);
               continue;
            }

            if (escaped) {
               escaped = false;
               argv_token_addch(c);
               continue;
            }

            /* if reached here, we're at end of token */
            in_token = false;
            argv_token_finish();
            break;

         /* handle quotes */
         case '\'':
         case '\"':

            if (escaped) {
               argv_token_addch(c);
               escaped = false;
               continue;
            }

            if (!in_token) {
               in_token = true;
               in_container = true;
               container_start = c;
               continue;
            }

            if (in_container) {
               if (c == container_start) {
                  in_container = false;
                  in_token = false;
                  argv_token_finish();
                  continue;
               } else {
                  argv_token_addch(c);
                  continue;
               }
            }

            /* XXX in this case, we:
             *    1. have a quote
             *    2. are in a token
             *    3. and not in a container
             * e.g.
             *    hell"o
             *
             * what's done here appears shell-dependent,
             * but overall, it's an error.... i *think*
             */
            printf("Parse Error! Bad quotes\n");
            break;

         case '\\':

            if (in_container && s[i+1] != container_start) {
               argv_token_addch(c);
               continue;
            }

            if (escaped) {
               argv_token_addch(c);
               continue;
            }

            escaped = true;
            break;

         default:
            if (!in_token) {
               in_token = true;
            }

            argv_token_addch(c);
      }
   }

   if (in_container)
      printf("Parse Error! Still in container\n");

   if (escaped)
      printf("Parse Error! Unused escape (\\)\n");
}

/* simple driver */
int
main(int argc, char *argv[])
{
   char  s[255];
   int   i;

   while (fgets(s, sizeof(s), stdin) != NULL) {

      printf("parsing...\n");
      fflush(stdout);

      str2argv(s);

      for (i = 0; i < _argc; i++)
         printf("\t_argv[%d] = '%s'\n", i, _argv[i]); fflush(stdout);
   }

   return 0;
}


Last edited by Scott; 12-16-2009 at 07:34 PM.. Reason: Removed link. Copied code from link. Adjusted text to this effect.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Homework & Coursework Questions

Help using argc/argv in assignment

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: First, create a "hello world" program that prints "Hello World". But NOW, instead use argc to verify that a... (9 Replies)
Discussion started by: miniviking10
9 Replies

2. Programming

Building Block style programming Book

Hello to all, Here is my situation. Some time in the mid-80's I stumbled across a small white programming book - can't remember the name but it was unique in that it started right out giving instructions on creating building blocks in code as a foundation for a complete system. The book was... (2 Replies)
Discussion started by: jozefn
2 Replies

3. UNIX for Advanced & Expert Users

O argv, argv, wherefore art thou argv?

All of my machines (various open source derivatives on x86 and amd64) store argv above the stack (at a higher memory address). I am curious to learn if any systems store argv below the stack (at a lower memory address). I am particularly interested in proprietary Unices, such as Solaris, HP-UX,... (9 Replies)
Discussion started by: alister
9 Replies

4. Shell Programming and Scripting

argc/ argv in awk

Hi guys, i'm trying to solve this problem. I have to run something like cat file1.txt | awk -f script.awk 10 if i'm in the awk script, how can i take the parameter :10 ??:wall: i try something like : BEGIN{ var=argv } {..} END{..} but obviously is not correct... (5 Replies)
Discussion started by: heaven25
5 Replies

5. Shell Programming and Scripting

ARGV and ARGC in bash 3 and bash 3.2

Hi Folks, I've prepared a shell script that takes action based on arguments and number of arguments..sample code like: ARGV=("$@") ARGC=("$#") case ${ARGV} in abc) if ; then ...... else printf "\nInvalid number of arguments, please check the inputs and... (2 Replies)
Discussion started by: SBC
2 Replies

6. Programming

How to turn argv[1] into a string in C?

i have a function that's parameter is char *s and in the main function i am sending that function &(argv), but i dont think this is working, how can i fix this? can i cast it to be a string or something? is there a way i can create a new string thats exactly what argv is equal to... (6 Replies)
Discussion started by: omega666
6 Replies

7. Programming

help for argv argc

Hi C experts, I have the following code for adding command line option for a program int main (argc, argv) int argc; char *argv; { char *mem_type; //memory type char *name; //name of the memory int addr; //address bits int data; ... (5 Replies)
Discussion started by: return_user
5 Replies

8. Programming

dbx debugger + argv[argc]

Is it possible to use the dbx debugger with the CL options for the executable ? Say you have created a executable called myfunc which can take string arguments at run-time. You run it like this ./myfunc Hello World where Hello and World are the string arguments My question is whether... (1 Reply)
Discussion started by: JamesGoh
1 Replies

9. Programming

Using argv argc

I searched on the forums. No advises. I am using a previous source code. I changed the main function main(int argc, char **argv) in a function misc(int argc, char **argv). How do you use the argc and argv parameters? This is how I am calling the function : char param; strcat(param,"wgrib ");... (4 Replies)
Discussion started by: Akeson Chihiro
4 Replies
Login or Register to Ask a Question