0

I'm trying to create a string parser that breaks a string up into an array of strings using a delimiter. I'm using the function "strtok" which I understand how to use. What I don't understand is how to get my parsing fuction to return an array of the different words in my string. It's been quite some time since I've used C, so this is proving to be quite difficult.

char ** parseString(char * inString){

    char **ptr; //What I want to return, an array of strings
    int count = 0;

    ptr[count] = strtok(inString, " ");
    count++;
    while(inString!=NULL){
         ptr[count]=strtok(NULL, " ");
         count++;
    }
    return ptr;
}

I know that this above code won't work. I vaguely remember how to use malloc, and I know that my above code will result in a seg fault since I haven't malloced, but above is essentially what I want to have happen. How do I appropriately malloc if I don't even know how many words I need to have in my array of strings?

3
  • 1
    Option 1) Take a guess. If the array fills up, realloc to make it bigger. If you made it too big, realloc to make it smaller at the end. Commented Sep 8, 2019 at 22:20
  • Option 2), Parse the string twice. The first time, you just count how big the array needs to be. Do the actual work the second time Commented Sep 8, 2019 at 22:21
  • When does, does the data pointed to by inString need to be as its was originally? Commented Sep 9, 2019 at 1:30

1 Answer 1

1

For allocating the vector since you don't know from the start its size there are a couple of strategies:

  • you can first parse inString to get the number of words, allocate then parse again.
  • you can reallocate for each word.
  • you can reallocate with a geometric growth.

In the first variant you do the parsing twice, making it the worst of them

The second has the disadvantage of multiple reallocations, but could be acceptable on small strings and if the function is not critical.

The third one is the best and is what std::vector in C++ does.

I show you the 2nd variant here.

You also need to keep the size of the vector in a separate variable.

char** parseString(char* inString, size_t* tokensLen)
{
    char **tokens = NULL; // realloc on NULL acts like malloc so we simplify the code
    size_t i = 0;

    // parse the first word
    char* inToken = strtok(inString, " ");

    while (inToken)
    {
        // allocate for one more pointer
        tokens = realloc(tokens, (i + 1) * sizeof *tokens);
        assert(tokens);

        // point to word
        tokens[i] = inToken;

        // parse the next word
        inToken = strtok(NULL, " ");
        ++i;
    }

    // set size
    *tokensLen = i;

    return tokens;
}
int main()
{
    char str[] = "here we go";
    size_t len;
    char** tokens = parseString(str, &len);

    for (size_t i = 0; i < len; ++i)
    {
        printf("%s\n", tokens[i]);
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks so much, that makes a lot of sense!
size_t* tokensLen and int i = 0; make more sense using the same type. Suggest size_t.
@mattgallant24 you are welcome. If you think this helped you consider upvoting it. If it answers your question consider marking it as accepted.
This answer modifies the original string. That is (IMO) a poor practice. Suggest use strlen() and malloc() to copy the string to a local dynamic array.and performing all the parsing there

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.