1

I have a string which is searched based on the delimiter " (double quote).

So when I enter the string "program", it is able to search the beginning and end of the string based on the delimiter and returns me the string program which I put in a vector.

Now, if I enter a string "program"123"" it returns me substrings like program, 123, 123".

Now the result I want is program"123" which is a valid string as per the usecase but it contains " as part of the string and this is where the search by delimiter fails to distinguish between the beginning and end of the string.

Can someone help with some logic?

The following is the method I am using.

enter code here

public static PVector tokenizeInput(final String sCmd) throws ExceptionOpenQuotedString { if (sCmd == null) { return null; }

    PVector rc = new PVector();

    if (sCmd.length() == 0)
    {
        rc.add(StringTable.STRING_EMPTY);
        return rc;
    }

    char chCurrent = '\0';
    boolean bInWhitespace = true;
    boolean bInQuotedToken = false;
    boolean bDelim;
    int start = 0;
    int nLength = sCmd.length();

    for (int i = 0; i < nLength; i++)
    {
        chCurrent = sCmd.charAt(i); // "abcd "ef"" rtns abdc ef ef"
        bDelim = -1 != APIParseConstants.CMD_LINE_DELIMS.indexOf(chCurrent);

        if (bInWhitespace) // true
        {
            // In whitespace
            if (bDelim)
            {
                if ('\"' == chCurrent)
                {
                    start = i + 1;
                    bInQuotedToken = true;
                    bInWhitespace = false;
                } // if ('\"' == chCurrent)
            }
            else
            {
                start = i;
                bInWhitespace = false;
            } // else - if (bDelim)
        }
        else
        {
            // Not in whitespace
            boolean bAtEnd = i + 1 == nLength;
            if (!bDelim)
            {
                continue;
            }
            else
            {
                if ('\"' == chCurrent)
                {
                    if (!bInQuotedToken)
                    {
                        // ending current token due to '"'
                        if (bAtEnd)
                        {
                            // non terminated quoted string at end...
                            throw new ExceptionOpenQuotedString(
                                    sCmd.substring(start));
                        }
                        else
                        {
                            rc.add(sCmd.substring(start, i)); // include quote
                            bInQuotedToken = true;
                            bInWhitespace = false;
                        } // if (bAtEnd)
                    }
                    else
                    {
                        // ending quoted string
                        //if (!bAtEnd)
                        {
                            rc.add(sCmd.substring(start, i)); // don't include quote
                            bInQuotedToken = false;
                            bInWhitespace = true;
                        } // if (bAtEnd)
                    } // else - if (!bInQuotedToken)
                }
                else
                {
                    // got delim (not '"')
                    if (!bAtEnd && !bInQuotedToken)
                    {
                        rc.add(sCmd.substring(start, i));
                        bInWhitespace = true;
                    } // if (bAtEnd)
                } // else - if ('\"' == chCurrent)
            } // else - if (!bDelim)
        } // else - if (bInWhitespace)
    } // for (short i = 0; i < nLength; i++)

    if (!bInWhitespace && start < nLength)
    {
        if (!bInQuotedToken || chCurrent == '"')
        {
            rc.add(sCmd.substring(start));
        }
        else
        {
            throw new ExceptionOpenQuotedString(sCmd.substring(start));
        } // else - if (!bInQuotedToken)
    } // if (!bInWhitespace && start < nLength)
    return rc;
}
5
  • How do you distinguish between double quotes as delimiters and as part of the string? I'm not asking for your code, but about the rules in plain English. Commented Jun 26, 2012 at 10:59
  • You really ought to show us the code where you read in the string and tokenize it. Commented Jun 26, 2012 at 11:01
  • Regular Expressions are your friend. Commented Jun 26, 2012 at 11:07
  • In this code you can give a single name as a string which may or may not include a double quote. But if it does, than all the double quotes except the first and last will be part of the string. Commented Jun 26, 2012 at 11:08
  • Do you have any control over the format of the input or is it unchangeable? Commented Jun 26, 2012 at 11:28

2 Answers 2

1

You should escape the internal ". Otherwise, you could check for the position of the first and last " characters and split/cut the string using those positions as delimiters.

Sign up to request clarification or add additional context in comments.

Comments

0

Whenever you embed one encoding (all possible strings) inside another (quoted strings) there are only a few basic techniques to allow you to parse them unambiguously:

  1. Disallow certain inputs. For instance, don't allow quote characters. Now you know they are always delimiters. In your case, you could choose a new delimiter besides quote and disallow that in your input. This is rarely desirable, because you often end up wanting to allow the input you previously thought you didn't need.

  2. Include the length of the input in the encoding. For example, instead of quotes you could precede each string with the number of characters in it.

  3. Escaping. Some inputs cannot be represented directly. Instead, at least one character is reserved as the escape character. It indicates that whatever follows it should be interpreted in a different way. In Java strings, the backslash is the escape character. If you only need the escape character for a single reason, you may want to follow the example of some SQL dialects and double it. In SQL, a quote is the quote character for strings, so to include a literal quote character in a string, you type two quotes.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.