libvarexp"> varexp::unescape"> varexp::expand"> ]>
Variable Expression Library Peter Simons
simons@cryp.to
Introduction Purpose of this library &purpose; Downloading the latest version The latest version of &varexp; is available for download is varexp-1.2.tar.gz. Supported Expressions Variable Expressions &expressions; Operations on Variables &operations; Quoted Pairs "ed-pairs; Arrays of Variables &arrays; Looping &looping; The Complete EBNF Grammar &ebnf; The &expand; Function The heart of &varexp; is the &expand; function, which is defined as follows: void varexp::expand std::string const &input std::string &result varexp::callback_t &lookup varexp::config_t* config = 0 The parameters are pretty intuitive: input is obviously a reference to the input buffer in which variable expressions should be expanded. result is a reference to the target buffer, where the expanded result will be stored. The contens of result will be overwritten by &expand;. It legal to provide the same string instance for both input and result if the original template is no longer required after the expansion. The lookup parameter contains a reference to a user-supplied class that serves as the lookup callback for accessing variable's contents. Such a callback class must be derived from varexp::callback_t. More details on this topic can be found in below. The last parameter, config, can be used to change the lexical tokens of the parser's grammar. If you omit this parameter -- and thus pass 0 through the default value --, the default configuration will be used. The default configuration is what has been used in the examples throughout this manual; changing it will hardly be necessary. If you want to, though, because you want to disable looping or use variables of the form $(NAME) rather than ${NAME}, please refer to for a detailed discussion. In case of success, &expand; will return, otherwise, one of the exceptions listed in is thrown. Writing Lookup Callbacks &varexp;'s header file, varexp.hh, defines the abstract base class varexp::callback_t, which serves as an interface to user-supplied variable-lookup callbacks. The class is defined like this: varexp::callback_t virtual operator() const std::string & name std::string & data virtual operator() const std::string & name int idx std::string & data The first operator() is called by &expand; to resolve a normal variable such as $NAME. The parameter name will contain the name NAME in this case, and data is a reference to a string where the callback function should place the variable's contents. The second operator() is called to resolve an array variable, such as ${NAME[i]}. The two parameters name and data have the same meaning in this case, but an additional parameter is provided, idx, which will contain the index i. Either callback function may throw any exception it sees fit in case of an error, but there are two exceptions that have a special meaning: varexp::undefined_variable should be thrown by either function in case requested variable is not defined, and the array version of the callback should throw varexp::array_lookups_are_unsupported when it has been called but should not have been. Throwing varexp::undefined_variable in case of an undefined variable is very important because in some cases this exception will be caught by &expand; -- for example during the looping construct! -- and changes the course of action in the routine. Any other exception thrown by these callbacks will leave &expand; and abort processing. Make sure your application catches them! Sometimes it is useful to be able to determine the size of an array in the template. &varexp; does not provide any construct that would do that, mostly because most of the array's behavior is implementation defined anyway, but a good convention is to have the array callback return the size of the array in case a negative index is looked-up. In order to illustrate how to write a callback of your own, here is a short example callback that will return variable from the Unix environment. The source code has been taken from the test program regression-tests/expand3.cc, so you might want to look there for further examples of more complex callbacks. using namespace std; using namespace varexp; struct env_lookup : public callback_t { virtual void operator()(const string& name, string& data) { const char* p = getenv(name.c_str()); if (p == NULL) throw undefined_variable(); else data = p; } virtual void operator()(const string& name, int idx, string& data) { throw array_lookups_are_unsupported(); } }; Configuring the Parser One of the parameters passed to &expand; is a pointer to a date structure of type varexp::config_t. This structure defines the elementary tokens used by the parser to determine what is a variable expression and what is not. The structure is defined as follows: varexp::config_t char varinit char startdelim char enddelim char startindex char endindex char current_index char escape char* namechars config_t The structure has a default constructor that will initialize the members of the instance to the default values used throughout this documentation: varexp::config_t() { varinit = '$'; startdelim = '{'; enddelim = '}'; startindex = '['; endindex = ']'; current_index = '#'; escape = '\\; namechars = "a-zA-Z0-9_"; } If want to use this default configuration, don't mess with a varexp::config_t structure at all; passing 0 to &expand; or leaving config out entirely will use exactly this configuration. If you want to parse a different syntax than the default, though, get a local instance of the varexp::config_t class, modify those values, and pass a pointer to the instance into &expand;. The members of the structure have the following meaning: varinit This character defines the character that starts a variable in the input text. startdelim enddelim These variables define the characters which must be used to delimit a complex variable expression. startindex endindex These character define the characters used to delimit both an index specification to an array variable and the start and end delimiter of the looping construct. You may set these entries to 0 in order to disable array support and looping altogether. current_index This entry defines the character to be replaced by the current loop counter in an index specification. escape This entriy defines the character that will espace a varinit or startindex character in the input text so that &expand; interprets it literally and not as a special. namechars This string defines the set of characters that are legal for a variable name. The specification may contain character ranges. Please note that it is possible to shoot yourself in the foot with an incorrect parser configuration. The namechars entry, for example, must not contain any of the specials defined above or the parser will not be able to determine the end of a variable expression anymore. There is a set of consistency checks that will be run by &expand;, which will throw an varexp::invalid_configuration exception in case the configuration is errorneous, but these checks will probably not catch all configurations that don't make sense. So better be careful when defininig your own configuration for the parser. The &unescape; Function The missing piece in &varexp; is the &unescape; function. It will expand the quoted pairs described in . Its prototype, as defined in varexp.hh is: void varexp::unescape std::string const &input std::string &result bool unescape_all The parameters input and result are references to the input and output buffer respectively. It is legal to pass the same std::string instance as input and output if the original buffer isn't required anymore. The third parameter, unescape_all will determine whether &unescape; should expand only the known quoted pairs or whether it should expand all quoted pairs. If this parameter is set to false, only the quoted pairs described in are expanded; all other quoted pairs -- the unknown ones -- will be left untouched. If unescape_all is set to true, though, any combination of \a will be expanded to a. You will need this parameter if you want to combine &unescape; with &expand;, because an input buffer might contain unknown quoted pairs that have a special meaning to variable constructs! One example is the quoted pair \1, which is used in regular expression search-and-replace. Another example is the string \${Not an variable}. These quoted pairs must be preserved for &expand;, so the usual approach for combining &unescape; und &expand; is to call the functions in the following order: Call &unescape; with unescape_all set to false. Call &expand; on the resulting buffer. Call &unescape; on the resulting buffer with unescape_all set to true. This approach is illustrated in the example program shown in . &unescape; will return if no error occured. If the input buffer contained syntax errors, the apropriate exception as described in will be thrown. Exceptions Thrown by &varexp; &varexp; throws various exceptions in case of a syntax error or when required system resources (memory) cannot be reserved. The complete list is found below. In addition to these, &varexp; may throw practically any of the exceptions thrown by the STL's containers. All of the following exceptions are derived from the abstract base class varexp::error, so by catching this exception, you can catch all of them. The varexp::error exception provides the following interface: varexp::error : public std::runtime_error error std::string const & what_msg virtual const char* what size_t current_position As you can see, varexp::error is derived from std::runtime_error. This inheritance relationship also defines the what member function that will return a short, clear-text description of the error that caused the actual execption instance to be thrown. In addition to this member funcition, the member variable current_position is available, which contains the offset position in the input buffer that was parsed when the error occured. Here is the complete list of all &varexp;-specific exceptions: varexp::incomplete_hex The input buffer ended before a hexadecimal \xaa quoted pair was complete. varexp::invalid_hex Any of the a characters in an \xaa quoted pair was not a valid hexadecimal character. varexp::octal_too_large The first digit of an octal \abb quoted pair was not in the range from 0 to 3. varexp::invalid_octal A digit of an octal \abb expression was not in the range from 0 to 7. varexp::incomplete_octal The input buffer ended in the before an octal \abb quoted pair was complete. varexp::incomplete_grouped_hex A hexadecimal \x{} expression contained an odd number of characters in the parameter. varexp::incorrect_class_spec In a character range specification a-b, the start of the range a was bigger (in terms of the ASCII code) than the end of the range b. varexp::invalid_configuration &expand;'s configuration is inconsistent. varexp::incomplete_variable_spec Either, the input buffer ended right after a variable initializer token ($) was found, or a complex variable expression was not correctly terminated, meaning, that the closing } bracket was missing. varexp::undefined_variable This exception is supposed to be thrown by the user-provided callback when an unknown variable is requested. varexp::input_isnt_text_nor_variable This exception is throw in the rather unlikely case that the parser could not process the complete buffer, yet no error occured. When this should happen? Well, not at all. But since the error is theoretically possible, I defined it. varexp::unknown_command_char In an ${NAME:c} expression, c was none of the supported operations. varexp::malformatted_replace In an ${NAME:s…} expression, one of the required parameters is missing. varexp::unknown_replace_flag An unsupported flag was provided in an ${NAME:s…} expression. varexp::invalid_regex_in_replace The regular expression given as pattern in an ${NAME:s…} expression failed to compile. varexp::missing_parameter_in_command The required word parameter was missing in an ${NAME:-word}, ${NAME:+word}, or ${NAME:*word} expression. varexp::empty_search_string In an ${NAME:s…} expression, the search parameter was empty. varexp::missing_start_offset The start parameter was missing in an ${NAME:ostart,end} expression. varexp::invalid_offset_delimiter In an ${NAME:ostart,end} or ${NAME:ostart-end} expression, the delimiter between start and end was neither a , nor a -. varexp::range_out_of_bounds The stop parameter in an ${NAME:ostart,end} or ${NAME:ostart-end} expression exceeded the actual length of the string. varexp::offset_out_of_bounds The start parameter in an ${NAME:ostart,end} or ${NAME:ostart-end} expression exceeded the actual length of the string. varexp::offset_logic In an ${NAME:ostart,end} expression, start was larger than stop. varexp::malformatted_transpose In an ${NAME:y…} expression, one of the required parameters is missing. varexp::transpose_classes_mismatch The ochars range has not the same number of characters as the nchars range in an ${NAME:y…} expression. varexp::empty_transpose_class In an ${NAME:y…} expression, either the ochars or the nchars range was empty. varexp::incorrect_transpose_class_spec In a character range given in an ${NAME:y…} expression, the start of the range was larger (in terms of the ASCII code) than the end character. varexp::malformatted_padding In an ${NAME:p…} expression, one of the required parameters is missing. varexp::missing_padding_width The width parameter in an ${NAME:p…} expression was empty. varexp::empty_padding_fill_string The fill parameter in an ${NAME:p…} expression was empty. varexp::unknown_quoted_pair_in_replace In the replace parameter of an ${NAME:s…} expression, an invalid quoted pair was specified. Valid are only quoted pairs of the form \digit. varexp::submatch_out_of_range In the replace parameter an ${NAME:s…} expression, a submatch with a number greater than the number of submatches defined in the search parameter was accessed. varexp::incomplete_quoted_pair The input buffer ended right after a backslash character. varexp::array_lookups_are_unsupported This exception is supposed to be thrown by the user-supplied callback when the array lookup function is called even though arrays should not occur. If you don't intend to support arrays, though, you should disable them via the parser's configuration instead. varexp::invalid_char_in_index_spec The index specification of array variable contains an invalid character, a character that is not part of a num-exp that is. varexp::incomplete_index_spec The input buffer ended in an open variable index specification; meaning that the terminating ] delimiter was missing. varexp::unclosed_bracket_in_index An arithmetic group in an index specification was closed properly with a ) bracket. varexp::division_by_zero_in_index Division by zero error in index specification. varexp::unterminated_loop_construct The buffer ended in the midst of on open looping construct. varexp::invalid_char_in_loop_limits The looping limits specification of contained invalid characters. Example Program The following source code may be found in regression-test/expand5.cc. You might want to check the other test programs there for more complex examples. Especially expand6.cc, which also makes use of arrays and loops! #include <cstdio> #include <cstdlib> #include <cerrno> #include <cstring> #include "../varexp.hh" using namespace varexp; using namespace std; struct env_lookup : public callback_t { virtual void operator()(const string& name, string& data) { const char* p = getenv(name.c_str()); if (p == NULL) throw undefined_variable(); else data = p; } virtual void operator()(const string& name, int idx, string& data) { throw runtime_error("Not implemented."); } }; int main(int argc, char** argv) { const char* input = \ "\\$HOME = '${HOME}'\\n" \ "\\$OSTYPE = '${$FOO${BAR}}'\\n" \ "\\$TERM = '${TERM}'\\n"; const char* output = \ "$HOME = '/home/regression-tests'\n" \ "$OSTYPE = 'regression-os'\n" \ "$TERM = 'regression-term'\n"; string tmp; env_lookup lookup; if (setenv("HOME", "/home/regression-tests", 1) != 0 || setenv("OSTYPE", "regression-os", 1) != 0 || setenv("TERM", "regression-term", 1) != 0 || setenv("FOO", "OS", 1) != 0 || setenv("BAR", "TYPE", 1) != 0) { printf("Failed to set the environment: %s.\n", strerror(errno)); return 1; } unsetenv("UNDEFINED"); expand(input, tmp, lookup); unescape(tmp, tmp, true); if (tmp != output) { printf("The buffer returned by var_expand() " \ "is not what we expected.\n"); return 1; } return 0; } License &license;