Skip to main content
4 of 4
Commonmark migration

Finding all alphanumerical words in a text line using std::string

Preliminaries

I am working on a software that controls CAEN digitizer. The current step is to provide an ability to save/load configuration to/from an external source (hereafter - "config file"). I decided config file to be a simple text file and consisting of lines like this one:

PARAMETER VALUE

For example, this is how the trigger threshold might have been set:

CH_THRESHOLD 100

To compile a configuration for a digitizer from a config file my program should process a config file line by line and extract a pair parameter-value from each line if it is possible.

Some notes

Here are some principles I was following when writing the code:

  • First of all, performance is not the point. The parsing is not intended to be fast because it is a single operation, i.e. it is performed only once in a relative large time interval. I mainly refer it to the find_first_* function calls below (I believe it is expensive to search for a character in that const string).

  • The parsing of a config file is the first step in the loading of a config file. It is about syntax not semantics; that is why at this stage all the values are strings (though semantically they could be numbers).

Program

struct ConfigUnit

This structure represents a single pair parameter-value, so called a config unit.

struct ConfigUnit
{
    std::string parameter;
    std::string value;

    void Print() const
    {
        std::cout << "Parameter: '" << parameter << "'" "\tValue: '" << value << "'" << std::endl;
    }
};

Algorithm

This is actually a little bit more generic algorithm than I need because it searches for all words in a line.

const std::string alphanumeric = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

ConfigUnit GetConfigUnit( const std::string &line )
{
    ConfigUnit unit;

    std::vector< std::string > words;
    //Find all alphanumerical words in the line
    for( size_t startWord = line.find_first_of( alphanumeric ); startWord != std::string::npos; )
    {
        std::size_t endWord = line.find_first_not_of( alphanumeric, startWord );
        words.push_back( line.substr( startWord, (endWord - startWord) ) );
        startWord = line.find_first_of( alphanumeric, endWord );
    }

    if( words.size() == 2 )
    {
        unit.parameter = words.at(0);
        unit.value = words.at(1);
    }

    return unit;
}

Test

#include <iostream>
#include <vector>
#include <string>
#include <fstream>


struct ConfigUnit
{
    std::string parameter;
    std::string value;

    void Print() const
    {
        std::cout << "Parameter: '" << parameter << "'" "\tValue: '" << value << "'" << std::endl;
    }
};


const std::string alphanumeric = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

ConfigUnit GetConfigUnit( const std::string &line )
{
    ConfigUnit unit;

    std::vector< std::string > words;
    //Find all alphanumerical words in the line
    for( size_t startWord = line.find_first_of( alphanumeric ); startWord != std::string::npos; )
    {
        std::size_t endWord = line.find_first_not_of( alphanumeric, startWord );
        words.push_back( line.substr( startWord, (endWord - startWord) ) );
        startWord = line.find_first_of( alphanumeric, endWord );
    }

    if( words.size() == 2 )
    {
        unit.parameter = words.at(0);
        unit.value = words.at(1);
    }

    return unit;
}


int main()
{
    std::ifstream file; 
    file.exceptions( std::fstream::failbit | std::fstream::badbit );
    try
    {
        file.open( "dummy.txt" );
        std::string line;
        while( std::getline( file, line ) )
        {
            ConfigUnit unit = GetConfigUnit( line );
            unit.Print();
        }
        file.close();
    }
    catch( const std::ifstream::failure &e )
    {
        if( file.eof() )
        {
            std::cout << "SUCCESS!\n";
        }
        else
        {
            std::cerr << "FAILURE! " << e.what() << std::endl;
        }
    }

    return 0;
}

As always, please critique, suggest, correct.

LRDPRDX
  • 941
  • 5
  • 16