DEV Community

Patrick Wendo
Patrick Wendo

Posted on

Using regex named capture groups to process lines in a CSV file in ruby.

Imagine you are processing a CSV file with information from weather stations. single line might look like this

Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n
Enter fullscreen mode Exit fullscreen mode

The regex library in ruby allows for named captures. What this means is that you can specify a name for the text being matched. For instance

/(?<city>[\w\s]+)/
Enter fullscreen mode Exit fullscreen mode

The ?<city> is the name of the capture group w+(\s\w+).

If we run the code

"Hamburg;22.0".match(/(?<city>[\w\s]+)/).named_captures
Enter fullscreen mode Exit fullscreen mode

This will return a ruby Hash that looks like this

{"city"=>"Hamburg"}
Enter fullscreen mode Exit fullscreen mode

We can use multiple named captures like this. However, if we want to return all matched captures in the string, we use the .scan method instead. This will return an array of arrays, with each internal array being a matched capture.

For instance:

"Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n".scan(/(?<city>[\w\s]+);(?<temp>\d+(\.\d+)?)/)
Enter fullscreen mode Exit fullscreen mode

will return

[["Hamburg", "22.0"], ["Berlin", "18.45"], ["Tokyo", "11.23"], ["New York", "4.20"]]
Enter fullscreen mode Exit fullscreen mode

This makes it markedly easier to process the data.

Named captures are pretty cool.

Reference:
Regex Named Captures
Rubular (Regex playground for Ruby)

Top comments (0)