I am working on a DSL for text processing. The core is searching for regular expressions with some operators around them. These searches are embedded into a more procedural program, which allows for selective execution and setting of metadata. Hopefully, a little snippet gives you an idea how it could look like:
xml_module = import('xml') // various extensions are available to deal with different data formats
// matches when ghi is found at most 5 words away from a match of either abc or def (simplified)
r1 = add_rule('my_rule', [
distance([
or([regex('abc'), regex('def')]),
regex('ghi')
]
)
if r1.matched()
xml.add_tag('root/tag', 'my_value')
endif
The syntax of the DSL is a bit inspired by python. It features statements and expressions and a few select builtin data types. Functions and methods can not be defined by the user.
I am unsure however about the way I should write rules. A rule takes a name and an array of subrules as parameters. Those subrules could be stored in variables to reuse them. It's a tradeoff between loosing readability because all subrule levels are stored in variables, or losing readability because they are all written in place.
Maybe you have some feedback?