7

I have a shapefile of data for Japan, and I need to extract the English character names in my "name" field in order to label the layer. Much of the data is all in Japanese, but some of the names have an alternate English name which can be found inside a set of parenthesis. I'd like to extract this text to another field and use it for labeling. I'm using QGIS.

Example:

I need to turn

野川サイクリング道路 (Nogawa Cycling Road)  

Into

Nogawa Cycling Road

I'm attempting to use the regexp_substr function in the Field Calculator:

regexp_substr( "name", '\((.*?)\)')

When I try to run this, I get an error:

An error occured while evaluating the calculation string: Invalid regular expression '?(.*?)?': bad repetition syntax.

Oddly, in testing, I tried this same regex in PostGIS and it worked great. Does anyone know of a regex that will do this task in the QGIS field calculator?

1
  • The expression provided is a valid regular expression, and actually works too, The error message is caused by not formatting the expression correctly for QGIS. For QGIS regexp functions, including regexp_substr - QGIS Documentation "Backslash characters must be double escaped." Properly escaping the back slashes would allow the original expression to work in QGIS: regexp_substr( "name", '\\((.*?)\\)') Commented Jan 27, 2023 at 14:02

2 Answers 2

9

I tried this, and it seems to work..

regexp_substr( "name", '[(](.*)?[)]')

You need to use greedy matching (hence the ?), but also need to put square brackets around the round brackets.

4
  • 1
    You can also escape the parens with \\( and \\) - I thought it might only need one backslash, and so be shorter than [)] but you need two (there's some deeper string interpolation going on....) Commented Oct 12, 2015 at 21:56
  • +1, don't think I've seen that many brackets in a single statement! Commented Oct 13, 2015 at 10:32
  • Thank you all! regexp_substr( "name", '[(](.*)?[)]') did the trick! Commented Oct 14, 2015 at 15:16
  • A question mark by itself (not combined with another quantifier) is greedy not because it is a question mark but because all quantifiers are greedy by default. Adding a question mark to another quantifier makes the original quantifier lazy. Commented Jan 27, 2023 at 14:01
2

Use this regular expression with regex_replace() function to remove all characters that are not in the latin alphabet (including non-letter characters like brackets etc.), except for white spaces:

trim (regexp_replace (name,'([^\\p{Latin}|\\s])',''))

Explanation:

  • \\p{Latin} matches latin characters, see https://www.regular-expressions.info/unicode.html
  • \\s matches any white spaces
  • | logical OR
  • ^ in square brackets [] excludes (logical NOT) what follows (latin characters OR white spaces)
  • trim() is optional and delete withespaces at the beginning and end of the output.

enter image description here

2
  • cant we seperate only other languages to seperate coulmn Commented Feb 28, 2023 at 5:21
  • Languages? Like separating Paris (french) from London(English)? No way, how should the software recognize the lange of placenaes? Commented Feb 28, 2023 at 6:02

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.