Extracting text within parenthesis in a QGIS attribute table

Question

I have a shapefile of data for Japan, and I need to extract the English character names in my "name" field in order to label the layer. Much of the data is all in Japanese, but some of the names have an alternate English name which can be found inside a set of parenthesis. I'd like to extract this text to another field and use it for labeling. I'm using QGIS.

Example:

I need to turn

野川サイクリング道路 (Nogawa Cycling Road)

Into

Nogawa Cycling Road

I'm attempting to use the regexp_substr function in the Field Calculator:

regexp_substr( "name", '\((.*?)\)')

When I try to run this, I get an error:

An error occured while evaluating the calculation string: Invalid regular expression '?(.*?)?': bad repetition syntax.

Oddly, in testing, I tried this same regex in PostGIS and it worked great. Does anyone know of a regex that will do this task in the QGIS field calculator?

The expression provided is a valid regular expression, and actually works too, The error message is caused by not formatting the expression correctly for QGIS. For QGIS regexp functions, including regexp_substr - QGIS Documentation "Backslash characters must be double escaped." Properly escaping the back slashes would allow the original expression to work in QGIS: regexp_substr( "name", '\\((.*?)\\)') — bixb0012
– bixb0012, Commented Jan 27, 2023 at 14:02

Steven Kay · Accepted Answer · 2015-10-12 20:38:23Z

9

I tried this, and it seems to work..

regexp_substr( "name", '[(](.*)?[)]')

You need to use greedy matching (hence the ?), but also need to put square brackets around the round brackets.

answered Oct 12, 2015 at 20:38

Steven Kay

20.7k5 gold badges36 silver badges85 bronze badges

1

You can also escape the parens with \\( and \\) - I thought it might only need one backslash, and so be shorter than [)] but you need two (there's some deeper string interpolation going on....)

Spacedman
– Spacedman

2015-10-12 21:56:23 +00:00
Commented Oct 12, 2015 at 21:56
+1, don't think I've seen that many brackets in a single statement!

Joseph
– Joseph

2015-10-13 10:32:53 +00:00
Commented Oct 13, 2015 at 10:32
Thank you all! regexp_substr( "name", '[(](.*)?[)]') did the trick!

Christy Heaton
– Christy Heaton

2015-10-14 15:16:57 +00:00
Commented Oct 14, 2015 at 15:16
A question mark by itself (not combined with another quantifier) is greedy not because it is a question mark but because all quantifiers are greedy by default. Adding a question mark to another quantifier makes the original quantifier lazy.

bixb0012
– bixb0012

2023-01-27 14:01:05 +00:00
Commented Jan 27, 2023 at 14:01

Add a comment |

Babel · Accepted Answer · 2023-01-29 21:19:30Z

2

Use this regular expression with regex_replace() function to remove all characters that are not in the latin alphabet (including non-letter characters like brackets etc.), except for white spaces:

trim (regexp_replace (name,'([^\\p{Latin}|\\s])',''))

Explanation:

\\p{Latin} matches latin characters, see https://www.regular-expressions.info/unicode.html
\\s matches any white spaces
| logical OR
^ in square brackets [] excludes (logical NOT) what follows (latin characters OR white spaces)
trim() is optional and delete withespaces at the beginning and end of the output.

edited Jan 29, 2023 at 21:19

answered Jan 27, 2023 at 13:26

Babel

80.1k15 gold badges97 silver badges244 bronze badges

cant we seperate only other languages to seperate coulmn

Bruno B
– Bruno B

2023-02-28 05:21:04 +00:00
Commented Feb 28, 2023 at 5:21
Languages? Like separating Paris (french) from London(English)? No way, how should the software recognize the lange of placenaes?

Babel
– Babel

2023-02-28 06:02:31 +00:00
Commented Feb 28, 2023 at 6:02

Add a comment |

Stack Exchange Network

Extracting text within parenthesis in a QGIS attribute table

2 Answers 2

Linked

Hot Network Questions

Extracting text within parenthesis in a QGIS attribute table

2 Answers 2

Linked

Related

Hot Network Questions