2

I have a list of String in Scala, each String has a key/value format as follows:

<row Id="25780063" PostTypeId="2" ParentId="25774527" CreationDate="2014-09-11T05:56:29.900" />

Each String may have some extra key/value. I'd like to extract the value for a few keys for each string. Here is the pattern I've defined but it is not working properly

val idPattern = "Id=(.*).r
val typePattern = "PostTypeId=(.*)".r

How can I correctly extract the value for 'Id' and 'PostTypeId'?

1
  • The scaladoc for scala.util.matching.Regex is pretty good. Commented Feb 21, 2016 at 3:58

2 Answers 2

1

Making it unanchored says find instead of match all input.

scala> val id = """Id="([^"]*)"""".r.unanchored
id: scala.util.matching.UnanchoredRegex = Id="([^"]*)"

scala> """stuff Id="something" more""" match { case id(x) => x }
res7: String = something

scala> id.findFirstIn("""stuff Id="something" more""")
res8: Option[String] = Some(Id="something")
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. Could you tel lme what 'unanchored' means here?
what if the string does not match with the pattern. Currently it returns an error message. What should I do?
In regex, the ^ and $ that says match start and end of input are called anchors. Normally when matching all input, it behaves as though anchored, so unanchored means the pattern matches anywhere in the input. To supply a default case in scala match, use case _ =>.
If I have two different pattern and would like to return only rows/records match both patterns, how can I do that? I mean, if a row does not match a pattern, skip it.
rows collect { case r(x) => x } to extract, or filter to filter, rows filter { case r(_*) => true case _ => false }.
0

First you have to define the regex as valid stable identifiers.

val IdPattern = "Id=(.*).r
val TypePattern = "PostTypeId=(.*)".r

Note the initial uppercase, required for pattern matching (or use backquotes if really want it lowercased).

Then,

aString match {
  case IdPattern(group) => println(s"id=$group")
  case TypePattern(group) => println(s"type=$group")
}

2 Comments

Your comment about uppercase is not true for extractor patterns.
I dont think my regex is correct because the PostTypeId select everything to the end of the string

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.