1

I would like to extract a part of the string. Here is an example dataset.

df <- data.frame(id = c(1,2),
                 string = c('<itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value>',
                            '<itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value>'))

> df
  id                                                                       string
1  1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value>
2  2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value>

I would like to extract ETC_CHOICE_2 and ETC_CHOICE_4 from the long string. My desired output would be:

> df
  id                                                                       string  extract
1  1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value>  ETC_CHOICE_2
2  2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value>  ETC_CHOICE_4

Does anyone have any idea?

Thanks!

2 Answers 2

1

An option is to use htmlParse from XML

library(XML)
library(dplyr)
df %>% 
  mutate(extract = htmlParse(string) %>%
                    getNodeSet("//value") %>%
                    xmlValue)

-output

#id                                                                       string      extract
#1  1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value> ETC_CHOICE_2
#2  2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value> ETC_CHOICE_4
Sign up to request clarification or add additional context in comments.

Comments

1

You can use regex to extract everything between <value> and </value>.

df$extract <- sub('.*<value>(.*)</value>', '\\1', df$string)
df

#  id                                                                       string      extract
#1  1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value> ETC_CHOICE_2
#2  2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value> ETC_CHOICE_4

1 Comment

I appreciate your time. THis solution indeed can be used in different ways since you proposed the location of the desired extracted part. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.