Remove part of multiline string after pattern in r

Question

I am trying to remove all the characters from ROW FORMAT SERDE, with the gsub function however it does not work. Any suggestion.

x <- c("CREATE TABLE `cld_ml_bi_eng.iris`(", "  `sepal_length` double, ", 
  "  `sepal_width` double, ", "  `petal_length` double, ", "  `petal_width` double, ", 
  "  `species` string)", "ROW FORMAT SERDE ", "  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' ", 
  "STORED AS INPUTFORMAT ", "  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' ", 
  "OUTPUTFORMAT ", "  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'", 
  "LOCATION", "  'hdfs://haprod/warehouse/tablespace/managed/hive/cld_ml_bi_eng.db/iris'", 
  "TBLPROPERTIES (", "  'bucketing_version'='2', ", "  'transactional'='true', ", 
  "  'transactional_properties'='default', ", "  'transient_lastDdlTime'='1636686825')")

Here I use gsub

gsub(pattern = "(ROW FORMAT SERDE).*", replacement = "\\1", x = x)

My expected output

c("CREATE TABLE `cld_ml_bi_eng.iris`(", "  `sepal_length` double, ", 
  "  `sepal_width` double, ", "  `petal_length` double, ", "  `petal_width` double, ", 
  "  `species` string)")

Each of your lines is a separate object - you either need to paste it together first for gsub to work, or just select the chunk - head(x, grep("ROW FORMAT SERDE\\s+", x)-1) — thelatemail
– thelatemail, Commented Nov 12, 2021 at 3:35

Tim Biegeleisen · Accepted Answer · 2021-11-12 03:41:54Z

1

One approach would be to use grep to find the index of the string in your input vector which starts with the text ROW FORMAT SERDE. Then, subset the input vector and paste into a single string:

paste0(x[1:(grep("^ROW FORMAT SERDE", x)-1)], collapse="")

[1] "CREATE TABLE cld_ml_bi_eng.iris( sepal_length double, sepal_width double, petal_length double, petal_width double, species string)"

answered Nov 12, 2021 at 3:41

Tim Biegeleisen

526k32 gold badges323 silver badges400 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Remove part of multiline string after pattern in r

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related