3

I have a character vector

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")

And I'm trying to remove span AND punctuation from every word in the vector

> something thank great to hear your

The thing is, there's no rule if span will appear before or after the word I'm interested in. Also, span can be glued to: i) characters only (e.g. yourspan), punctuation only (e.g. ..span?) or character and punctuation (e.g. somethingspan.).

I searched SO for the answer, but usually I see request to remove whole words (like here ) or elements of the string after/before a letter/punctuation (like here )

Any help will be appreciated

7
  • Please share the code that fails. Commented Dec 14, 2017 at 10:12
  • @A5C1D2H2I1M1N2O1R2T1 That gsub("span", "", words) will only remove span, but will keep the . in somethingspan.. The question is unclear. Commented Dec 14, 2017 at 10:17
  • 1
    gsub("span[[:punct:]]*", "", words) Commented Dec 14, 2017 at 10:18
  • @AvinashRaj combination of the two. Commented Dec 14, 2017 at 10:19
  • 1
    Try paste(gsub("[[:punct:]]*span[[:punct:]]*", "", words), collapse=" ") Commented Dec 14, 2017 at 10:32

3 Answers 3

2

You may use

[[:punct:]]*span[[:punct:]]*

See the regex demo.

Details

  • [[:punct:]]* - 0+ punctuations chars
  • span - a literal substring
  • [[:punct:]]* - 0+ punctuations chars

R Demo:

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ")  # Concat the elements
## => [1] "something thank great to hear your"

If there result whitespace only elements after removing unwanted strings, you may replace the second step with words <- words[trimws(words) != ""] (instead of words[words != ""]).

Sign up to request clarification or add additional context in comments.

Comments

2

https://regex101.com/ here you can try everything.

clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")

# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)

Comments

0

Use sub to remove span. To make it into a sentence use paste and collapse

library(magrittr)

sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words)  %>% paste(collapse=" ")

so it only removes a span in the beginning or in the end.

Output

[1] "something ? thank great to hear your"

6 Comments

"^span|span$" will not handle "somethingspan.", there is a . at the end. See OP: it can be followed by characters, punctuation, combination of the two, etc.. So, even [[:punct:]]? before $ won't help. The question is unclear.
Andre, the question is too unclear, but have a look at it can be followed by characters, punctuation, combination of the two, etc. Just [[:punct:]]? won't help.
@Wiktor, what's unclear about the question? I'll clarify it
Yes its unclear. I guess the code provided by all of us. Should lead @Kasia to her goal.
@Kasia, please have ALL possibilities that can occur in your rep. code.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.