0

I have a question regarding to extract letters from a string For example I have in R one vector like:

America, Asia, Europe

I want to get all of the upper letters in this format like

AAE or A, A, E

How can I do this with regmatches and regexpr?

1
  • if you are sure that there is no capital letters within the word, just do gsub("[a-z]+","",x) Commented Jun 30, 2020 at 17:11

2 Answers 2

1

A simple gsub

x <- "America, Asia, Europe"
gsub("[^A-Z]","",x)
[1] "AAE"
Sign up to request clarification or add additional context in comments.

4 Comments

Your suggestion is overly simplified. What happens if a given word happens to have two capital letters, e.g. "South America"? Your suggestion would leave behind SA, which might not be what the OP wants here. Always consider giving a flexible solution.
is it also possible to check if for example the letter A is then inside the new expression e.g AAE ?
@Abcxyz yes, you can use grepl("A", x) to check.
@TimBiegeleisen, OP asked to extract all capital letters. What if OP wants all the capital letters? in that case "South America" would return SA and that would be exactly what OP asked for.
0

You could use gsub here:

x <- "America, Asia, Europe"
output <- gsub("\\b([A-Z])[a-z]+(?:,\\s*)?", "\\1", x)
output

[1] "AAE"

If you want a CSV string output of capital letters, then consider:

x <- "America, Asia, Europe"
output <- gsub("\\b([A-Z])[a-z]+(?:,\\s*)?", "\\1, ", x)
output <- sub(", $", "", output)
output

[1] "A, A, E"

4 Comments

Thanks a lot! is it also possible to check if for example the letter A is then inside the new expression e.g AAE ?
I don't understand your follow up question?
For example if I want to check if the letter A appears in AAE or if the letter E appears in AAE
Use: grepl("A", "AAE", fixed=TRUE)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.