0

I have these 3 example strings:

x <- "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found"
y <- "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found"
z <- "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found"

What I want to do is to remove strings that comes after first word of the last / delimiter resulting in:

AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer
SPIB/MA0081.1/Jaspar

I tried this but it doesn't give what I want:

> sub("\\(.*?\\)More Information | Similar Motifs Found","",x)
[1] "AP-1| Similar Motifs Found"

What's the right way to do it?

1 Answer 1

1

You can use a greedy pattern (.*/\\w+).* to match until the last /word, then extract the group with back reference:

v <- c("AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found", "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found", "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found")

sub("(.*/\\w+).*", "\\1", v)
# [1] "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer"          "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer"
# [3] "SPIB/MA0081.1/Jaspar" 

In (.*/\\w+).*, the first .* is greedy and will match as many as possible, the stop condition is / + a word(matched by \\w+); the second .* matches the remaining part of the string.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.