1

Havig a dataframe like this:

df_in <- data.frame(x = c('x1','x2','x3','x4'),
                     col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
                     col2 = c('https://google.com', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'),
                     col3 = c('http://www.bbcnews.com?id=321', 'http://google.com?id=1234','NA','https://bbcnews.com/search'),
                     col4 = c('NA', 'https://www.youtube/com','NA', 'www.youtube.com/searcht'))

Example of dataframe input as printed in the console:

 x                          col1                           col2                          col3                    col4
1 x1  http://youtube.com/something             https://google.com http://www.bbcnews.com?id=321                      NA
2 x2                            NA http://www.bbcnews2.com?id=321     http://google.com?id=1234 https://www.youtube/com
3 x3  https://www.yahooexample.com                             NA                            NA                      NA
4 x4 https://www.yahooexample2.com        https://google.com/text    https://bbcnews.com/search www.youtube.com/searcht

I would like to create a dataframe of a specific subset conditions. Example I would like to keep only the one which contain the "google", "youtube" and "bbc" in their sting. Example of expected output:

df_out <- data.frame(x = c('x1','x2','x4'),
                     col1new = c('http://youtube.com/something', 'http://www.bbcnews2.com?id=321', 'https://google.com/text'),
                     col2new = c('https://google.com', 'http://google.com?id=1234', 'https://bbcnews.com/search'),
                     col3new = c('http://www.bbcnews.com?id=321', 'https://www.youtube/com', 'www.youtube.com/searcht'))

Example of dataframe output as printed in the console:

 x                        col1new                    col2new                       col3new
1 x1   http://youtube.com/something         https://google.com http://www.bbcnews.com?id=321
2 x2 http://www.bbcnews2.com?id=321  http://google.com?id=1234       https://www.youtube/com
3 x4        https://google.com/text https://bbcnews.com/search       www.youtube.com/searcht
4
  • What did you try? Commented Feb 12, 2018 at 9:09
  • What happens when your search team is your condition? For example bbc in youtube? https://www.youtube.com/results?search_query=bbc Commented Feb 12, 2018 at 9:15
  • 1
    You may need i1 <- Reduce('|', lapply(df_in[-1], grepl, pattern= "googlel|youtube|bbc")); cbind(df_in[i1, 1, drop = FALSE], t(apply(df_in[i1,-1], 1, function(x) x[grepl("google|youtube|bbc", x)]))) Commented Feb 12, 2018 at 9:45
  • What should be the output for : df_in <- data.frame(x = c('x1','x2','x3','x4' ,'x5'), col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com', 'www.youtube.com/searcht'), col2 = c('https://google.com', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text', 'www.youtube.com/searcht'), col3 = c('http://www.bbcnews.com?id=321', 'http://google.com?id=1234','NA','https://bbcnews.com/search', 'www.youtube.com/searcht'), col4 = c('NA', 'https://www.youtube/com','NA', 'www.youtube.com/searcht', 'www.youtube.com/searcht') ) ? Commented Feb 12, 2018 at 10:23

1 Answer 1

2

We could create a logical condition with grep to filter the rows based on the entries of elements having atleast one of the pattern after the http://

i1 <- Reduce('|', lapply(df_in[-1], grepl, pattern= "https?://(google|youtube|bbc)"))

Then, loop through the rows of the subset data and get the links that match with google/youtube/bbc

tmp <- t(apply(df_in[i1,-1], 1, function(x) x[grepl("(google|youtube|bbc)", x)]))
colnames(tmp) <- paste0('col', seq_len(ncol(tmp)), "new")

and cbind with the subset of first column

cbind(df_in[i1, 1, drop = FALSE], tmp)
#   x                        col1new                    col2new                       col3new
#1 x1   http://youtube.com/something         https://google.com http://www.bbcnews.com?id=321
#2 x2 http://www.bbcnews2.com?id=321  http://google.com?id=1234       https://www.youtube/com
#4 x4        https://google.com/text https://bbcnews.com/search       www.youtube.com/searcht
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.