1

Is there a common algorithm to cut urls from some string?

For example:

 string1 = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
 (string2, urls) = wild_magic_appears(string1)
 string2 = "bla bla bla  more blah blah  more text bla bla"
 urls = ["http://bla.domain.com", "nohttp.domain.with.no.protocol"]

I know that regex is the best solution for that, but I'm interested in non-regex solution

1
  • 2
    You could split the string in words (split at ` `) and consider each word separately. How wild the magic will be depends on what you want to match, e.g. the simplest requirement would be "any word starting with http://, https:// or containing more than one dot". Commented Dec 17, 2013 at 8:22

3 Answers 3

1

In C# you can do this for urls that starts with "http://"

string str1 = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol";
string [] array = str1.Split(' ');
Listr<string> urls= new List<string>();

foreach(var s in array)
{
   if(s.StartsWith("http://")) // you can add here other conditions that match url
     urls.Add(s);
}
Sign up to request clarification or add additional context in comments.

1 Comment

Pretty simple. For those who will search solution for this question, I propose to detect urls by protocol names, dots and list of top-level domains (as I did).
0

Ruby,split colon and spaces.

only for urls starts with http:// and your string part don't have a colon.

>a = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
>a.split(":")[0].to_s[-4..-1] + ":" + a.split(":")[1].split()[0].to_s
=> "http://bla.domain.com"

for urls with only dots.I can't think of a good solution.

1 Comment

This is a quite narrow solution. This is not a great solution for user-texts with ':'.
0

Think of a new solution.just to split "http://" or "https://". This one is better to deal with user's colon.

>a = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
>("http://"+a.split("http://")[1].to_s).split()[0]
=>"http://bla.domain.com"

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.