-2

I have a large string value and I am trying to find the best way to replace a certain text value out of it without changing any values in a URL.

For instance lets say I want to replace the word "google" with "hello". I have a large string value with multiple instances of "hi" and a url within the string value "https://www.google.com" (this is just an example). Which is the best route to take for replacing these values, potentially a split on the string, regex or a replace?

At the moment I have something like this:

var data = "<h1>google this is a sample text</h1><p> more text will go here so, google. <a href='https://google.com'> Link here </a>";
var test = "";
if(data.Contains("google")){
   test = data.Replace("google", "hello");
}
// for case sensitivity
if(data.Contains("Google")){
   test = data.Replace("Google", "hello");
}

Is there a better alternative to this and would there be a way to not replace the text in a url?

10
  • regex would probably be able to do the trick here. Commented Apr 13, 2021 at 10:06
  • 1
    How does your input string look like? Are there spaces, or any other separator in between "words" and URLs? Commented Apr 13, 2021 at 10:06
  • 1
    could you give a example for the different kinds of input strings Commented Apr 13, 2021 at 10:07
  • Ill add more detail into the input string now but effectively it will look like html Commented Apr 13, 2021 at 10:08
  • 3
    Are you always going to be parsing HTML? Because if so, I recommend using something like HtmlAgilityPack and using this Q&A to get the text of the body, then use this Q&A to replace the string you want, ignoring case Commented Apr 13, 2021 at 10:18

1 Answer 1

1

In your very particular case, I would try at first some kind of basic splitting, provided that the tag 'a' is always used and only used to insert the URLs

   private string ReplaceNonUrl_Split(string bigString, string[] substringsToReplace, string[] newStrings)
        {
            string[] Parts = bigString.Split(new string[] { "<a", "</a>" }, StringSplitOptions.None);

            for(int i=0; i<Parts.Length; i++)
            {
                if (Parts[i].Contains("href="))
                {
                    string[] subParts = Parts[i].Split(new string[] { ">" }, StringSplitOptions.None);
                    for (int j = 1; j < subParts.Length; j++)
                    {
                        for (int k = 0; k < newStrings.Length; k++)
                            subParts[j] = subParts[j].Replace(substringsToReplace[k], newStrings[k]);
                    }

                    Parts[i] = string.Join(">", subParts);
                }
                else
                {
                    for (int k = 0; k < newStrings.Length; k++)
                        Parts[i] = Parts[i].Replace(substringsToReplace[k], newStrings[k]);
                }
            }

            string ReplacedString= Parts[0];          
            bool startingURL = true;
            for(int i= 1; i< Parts.Length; i++)
            {
                if (startingURL)
                    ReplacedString += "<a" + Parts[i];
                else
                    ReplacedString += "</a>" + Parts[i];

                startingURL = !startingURL;
            }

            return ReplacedString;
        }

Then call:

   string replacedString = ReplaceNonUrl_Split(data, new string[] { "google", "Google" }, new string[] { "hello", "Hello" });

DISCLAIMER This is just a very manual option. Surely, there already exists libraries that do this for you nicer and efficiently, so I recommend to have a look first to existing html parsers that might fit you.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.