1

So I fetch a string from a website via code from another question I posted here. This works really well when I put it into a rich textbox, but, now I need to split the string into seperate sentences in a list/array (suppose list will be easier, since you don't need to determine how long the input is going to be).

Yesterday I found the following code at another question (didn't note the question, sorry):

List<string> list = new List<string>(Regex.Split(lyrics, Environment.NewLine));

But the input is now spliting into two parts, the first three sentences and the rest.

I retrieve the text from musixmatch.com with the following code (added fixed url for simplicity):

var source = "https://www.musixmatch.com/lyrics/Krewella/Alive";
var htmlWeb = new HtmlWeb();
var documentNode = htmlWeb.Load(source).DocumentNode;

var findclasses = documentNode
    .Descendants("p")
    .Where(d => d.Attributes["class"]?.Value.Contains("mxm-lyrics__content") == true);

var text = string.Join(Environment.NewLine, findclasses.Select(x => x.InnerText));

More information about this code can be found here. What it does in a nutshell is it retrieves specific html that has the lyrics in it. I need to split the lyrics line by line for a synchronization process that I'm building (just like was built-in in Spotify a while ago). I need something (preferably an list/array) that I can index because that would make the database to store all this data a bit smaller. What am I supposed to use for this process?

Edit: Answer to the mark of a possible duplicate: C# Splitting retrieved string to list/array

16
  • 1
    I've just looked at the sample, it's gonna be tough as they seem to split it into almost random tags in the HTML. Commented Dec 15, 2016 at 12:59
  • 1
    @MagicLegend The problem is that there are two p tags with the lyrics and you just concatenate them by a new line, so when you split it there are only two parts. You need to split on whatever is breaking the actual lines. Commented Dec 15, 2016 at 13:03
  • 1
    @MagicLegend the example has lyrics split into two divs, with the same <p> mxm-lyrics__content Commented Dec 15, 2016 at 13:03
  • 1
    That's indeed something that I haven't realized myself. You gentlemen are correct in that. Commented Dec 15, 2016 at 13:04
  • 2
    @MagicLegend your text uses \n, not \r\n. On Windows, Environment.Newline is \r\n. Split by \n Commented Dec 15, 2016 at 13:05

2 Answers 2

4

You can split by both:

var lines = string.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, forgot to add a comment here. I tweaked the line a little so it would insert the output into a list, so it would become List<string> list = new List<string>(lyrics.Split(new char[] { '\r', '\n' }));. I also needed the emptyentries, hence I removed that addition. Thank you for the help :)
0

What I would do is to ensure that there is a common concept of "NewLine" in the code. It could be \r, \n or \r\n. Simply replace all '\n' with "". (Edited this one)

Now, all you have to do is

var lyricLines = lyricsWithCommonNewLine.Split('\r')

13 Comments

Why \r when it's \n that's guaranteed to appear in all cases? Removing \r makes sense, replacing \n with \r though, not so much
@PanagiotisKanavos What would you suggest then?
I did, both here and in the comments. Just split by \n. In general, you can use `Replace("\r",""). Or use a Regex that matches both cases. Or use a StringReader. The regex and StringReader will be the fastest
Environment.NewLine will always return adequate new line symbol.
@PanagiotisKanavos, you are right, that was just a brain fart. Edited the answer.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.