Skip to main content
8 events
when toggle format what by license comment
Oct 31 at 15:22 comment added Ben Voigt Another evidence in favor of "cannot be done correctly without using an actual HTML parser" is what the code currently does if there are multiple script or multiple style tags, not contiguous.
Oct 30 at 17:38 history edited Booboo CC BY-SA 4.0
deleted 2 characters in body
Oct 30 at 9:00 comment added Stef Although I don't actually know how much this actually saves - this still cuts the text in n+1 strings, and presumably the last string, which we don't need, will be very long and I don't know whether it's copied or if python is smart enough to use the same underlying char array as the original text.
Oct 29 at 14:57 history edited Booboo CC BY-SA 4.0
More efficient splitting of the text.
Oct 29 at 14:54 comment added Booboo @Stef Good point! Thanks.
Oct 29 at 13:08 comment added Stef Regarding test_fingerprint: words = text.lower().split()[:n] can be replaced with words = text.lower().split(maxsplit=n)[:n] to avoid splitting the whole text when only the first few words are wanted
Oct 29 at 11:55 history edited Booboo CC BY-SA 4.0
added 112 characters in body
Oct 29 at 11:49 history answered Booboo CC BY-SA 4.0