Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

2
  • 4
    Nice, but some English words truly contain trailing punctuation. For example, the trailing dots in e.g. and Mrs., and the trailing apostrophe in the possessive frogs' (as in frogs' legs) are part of the word, but will be stripped by this algorithm. Handling abbreviations correctly can be roughly achieved by detecting dot-separated initialisms plus using a dictionary of special cases (like Mr., Mrs.). Distinguishing possessive apostrophes from single quotes is dramatically harder, since it requires parsing the grammar of the sentence in which the word is contained. Commented Jan 29, 2016 at 0:02
  • 2
    @MarkAmery You're right. It's also since occurred to me that some punctuation marks—such as the em dash—can separate words without spaces. Commented Sep 30, 2016 at 8:57