I can’t believe nobody has done this list yet. I mean, there is one about names, one about time and many others on other topics, but not one about languages yet (except one honorable mention that comes close). So, here’s my attempt to list all the misconceptions and prejudices I’ve come across in the course of my long and illustrious career in software localisation and language technology. Enjoy – and send me your own ones!

      •  TehPers   ( @TehPers@beehaw.org ) 
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        Periods, question marks, and exclamation points are end-of-sentence punctuation. Splitting on those would split on 8.9.

        Even if you wrote a magical split function that always split at the ends of sentences, you can quote multiple sentences in the middle of a sentence:

        John told me “Are you crazy? That’s dangerous!” and I reassured him that it would be safe.

        Basically programmers should not try to write their own sentence splitting function (unless they’re writing a library for that or something, like a natural language parser).