Definition of β€œword”

See also words from string (Z13402). Tokenization by whitespace could be generalized to tokenization by delimiter(s). If punctuation is suppressed by whitespace substitution or inclusion within delimiters, we converge on a common function.

In the domain of lexical forms, conventions vary by language. In English we have a particular difficulty with hyphens and apostrophes (occasionally described by the misnomer β€œinterpunction”).

  • The string β€œdon’t” is generally regarded as equivalent to β€œdo not”, which is two words, not one.
  • The string β€œcan’t” is generally regarded as equivalent to β€œcannot”, which might be considered a single word.
  • Contraction of β€œis” to β€œβ€™s” may be indistinguishable from a possessive, so a whitespace-delimited string ending ’s may be considered either one word or two (whereas such a string ending s’ is always a single word, if correct).
  • Compound words are typically hyphenated in some contexts and left as separate words in others. A β€œwell-known” distinction is one that is well known. Sometimes a form with neither hyphens nor spaces may be used (see, for example, https://books.google.com/ngrams/graph?content=wellknown%2Cwell-known%2Cwell+known&year_start=1800&year_end=2000&corpus=en-2019&smoothing=3.)

GrounderUK (talk) 13:40, 30 March 2024 (UTC)Reply

𐌲𐌰𐌽𐌿𐌼𐌰𐌽 π†π‚πŒ°πŒΌ "https://www.wikifunctions.org/w/index.php?title=Talk:Z13402&oldid=94708"
Return to "Z13402" page.