Talk:Z13402
๐ฃ๐ฌ๐๐ฌ๐ฅ๐ด๐๐ด ๐๐ง๐๐ฌ๐๐ด๐๐ด: ๐น months ago ๐๐ข GrounderUK ๐๐ฌ ๐๐ฎ๐๐จ๐๐ด Definition of โwordโ
Definition of โwordโ
See also words from string (Z13402). Tokenization by whitespace could be generalized to tokenization by delimiter(s). If punctuation is suppressed by whitespace substitution or inclusion within delimiters, we converge on a common function.
In the domain of lexical forms, conventions vary by language. In English we have a particular difficulty with hyphens and apostrophes (occasionally described by the misnomer โinterpunctionโ).
- The string โdonโtโ is generally regarded as equivalent to โdo notโ, which is two words, not one.
- The string โcanโtโ is generally regarded as equivalent to โcannotโ, which might be considered a single word.
- Contraction of โisโ to โโsโ may be indistinguishable from a possessive, so a whitespace-delimited string ending โs may be considered either one word or two (whereas such a string ending sโ is always a single word, if correct).
- Compound words are typically hyphenated in some contexts and left as separate words in others. A โwell-knownโ distinction is one that is well known. Sometimes a form with neither hyphens nor spaces may be used (see, for example, https://books.google.com/ngrams/graph?content=wellknown%2Cwell-known%2Cwell+known&year_start=1800&year_end=2000&corpus=en-2019&smoothing=3.)