Wikifunctions:Type proposals/Alphabet

This would be a list of Code point (Z86) associated with one Natural language (Z60). A language may have multiple alphabets associated with if for different purposes.

Uses

Sorting

The most obvious user case would be language respecting sorting, as even latin based alphabets disagree on the order of letters.

Language dependent string evaluation/manipulation

This covers miscellaneous cases where an alphabet is passed as one argument to a functions. Some existing functions where this could be useful:

string only has characters from alphabet (Z11693)
is pangram of alphabet (Z13119)
- Language implementations: Bengali and Latin base alphabet
Caesar ciphers: general case Caesar cipher (Latin alphabet) (Z12812), and ROT1 (Latin alphabet) (Z10846), ROT13 (Latin alphabet) (Z10627) and ROT25 (Latin alphabet) (Z10851)
is a palindrome (Z10096) has many issues, but one raised on its discussion page is handling of multi character letters. Breton is used as an example and this also applies to many other languages, like the Dutch Ij and Welsh CC, DD, FF, NG, LL, PH, RH and TH. And those are still using "the Latin alphabet".

Comments

This still leaves some sorting related issues unresolved, like transliteration of foreign orthology. In Swedish, the Danish Ø and ø are treated like the native Ö and ö in sorting, like in this Wikipedia category. But those could be handled using language specific replacement maps, an alphabet passed to the function would contain which natural language to use. --Autom (talk) 01:33, 30 March 2024 (UTC)[reply]

Do you think this should be a String, rather than an (ordered) list of code points? Jdforrester (WMF) (talk) 18:17, 1 April 2024 (UTC)[reply]

@Jdforrester (WMF): I wrote it like that because some languages treat double letters differently for sorting (like how Aa is sorted under Å in w:da:Kategori:Købstæder). Using single code points would be more elegant and intuitive, but a small string can do all the same things and more. --Autom (talk) 11:42, 10 May 2024 (UTC)[reply]

Sorry, this wouldn't solve my example as they are treated as equivalent. The Dutch Ij is already in its alphabetical position, but I'm certain there are other exceptions I haven't thought about. I have only limited knowledge of European languages, after all.

You have me convinced that it might be best to use code points and solve edge cases on a per language basis instead. --Autom (talk) 11:51, 10 May 2024 (UTC)[reply]