Saturday 28 July 2012

Custom sorting for conlangs

Sorting a list of words into alphabetical order is usually a trivially easy task. But if your words are in a conlang, alphabetical order might be different from usual. Here's a Python class that, when instantiated with a list of strings (alphabet), creates a callable object that can be used as the cmp argument of Python's list.sort() method. alphabet can contain digraphs, in which case matching is greedy, and the CustomSorter will ignore any characters not found in alphabet, which is useful for separating pairs of characters that might otherwise resemble digraphs. If you're using Python 3, you'll have to wrap the CustomSorter in comp_to_key. (Hope preformatted text works)
class CustomSorter(object):
    def __init__(self,alphabet):
        self.alphabet=alphabet

    def __call__(self,word1,word2):
        comp=0
        if word1=='' and word2=='':
            comp=0
        elif word1=='':
            comp=-1
        elif word2=='':
            comp=1
        else:
            head1,tail1=self.separate(word1)
            head2,tail2=self.separate(word2)
            if head1==head2:
                comp=self(tail1,tail2)
            else:
                comp=self.alphabet.index(head1)-self.alphabet.index(head2)
        return comp

    def separate(self,word):
        candidates=self.Candidates(word)
        while candidates==[]:
            word=word[1:]
            candidates=self.candidates(word)
        candidates.sort(key=len)
        head=candidates.pop()
        tail=word[len(head):]
        return head,tail

        def Candidates(self,word):
            return [letter for letter in self.alphabet if word.startswith(letter)]