Jan 21, 2010
minpair generates a complete list of minimal pairs (words differing in exactly one segment) from a list of words. The input should consist of one entry per line in UTF-8 Unicode. As default, each entry consists of two parts, separated by a tab. The first field is the word. The second field is an identifier. Typically this will be a gloss or record number.The output lists the two segments contrasting in the minimal pair, then the two words, each followed by its identifier, if supplied, and then the context for the difference, with a difference site marker (by default an underscore) marking the site of the difference. The segments differing are listed in a fixed order (that of the character codes) so that all tokens of the same pair will sort together.By default minpair searches only for pairs of words of the same length differing in exactly one segment. Command line options allow the addition of single insertions/deletions and single transpositions.In order to find all minimal pairs it is normally necessary for the input notation to use one character for each segment. Even in IPA transcription, this is often not the case. minpair provides for this situation by accepting definitions of multigraphs. For instance, if you put the sequences p', t', and k', representing glottalized /p/, /t/, and /k/, in the multigraph definition file, minpair will treat them as single segments.
The multigraph definition file should consist of the character sequences that are to be treated as single segments, one per line. Like all other input, this file should be encoded in UTF-8 Unicode. Sequences declared as multigraphs are compressed to a single UTF-32 codepoint so that they will compare as single segments, then decompressed on output.The basic program has a command-line interface. mpg provides an optional graphical interface. mpg will also arrange for the output of minpair to be sorted if a suitable sort utility is available. Standard sort utilities like Unix sort will do, but if the data contains multigraphs, the best results will be obtained using msort since it can read and use the same multigraph definitions as does minpair.It is also possible to use mpg without minpair. mpg can find minimal pairs involving substitutions but currently cannot handle indels and transpositions. mpg is slower than minpair but fast enough as to be usable with lists of a few thousand words. mpg is also able to find pairs of words that differ in two positions, which minpair does not know how to do. This is useful when looking for phonological rules. The maximum distance between the two positions may be specified. Requirements: