lingpy.data package

Submodules

lingpy.data.derive module

Module for the derivation of sound class models.

The module provides functions for the customized compilation of sound-class models. All models are defined in simple text files. In order to guarantee their quick access when loading the library, the models are compiled and stored in binary files.

lingpy.data.derive.compile_dvt(path='')

Function compiles diacritics, vowels, and tones.

Notes

Diacritics, vowels, and tones are defined in the data/models/dv/ directory of the LingPy package and automatically loaded when loading the LingPy library. The values are defined as the constants rcParams['vowels'], rcParams['diacritics'], and rcParams['tones']. Their core purpose is to guide the tokenization of IPA strings (cf. ipa2tokens()). In order to change the variables, one simply has to change the text files diacritics, tones, and vowels in the data/models/dv directory. The structure of these files is fairly simple: Each line contains a vowel or a diacritic character, whereas diacritics are preceded by a dash.

lingpy.data.derive.compile_model(model, path=None)

Function compiles customized sound-class models.

Parameters:

model : str

A string indicating the name of the model which shall be created.

path : str

A string indication the path where the model-folder is stored.

Notes

A model is defined by a folder placed in data/models directory of the LingPy package. The name of the folder reflects the name of the model. It contains three files: the file converter, the file INFO, and the optional file scorer. The format requirements for these files are as follows:

INFO

The INFO-file serves as a reference for a given sound-class model. It can contain arbitrary information (and also be empty). If one wants to define specific characteristics, like the source, the compiler, the date, or a description of a given model, this can be done by employing a key-value structure in which the key is preceded by an @ and followed by a colon and the value is written right next to the key in the same line, e.g.:

@source: Dolgopolsky (1986)

This information will then be read from the INFO file and rendered when printing the model to screen with help of the print() function.

converter

The converter file contains all sound classes which are matched with their respective sound values. Each line is reserved for one class, precede by the key (preferably an ASCII-letter) representing the class:

B : ɸ, β, f, p͡f, p͜f, ƀ
E : ɛ, æ, ɜ, ɐ, ʌ, e, , ə, ɘ, ɤ, è, é, ē, ě, ê, ɚ
D : θ, ð, ŧ, þ, đ
G : x, ɣ, χ
...
matrix

A scoring matrix indicating the alignment scores of all sound-class characters defined by the model. The scoring is structured as a simple tab-delimited text file. The first cell contains the character names, the following cells contain the scores in redundant form (with both triangles being filled):

B  10.0 -10.0   5.0 ...
E -10.0   5.0 -10.0 ...
F   5.0 -10.0  10.0 ...
...
scorer

The scorer file (which is optional) contains the graph of class-transitions which is used for the calculation of the scoring dictionary. Each class is listed in a separate line, followed by the symbols v,``c``, or t (indicating whether the class represents vowels, consonants, or tones), and by the classes it is directly connected to. The strength of this connection is indicated by digits (the smaller the value, the shorter the path between the classes):

A : v, E:1, O:1
C : c, S:2
B : c, W:2
E : v, A:1, I:1
D : c, S:2
...

The information in such a file is automatically converted into a scoring dictionary (see List2012b for details).

Based on the information provided by the files, a dictionary for the conversion of IPA-characters to sound classes and a scoring dictionary are created and stored as a binary. The model can be loaded with help of the Model class and used in the various classes and functions provided by the library.

lingpy.data.model module

Module for handling sequence models.

class lingpy.data.model.Model(model, path=None)

Bases: object

Class for the handling of sound-class models.

Parameters:

model : { ‘sca’, ‘dolgo’, ‘asjp’, ‘art’, ‘_color’ }

A string indicating the name of the model which shall be loaded. Select between:

  • ‘sca’ - the SCA sound-class model (see List2012a),
  • ‘dolgo’ - the DOLGO sound-class model (see: :evobib:`Dolgopolsky1986’),
  • ‘asjp’ - the ASJP sound-class model (see Brown2008 and Brown2011),
  • ‘art - the sound-class model which is used for the calculation of sonority profiles and prosodic strings (see List2012), and
  • ‘_color” - the sound-class model which is used for the coloring of sound-tokens when creating html-output.

Notes

Models are loaded from binary files which can be found in the data/models/ folder of the LingPy package. A model has two essential attributes:

  • converter – a dictionary with IPA-tokens as keys and sound-class characters as values, and
  • scorer – a scoring dictionary with tuples of sound-class characters as keys and scores (integers or floats) as values.

Examples

When loading LingPy, the models sca, asjp, dolgo, and art are automatically loaded, and they are accessible via the rc() function for global settings:

>>> from lingpy import *
>>> rc('asjp')
<sca-model "asjp">

Define variables for the standard models for convenience:

>>> asjp = rc('asjp')
>>> sca = rc('sca')
>>> dolgo = rc('dolgo')
>>> art = rc('art')

Check, how the letter a is converted in the various models:

>>> for m in [asjp,sca,dolgo,art]:
>>> for m in [asjp,sca,dolgo,art]:
...     print('{0} > {1} ({2})'.format('a',m.converter['a'],m.name))
...
a > a (asjp)
a > A (sca)
a > V (dolgo)
a > 7 (art)

Retrieve basic information of a given model:

>>> print(sca)
Model:    sca
Info:     Extended sound class model based on Dolgopolsky (1986)
Source:   List (2012)
Compiler: Johann-Mattis List
Date:     2012-03

Attributes

converter dict A dictionary with IPA tokens as keys and sound-class characters as values.
scorer dict A scoring dictionary with tuples of sound-class characters as keys and similarity scores as values.
info dict A dictionary storing the key-value pairs defined in the INFO.
name str The name of the model which is identical with the name of the folder from wich the model is loaded.
lingpy.data.model.load_dvt(path='')

Function loads the default characters for IPA diacritics and IPA vowels of LingPy.

Module contents

LingPy comes along with many different kinds of predefined data. When loading the library, the following dictionary is automatically loaded and employed by all LingPy modules:

rcParams : dict

As an alternative to all global variables, this dictionary contains all these variables, and additional ones. This dictionary is used for internal coding purposes and stores parameters that are globally set (if not defined otherwise by the user), such as

  • specific debugging messages (warnings, messages, errors)
  • default values, such as “gop” (gap opening penalty), “scale” (scaling factor
  • by which extended gaps are penalized), or “figsize” (the default size of
  • figures if data is plotted using matplotlib).

These default values can be changed with help of the rc function that takes any keyword and any variable as input and adds or modifies the specific key of the rcParams dictionary, but also provides more complex functions that change whole sets of variables, such as the following statement:

>>> rc(schema="asjp")

which switches the variables “asjp”, “dolgo”, etc. to the ASCII-based transcription system of the ASJP project.

If you want to change the content of c{rcParams} directly, you need to import the dictionary explicitly:

>>> from lingpy.settings import rcParams

However, changing the values in the dictionary randomly can produce unexpected behavior and we recommend to use the regular rc function for this purpose.

lingpy.settings.rc(rval=None, **keywords)

Function changes parameters globally set for LingPy sessions.

Parameters:

rval : string (default=None)

Use this keyword to specify a return-value for the rc-function.

schema : {“ipa”, “asjp”}

Change the basic schema for sequence comparison. When switching to “asjp”, this means that sequences will be treated as sequences in ASJP code, otherwise, they will be treated as sequences written in basic IPA.

Notes

This function is the standard way to communicate with the rcParams dictionary which is not imported as a default. If you want to see which parameters there are, you can load the rcParams dictonary directly:

>>> from lingpy.settings import rcParams

However, be careful when changing the values. They might produce some unexpected behavior.

Examples

Import LingPy:

>>> from lingpy import *

Switch from IPA transcriptions to ASJP transcriptions:

>>> rc(schema="asjp")

You can check which “basic orthography” is currently loaded:

>>> rc(basic_orthography)
'asjp'
>>> rc(schema='ipa')
>>> rc(basic_orthography)
'fuzzy'