lingpy.meaning package

Submodules

lingpy.meaning.basvoc module

Class for the handling of basic vocabulary lists.

class lingpy.meaning.basvoc.BasVoc(infile=None, col='list', row='key', conf=None)

Bases: lingpy.basic.parser.QLCParserWithRowsAndCols

Load a comparative collection of Swadesh lists (concepticon).

Notes

This collection may be useful for retrieving a subset of a given dataset, or for converting between conceptual items.

Examples

Load a BasVoc object without arguments in order to get the default object:

>>> from lingpy.meaning import BasVoc
>>> concepticon = BasVoc()

Alternatively, load a pre-compiled object from LingPy:

>>> from lingpy.meaning import concepticon

Retrieve all original words in Jachontov’s list concept list:

>>> concepticon.get_list('jachontov','number','item')
[['94', 'water'],
 ['25', 'eye'],
 ['45', 'know'],
 ['86', 'this'],
 ['84', 'tail'],
 ['87', 'thou'],
 ['28', 'fire'],
 ['89', 'tooth'],
 ['63', 'one'],
 ['32', 'full'],
 ['59', 'new'],
 ['42', 'I'],
 ['96', 'what'],
 ['82', 'sun'],
 ['61', 'nose'],
 ['37', 'hand'],
 ['18', 'dog'],
 ['24', 'egg'],
 ['81', 'stone'],
 ['88', 'tongue'],
 ['54', 'moon'],
 ['108', 'wind'],
 ['98', 'who'],
 ['104', 'salt'],
 ['50', 'louse'],
 ['91', 'two'],
 ['29', 'fish'],
 ['21', 'ear'],
 ['41', 'horn'],
 ['9', 'blood'],
 ['17', 'die'],
 ['110', 'year'],
 ['57', 'name'],
 ['10', 'bone'],
 ['33', 'give']]
get_dict(col='', row='', entry='', **keywords)

Return a dictionary representation for a given concept list.

Parameters:

col : str

The concept list.

row : str

The concept (referenced by its unique ID).

entry : str

The entry that shall serve as key for the dictionary.

Returns:

d : dict

A dictionary with the unique IDs as key and the specified entry as value.

get_list(swadlist, *entries)

Return a given concept list with the specified entries.

Parameters:

swadlist : str

The concept list that shall be selected.

*entries : str

The entries that shall be included in the list.

Returns:

l : list

A list that contains the entries as specified.

Examples

>>> from lingpy.meaning import concepticon
>>> lst = concepticon.get_list('jachontov', 'item')[:5]
>>> lst
['water', 'eye', 'know', 'this', 'tail']
get_sublist(sublist, baselist, *entries)

Return the entries of one list that also occur in another list.

Parameters:

sublist : str

The sublist whose entries shall be selected.

baselst : str

The name of the basic list of which the sublist shall be taken.

*entries : str

The entries (“item”, “number”, etc.) which shall be selected from the lists.

Returns:

l : list

A list containing the entries as specified.

Examples

>>> from lingpy.meaning import concepticon
>>> concepticon.get_sublist('dolgopolsky','jachontov','item')
['water',
 'eye',
 'thou',
 'tooth',
 'I',
 'what',
 'tongue',
 'who',
 'louse',
 'two',
 'name']
class lingpy.meaning.basvoc.Concepticon

Bases: object

compare(*lists, **keywords)

Compare multiple concept lists with each other.

lingpy.meaning.colexification module

Module offers methods to handle colexification patterns in wordlist objects.

lingpy.meaning.colexification.colexification_network(wordlist, entry='ipa', concept='concept', output='', filename='network', bipartite=False, **keywords)

Calculate a colexification network from a given wordlist object.

Parameters:

wordlist : ~lingpy.basic.wordlist.Wordlist

The wordlist object containing the data.

entry : str (default=”ipa”)

The reference point for the language entry. We use “ipa” as a default.

concept : str (default=”concept”)

The reference point for the name of the row containing the concepts. We use “concept” as a default.

output: str (default=’‘) :

If output is set to “gml”, the resulting network will be written to a textfile in GML format.

Returns:

G : networkx.Graph

A networkx.Graph object.

lingpy.meaning.colexification.compare_colexifications(wordlist, entry='ipa', concept='concept')

Compare colexification patterns for a given wordlist.

lingpy.meaning.colexification.evaluate_colexifications(G, weight='wordWeight', outfile=None)

Function calculates most frequent colexifications in a wordlist.

lingpy.meaning.glosses module

Module provides functions for the handling of concept glosses in linguistic datasets.

lingpy.meaning.glosses.compare_conceptlists(list1, list2, output='', match=None, filename='matches', **keywords)

Function compares two concept lists and outputs suggestions for mapping.

Notes

Idea is to take one conceptlist as the basic list and then to search for a plausible mapping of concepts in the second list to the first list. All suggestions can then be output in various forms, both with multiple matches excluded or included, and in textform or in other forms.

What is important, regarding the output here, is, that the output contains all matches, including non-matched items which occur in the second list but not in the first list. Non-matched items which occur in the first list but not in the second list are ignored.

The syntax for matching types is organized as follows:

  • 1 indicates a full match between glosses, including information on part speech and the like
  • 2 indicates a very good match between a full gloss and the main part of a gloss or the two main parts of a gloss
  • 3 indicates a very good match between the main parts of two glosses with non-matching information regarding part of speech
  • 4 indicates that the longest part of two glosses matches along with the part-of-speech information.
  • 5 indicates that the longest part of two glosses matches with non-matching part-of-speech information.
  • 6 indicates that the longest part of the first list is matched by one of the parts in the second list
  • 7 indicates that the longest part of the second list is matched by one of the parts in the first list
  • 8 indicates that no match could be found.
lingpy.meaning.glosses.compare_concepts(c1, c2)

Debug-function for concept comparison.

lingpy.meaning.glosses.parse_gloss(gloss, output='list')

Parse a gloss into its constituents by applying some general logic.

Parameters:

gloss : str

The gloss as found in various sources (we assume that we are dealing with English glosses here.

output : str (default=”list”)

Determine the output of the parsing routine. Select between “list” which will return a list of tuples, or “dict”, which will return a list of dictionaries.

Returns:

constituents : {list, dict}

A list of tuples, or a list of dictionaries, with each tuple consisting of 6 items, namely: * the main part (“main”), * the start character indicating a potential comment (“comment_start”), * the comment (everything occurring in brackets in the input string (“comment”), * the end character indicating the end of a potential comment (“comment_end”), * the part of speech, in case this was specificied by a preceding “the” or a preceding “to” in the mainpart of the string (“pos”), * the prefix, that is, words, like, eg. “be”, “in”, which may precede the main gloss in concept lists, as in “be quiet” (“prefix”), * the longest constituent, which is identical with the main part if there’s no whitespace in the main part, otherwise the longest part part of the main gloss split by whitespace (“longest_part”) * the parts of a gloss, if the constituent contains multiple words (“parts”) * the original gloss (for the purpose of testing, labelled “gloss”)

If “dict” is chosen as output, this returns a list of dictionaries with the keys as specified in brackets above.

Notes

The basic purpose of this function is to provide a means to make it easier to compare meanings across different resources. Often, linguists will annotate their resources quite differently, and for one and the same concept, we may find very different glosses. The concept “kill [verb]”, for example may be glossed as “to kill”, “kill”, “kill (v.)”, “kill (somebody)”, etc. In order to guarantee comparability, this function tries to use basic knowledge of glossing tendencies to disentangle the variety of glossing styles which can be found in the literature. Thus, in the case of “kill [verb]”, the function will analyze the different strings as follows:

>>> from lingpy.meaning.glosses import parse_gloss
>>> glosses = ["to kill", "kill", "kill (v.)", "kill (somebody)"]
>>> for gloss in glosses:
...     parsed_gloss = parse_gloss(gloss, output='dict')
...     print(parsed_gloss[0]['gloss'], parsed_gloss[0]['pos'])
kill verb
kill
kill verb
kill

As can be seen: it seeks to extract the most important part of the gloss and may thus help to compare different glosses across different resources.

Module contents