arsenal.nlp package¶

Subpackages¶

Submodules¶

arsenal.nlp.annotation module¶

exception arsenal.nlp.annotation.ParseError[source]¶

Bases: Exception

Custom exception class used by this module.

class arsenal.nlp.annotation.Span(label, begins, ends)[source]¶

Bases: object

begins¶

ends¶

label¶

arsenal.nlp.annotation.bio2span(seq, tagger=None, include_O=True)[source]¶

arsenal.nlp.annotation.bracket2bio(x)[source]¶

generate BIO-token pairs from bracket-style annotation. Note: splits text of spaces, so wordsplitting should already be done.

>>> x = bracket2bio("[TITLE Cat in the Hat][AUTHOR Dr. Seuss]")
>>> list(x)                                  #doctest:+NORMALIZE_WHITESPACE
[('B-TITLE', 'Cat'), ('I-TITLE', 'in'), ('I-TITLE', 'the'),
 ('I-TITLE', 'Hat'), ('B-AUTHOR', 'Dr.'), ('I-AUTHOR', 'Seuss')]

arsenal.nlp.annotation.extract_contiguous(s, labeler=None)[source]¶

>>> list(extract_contiguous(""))
[]

>>> list(extract_contiguous("AAAA"))
[Span(label='A', begins=0, ends=4)]

>>> list(extract_contiguous("AABBC"))
[Span(label='A', begins=0, ends=2), Span(label='B', begins=2, ends=4), Span(label='C', begins=4, ends=5)]

>>> list(extract_contiguous("AABBB"))
[Span(label='A', begins=0, ends=2), Span(label='B', begins=2, ends=5)]

arsenal.nlp.annotation.fromSGML(f, linegrouper='\n', bioencoding=False)[source]¶

arsenal.nlp.annotation.line_groups(text, pattern)[source]¶

Very simple function for breaking up text into groups based on a single pattern.

>>> list(line_groups("a BB c d BB", "BB"))
['a', 'c d']

arsenal.nlp.annotation.sgml2bio(x)[source]¶

>>> sgml2bio('<title>Cat in the Hat</title><author>Dr. Seuss</author>')
[('B-title', 'Cat'), ('I-title', 'in'), ('I-title', 'the'), ('I-title', 'Hat'), ('B-author', 'Dr.'), ('I-author', 'Seuss')]

arsenal.nlp.annotation.sgml2segmentation(x, lexer=re.compile('\\S+'))[source]¶

>>> sgml2segmentation('<title>Cat in the Hat</title><author>Dr. Seuss</author>')
[('title', ['Cat', 'in', 'the', 'Hat']), ('author', ['Dr.', 'Seuss'])]

arsenal.nlp.annotation.sgml2seq(x)[source]¶

>>> sgml2seq('<title>Cat in the Hat</title><author>Dr. Seuss</author>')
[('title', 'Cat'), ('title', 'in'), ('title', 'the'), ('title', 'Hat'), ('author', 'Dr.'), ('author', 'Seuss')]

arsenal.nlp.evaluation module¶

Evaluations methods common in NLP and information extraction.

TODO: Have a look at: https://github.com/nschneid/pyutil/blob/master/chunkeval.py, there appear to be richer evaluation methods.

class arsenal.nlp.evaluation.F1(confusion_matrix=False)[source]¶

Bases: object

add_relevant(label, instance)[source]¶

add_retrieved(label, instance)[source]¶

confusion()[source]¶

latex()[source]¶

report(instance, prediction, target)[source]¶

scores(verbose=True)[source]¶

arsenal.nlp.evaluation.plot_confusion(y_true, y_pred, alphabet, normalized=False)[source]¶

Draw confusion matrix

Options:

normalized: Normalize the confusion matrix by row (i.e by the number of samples in each class)

arsenal.nlp package¶

Subpackages¶

Submodules¶

arsenal.nlp.annotation module¶

arsenal.nlp.evaluation module¶

Module contents¶

arsenal

Navigation

Related Topics