gatenlp.lib_stanza module

Support for using stanford stanza (see https://stanfordnlp.github.io/stanza/): convert from stanford Stanza output to gatenlp documents and annotations.

gatenlp.lib_stanza.apply_stanza(nlp, gatenlpdoc, setname='')[source]

Run the stanford stanza pipeline on the gatenlp document and transfer the annotations. This modifies the gatenlp document in place.

Parameters
  • nlp – StanfordNLP pipeline

  • gatenlpdoc – gatenlp document

  • setname – set to use

Returns

gatenlp.lib_stanza.stanza2gatenlp(stanzadoc, gatenlpdoc=None, setname='', token_type='Token', sentence_type='Sentence', add_entities=True, ent_prefix=None)[source]

Convert a Stanford Stanza document to a gatenlp document. If a gatenlp document is already provided, add the annotations from the Stanford Stanza document to it. In this case the original gatenlpdoc is used and gets modified. :param stanzadoc: a Stanford Stanza document :param gatenlpdoc: if None, a new gatenlp document is created otherwise this document is added to. :param setname: the annotation set name to which the annotations get added, empty string for the default annotation set. :param token_type: the annotation type to use for tokens, if needed :param sentence_type: the annotation type to use for sentence anntoations :param add_entities: if True, add any entities as well :param ent_prefix: if None, use the original entity type as annotation type, otherwise add the given string to the annotation type as a prefix. :return: the new or modified gatenlp document

gatenlp.lib_stanza.tok2tok(tok)[source]

Create a copy of a Stanza token, prepared for creating an annotation: this is a dict that has start, end and id keys and everything else in a nested dict “fm”. :param tok: original stanza token :return: what we use to create a Token annotation