gatenlp.spacy module

Support for using spacy: convert from spacy to gatenlp documents and annotations.

gatenlp.spacy.apply_spacy(nlp, gatenlpdoc, setname='')[source]

Run the spacy nlp pipeline on the gatenlp document and transfer the annotations. This modifies the gatenlp document in place.

Parameters
  • nlp – spacy pipeline

  • gatenlpdoc – gatenlp document

  • setname – annotation set to receive the annotations

  • tokens – an annotation set containing already known token annotations

Returns

gatenlp.spacy.spacy2gatenlp(spacydoc, gatenlpdoc=None, setname='', token_type='Token', spacetoken_type='SpaceToken', sentence_type='Sentence', nounchunk_type='NounChunk', add_tokens=True, add_spacetokens=True, add_ents=True, add_sents=True, add_nounchunks=True, add_dep=True)[source]

Convert a spacy document to a gatenlp document. If a gatenlp document is already provided, add the annotations from the spacy document to it. In this case the original gatenlpdoc is used and gets modified. :param spacydoc: a spacy document :param gatenlpdoc: if None, a new gatenlp document is created otherwise this document is added to. :param setname: the annotation set name to which the annotations get added, empty string for the default annotation set. :param token_type: the annotation type to use for tokens :param spacetoken_type: the annotation type to use for space tokens :param sentence_type: the annotation type to use for sentence anntoations :param nounchunk_type: the annotation type to use for noun chunk annotations :param add_tokens: should annotations for tokens get added? If not, dependency parser info cannot be added either. :param add_spacetokens: should annotations for space tokens get added :param add_ents: should annotations for entities get added :param add_sents: should sentence annotations get added :param add_nounchunks: should noun chunk annotations get added :param add_dep: should dependency parser information get added :return: the new or modified