gatenlp.annotation_set module

class gatenlp.annotation_set.AnnotationSet(name: str = '', owner_doc: Document = None)[source]

Bases: object

Create a new annotation set.

Parameters
  • name – the name of the annotation set. This is only really needed if the changelog is used.

  • changelog – if a changelog is used, then all changes to the set and its annotations are logged

  • owner_doc – if this is set, the set and all sets created from it can be queried for the owning document and offsets get checked against the text of the owning document, if it has text.

add(start: int, end: int, anntype: str, features: Dict[str, Any] = None, annid: int = None)[source]

Add an annotation to the set. Once an annotation has been added, the start and end offsets, the type, and the annotation id are immutable.

Parameters
  • start – start offset

  • end – end offset

  • anntype – the annotation type

  • features – a map, an iterable of tuples or an existing feature map. In any case, the features are used to create a new feature map for this annotation. If the map is empty or this parameter is None, the annotation does not store any map at all.

  • annid – the annotation id, if not specified the next free one for this set is used. NOTE: the id should normally left unspecified and get assigned automatically.

Returns

the annotation id of the added annotation

add_ann(ann, annid: int = None)[source]

Add a copy of the given ann to the annotation set, either with a new annotation id or with the one given.

Parameters

annid – the annotation id, if not specified the next free one for this set is used. NOTE: the id should normally left unspecified and get assigned automatically.

Returns

the annotation id of the added annotation

property changelog
clear() → None[source]

Remove all annotations from the set.

Returns

coextensive(start: int, end: int)gatenlp.annotation_set.AnnotationSet[source]

Return an immutable annotation set with all annotations that start and end at the given offsets.

Parameters
  • start – start offset of the span

  • end – end offset of the span

Returns

annotation set with all annotations that have the same start and end offsets.

contains(annorannid: Union[int, gatenlp.annotation.Annotation]) → bool

Provides ‘annotation in annotation_set’ functionality

Parameters

annorannid – the annotation instance or annotation id to check

Returns

true if the annotation exists in the set, false otherwise

covering(start: int, end: int)gatenlp.annotation_set.AnnotationSet[source]

Get the annotations which contain the given offset range (or annotation/annotation set)

Parameters
  • start – the start offset of the span

  • end – the end offset of the span

Returns

an immutable annotation set with the matching annotations, if any

end()[source]

Returns the end offset of the annotation set, i.e. the biggest end offset of any annotation.

Returns

largest end offset

fast_iter() → Generator[source]

Returns a generator for fast iteration over all annotations in arbitrary order.

Returns

first()[source]

Return the first annotation in the set or raise and exception if the set is empty. :return: first annotation

static from_dict(dictrepr, owner_doc=None, **kwargs)[source]
get(annid: int, default=None) → Optional[gatenlp.annotation.Annotation][source]

Gets the annotation with the given annotation id or returns the given default.

Parameters
  • annid – the annotation id of the annotation to retrieve.

  • default – what to return if an annotation with the given id is not found.

Returns

the annotation or the default value.

get_doc() → Optional[Document][source]

Get the owning document, if known. If the owning document was not set, return None.

Returns

the document this annotation set belongs to or None if unknown.

immutable(restrict_to=None)gatenlp.annotation_set.AnnotationSet[source]

Create an immutable copy of this set, optionally restricted to the given annotation ids.

Parameters

restrict_to – an iterable of annotation ids

Returns

an immutable annotation set with all the annotations of this set or restricted to the ids in restrict_to

immutable_from(anns: collections.abc.Iterable)gatenlp.annotation_set.AnnotationSet[source]

Create an immutable annotation set from the annotations in anns which could by anything that can be iterated over. The owning document is the same as for this set. The next annotation id for the created set is the highest see annotation id from anns plus one.

Parameters

anns – an iterable of annotations

Returns

an immutable annotation set with all the annotations of this set or restricted to the ids in restrict_to

iter(start_ge: Optional[int] = None, start_lt: Union[None, int] = None, with_type: str = None, reverse: bool = False) → Generator[source]

Returns a generator for going through annotations in document order. If an iterator of annotations is given, then those annotations, optionally limited by the other parameters are returned in document order, otherwise, all annotations in the set are returned, otionally limited by the other parameters.

Parameters
  • annotations – an iterable of annotations from this annotation set.

  • start_ge – the offset from where to start including annotations

  • start_lt – the last offset to use as the starting offset of an annotation

  • with_type – only annotations of this type

  • reverse – process in reverse document order

Returns

generator for annotations in document order

last()[source]

Return the last annotation in the set or raise and exception if the set is empty. :return: first annotation

overlapping(start: int, end: int)gatenlp.annotation_set.AnnotationSet[source]

Gets annotations overlapping with the given span. Instead of the start and end offsets, also accepts an annotation or annotation set.

Parameters
  • start – start offset of the span

  • end – end offset of the span

Returns

an immutable annotation set with the matching annotations

remove(annotation: Union[int, gatenlp.annotation.Annotation]) → None[source]

Remove the given annotation which is either the id or the annotation instance.

Parameters

annotation – either the id (int) or the annotation instance (Annotation)

Returns

reverse_iter(**kwargs)[source]

Same as iter, but with the reverse parameter set to true.

Parameters

kwargs – Same as for iter(), with revers=True fixed.

Returns

same result as iter()

size() → int[source]

Return number of annotations in the set.

Returns

number of annotations

span() → Tuple[int, int][source]

Returns a tuple with the start and end offset the corresponds to the smallest start offset of any annotation and the largest end offset of any annotation. (Builds the offset index)

Returns

tuple of minimum start offset and maximum end offset

start()[source]

Return the start offset of the annotation set, i.e. the smallest offset of any annotation. This needs the index.

Returns

smallest annotation offset

start_eq(start: int, ignored: Any = None)gatenlp.annotation_set.AnnotationSet[source]

Gets all annotations starting at the given offset (empty if none) and returns them in an immutable annotation set.

Parameters
  • start – the offset where annotations should start

  • ignored – dummy parameter to allow the use of annotations and annotation sets

Returns

annotation set of matching annotations

start_ge(start: int, ignored: Any = None)gatenlp.annotation_set.AnnotationSet[source]

Return the annotations that start at or after the given start offset.

Parameters
  • start – Start offset

  • ignored – dummy parameter to allow the use of annotations and annotation sets

Returns

an immutable annotation set of the matching annotations

start_lt(offset: int, ignored: Any = None)gatenlp.annotation_set.AnnotationSet[source]

Return the annotations that start before the given offset (or annotation). This also accepts an annotation or set.

Parameters
  • offset – offset before which the annotations should start

  • ignored – dummy parameter to allow the use of annotations and annotation sets

Returns

an immutable annotation set of the matching annotations

start_min_ge(offset: int, ignored: Any = None)gatenlp.annotation_set.AnnotationSet[source]

Gets all annotations starting at the first possible offset at or after the given offset and returns them in an immutable annotation set.

Parameters
  • offset – The offset

  • ignored – dummy parameter to allow the use of annotations and annotation sets

Returns

annotation set of matching annotations

to_dict(**kwargs)[source]
type_names() → KeysView[str][source]

Gets the names of all types in this set. Creates the type index if necessary.

Returns

the set of known annotation type names.

with_type(*anntype: collections.abc.Iterable)gatenlp.annotation_set.AnnotationSet[source]

Gets annotations of the specified type(s). Creates the type index if necessary.

Parameters

anntype – one or more types or type lists. The union of all types specified that way is used to filter the annotations. If no type is specified, all annotations are selected.

Returns

an immutable annotation set with the matching annotations.

within(start: int, end: int)gatenlp.annotation_set.AnnotationSet[source]

Gets annotations that fall completely within the given offset range

Parameters
  • start – start offset of the range

  • end – end offset of the range

Returns

an immutable annotation set with the matching annotations