gatenlp.annotation_set module¶
-
class
gatenlp.annotation_set.
AnnotationSet
(name: str = '', owner_doc: Document = None)[source]¶ Bases:
object
Create a new annotation set.
- Parameters
name – the name of the annotation set. This is only really needed if the changelog is used.
changelog – if a changelog is used, then all changes to the set and its annotations are logged
owner_doc – if this is set, the set and all sets created from it can be queried for the owning document and offsets get checked against the text of the owning document, if it has text.
-
add
(start: int, end: int, anntype: str, features: Dict[str, Any] = None, annid: int = None)[source]¶ Add an annotation to the set. Once an annotation has been added, the start and end offsets, the type, and the annotation id are immutable.
- Parameters
start – start offset
end – end offset
anntype – the annotation type
features – a map, an iterable of tuples or an existing feature map. In any case, the features are used to create a new feature map for this annotation. If the map is empty or this parameter is None, the annotation does not store any map at all.
annid – the annotation id, if not specified the next free one for this set is used. NOTE: the id should normally left unspecified and get assigned automatically.
- Returns
the annotation id of the added annotation
-
add_ann
(ann, annid: int = None)[source]¶ Add a copy of the given ann to the annotation set, either with a new annotation id or with the one given.
- Parameters
annid – the annotation id, if not specified the next free one for this set is used. NOTE: the id should normally left unspecified and get assigned automatically.
- Returns
the annotation id of the added annotation
-
property
changelog
¶
-
coextensive
(start: int, end: int) → gatenlp.annotation_set.AnnotationSet[source]¶ Return an immutable annotation set with all annotations that start and end at the given offsets.
- Parameters
start – start offset of the span
end – end offset of the span
- Returns
annotation set with all annotations that have the same start and end offsets.
-
contains
(annorannid: Union[int, gatenlp.annotation.Annotation]) → bool¶ Provides ‘annotation in annotation_set’ functionality
- Parameters
annorannid – the annotation instance or annotation id to check
- Returns
true if the annotation exists in the set, false otherwise
-
covering
(start: int, end: int) → gatenlp.annotation_set.AnnotationSet[source]¶ Get the annotations which contain the given offset range (or annotation/annotation set)
- Parameters
start – the start offset of the span
end – the end offset of the span
- Returns
an immutable annotation set with the matching annotations, if any
-
end
()[source]¶ Returns the end offset of the annotation set, i.e. the biggest end offset of any annotation.
- Returns
largest end offset
-
fast_iter
() → Generator[source]¶ Returns a generator for fast iteration over all annotations in arbitrary order.
- Returns
-
first
()[source]¶ Return the first annotation in the set or raise and exception if the set is empty. :return: first annotation
-
get
(annid: int, default=None) → Optional[gatenlp.annotation.Annotation][source]¶ Gets the annotation with the given annotation id or returns the given default.
- Parameters
annid – the annotation id of the annotation to retrieve.
default – what to return if an annotation with the given id is not found.
- Returns
the annotation or the default value.
-
get_doc
() → Optional[Document][source]¶ Get the owning document, if known. If the owning document was not set, return None.
- Returns
the document this annotation set belongs to or None if unknown.
-
immutable
(restrict_to=None) → gatenlp.annotation_set.AnnotationSet[source]¶ Create an immutable copy of this set, optionally restricted to the given annotation ids.
- Parameters
restrict_to – an iterable of annotation ids
- Returns
an immutable annotation set with all the annotations of this set or restricted to the ids in restrict_to
-
immutable_from
(anns: collections.abc.Iterable) → gatenlp.annotation_set.AnnotationSet[source]¶ Create an immutable annotation set from the annotations in anns which could by anything that can be iterated over. The owning document is the same as for this set. The next annotation id for the created set is the highest see annotation id from anns plus one.
- Parameters
anns – an iterable of annotations
- Returns
an immutable annotation set with all the annotations of this set or restricted to the ids in restrict_to
-
iter
(start_ge: Optional[int] = None, start_lt: Union[None, int] = None, with_type: str = None, reverse: bool = False) → Generator[source]¶ Returns a generator for going through annotations in document order. If an iterator of annotations is given, then those annotations, optionally limited by the other parameters are returned in document order, otherwise, all annotations in the set are returned, otionally limited by the other parameters.
- Parameters
annotations – an iterable of annotations from this annotation set.
start_ge – the offset from where to start including annotations
start_lt – the last offset to use as the starting offset of an annotation
with_type – only annotations of this type
reverse – process in reverse document order
- Returns
generator for annotations in document order
-
last
()[source]¶ Return the last annotation in the set or raise and exception if the set is empty. :return: first annotation
-
overlapping
(start: int, end: int) → gatenlp.annotation_set.AnnotationSet[source]¶ Gets annotations overlapping with the given span. Instead of the start and end offsets, also accepts an annotation or annotation set.
- Parameters
start – start offset of the span
end – end offset of the span
- Returns
an immutable annotation set with the matching annotations
-
remove
(annotation: Union[int, gatenlp.annotation.Annotation]) → None[source]¶ Remove the given annotation which is either the id or the annotation instance.
- Parameters
annotation – either the id (int) or the annotation instance (Annotation)
- Returns
-
reverse_iter
(**kwargs)[source]¶ Same as iter, but with the reverse parameter set to true.
- Parameters
kwargs – Same as for iter(), with revers=True fixed.
- Returns
same result as iter()
-
span
() → Tuple[int, int][source]¶ Returns a tuple with the start and end offset the corresponds to the smallest start offset of any annotation and the largest end offset of any annotation. (Builds the offset index)
- Returns
tuple of minimum start offset and maximum end offset
-
start
()[source]¶ Return the start offset of the annotation set, i.e. the smallest offset of any annotation. This needs the index.
- Returns
smallest annotation offset
-
start_eq
(start: int, ignored: Any = None) → gatenlp.annotation_set.AnnotationSet[source]¶ Gets all annotations starting at the given offset (empty if none) and returns them in an immutable annotation set.
- Parameters
start – the offset where annotations should start
ignored – dummy parameter to allow the use of annotations and annotation sets
- Returns
annotation set of matching annotations
-
start_ge
(start: int, ignored: Any = None) → gatenlp.annotation_set.AnnotationSet[source]¶ Return the annotations that start at or after the given start offset.
- Parameters
start – Start offset
ignored – dummy parameter to allow the use of annotations and annotation sets
- Returns
an immutable annotation set of the matching annotations
-
start_lt
(offset: int, ignored: Any = None) → gatenlp.annotation_set.AnnotationSet[source]¶ Return the annotations that start before the given offset (or annotation). This also accepts an annotation or set.
- Parameters
offset – offset before which the annotations should start
ignored – dummy parameter to allow the use of annotations and annotation sets
- Returns
an immutable annotation set of the matching annotations
-
start_min_ge
(offset: int, ignored: Any = None) → gatenlp.annotation_set.AnnotationSet[source]¶ Gets all annotations starting at the first possible offset at or after the given offset and returns them in an immutable annotation set.
- Parameters
offset – The offset
ignored – dummy parameter to allow the use of annotations and annotation sets
- Returns
annotation set of matching annotations
-
type_names
() → KeysView[str][source]¶ Gets the names of all types in this set. Creates the type index if necessary.
- Returns
the set of known annotation type names.
-
with_type
(*anntype: collections.abc.Iterable) → gatenlp.annotation_set.AnnotationSet[source]¶ Gets annotations of the specified type(s). Creates the type index if necessary.
- Parameters
anntype – one or more types or type lists. The union of all types specified that way is used to filter the annotations. If no type is specified, all annotations are selected.
- Returns
an immutable annotation set with the matching annotations.
-
within
(start: int, end: int) → gatenlp.annotation_set.AnnotationSet[source]¶ Gets annotations that fall completely within the given offset range
- Parameters
start – start offset of the range
end – end offset of the range
- Returns
an immutable annotation set with the matching annotations