gatenlp.utils module

Various utilities that could be useful in several modules.

gatenlp.utils.match_substrings(text, items, getstr=None, cmp=None, unmatched=False)[source]

Matches each item from the items sequence with sum substring of the text in a greedy fashion. An item is either already a string or getstr is used to retrieve a string from it. The text and substrings are normally compared with normal string equality but cmp can be replaced with a two-argument function that does the comparison instead. This function expects that all items are present in the text, in their order and without overlapping! If this is not the case, an exception is raised.

Parameters
  • text – the text to use for matching

  • items – items that are or contains substrings to match

  • getstr – a function that retrieves the text from an item

  • cmp – a function that compares to strings and returns a boolean that indicates if they should be considered to be equal.

  • unmatched – if true returns two lists of tuples, where the second list contains the offsets of text not matched by the items

Returns

a list of tuples (start, end, item) where start and end are the start and end offsets of a substring in the text and item is the item for that substring.

gatenlp.utils.to_dict(obj)[source]

If obj is not None, call its to_dict method, otherwise return None :param obj: the object on which to call to_dict :return: the result of to_dict or None

gatenlp.utils.to_list(obj)[source]

If obj is not None, call its to_list method, otherwise return None :param obj: the object on which to call to_list :return: the result of to_list or None