gatenlp.utils module¶
Various utilities that could be useful in several modules.
-
gatenlp.utils.
match_substrings
(text, items, getstr=None, cmp=None, unmatched=False)[source]¶ Matches each item from the items sequence with sum substring of the text in a greedy fashion. An item is either already a string or getstr is used to retrieve a string from it. The text and substrings are normally compared with normal string equality but cmp can be replaced with a two-argument function that does the comparison instead. This function expects that all items are present in the text, in their order and without overlapping! If this is not the case, an exception is raised.
- Parameters
text – the text to use for matching
items – items that are or contains substrings to match
getstr – a function that retrieves the text from an item
cmp – a function that compares to strings and returns a boolean that indicates if they should be considered to be equal.
unmatched – if true returns two lists of tuples, where the second list contains the offsets of text not matched by the items
- Returns
a list of tuples (start, end, item) where start and end are the start and end offsets of a substring in the text and item is the item for that substring.