Module PdmContext.utils.mapping_functions
Classes
class map_categorical_to_continuous
-
Wrapper Class for mapping categorical to continuous time-series.
Expand source code
class map_categorical_to_continuous: """ Wrapper Class for mapping categorical to continuous time-series. """ def __init__(self): pass def map(self,target_series, occurrences, name): """ This method is used to generate context time series of categorical type. The way we do this is by generating a time series of each different category, by creating a zero and one series, similar to "isolated" type, by filling ones in the timestamps that each category appears. Finally, we create an additional series with the name state_{name}, having zeros until the occurrence of the last category, and filled with ones afterward. **Parameters**: **target_series**: Used to align sample rate. **occurrences**: a list of tuple with timestamps and categorical value, refering to the observed value of a categorical source. **name**: name of the categorical source. **return**: A list of time-series to populate CD part of the context. """ vector = [[0] for i in range(len(target_series))] pos = 0 unique_categories = set([occ[0] for occ in occurrences]) # this is to aling the series in case of different sample Rate for i in range(len(target_series)): timestamp = target_series[i][1] current_pos = pos for q in range(pos, len(occurrences)): if occurrences[q][1] > timestamp: current_pos = q break # no data found if current_pos == pos: # if no data in betwwen values use the previus value if i > 0: vector[i] = [occurrences[-1][0]] # if no data until i timestamp use the first occurence as value else: vector[i] = [occurrences[0][0]] # if multiple values in between two timestamps use the last as value else: dataInBetween = [value for value, time in occurrences[pos:current_pos]] vector[i] = [v for v in set(dataInBetween)] # if no other occurrences just repeat the last value if current_pos == len(occurrences): for k in range(i + 1, len(vector)): vector[k] = [occurrences[-1][0]] break pos = current_pos all_vectors = [] all_names = [] # one-hot encoding of unique categories for value in unique_categories: in_vector = [1 if value in v else 0 for v in vector] if len(set(in_vector)) == 1: in_vector[0] = 0 all_vectors.append(in_vector) all_names.append(f"{value}_{name}") # create of the state variable state_vector = [0 for i in range(len(target_series))] lastv = occurrences[-1][0] ## not stable for i in range(len(state_vector) - 1, -1, -1): if lastv in vector[i]: state_vector[i] = 1 else: break all_vectors.append(state_vector) all_names.append(f"state_{name}") return all_vectors, all_names
Methods
def map(self, target_series, occurrences, name)
-
This method is used to generate context time series of categorical type.
The way we do this is by generating a time series of each different category, by creating a zero and one series, similar to "isolated" type, by filling ones in the timestamps that each category appears. Finally, we create an additional series with the name state_{name}, having zeros until the occurrence of the last category, and filled with ones afterward.
Parameters:
target_series: Used to align sample rate.
occurrences: a list of tuple with timestamps and categorical value, refering to the observed value of a categorical source.
name: name of the categorical source.
return: A list of time-series to populate CD part of the context.
class map_configuration_to_continuous
-
Wrapper Class for mapping configuration events (defined as events with constant impact) to continuous time-series.
Expand source code
class map_configuration_to_continuous: """ Wrapper Class for mapping configuration events (defined as events with constant impact) to continuous time-series. """ def __init__(self): pass def map(self,target_series, occurrences,name): """ Configuration events, refers to configuration changes or events that alter the state of the monitored asset. To transform these events into continuous signals, we start with a series of 0s, and after each occurrence of such an event, we add 1 to all the positions after the occurrence's timestamp **Parameters**: **target_series**: Used to align sample rate of the continuous series. **occurrences**: Contain time stamps of the occurrences of an isolated type source. **return**: A binary time series with same size as target_series, that models the occurrences of the provided Configuration source, to populate CD part of the context. """ vector = [0 for i in range(len(target_series))] for occ in occurrences: for q in range(len(target_series)): if target_series[q][1] >= occ[1]: for k in range(q, len(vector)): vector[k] += 1 break ## not stable if len(set(vector)) == 1: vector[0] = 0 return [vector],[name]
Methods
def map(self, target_series, occurrences, name)
-
Configuration events, refers to configuration changes or events that alter the state of the monitored asset. To transform these events into continuous signals, we start with a series of 0s, and after each occurrence of such an event, we add 1 to all the positions after the occurrence's timestamp
Parameters:
target_series: Used to align sample rate of the continuous series.
occurrences: Contain time stamps of the occurrences of an isolated type source.
return: A binary time series with same size as target_series, that models the occurrences of the provided Configuration source, to populate CD part of the context.
class map_isolated_to_continuous
-
Wrapper Class for mapping isolated events (defined as events with instant impact) to continuous time-series.
Expand source code
class map_isolated_to_continuous: """ Wrapper Class for mapping isolated events (defined as events with instant impact) to continuous time-series. """ def __init__(self): pass def map(self,target_series, occurrences,name): """ Isolated events are discrete events that have an immediate impact on the behavior of the asset. To transform such events into a continuous representation, we start with a series of 0s as an initial signal and assign 1 to the position corresponding to the timestamps of the events. If the event timestamp does not match any target_series timestamps, it is mapped to the closest timestamp in target_series. **Parameters**: **target_series**: Used to align sample rate of the continuous series. **occurrences**: Contain time stamps of the occurrences of an isolated type source. **return**: A binary time series with same size as target_series, that models the occurrences of the provided isolated source, to populate CD part of the context. """ vector = [0 for i in range(len(target_series))] for occ in occurrences: for q in range(len(target_series)): if target_series[q][1] > occ[1]: vector[q] = 1 break return [vector],[name]
Methods
def map(self, target_series, occurrences, name)
-
Isolated events are discrete events that have an immediate impact on the behavior of the asset. To transform such events into a continuous representation, we start with a series of 0s as an initial signal and assign 1 to the position corresponding to the timestamps of the events. If the event timestamp does not match any target_series timestamps, it is mapped to the closest timestamp in target_series.
Parameters:
target_series: Used to align sample rate of the continuous series.
occurrences: Contain time stamps of the occurrences of an isolated type source.
return: A binary time series with same size as target_series, that models the occurrences of the provided isolated source, to populate CD part of the context.
class map_univariate_to_continuous
-
Wrapper Class for mapping univariate time-series (defined as events with instant impact) to continuous time-series with same frequency as a target time-series.
Expand source code
class map_univariate_to_continuous: """ Wrapper Class for mapping univariate time-series (defined as events with instant impact) to continuous time-series with same frequency as a target time-series. """ def __init__(self): self.existing_results={} def map(self,target_series, occurrences,name): """ For continuous data sources, we simply collect the values within the time window. Although the time window is the same for all sources, each source may have a different sample rate. To create a signal of the same size as target_series, we perform mean aggregation if a source has a higher sample rate than target_series, using the mean value of the data between each timestamp of the target_series. **Parameters**: **target_series**: Used to align sample rate of the continuous series. **occurrences**: The univariate time series. **return**: A time series with same size as target_series, to populate CD part of context. """ if name not in self.existing_results.keys(): self.existing_results[name]=[] # position of first timestamp of target series in aggregated data spos=-1 for tup in self.existing_results[name]: spos += 1 if tup[1]>=target_series[0][1]: break self.existing_results[name]=self.existing_results[name][spos:] pos = 0 if len(self.existing_results[name]) > 0: for q in range(len(occurrences)): pos = q if occurrences[q][1] > self.existing_results[name][-1][1]: break vector = [tup[0] for tup in self.existing_results[name]]+[0 for i in range(len(target_series)-len(self.existing_results[name]))] for i in range(len(self.existing_results[name]),len(target_series)): timestamp = target_series[i][1] current_pos = pos for q in range(pos, len(occurrences)): if occurrences[q][1] > timestamp: current_pos = q break if i==len(target_series)-1: current_pos=len(occurrences)+1 # no data found if current_pos == pos: # if no data in betwwen values use the previus value if i > 0: vector[i] = vector[i - 1] # if no data until i timestamp use the first occurence as value else: vector[i] = occurrences[0][0] # if multiple values in between two timestamps use the mean of them as value else: dataInBetween = [value for value, time in occurrences[pos:current_pos]] vector[i] = sum(dataInBetween) / len(dataInBetween) # if no other occurrences just repeat the last value if current_pos == len(occurrences): for k in range(i + 1, len(vector)): vector[k] = vector[k - 1] break pos = current_pos self.existing_results[name]=[(v,tup[1]) for v,tup in zip(vector,target_series)] return [vector],[name]
Methods
def map(self, target_series, occurrences, name)
-
For continuous data sources, we simply collect the values within the time window. Although the time window is the same for all sources, each source may have a different sample rate. To create a signal of the same size as target_series, we perform mean aggregation if a source has a higher sample rate than target_series, using the mean value of the data between each timestamp of the target_series.
Parameters:
target_series: Used to align sample rate of the continuous series.
occurrences: The univariate time series.
return: A time series with same size as target_series, to populate CD part of context.