spacr.utils¶
Module Contents¶
- spacr.utils.calculate_activation_correlations(inputs, activation_maps, file_names, manders_thresholds=[15, 50, 75])[source]¶
Calculates Pearson and Manders correlations between input image channels and activation map channels.
- Parameters:
inputs – A batch of input images, Tensor of shape (batch_size, channels, height, width)
activation_maps – A batch of activation maps, Tensor of shape (batch_size, channels, height, width)
file_names – List of file names corresponding to each image in the batch.
manders_thresholds – List of intensity percentiles to calculate Manders correlation.
- Returns:
- A DataFrame with columns for pairwise correlations (Pearson and Manders)
between input channels and activation map channels.
- Return type:
df_correlations
- spacr.utils.load_settings(csv_file_path, show=False, setting_key='setting_key', setting_value='setting_value')[source]¶
Convert a CSV file with ‘settings_key’ and ‘settings_value’ columns into a dictionary. Handles special cases where values are lists, tuples, booleans, None, integers, floats, and nested dictionaries.
- Parameters:
csv_file_path (str) – The path to the CSV file.
show (bool) – Whether to display the dataframe (for debugging).
setting_key (str) – The name of the column that contains the setting keys.
setting_value (str) – The name of the column that contains the setting values.
- Returns:
A dictionary where ‘settings_key’ are the keys and ‘settings_value’ are the values.
- Return type:
dict
- spacr.utils.print_progress(files_processed, files_to_process, n_jobs, time_ls=None, batch_size=None, operation_type='')[source]¶
- spacr.utils.is_multiprocessing_process(process)[source]¶
Check if the process is a multiprocessing process.
- spacr.utils.mask_object_count(mask)[source]¶
Counts the number of objects in a given mask.
Parameters: - mask: numpy.ndarray. The mask containing object labels.
Returns: - int. The number of objects in the mask.
- spacr.utils.normalize_to_dtype(array, p1=2, p2=98, percentile_list=None, new_dtype=None)[source]¶
Normalize each image in the stack to its own percentiles.
Parameters: - array: numpy array The input stack to be normalized. - p1: int, optional The lower percentile value for normalization. Default is 2. - p2: int, optional The upper percentile value for normalization. Default is 98. - percentile_list: list, optional A list of pre-calculated percentiles for each image in the stack. Default is None.
Returns: - new_stack: numpy array The normalized stack with the same shape as the input stack.
- spacr.utils.annotate_conditions(df, cells=None, cell_loc=None, pathogens=None, pathogen_loc=None, treatments=None, treatment_loc=None)[source]¶
Annotates conditions in a DataFrame based on specified criteria and combines them into a ‘condition’ column. NaN is used for missing values, and they are excluded from the ‘condition’ column.
- Parameters:
df (pandas.DataFrame) – The DataFrame to annotate.
cells (list/str, optional) – Host cell types. Defaults to None.
cell_loc (list of lists, optional) – Values for each host cell type. Defaults to None.
pathogens (list/str, optional) – Pathogens. Defaults to None.
pathogen_loc (list of lists, optional) – Values for each pathogen. Defaults to None.
treatments (list/str, optional) – Treatments. Defaults to None.
treatment_loc (list of lists, optional) – Values for each treatment. Defaults to None.
- Returns:
Annotated DataFrame with a combined ‘condition’ column.
- Return type:
pandas.DataFrame
- class spacr.utils.ScaledDotProductAttention(d_k)[source]¶
Bases:
torch.nn.Module
Scaled Dot-Product Attention module.
- Parameters:
d_k (int) – The dimension of the key and query vectors.
- forward(Q, K, V)[source]¶
Performs the forward pass of the attention mechanism.
- Parameters:
Q (torch.Tensor) – The query tensor of shape (batch_size, seq_len_q, d_k).
K (torch.Tensor) – The key tensor of shape (batch_size, seq_len_k, d_k).
V (torch.Tensor) – The value tensor of shape (batch_size, seq_len_v, d_k).
- Returns:
The output tensor of shape (batch_size, seq_len_q, d_k).
- Return type:
torch.Tensor
- class spacr.utils.SelfAttention(in_channels, d_k)[source]¶
Bases:
torch.nn.Module
Self-Attention module that applies scaled dot-product attention mechanism.
- Parameters:
in_channels (int) – Number of input channels.
d_k (int) – Dimensionality of the key and query vectors.
- class spacr.utils.ScaledDotProductAttention(d_k)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.SelfAttention(in_channels, d_k)[source]¶
Bases:
torch.nn.Module
Self-Attention module that applies scaled dot-product attention mechanism.
- Parameters:
in_channels (int) – Number of input channels.
d_k (int) – Dimensionality of the key and query vectors.
- class spacr.utils.EarlyFusion(in_channels)[source]¶
Bases:
torch.nn.Module
Early Fusion module for image classification.
- Parameters:
in_channels (int) – Number of input channels.
- class spacr.utils.SpatialAttention(kernel_size=7)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.MultiScaleBlockWithAttention(in_channels, out_channels)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.CustomCellClassifier(num_classes, pathogen_channel, use_attention, use_checkpoint, dropout_rate)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.TorchModel(model_name='resnet50', pretrained=True, dropout_rate=None, use_checkpoint=False)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.FocalLossWithLogits(alpha=1, gamma=2)[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.ResNet(resnet_type='resnet50', dropout_rate=None, use_checkpoint=False, init_weights='imagenet')[source]¶
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- spacr.utils.split_my_dataset(dataset, split_ratio=0.1)[source]¶
Splits a dataset into training and validation subsets.
- Parameters:
dataset (torch.utils.data.Dataset) – The dataset to be split.
split_ratio (float, optional) – The ratio of validation samples to total samples. Defaults to 0.1.
- Returns:
A tuple containing the training dataset and validation dataset.
- Return type:
tuple
- spacr.utils.classification_metrics(all_labels, prediction_pos_probs, loss, epoch)[source]¶
Calculate classification metrics for binary classification.
Parameters: - all_labels (list): List of true labels. - prediction_pos_probs (list): List of predicted positive probabilities. - loader_name (str): Name of the data loader. - loss (float): Loss value. - epoch (int): Epoch number.
Returns: - data_df (DataFrame): DataFrame containing the calculated metrics.
- spacr.utils.compute_irm_penalty(losses, dummy_w, device)[source]¶
Computes the Invariant Risk Minimization (IRM) penalty.
- Parameters:
losses (list) – A list of losses.
dummy_w (torch.Tensor) – A dummy weight tensor.
device (torch.device) – The device to perform computations on.
- Returns:
The computed IRM penalty.
- Return type:
float
- spacr.utils.choose_model(model_type, device, init_weights=True, dropout_rate=0, use_checkpoint=False, channels=3, height=224, width=224, chan_dict=None, num_classes=2, verbose=False)[source]¶
Choose a model for classification.
- Parameters:
model_type (str) – The type of model to choose. Can be one of the pre-defined TorchVision models or ‘custom’ for a custom model.
device (str) – The device to use for model inference.
init_weights (bool, optional) – Whether to initialize the model with pre-trained weights. Defaults to True.
dropout_rate (float, optional) – The dropout rate to use in the model. Defaults to 0.
use_checkpoint (bool, optional) – Whether to use checkpointing during model training. Defaults to False.
channels (int, optional) – The number of input channels for the model. Defaults to 3.
height (int, optional) – The height of the input images for the model. Defaults to 224.
width (int, optional) – The width of the input images for the model. Defaults to 224.
chan_dict (dict, optional) – A dictionary containing channel information for custom models. Defaults to None.
num_classes (int, optional) – The number of output classes for the model. Defaults to 2.
- Returns:
The chosen model.
- Return type:
torch.nn.Module
- spacr.utils.check_multicollinearity(x)[source]¶
Checks multicollinearity of the predictors by computing the VIF.
- spacr.utils.resize_images_and_labels(images, labels, target_height, target_width, show_example=True)[source]¶
- spacr.utils.compute_segmentation_ap(true_masks, pred_masks, iou_thresholds=np.linspace(0.5, 0.95, 10))[source]¶
- spacr.utils.merge_touching_objects(mask, threshold=0.25)[source]¶
Merges touching objects in a binary mask based on the percentage of their shared boundary.
- Parameters:
mask (ndarray) – Binary mask representing objects.
threshold (float, optional) – Threshold value for merging objects. Defaults to 0.25.
- Returns:
Merged mask.
- Return type:
ndarray
- spacr.utils.remove_intensity_objects(image, mask, intensity_threshold, mode)[source]¶
Removes objects from the mask based on their mean intensity in the original image.
- Parameters:
image (ndarray) – The original image.
mask (ndarray) – The mask containing labeled objects.
intensity_threshold (float) – The threshold value for mean intensity.
mode (str) – The mode for intensity comparison. Can be ‘low’ or ‘high’.
- Returns:
The updated mask with objects removed.
- Return type:
ndarray
- spacr.utils.preprocess_image(image_path, image_size=224, channels=[1, 2, 3], normalize=True)[source]¶
- spacr.utils.preprocess_image(image_path, normalize=True, image_size=224, channels=[1, 2, 3])[source]¶
- spacr.utils.class_visualization(target_y, model_path, dtype, img_size=224, channels=[0, 1, 2], l2_reg=0.001, learning_rate=25, num_iterations=100, blur_every=10, max_jitter=16, show_every=25, class_names=['nc', 'pc'])[source]¶
- spacr.utils.reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method='umap', verbose=False, embedding=None, n_jobs=-1, mode='fit', model=False)[source]¶
Perform dimensionality reduction and clustering on the given data.
Parameters: numeric_data (np.ndarray): Numeric data for embedding and clustering. n_neighbors (int or float): Number of neighbors for UMAP or perplexity for t-SNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP and DBSCAN. eps (float): Epsilon for DBSCAN. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. embedding (np.ndarray, optional): Precomputed embedding. Default is None. return_model (bool): Whether to return the reducer model. Default is False.
Returns: tuple: embedding, labels (and optionally the reducer model)
- spacr.utils.plot_embedding(embedding, image_paths, labels, image_nr, img_zoom, colors, plot_by_cluster, plot_outlines, plot_points, plot_images, smooth_lines, black_background, figuresize, dot_size, remove_image_canvas, verbose)[source]¶
- spacr.utils.plot_clusters(ax, embedding, labels, colors, cluster_centers, plot_outlines, plot_points, smooth_lines, figuresize=10, dot_size=50, verbose=False)[source]¶
- spacr.utils.plot_umap_images(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, plot_by_cluster, remove_image_canvas, verbose)[source]¶
- spacr.utils.plot_images_by_cluster(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, cluster_indices, remove_image_canvas, verbose)[source]¶
- spacr.utils.plot_clusters_grid(embedding, labels, image_nr, image_paths, colors, figuresize, black_background, verbose)[source]¶
- spacr.utils.preprocess_data(df, filter_by, remove_highly_correlated, log_data, exclude, column_list=False)[source]¶
Preprocesses the given dataframe by applying filtering, removing highly correlated columns, applying log transformation, filling NaN values, and scaling the numeric data.
Args: df (pandas.DataFrame): The input dataframe. filter_by (str or None): The channel of interest to filter the dataframe by. remove_highly_correlated (bool or float): Whether to remove highly correlated columns. If a float is provided, it represents the correlation threshold. log_data (bool): Whether to apply log transformation to the numeric data. exclude (list or None): List of features to exclude from the filtering process. verbose (bool): Whether to print verbose output during preprocessing.
Returns: numpy.ndarray: The preprocessed numeric data.
Raises: ValueError: If no numeric columns are available after filtering.
- spacr.utils.remove_low_variance_columns(df, threshold=0.01, verbose=False)[source]¶
Removes columns from the dataframe that have low variance.
Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The variance threshold below which columns will be removed.
Returns: pandas.DataFrame: The DataFrame with low variance columns removed.
- spacr.utils.remove_highly_correlated_columns(df, threshold=0.95, verbose=False)[source]¶
Removes columns from the dataframe that are highly correlated with one another.
Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The correlation threshold above which columns will be removed.
Returns: pandas.DataFrame: The DataFrame with highly correlated columns removed.
- spacr.utils.filter_dataframe_features(df, channel_of_interest, exclude=None, remove_low_variance_features=True, remove_highly_correlated_features=True, verbose=False)[source]¶
Filter the dataframe df based on the specified channel_of_interest and exclude parameters.
Parameters: - df (pandas.DataFrame): The input dataframe to be filtered. - channel_of_interest (str, int, list, None): The channel(s) of interest to filter the dataframe. If None, no filtering is applied. If ‘morphology’, only morphology features are included.If an integer, only the specified channel is included. If a list, only the specified channels are included.If a string, only the specified channel is included. - exclude (str, list, None): The feature(s) to exclude from the filtered dataframe. If None, no features are excluded. If a string, the specified feature is excluded.If a list, the specified features are excluded.
Returns: - filtered_df (pandas.DataFrame): The filtered dataframe based on the specified parameters. - features (list): The list of selected features after filtering.
- spacr.utils.find_non_overlapping_position(x, y, image_positions, threshold, max_attempts=100)[source]¶
- spacr.utils.search_reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method, verbose, reduction_param=None, embedding=None, n_jobs=-1)[source]¶
Perform dimensionality reduction and clustering on the given data.
Parameters: numeric_data (np.array): Numeric data to process. n_neighbors (int): Number of neighbors for UMAP or perplexity for tSNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP, tSNE, and DBSCAN. eps (float): Epsilon for DBSCAN clustering. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. reduction_param (dict): Additional parameters for the reduction method. embedding (np.array): Precomputed embedding (optional). n_jobs (int): Number of parallel jobs to run.
Returns: embedding (np.array): Embedding of the data. labels (np.array): Cluster labels.
- spacr.utils.extract_features(image_paths, resnet=resnet50)[source]¶
Extract features from images using a pre-trained ResNet model.
- spacr.utils.check_normality(series)[source]¶
Helper function to check if a feature is normally distributed.
- spacr.utils.random_forest_feature_importance(all_df, cluster_col='cluster')[source]¶
Random Forest feature importance.
- spacr.utils.perform_statistical_tests(all_df, cluster_col='cluster')[source]¶
Perform ANOVA or Kruskal-Wallis tests depending on normality of features.
- spacr.utils.combine_results(rf_df, anova_df, kruskal_df)[source]¶
Combine the results into a single DataFrame.
- spacr.utils.cluster_feature_analysis(all_df, cluster_col='cluster')[source]¶
Perform Random Forest feature importance, ANOVA for normally distributed features, and Kruskal-Wallis for non-normally distributed features. Combine results into a single DataFrame.
- spacr.utils.adjust_cell_masks(parasite_folder, cell_folder, nuclei_folder, overlap_threshold=5, perimeter_threshold=30)[source]¶
Process all npy files in the given folders. Merge and relabel cells in cell masks based on parasite overlap and cell perimeter sharing conditions.
- Parameters:
parasite_folder (str) – Path to the folder containing parasite masks.
cell_folder (str) – Path to the folder containing cell masks.
nuclei_folder (str) – Path to the folder containing nuclei masks.
overlap_threshold (float) – The percentage threshold for merging cells based on parasite overlap.
perimeter_threshold (float) – The percentage threshold for merging cells based on shared perimeter.
- spacr.utils.process_masks(mask_folder, image_folder, channel, batch_size=50, n_clusters=2, plot=False)[source]¶
- spacr.utils.merge_regression_res_with_metadata(results_file, metadata_file, name='_metadata')[source]¶
- spacr.utils.augment_image(image)[source]¶
Perform data augmentation by rotating and reflecting the image.
Parameters: - image (PIL Image or numpy array): The input image.
Returns: - augmented_images (list): A list of augmented images.
- spacr.utils.augment_dataset(dataset, is_grayscale=False)[source]¶
Perform data augmentation on the entire dataset by rotating and reflecting the images.
Parameters: - dataset (list of tuples): The input dataset, each entry is a tuple (image, label, filename). - is_grayscale (bool): Flag indicating if the images are grayscale.
Returns: - augmented_dataset (list of tuples): A dataset with augmented (image, label, filename) tuples.
- spacr.utils.convert_and_relabel_masks(folder_path)[source]¶
Converts all int64 npy masks in a folder to uint16 with relabeling to ensure all labels are retained.
Parameters: - folder_path (str): The path to the folder containing int64 npy mask files.
Returns: - None
- spacr.utils.download_models(repo_id='einarolafsson/models', retries=5, delay=5)[source]¶
Downloads all model files from Hugging Face and stores them in the resources/models directory within the installed spacr package.
- Parameters:
repo_id (str) – The repository ID on Hugging Face (default is ‘einarolafsson/models’).
retries (int) – Number of retry attempts in case of failure.
delay (int) – Delay in seconds between retries.
- Returns:
The local path to the downloaded models.
- Return type:
str
- spacr.utils.generate_cytoplasm_mask(nucleus_mask, cell_mask)[source]¶
Generates a cytoplasm mask from nucleus and cell masks.
Parameters: - nucleus_mask (np.array): Binary or segmented mask of the nucleus (non-zero values represent nucleus). - cell_mask (np.array): Binary or segmented mask of the whole cell (non-zero values represent cell).
Returns: - cytoplasm_mask (np.array): Mask for the cytoplasm (1 for cytoplasm, 0 for nucleus and pathogens).
- spacr.utils.add_column_to_database(settings)[source]¶
Adds a new column to the database table by matching on a common column from the DataFrame. If the column already exists in the database, it adds the column with a suffix. NaN values will remain as NULL in the database.
- Parameters:
settings (dict) – A dictionary containing the following keys: csv_path (str): Path to the CSV file with the data to be added. db_path (str): Path to the SQLite database (or connection string for other databases). table_name (str): The name of the table in the database. update_column (str): The name of the new column in the DataFrame to add to the database. match_column (str): The common column used to match rows.
- Returns:
None
- spacr.utils.fill_holes_in_mask(mask)[source]¶
Fill holes in each object in the mask while keeping objects separated.
- Parameters:
mask (np.ndarray) – A labeled mask where each object has a unique integer value.
- Returns:
A mask with holes filled and original labels preserved.
- Return type:
np.ndarray
- spacr.utils.group_feature_class(df, feature_groups=['cell', 'cytoplasm', 'nucleus', 'pathogen'], name='compartment')[source]¶
- spacr.utils.filter_and_save_csv(input_csv, output_csv, column_name, upper_threshold, lower_threshold)[source]¶
Reads a CSV into a DataFrame, filters rows based on a column for values > upper_threshold and < lower_threshold, and saves the filtered DataFrame to a new CSV file.
- Parameters:
input_csv (str) – Path to the input CSV file.
output_csv (str) – Path to save the filtered CSV file.
column_name (str) – Column name to apply the filters on.
upper_threshold (float) – Upper threshold for filtering (values greater than this are retained).
lower_threshold (float) – Lower threshold for filtering (values less than this are retained).
- Returns:
None
- spacr.utils.extract_tar_bz2_files(folder_path)[source]¶
Extracts all .tar.bz2 files in the given folder into subfolders with the same name as the tar file.
- Parameters:
folder_path (str) – Path to the folder containing .tar.bz2 files.
- spacr.utils.calculate_shortest_distance(df, object1, object2)[source]¶
Calculate the shortest edge-to-edge distance between two objects (e.g., pathogen and nucleus).
Parameters: - df: Pandas DataFrame containing measurements - object1: String, name of the first object (e.g., “pathogen”) - object2: String, name of the second object (e.g., “nucleus”)
Returns: - df: Pandas DataFrame with a new column for shortest edge-to-edge distance.
- spacr.utils.format_path_for_system(path)[source]¶
Takes a file path and reformats it to be compatible with the current operating system.
- Parameters:
path (str) – The file path to be formatted.
- Returns:
The formatted path for the current operating system.
- Return type:
str
- spacr.utils.normalize_src_path(src)[source]¶
Ensures that the ‘src’ value is properly formatted as either a list of strings or a single string.
- Parameters:
src (str or list) – The input source path(s).
- Returns:
- A correctly formatted list if the input was a list (or string representation of a list),
otherwise a single string.
- Return type:
list or str
- spacr.utils.generate_image_path_map(root_folder, valid_extensions=('tif', 'tiff', 'png', 'jpg', 'jpeg', 'bmp', 'czi', 'nd2', 'lif'))[source]¶
Recursively scans a folder and its subfolders for images, then creates a mapping of: {original_image_path: new_image_path}, where the new path includes all subfolder names.
- Parameters:
root_folder (str) – The root directory to scan for images.
valid_extensions (tuple) – Tuple of valid image file extensions.
- Returns:
A dictionary mapping original image paths to their new paths.
- Return type:
dict
- spacr.utils.copy_images_to_consolidated(image_path_map, root_folder)[source]¶
Copies images from their original locations to a ‘consolidated’ folder, renaming them according to the generated dictionary.
- Parameters:
image_path_map (dict) – Dictionary mapping {original_path: new_path}.
root_folder (str) – The root directory where the ‘consolidated’ folder will be created.