1 Introduction

In the MOFA2 R package we provide a wide range of downstream analysis to visualise and interpret the model output. Here we provide a brief description of the main functionalities. Please refer to the vignettes for details on the different analysis.

2 Load libraries

library(ggplot2)
library(MOFA2)
## Warning: replacing previous import 'DelayedArray::pnorm' by 'stats::pnorm'
## when loading 'MOFA2'

3 Load trained model

Options:

  • sort_factors: logical indicating whether factors should be sorted by variance explained (default is TRUE)
  • on_disk: logical indicating whether to work from memory or from disk. This should be set to TRUE when the training data is so big that cannot fit into memory. On-disk operations are performed using the HDF5Array and DelayedArray framework.
model <- load_model("/Users/ricard/data/mofaplus/test/model.hdf5")

3.1 Overview of data

The function plot_data_overview can be used to obtain an overview of the input data. It shows how many views (rows) and how many groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).

plot_data_overview(model)

4 Add metadata to the model

We have implemented the option to add sample metadata information to the MOFA object. This makes it easier to color the subsequent plots.
The metadata is stored as dataframe in model@metadata, and it requires at least two columns: sample and group. The number of rows must match the total number of samples in the model (sum(model@dimensions$N)).

Let’s create some metadata…

sample_metadata <- model@samples_metadata
head(sample_metadata, n=3)
##           sample  group
## 1 sample0_group0 group0
## 2 sample1_group0 group0
## 3 sample2_group0 group0
sample_metadata$condition <- sample(c("A","B"), size = nrow(sample_metadata), replace=T)
sample_metadata$age <- sample(1:100, size = nrow(sample_metadata), replace=T)

samples_metadata(model) <- sample_metadata
head(model@samples_metadata, n=3)
##           sample  group condition age
## 1 sample0_group0 group0         A  90
## 2 sample1_group0 group0         A  62
## 3 sample2_group0 group0         B  44

5 Variance decomposition

MOFA reduces the dimensionality of a multi-omics data set in terms of a small number of latent factors, and it quantifies the fraction of variance explained (\(R^2\)) for each of the factors in the different omics. If using multiple groups of data, the model quantifies how much variance each factor explains in each combination of view and group (see figure above).

The variance explained estimates are expensive to compute and are stored in the hdf5 file. In R, they are loaded in model@cache$:

# Total variance explained per view and group
head(model@cache$variance_explained$r2_total[[1]]) # group 1
## $view0
## [1] 0.8271578
## 
## $view1
## [1] 0.8491216
## 
## $view2
## [1] 0.5499326
# Variance explained for every factor in per view and group
head(model@cache$variance_explained$r2_per_factor[[1]]) # group 1
##                view0        view1        view2
## Factor1 7.564619e-06 1.591348e-01 5.494409e-01
## Factor2 2.398969e-05 9.459490e-06 8.689685e-05
## Factor3 3.107064e-01 1.857647e-01 3.959269e-05
## Factor4 2.460983e-01 2.556742e-01 3.398418e-05
## Factor5 2.545364e-01 2.459087e-01 7.648372e-05
## Factor6 0.000000e+00 1.090930e-04 2.088264e-04

Variance explained estimates can be plotted using plot_variance_explained(model, ...). Options:

  • factors: character vector with a factor name(s), or numeric vector with the index(es) of the factor(s). Default is “all”.
  • x: character specifying the dimension for the x-axis (“view”, “factor”, or “group”).
  • y: character specifying the dimension for the y-axis (“view”, “factor”, or “group”).
  • split_by: character specifying the dimension to be faceted (“view”, “factor”, or “group”).
  • plot_total: logical value to indicate if to plot the total variance explained (for the variable in the x-axis)
plot_variance_explained(model, x="group", y="factor")

p <- plot_variance_explained(model, x="view", y="group")
p + theme(axis.text.x = element_text(color="black", angle=40, vjust=1, hjust=1))

p <- plot_variance_explained(model, x="group", y="factor", plot_total = T)
p[[2]]

5.0.1 Visualisation of samples in the factor space

Mathematically, each factor orders cells along a one-dimensional axis centered at zero. Samples with different signs have opposite effects along the inferred axis of variation. Cells that remain centered at zero can represent either an intermediate phenotype or no phenotype at all associated with the factor under consideration.

5.1 Visualisation of single factors

Factors can be plotted using plot_factor (beeswarm plots) or plot_factors (scatter plots)

plot_factor(model, 
  factor = 1:3,
  color_by = "age",
  shape_by = "condition"
)

Adding more options

p <- plot_factor(model, 
  factor = 1,
  color_by = "condition",
  dot_size = 0.2,      # change dot size
  dodge = T,           # dodge points with different colors
  legend = F,          # remove legend
  add_violin = T,      # add violin plots,
  violin_alpha = 0.25  # transparency of violin plots
)

# The output of plot_factor is a ggplot2 object that we can edit
p <- p + 
  scale_color_manual(values=c("A"="black", "B"="red")) +
  scale_fill_manual(values=c("A"="black", "B"="red"))

print(p)

5.2 Visualisation of combinations of factors

Scatter plots

plot_factors(model, 
  factors = 1:3,
  color_by = "condition"
)

5.3 Visualisation of feature weights

The weights provide a score for how strong each feature relates to each factor. Genes with no association with the factor have values close to zero, while genes with strong association with the factor have large absolute values. The sign of the loading indicates the direction of the effect: a positive loading indicates that the feature has higher levels in the cells with positive factor values, and vice versa.

Weights can be plotted using plot_weights (beeswarm plots) or plot_top_weights (scatter plots)

plot_weights(model,
  view = "view0",
  factor = 1,
  nfeatures = 10,     # Number of features to highlight
  scale = T           # Scale loadings from -1 to 1
)

6 Visualisation of patterns in the data

Instead of looking at an “abstract” weight, it is useful to observe the coordinated heterogeneity of the informative features in the original data. This can be done using the plot_data_heatmap and plot_data_scatter function.

6.1 Heatmaps

Heatmap of observations. Top features are selected by its weight in the selected factor. By default, samples are ordered according to their corresponding factor value.

plot_data_heatmap(model,
  view = "view1",         # view of interest
  factor = 1,             # factor of interest
  features = 20,          # number of features to plot (they are selected by loading)
  
  # extra arguments that are passed to the `pheatmap` function
  cluster_rows = TRUE, cluster_cols = FALSE,
  show_rownames = TRUE, show_colnames = FALSE
)

6.2 Scatter plots

Scatter plots of observations vs factor values. It is useful to add a linear regression estimate to visualise if the relationship between (top) features and factor values is linear.

plot_data_scatter(model,
  view = "view1",         # view of interest
  factor = 1,             # factor of interest
  features = 5,           # number of features to plot (they are selected by loading)
  add_lm = TRUE,          # add linear regression
  color_by = "condition"
)

6.3 Non-linear dimensionality reduction

Interpretability at the factor level is achieved at the expense of limited information content per factor (due to the linearity assumption). Nevertheless, the MOFA factors can be used as input to other methods that learn compact nonlinear manifolds (t-SNE or UMAP).

Run UMAP and t-SNE

set.seed(42)
model <- run_umap(model)
model <- run_tsne(model)

Plot non-linear dimensionality reduction

plot_dimred(model,
  method = "TSNE",
  color_by = "condition",
  legend = F
)

7 Other functionalities

7.1 Renaming dimensions

The user can rename the dimensions of the model

views(model) <- c("Transcriptomics", "Methylation", "Proteomics")
groups(model) <- c("Condition_A", "Condition_B", "Condition_C")
views(model)
## [1] "Transcriptomics" "Methylation"     "Proteomics"
groups(model)
## [1] "Condition_A" "Condition_B" "Condition_C"

7.2 Extracting data for downstream analysis

The user can extract the feature weights, the data and the factors to generate their own plots.

Extract factors

# factors is a list of matrices, one matrix per group with dimensions (nsamples, nfactors)
factors <- get_factors(model,
  groups = "all",
  factors = "all"
)

lapply(factors,dim)
## $Condition_A
## [1] 300   7
## 
## $Condition_B
## [1] 550   7
## 
## $Condition_C
## [1] 150   7

Extract weights

# weights is a list of matrices, one matrix per view with dimensions (nfeatures, nfactors)
weights <- get_weights(model,
  views = "all",
  factors = "all"
)

lapply(weights,dim)
## $Transcriptomics
## [1] 200   7
## 
## $Methylation
## [1] 100   7
## 
## $Proteomics
## [1] 400   7

Extract data

# data is a nested list of matrices, one matrix per view and group with dimensions (nfeatures, nsamples)
data <- get_data(model)

lapply(data, function(x) lapply(x, dim))[[1]]
## $Condition_A
## [1] 200 300
## 
## $Condition_B
## [1] 200 550
## 
## $Condition_C
## [1] 200 150

For convenience, the user can extract the data in long data.frame format to plug into ggplot:

# factors is a list of matrices, one matrix per group with dimensions (nsamples, nfactors)
factors <- get_factors(model, as.data.frame = T)
weights <- get_weights(model, as.data.frame = T)
data <- get_data(model, as.data.frame = T)
head(factors, n=3)
##           sample  factor       value       group
## 1 sample0_group0 Factor1 -0.05370162 Condition_A
## 2 sample1_group0 Factor1  0.22420164 Condition_A
## 3 sample2_group0 Factor1 -0.22473685 Condition_A
head(weights, n=3)
##          feature  factor         value            view
## 1 feature0_view0 Factor1 -0.0006858052 Transcriptomics
## 2 feature1_view0 Factor1  0.0005172773 Transcriptomics
## 3 feature2_view0 Factor1 -0.0006623013 Transcriptomics
head(data, n=3)
##              view       group        feature         sample      value
## 2 Transcriptomics Condition_A feature1_view0 sample0_group0 -0.4013383
## 3 Transcriptomics Condition_A feature2_view0 sample0_group0  2.1211231
## 4 Transcriptomics Condition_A feature3_view0 sample0_group0  2.8630833

8 SessionInfo

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MOFA2_0.99.1     ggplot2_3.2.1    BiocStyle_2.12.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.0.0         jsonlite_1.6        foreach_1.4.7      
##  [4] RcppParallel_4.4.4  assertthat_0.2.1    BiocManager_1.30.9 
##  [7] stats4_3.6.1        vipor_0.4.5         yaml_2.2.0         
## [10] ggrepel_0.8.1       corrplot_0.84       pillar_1.4.2       
## [13] backports_1.1.5     lattice_0.20-38     glue_1.3.1         
## [16] reticulate_1.13     digest_0.6.22       RColorBrewer_1.1-2 
## [19] ggsignif_0.6.0      colorspace_1.4-1    cowplot_1.0.0      
## [22] htmltools_0.4.0     Matrix_1.2-17       plyr_1.8.4         
## [25] pkgconfig_2.0.3     pheatmap_1.0.12     bookdown_0.14      
## [28] purrr_0.3.3         scales_1.0.0        RSpectra_0.15-0    
## [31] HDF5Array_1.12.3    Rtsne_0.15          BiocParallel_1.18.1
## [34] tibble_2.1.3        IRanges_2.18.3      ellipsis_0.3.0     
## [37] ggpubr_0.2.3        withr_2.1.2         BiocGenerics_0.30.0
## [40] lazyeval_0.2.2      magrittr_1.5        crayon_1.3.4       
## [43] evaluate_0.14       GGally_1.4.0        doParallel_1.0.15  
## [46] forcats_0.4.0       FNN_1.1.3           beeswarm_0.2.3     
## [49] tools_3.6.1         lifecycle_0.1.0     matrixStats_0.55.0 
## [52] stringr_1.4.0       Rhdf5lib_1.6.3      S4Vectors_0.22.1   
## [55] munsell_0.5.0       DelayedArray_0.10.0 compiler_3.6.1     
## [58] rlang_0.4.1         rhdf5_2.28.1        grid_3.6.1         
## [61] iterators_1.0.12    labeling_0.3        rmarkdown_1.16     
## [64] gtable_0.3.0        codetools_0.2-16    reshape_0.8.8      
## [67] reshape2_1.4.3      R6_2.4.0            knitr_1.25         
## [70] dplyr_0.8.3         uwot_0.1.4          zeallot_0.1.0      
## [73] stringi_1.4.3       ggbeeswarm_0.6.0    parallel_3.6.1     
## [76] Rcpp_1.0.2          vctrs_0.2.0         tidyselect_0.2.5   
## [79] xfun_0.10