MOFA2 0.99.1
Model selection is an important step in probabilistic modelling. As happens in most Bayesian models, the optimisation procedure of MOFA is not guaranteed to find a consistent optimal solution at every trial, and factors can vary between different model instances. Hence, it is important to asess the consistency of factors we trained models under different random parameter initialisations (different seeds).
Having said that, we observe MOFA to be very robust over trials. In practice, a single model is totally fine for an exploratory analysis.
library(MOFA2)
## Warning: replacing previous import 'DelayedArray::pnorm' by 'stats::pnorm'
## when loading 'MOFA2'
To illustrate the MOFA workflow we simulate a small example data set with 3 different views. make_example_data
generates an untrained MOFAobject containing the simulated data. If you work on your own data use create_mofa
to create the untrained MOFA object (see our vignettes on scRNA (gastrulation) or scMethylation (cortex)).
By default the function make_example_data
produces a small data set containing 3 views with 100 features each and 2 groups with 50 samples each. These parameters can be varied using the arguments n_views
, n_features
, n_groups
and n_samples
.
set.seed(1234)
sim_data <- make_example_data()
MOFAobject <- create_mofa(sim_data$data, groups = sim_data$groups)
## Creating MOFA object from a list of matrices (features as rows, sample as columns)...
## Warning in quality_control(object): There are duplicated features names across different views. We will add the suffix *_view* only for those features
## Example: if you have both TP53 in mRNA and mutation data it will be renamed to TP53_mRNA, TP53_mutation
MOFAobject
## Untrained MOFA model with the following characteristics:
## Number of views: 3
## Views names: view_1 view_2 view_3
## Number of features (per view): 100 100 100
## Number of groups: 2
## Groups names: group_1 group_2
## Number of samples (per group): 50 50
Details on the various options can be found in the getting_started vignette. Here, we will simply use the default options, changing only the number of factors:
data_opts <- get_default_data_options(MOFAobject)
model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 5
train_opts <- get_default_training_options(MOFAobject)
Once the MOFAobject is set up we can use run_mofa
to train the model.
As depending on the random initilization the results might differ, we recommend to use run_mofa
multiple times (e.g. ten times, here we use a smaller number for illustration as the model training can take some time) with different random seeds. This allows you to assess the robustness of the inferred factors across different random initilizations and select a model for downstream analysis. As a next step we will show how to compare the different fits and select the best model for downstream analyses.
n_inits <- 3
MOFAlist <- lapply(seq_len(n_inits), function(it) {
# change the seed
train_opts$seed <- 2019 + it
MOFAobject <- prepare_mofa(MOFAobject,
data_options = data_opts,
model_options = model_opts,
training_options = train_opts
)
run_mofa(MOFAobject)
})
## Warning in run_mofa(MOFAobject): No output filename provided. Using /tmp/mofa_20191113-164639.hdf5 to store the trained model.
## Warning in run_mofa(MOFAobject): No output filename provided. Using /tmp/mofa_20191113-164644.hdf5 to store the trained model.
## Warning in run_mofa(MOFAobject): No output filename provided. Using /tmp/mofa_20191113-164647.hdf5 to store the trained model.
Having a list of trained models we can use compare_elbo
to get an overview of what the optimized ELBO value is (a model with larger ELBO is preferred).
compare_elbo(MOFAlist)
## Plotting the absolute value of the ELBO for every model (the smaller the better)
With compare_factors
we can get an overview of how robust the factors are between different model instances.
compare_factors(MOFAlist)
For down-stream analyses we recommned to choose the model with the best ELBO value as is done by select_model
.
MOFAobject <- select_model(MOFAlist)
MOFAobject
## Trained MOFA with the following characteristics:
## Number of views: 3
## Views names: view_1 view_2 view_3
## Number of features (per view): 100 100 100
## Number of groups: 2
## Groups names: group_1 group_2
## Number of samples (per group): 50 50
## Number of factors: 5
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MOFA2_0.99.1 BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] ggrepel_0.8.1 Rcpp_1.0.2 lattice_0.20-38
## [4] tidyr_1.0.0 assertthat_0.2.1 zeallot_0.1.0
## [7] digest_0.6.22 foreach_1.4.7 plyr_1.8.4
## [10] R6_2.4.0 backports_1.1.5 stats4_3.6.1
## [13] evaluate_0.14 ggplot2_3.2.1 pillar_1.4.2
## [16] rlang_0.4.1 lazyeval_0.2.2 S4Vectors_0.22.1
## [19] Matrix_1.2-17 reticulate_1.13 rmarkdown_1.16
## [22] labeling_0.3 BiocParallel_1.18.1 Rtsne_0.15
## [25] stringr_1.4.0 uwot_0.1.4 pheatmap_1.0.12
## [28] munsell_0.5.0 DelayedArray_0.10.0 HDF5Array_1.12.3
## [31] compiler_3.6.1 vipor_0.4.5 xfun_0.10
## [34] pkgconfig_2.0.3 BiocGenerics_0.30.0 ggbeeswarm_0.6.0
## [37] htmltools_0.4.0 tidyselect_0.2.5 tibble_2.1.3
## [40] bookdown_0.14 IRanges_2.18.3 codetools_0.2-16
## [43] matrixStats_0.55.0 ggpubr_0.2.3 crayon_1.3.4
## [46] dplyr_0.8.3 grid_3.6.1 jsonlite_1.6
## [49] gtable_0.3.0 lifecycle_0.1.0 magrittr_1.5
## [52] scales_1.0.0 RcppParallel_4.4.4 stringi_1.4.3
## [55] reshape2_1.4.3 ggsignif_0.6.0 doParallel_1.0.15
## [58] vctrs_0.2.0 cowplot_1.0.0 Rhdf5lib_1.6.3
## [61] RColorBrewer_1.1-2 iterators_1.0.12 tools_3.6.1
## [64] forcats_0.4.0 glue_1.3.1 beeswarm_0.2.3
## [67] purrr_0.3.3 parallel_3.6.1 yaml_2.2.0
## [70] colorspace_1.4-1 rhdf5_2.28.1 BiocManager_1.30.9
## [73] corrplot_0.84 knitr_1.25