Intermediate files¶
Although the primary output of RSMSummarize is an HTML report, we also want the user to be able to conduct additional analyses outside of RSMTool. To this end, all of the tables produced in the experiment report are saved as files in the format as specified by file_format
parameter in the output
directory. The following sections describe all of the intermediate files that are produced.
Note
The names of all files begin with the summary_id
provided by the user in the experiment configuration file.
Marginal and partial correlations with score¶
filenames: margcor_score_all_data
, pcor_score_all_data
, `pcor_score_no_length_all_data
The first file contains the marginal correlations between each pre-processed feature and human score. The second file contains the partial correlation between each pre-processed feature and human score after controlling for all other features. The third file contains the partial correlations between each pre-processed feature and human score after controlling for response length, if length_column
was specified in the configuration file.
Model information¶
model_summary
This file contains the main information about the models included into the report including:
- Total number of features
- Total number of features with non-negative coefficients
- The learner
- The label used to train the model
betas
: standardized coefficients (for built-in models only).model_fit
: R squared and adjusted R squared computed on the training set. Note that these values are always computed on raw predictions without any trimming or rounding.
Note
If the report includes a combination of rsmtool
and rsmeval
experiments, the summary tables with model information will only include rsmtool
experiments since no model information is available for rsmeval
experiments.
Evaluation metrics¶
eval_short
- descriptives for predicted and human scores (mean, std.dev etc.) and association metrics (correlation, quadartic weighted kappa, SMD etc.) for specific score types chosen based on recommendations by Williamson (2012). Specifically, the following columns are included (theraw
orscale
version is chosen depending on the value of theuse_scaled_predictions
in the configuration file).- h_mean
- h_sd
- corr
- sys_mean [raw/scale trim]
- sys_sd [raw/scale trim]
- SMD [raw/scale trim]
- adj_agr [raw/scale trim_round]
- exact_agr [raw/scale trim_round]
- kappa [raw/scale trim_round]
- wtkappa [raw/scale trim_round]
- sys_mean [raw/scale trim_round]
- sys_sd [raw/scale trim_round]
- SMD [raw/scale trim_round]
- R2 [raw/scale trim]
- RMSE [raw/scale trim]
Note
Please note that for raw scores, SMD values are likely to be affected by possible differences in scale.