Title: | Extra Recipes Steps for Dealing with Omics Data |
---|---|
Description: | Omics data (e.g. transcriptomics, proteomics, metagenomics...) offer a detailed and multi-dimensional perspective on the molecular components and interactions within complex biological (eco)systems. Analyzing these data requires adapted procedures, which are implemented as steps according to the 'recipes' package. |
Authors: | Antoine BICHAT [aut, cre] , Julie AUBERT [ctb] |
Maintainer: | Antoine BICHAT <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.2 |
Built: | 2024-10-31 04:21:32 UTC |
Source: | https://github.com/abichat/scimo |
Fungal community abundance of 74 ASVs sampled from the surface of three different French cheeses.
data("cheese_abundance", package = "scimo") data("cheese_taxonomy", package = "scimo")
data("cheese_abundance", package = "scimo") data("cheese_taxonomy", package = "scimo")
For cheese_abundance
, a tibble
with columns:
Sample ID.
Appellation of the cheese. One of Saint-Nectaire
,
Livarot
or Epoisses
.
One of Natural
or Washed
.
Count of the ASV.
For cheese_taxonomy
, a tibble
with columns:
Amplicon Sequence Variant (ASV) ID.
Character corresponding to a standard concatenation of taxonomic clades.
Clade to which the ASV belongs.
This dataset came from doi:10.24072/pcjournal.321.
data("cheese_abundance", package = "scimo") cheese_abundance data("cheese_taxonomy", package = "scimo") cheese_taxonomy
data("cheese_abundance", package = "scimo") cheese_abundance data("cheese_taxonomy", package = "scimo") cheese_taxonomy
Gene expression of 108 CCLE cell lines from 5 different pediatric cancers.
data("pedcan_expression", package = "scimo")
data("pedcan_expression", package = "scimo")
A tibble with columns:
Cell line name.
One of Male
, Female
or Unknown
.
One of Primary
, Metastasis
or Unknown
.
One of Neuroblastoma
, Ewing Sarcoma
,
Rhabdomyosarcoma
, Embryonal Tumor
or Osteosarcoma
.
Expression of the gene, given in log2(TPM + 1).
This dataset is generated from DepMap Public 23Q4 primary files. https://depmap.org/portal/download/all/.
data("pedcan_expression", package = "scimo") pedcan_expression
data("pedcan_expression", package = "scimo") pedcan_expression
Aggregate variables according to hierarchical clustering.
step_aggregate_hclust( recipe, ..., role = "predictor", trained = FALSE, n_clusters, fun_agg, dist_metric = "euclidean", linkage_method = "complete", res = NULL, prefix = "cl_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_hclust") ) ## S3 method for class 'step_aggregate_hclust' tidy(x, ...)
step_aggregate_hclust( recipe, ..., role = "predictor", trained = FALSE, n_clusters, fun_agg, dist_metric = "euclidean", linkage_method = "complete", res = NULL, prefix = "cl_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_hclust") ) ## S3 method for class 'step_aggregate_hclust' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
For model terms created by this step, what analysis role should
they be assigned? By default, the new columns created by this step from
the original variables will be used as |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
n_clusters |
Number of cluster to create. |
fun_agg |
Aggregation function like |
dist_metric |
Default to |
linkage_method |
Default to |
res |
This parameter is only produced after the recipe has been trained. |
prefix |
A character string for the prefix of the resulting new variables. |
keep_original_cols |
A logical to keep the original variables in
the output. Defaults to |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_hclust(all_numeric_predictors(), n_clusters = 2, fun_agg = sum) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_hclust(all_numeric_predictors(), n_clusters = 2, fun_agg = sum) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Aggregate variables according to prior knowledge.
step_aggregate_list( recipe, ..., role = "predictor", trained = FALSE, list_agg = NULL, fun_agg = NULL, others = "discard", name_others = "others", res = NULL, prefix = "agg_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_list") ) ## S3 method for class 'step_aggregate_list' tidy(x, ...)
step_aggregate_list( recipe, ..., role = "predictor", trained = FALSE, list_agg = NULL, fun_agg = NULL, others = "discard", name_others = "others", res = NULL, prefix = "agg_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_list") ) ## S3 method for class 'step_aggregate_list' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
For model terms created by this step, what analysis role should
they be assigned? By default, the new columns created by this step from
the original variables will be used as |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
list_agg |
Named list of aggregated variables. |
fun_agg |
Aggregation function like |
others |
Behavior for the selected variables in |
name_others |
If |
res |
This parameter is only produced after the recipe has been trained. |
prefix |
A character string for the prefix of the resulting new
variables that are not named in |
keep_original_cols |
A logical to keep the original variables in
the output. Defaults to |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
list_iris <- list(sepal.size = c("Sepal.Length", "Sepal.Width"), petal.size = c("Petal.Length", "Petal.Width")) rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_list(all_numeric_predictors(), list_agg = list_iris, fun_agg = prod) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
list_iris <- list(sepal.size = c("Sepal.Length", "Sepal.Width"), petal.size = c("Petal.Length", "Petal.Width")) rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_list(all_numeric_predictors(), list_agg = list_iris, fun_agg = prod) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Normalize a set of variables by converting them to proportion, making them sum to 1. Also known as simplex projection.
step_rownormalize_tss( recipe, ..., role = NA, trained = FALSE, res = NULL, skip = FALSE, id = rand_id("rownormalize_tss") ) ## S3 method for class 'step_rownormalize_tss' tidy(x, ...)
step_rownormalize_tss( recipe, ..., role = NA, trained = FALSE, res = NULL, skip = FALSE, id = rand_id("rownormalize_tss") ) ## S3 method for class 'step_rownormalize_tss' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
res |
This parameter is only produced after the recipe has been trained. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- recipe(Species ~ ., data = iris) %>% step_rownormalize_tss(all_numeric_predictors()) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- recipe(Species ~ ., data = iris) %>% step_rownormalize_tss(all_numeric_predictors()) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Select features that exceed a background level in at least a defined number of samples.
step_select_background( recipe, ..., role = NA, trained = FALSE, background_level = NULL, n_samples = NULL, prop_samples = NULL, res = NULL, skip = FALSE, id = rand_id("select_background") ) ## S3 method for class 'step_select_background' tidy(x, ...)
step_select_background( recipe, ..., role = NA, trained = FALSE, background_level = NULL, n_samples = NULL, prop_samples = NULL, res = NULL, skip = FALSE, id = rand_id("select_background") ) ## S3 method for class 'step_select_background' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
background_level |
Background level to exceed. |
n_samples , prop_samples
|
Count or proportion of samples in which a
feature exceeds |
res |
This parameter is only produced after the recipe has been trained. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_background(all_numeric_predictors(), background_level = 4, prop_samples = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_background(all_numeric_predictors(), background_level = 4, prop_samples = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Select variables with highest coefficient of variation.
step_select_cv( recipe, ..., role = NA, trained = FALSE, n_kept = NULL, prop_kept = NULL, cutoff = NULL, res = NULL, skip = FALSE, id = rand_id("select_cv") ) ## S3 method for class 'step_select_cv' tidy(x, ...)
step_select_cv( recipe, ..., role = NA, trained = FALSE, n_kept = NULL, prop_kept = NULL, cutoff = NULL, res = NULL, skip = FALSE, id = rand_id("select_cv") ) ## S3 method for class 'step_select_cv' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
n_kept |
Number of variables to keep. |
prop_kept |
A numeric value between 0 and 1 representing the proportion
of variables to keep. |
cutoff |
Threshold beyond which (below or above) the variables are discarded. |
res |
This parameter is only produced after the recipe has been trained. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- recipe(Species ~ ., data = iris) %>% step_select_cv(all_numeric_predictors(), n_kept = 2) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- recipe(Species ~ ., data = iris) %>% step_select_cv(all_numeric_predictors(), n_kept = 2) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Select variables with the lowest (adjusted) p-value of a Kruskal-Wallis test against an outcome.
step_select_kruskal( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_kruskal") ) ## S3 method for class 'step_select_kruskal' tidy(x, ...)
step_select_kruskal( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_kruskal") ) ## S3 method for class 'step_select_kruskal' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
outcome |
Name of the variable to perform the test against. |
n_kept |
Number of variables to keep. |
prop_kept |
A numeric value between 0 and 1 representing the proportion
of variables to keep. |
cutoff |
Threshold beyond which (below or above) the variables are discarded. |
correction |
Multiple testing correction method. One of
|
res |
This parameter is only produced after the recipe has been trained. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_kruskal(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_kruskal(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Select variables with the lowest (adjusted) p-value of a Wilcoxon-Mann-Whitney test against an outcome.
step_select_wilcoxon( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_wilcoxon") ) ## S3 method for class 'step_select_wilcoxon' tidy(x, ...)
step_select_wilcoxon( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_wilcoxon") ) ## S3 method for class 'step_select_wilcoxon' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
outcome |
Name of the variable to perform the test against. |
n_kept |
Number of variables to keep. |
prop_kept |
A numeric value between 0 and 1 representing the proportion
of variables to keep. |
cutoff |
Threshold beyond which (below or above) the variables are discarded. |
correction |
Multiple testing correction method. One of
|
res |
This parameter is only produced after the recipe has been trained. |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
rec <- iris %>% dplyr::filter(Species != "virginica") %>% recipe(formula = Species ~ .) %>% step_select_wilcoxon(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
rec <- iris %>% dplyr::filter(Species != "virginica") %>% recipe(formula = Species ~ .) %>% step_select_wilcoxon(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
Extract clades from a lineage, as defined in the {yatah}
package.
step_taxonomy( recipe, ..., role = "predictor", trained = FALSE, rank = NULL, res = NULL, keep_original_cols = FALSE, skip = FALSE, id = rand_id("taxonomy") ) ## S3 method for class 'step_taxonomy' tidy(x, ...)
step_taxonomy( recipe, ..., role = "predictor", trained = FALSE, rank = NULL, res = NULL, keep_original_cols = FALSE, skip = FALSE, id = rand_id("taxonomy") ) ## S3 method for class 'step_taxonomy' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
For model terms created by this step, what analysis role should
they be assigned? By default, the new columns created by this step from
the original variables will be used as |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
rank |
The desired ranks, a combinaison of |
res |
This parameter is only produced after the recipe has been trained. |
keep_original_cols |
A logical to keep the original variables in
the output. Defaults to |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step added to the sequence of any existing operations.
Antoine Bichat
data("cheese_taxonomy") rec <- cheese_taxonomy %>% select(asv, lineage) %>% recipe(~ .) %>% step_taxonomy(lineage, rank = c("order", "genus")) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)
data("cheese_taxonomy") rec <- cheese_taxonomy %>% select(asv, lineage) %>% recipe(~ .) %>% step_taxonomy(lineage, rank = c("order", "genus")) %>% prep() rec tidy(rec, 1) bake(rec, new_data = NULL)