| Title: | Lipid Set Enrichment Analysis with Dual KS and 'fgsea' Engines |
|---|---|
| Description: | Provides biology-aware lipid set enrichment analysis (LSEA) for lipidomics data using dual engines: the Kolmogorov-Smirnov test and the fast gene set enrichment algorithm from the 'fgsea' package. Annotates lipids into biological groups at three levels (lipid class, LIPID MAPS category, functional category) and tests for coordinated directional shifts between conditions. Includes fatty acid chain analysis with trend plots weighted by lipid abundance (Spearman rank correlation, configurable smoothing), wide-format chain position output (sn-1, sn-2, sn-3, sn-4), annotation confidence filtering, and export utilities for reproducible reporting in CSV, 'Excel', and PDF formats. Vignettes are available in English and Spanish. Methods are based on Subramanian et al. (2005) <doi:10.1073/pnas.0506580102> and Korotkevich et al. (2021) <doi:10.1101/060012>. |
| Authors: | David Guardamino Ojeda [aut, cre] (ORCID: <https://orcid.org/0000-0002-3122-2218>) |
| Maintainer: | David Guardamino Ojeda <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-06-17 09:23:48 UTC |
| Source: | https://github.com/davidgo464/easylsea |
Parses lipid names in any format used by lipidomics software (LipidSearch, MS-DIAL, LipidView) and returns a structured data frame with LIPID MAPS canonical classification, chain-level metadata, and optional shorthand notation per Liebisch et al. (2020).
annotate_lipid( molecules, detail = c("compact", "standard", "full"), shorthand = FALSE, sn_confirmed = FALSE, lyso_explicit = FALSE, no_match = c("warn", "remove", "ignore"), sphingoid_default = "d" )annotate_lipid( molecules, detail = c("compact", "standard", "full"), shorthand = FALSE, sn_confirmed = FALSE, lyso_explicit = FALSE, no_match = c("warn", "remove", "ignore"), sphingoid_default = "d" )
molecules |
Character vector of lipid names to parse. |
detail |
Level of detail in the output table:
|
shorthand |
Logical. If |
sn_confirmed |
Logical. If |
lyso_explicit |
Logical. If |
no_match |
How to handle unparsed names: |
sphingoid_default |
Default sphingoid base prefix for sphingolipids
without explicit prefix. |
A data frame with one row per unique lipid name. Key columns include
Class, lm_category, lm_class_id, annotation_level,
is_ether, is_plasmalogen, is_istd, sphingoid_prefix,
total_cl, total_cs, and optionally shorthand_lm.
Liebisch G et al. Update on LIPID MAPS classification, nomenclature, and shorthand notation for MS-derived lipid structures. J Lipid Res. 2020;61(12):1539-1555. doi:10.1194/jlr.S120001025
Conroy MJ et al. LIPID MAPS: update to databases and tools for the lipidomics community. Nucleic Acids Res. 2024;52(D1):D1677-D1682. doi:10.1093/nar/gkad896
lipids <- c("PC 16:0/18:1", "PC O-18:1/20:4", "Cer d18:1/16:0", "TG(16:0/18:1/18:1)", "Lyso PE 18:1(d7)", "plasmenylPE (16:0/18:1)", "Sa1P d 18:0", "WE 16:0/18:1", "CoA 16:0", "15-HETE", "PGE2", "LTB4", "Resolvin D1", "12(13)-EpOME") annotate_lipid(lipids) annotate_lipid(lipids, detail = "standard") annotate_lipid(lipids, detail = "full", shorthand = TRUE)lipids <- c("PC 16:0/18:1", "PC O-18:1/20:4", "Cer d18:1/16:0", "TG(16:0/18:1/18:1)", "Lyso PE 18:1(d7)", "plasmenylPE (16:0/18:1)", "Sa1P d 18:0", "WE 16:0/18:1", "CoA 16:0", "15-HETE", "PGE2", "LTB4", "Resolvin D1", "12(13)-EpOME") annotate_lipid(lipids) annotate_lipid(lipids, detail = "standard") annotate_lipid(lipids, detail = "full", shorthand = TRUE)
Assigns lipid class (e.g. PC, TG, Cer), full class name, LIPID MAPS
structural category, and functional category to each lipid in data.
Returns the input data.frame with annotation columns appended, ready for
use in run_lsea and parse_lipid_chains.
annotate_lipids( data, lipid_col = "LipidName", shorthand_col = "Shorthand", method = c("internal", "lipidAnnotator"), verbose = TRUE )annotate_lipids( data, lipid_col = "LipidName", shorthand_col = "Shorthand", method = c("internal", "lipidAnnotator"), verbose = TRUE )
data |
A |
lipid_col |
Character(1). Name of the column containing lipid
identifiers. Default: |
shorthand_col |
Character(1) or |
method |
Character(1). Annotation method:
|
verbose |
Logical(1). Print annotation summary (class distribution
and count of unclassified lipids). Default: |
The input data.frame with five columns appended:
LipidClassAbbreviated class (e.g. "PC", "TG", "Cer").
LipidClass_FullDescriptive class name (e.g. "Ceramide", "Ether-PC").
LipidCategory_LMAPSLIPID MAPS structural category (e.g. "Glycerophospholipids", "Sphingolipids").
LipidCategory_functionalFunctional category, with Oxylipins and Bile Acids as standalone groups rather than nested under Fatty Acyls.
LipidCategorySimplified category for plotting:
same as LipidCategory_functional except Saccharolipids
are shown as "Glycolipids".
Lipids that cannot be classified receive LipidClass = "Unknown".
df <- data.frame( LipidName = c("PC 36:2", "TG(54:3)", "SM d18:1/16:0", "Cer(d18:1/24:0)", "LPC 18:0", "CE 18:1"), logFC = c(1.2, -0.8, 0.5, -1.1, 0.3, 0.9), stringsAsFactors = FALSE ) annotated <- annotate_lipids(df) annotated[, c("LipidName", "LipidClass", "LipidCategory")]df <- data.frame( LipidName = c("PC 36:2", "TG(54:3)", "SM d18:1/16:0", "Cer(d18:1/24:0)", "LPC 18:0", "CE 18:1"), logFC = c(1.2, -0.8, 0.5, -1.1, 0.3, 0.9), stringsAsFactors = FALSE ) annotated <- annotate_lipids(df) annotated[, c("LipidName", "LipidClass", "LipidCategory")]
Returns the default list that maps lipid classes to their parsing strategy.
Pass the output of this function as the cls_config argument of
parse_lipid_chains() to override individual entries.
default_chain_config()default_chain_config()
Named list with elements sn2, nacyl, long,
single, and excl.
One-call interface to the complete easyLSEA workflow:
lipid annotation, KS and/or fgsea enrichment across three biological
levels (class, LIPID MAPS category, functional category), and fatty
acid chain analysis. Returns a structured easyLSEA_result object
that can be plotted and exported.
easyLSEA( data, lipid_col = "LipidName", fc_col = "logFC", pval_col = "P.Value", case_lbl = "Case", ref_lbl = "Reference", engine = c("both", "ks", "fgsea"), annotator = c("internal", "lipidAnnotator"), run_chains = TRUE, min_rank = "E", group_cols = NULL, min_n = 3L, n_perm = 2000L, fgsea_nperm = 10000L, plots = TRUE, bubble_label = c("FDR", "DS", "NES", "n"), output = c("combined", "separate"), seed = 42L, verbose = TRUE )easyLSEA( data, lipid_col = "LipidName", fc_col = "logFC", pval_col = "P.Value", case_lbl = "Case", ref_lbl = "Reference", engine = c("both", "ks", "fgsea"), annotator = c("internal", "lipidAnnotator"), run_chains = TRUE, min_rank = "E", group_cols = NULL, min_n = 3L, n_perm = 2000L, fgsea_nperm = 10000L, plots = TRUE, bubble_label = c("FDR", "DS", "NES", "n"), output = c("combined", "separate"), seed = 42L, verbose = TRUE )
data |
A |
lipid_col |
Character(1). Name of the lipid identifier column.
Default: |
fc_col |
Character(1). Name of the log2 fold-change column.
Default: |
pval_col |
Character(1) or |
case_lbl |
Character(1). Label for the case group, used in output
tables and plot titles. Default: |
ref_lbl |
Character(1). Label for the reference group.
Default: |
engine |
Character(1). Enrichment engine: |
annotator |
Character(1). Lipid annotation method:
|
run_chains |
Logical(1). Whether to run fatty acid chain analysis
in addition to LSEA. Default: |
min_rank |
Character(1). Minimum confidence rank for chain analysis.
Ranks are ordered |
group_cols |
Character vector. Grouping columns to test in LSEA.
If |
min_n |
Integer(1). Minimum set size to test. Default: |
n_perm |
Integer(1). KS permutations for |
fgsea_nperm |
Integer(1). fgsea Monte Carlo permutations.
Default: |
plots |
Logical(1). Whether to generate ggplot2 objects.
Set to |
bubble_label |
Character vector. Which statistics to show next to
each bubble in the LSEA bubble plots. Any subset of |
output |
Character(1). Return format when both modules run:
|
seed |
Integer(1) or |
verbose |
Logical(1). Print progress messages. Default: |
An object of class easyLSEA_result: a named list with
five slots.
$metaNamed list: call, date, labels, engine, counts.
$lseaNamed list: results (data.frame with KS
and/or fgsea statistics), combined (merged table with
Convergence column).
$chainsNamed list: parsed and summary
from parse_lipid_chains, or NULL if
run_chains = FALSE.
$plotsNamed list of ggplot objects, or
NULL if plots = FALSE.
$inputNamed list: data (annotated input),
group_cols.
When output = "separate", returns
list(lsea = ..., chains = ...) instead.
annotate_lipids for standalone annotation,
run_lsea for the enrichment engine,
parse_lipid_chains for chain analysis,
plot_lsea, plot_chains,
export_lsea() to save results.
data("lipid_example", package = "easyLSEA") result <- easyLSEA( data = lipid_example, lipid_col = "LipidName", fc_col = "logFC", case_lbl = "NASH", ref_lbl = "Control", engine = "ks", plots = FALSE ) print(result) head(result$lsea$results)data("lipid_example", package = "easyLSEA") result <- easyLSEA( data = lipid_example, lipid_col = "LipidName", fc_col = "logFC", case_lbl = "NASH", ref_lbl = "Control", engine = "ks", plots = FALSE ) print(result) head(result$lsea$results)
Saves the contents of an easyLSEA result object to a
timestamped output folder. Supported formats: CSV tables, a multi-sheet
Excel workbook, PDF or PNG plots, and a standalone HTML report.
Any combination of formats can be requested in a single call.
export_lsea( result, dir, prefix = "easyLSEA", format = c("csv", "excel", "pdf"), overwrite = FALSE, plot_width = NULL, plot_height = NULL, plot_dpi = 300L, verbose = TRUE )export_lsea( result, dir, prefix = "easyLSEA", format = c("csv", "excel", "pdf"), overwrite = FALSE, plot_width = NULL, plot_height = NULL, plot_dpi = 300L, verbose = TRUE )
result |
An |
dir |
Character(1). Base directory where the output folder will be
created. Required: there is no default, so the function never writes
to the working directory, the package directory, or the user's home
filespace unless the caller explicitly provides a location. For
examples, tests, or throwaway output, pass |
prefix |
Character(1). Prefix for the output folder name. The folder
is named |
format |
Character vector. One or more of |
overwrite |
Logical(1). If |
plot_width |
Numeric(1) or |
plot_height |
Numeric(1) or |
plot_dpi |
Integer(1). Resolution for PNG output. Default: |
verbose |
Logical(1). Print progress messages. Default: |
<prefix>_<YYYY-MM-DD>/
tables/
lsea_results_ks.csv
lsea_results_fgsea.csv
lsea_combined.csv
chain_results.csv
chain_parsed.csv
chain_wide.csv
plots/
lsea/
bubble_ks.pdf
bubble_fgsea.pdf
chains/
tile/
tile_PC.pdf
tile_TG.pdf ...
trend/
trend_length_PC.pdf
trend_unsat_PC.pdf ...
results.xlsx
report.html
Excel export requires openxlsx (install.packages("openxlsx")).
HTML export requires rmarkdown and knitr.
Invisibly returns a named character vector of all file paths created. Useful for programmatic use or verification.
easyLSEA, run_lsea,
parse_lipid_chains
data("lipid_example", package = "easyLSEA") result <- suppressWarnings(suppressMessages(easyLSEA( data = lipid_example, engine = "ks", n_perm = 100L, plots = FALSE, verbose = FALSE ))) # Export CSV and PDF to a temporary folder paths <- export_lsea(result, dir = tempdir(), format = c("csv", "pdf")) pathsdata("lipid_example", package = "easyLSEA") result <- suppressWarnings(suppressMessages(easyLSEA( data = lipid_example, engine = "ks", n_perm = 100L, plots = FALSE, verbose = FALSE ))) # Export CSV and PDF to a temporary folder paths <- export_lsea(result, dir = tempdir(), format = c("csv", "pdf")) paths
A synthetic dataset of 200 lipid species simulating a case vs control lipidomics comparison, with known enrichment patterns built in: PC and PE species are enriched in the case group, TG species are depleted. Used in package examples and tests.
lipid_examplelipid_example
A data.frame with 200 rows and 6 columns:
Character. Lipid identifier in shorthand notation (e.g. "PC 36:2").
Character. Pre-assigned lipid class abbreviation.
Numeric. Log2 fold change (case / control).
Numeric. Raw p-value from simulated differential analysis.
Numeric. Benjamini-Hochberg adjusted p-value.
Integer. 1 if adj.P.Val < 0.05 and |logFC| > log2(1.25), 0 otherwise.
Simulated data. See data-raw/lipid_example.R for the
generation script. Seed: 2026.
Applies biology-aware chain parsing to each lipid in data,
routing each species to the appropriate parser based on its lipid class:
sn-2 (PC, PE, PE O), N-acyl (SM, Cer, HexCer, GlcCer, Hex2Cer, Hex3Cer),
long-format (TG, DG, PS, PG, PA, PI, CL),
single-chain (LPC, LPE, LPI, LPG, LPA, LPS, CAR, FFA, FA, CE), or excluded.
parse_lipid_chains( data, lipid_col = "LipidName", class_col = "LipidClass", shorthand_col = "Shorthand", rank_col = "Confidence_rank", min_rank = "E", cls_config = default_chain_config() )parse_lipid_chains( data, lipid_col = "LipidName", class_col = "LipidClass", shorthand_col = "Shorthand", rank_col = "Confidence_rank", min_rank = "E", cls_config = default_chain_config() )
data |
A |
lipid_col |
Character(1). Name of the lipid identifier column.
Default: |
class_col |
Character(1). Name of the lipid class column (must contain
abbreviated class names such as "PC", "TG", "SM"). Default:
|
shorthand_col |
Character(1) or |
rank_col |
Character(1) or |
min_rank |
Character(1). Minimum confidence rank to include in
analysis. Ranks are ordered |
cls_config |
Named list from |
A named list with two elements:
parsedLong-format data.frame with one row per
chain observation. Contains all columns from data plus chain
fields (analysis_chain_cl, analysis_chain_cs,
chain_type, etc.).
summaryPer-lipid parsing log data.frame with
columns LipidName, LipidClass,
Confidence_rank, status, and chain_type.
default_chain_config, plot_chains()
data("lipid_example", package = "easyLSEA") annotated <- annotate_lipids(lipid_example) chains <- parse_lipid_chains(annotated) head(chains$parsed) head(chains$summary)data("lipid_example", package = "easyLSEA") annotated <- annotate_lipids(lipid_example) chains <- parse_lipid_chains(annotated) head(chains$parsed) head(chains$summary)
Produces tile and trend plots for each lipid class with sufficient
chain observations. Returns a named list of ggplot
objects; does not write files. Use export_lsea() to save.
plot_chains( chains_result, case_lbl = "Case", ref_lbl = "Reference", fdr_thresh = 0.05, min_n_tile = 4L, min_n_trend = 5L, smooth_method = c("loess", "lm"), smooth_span = 0.75, smooth_weighted = TRUE, smooth_se = TRUE, show_points = TRUE, tile_label = c("both", "n", "sig", "none"), trend_test = c("spearman", "lm", "none"), trend_x_step_length = 2L, trend_x_step_unsat = 1L )plot_chains( chains_result, case_lbl = "Case", ref_lbl = "Reference", fdr_thresh = 0.05, min_n_tile = 4L, min_n_trend = 5L, smooth_method = c("loess", "lm"), smooth_span = 0.75, smooth_weighted = TRUE, smooth_se = TRUE, show_points = TRUE, tile_label = c("both", "n", "sig", "none"), trend_test = c("spearman", "lm", "none"), trend_x_step_length = 2L, trend_x_step_unsat = 1L )
chains_result |
Named list returned by |
case_lbl |
Character(1). Label for the case group. Default:
|
ref_lbl |
Character(1). Label for the reference group. Default:
|
fdr_thresh |
Numeric(1). FDR threshold to colour individual lipid
points in trend plots (red = FDR sig, grey = NS) and to label
significant counts in tile cells. Default: |
min_n_tile |
Integer(1). Minimum chain observations per class to
produce a tile plot. Default: |
min_n_trend |
Integer(1). Minimum chain observations per class to
produce trend plots. Default: |
smooth_method |
Character(1). Smoothing method for trend plots.
|
smooth_span |
Numeric(1). Span for loess smoothing (only used when
|
smooth_weighted |
Logical(1). If |
smooth_se |
Logical(1). Whether to display the 95\
interval ribbon around the smoothing curve. Default: |
show_points |
Logical(1). Whether to display individual lipid points
in trend plots, coloured by FDR significance. Default: |
tile_label |
Character(1). What to display inside each tile cell:
|
trend_test |
Character(1). Statistical test to annotate on trend plots.
|
trend_x_step_length |
Integer(1) or |
trend_x_step_unsat |
Integer(1) or |
Named list of ggplot objects with elements
tile_<CLASS>, trend_length_<CLASS>,
trend_unsat_<CLASS>.
parse_lipid_chains, export_lsea()
Produces a boxplot of logFC distributions for each lipid set, with jittered
individual lipid points, FDR/DS/NES labels for significant sets, and
red borders for significant sets. When engine = "both" (KS + fgsea),
fill colour encodes convergence (KS only, fgsea only, or KS+fgsea).
plot_distribution( data, lsea_result, group_col, fc_col = "logFC", case_lbl = "Case", ref_lbl = "Control", fdr_thresh = 0.05, min_n = 3L, sig_only = FALSE, label_angle = 0 )plot_distribution( data, lsea_result, group_col, fc_col = "logFC", case_lbl = "Case", ref_lbl = "Control", fdr_thresh = 0.05, min_n = 3L, sig_only = FALSE, label_angle = 0 )
data |
A |
lsea_result |
A named list as returned by |
group_col |
Character(1). Grouping column name
(e.g. |
fc_col |
Character(1). Column with log fold-change values.
Default: |
case_lbl |
Character(1). Label for the case group. Default:
|
ref_lbl |
Character(1). Label for the reference group. Default:
|
fdr_thresh |
Numeric(1). FDR threshold for significance.
Default: |
min_n |
Integer(1). Minimum number of lipids per set to include.
Default: |
sig_only |
Logical(1). If |
label_angle |
Numeric(1). Angle for FDR labels. |
A ggplot object, or NULL if no groups pass
min_n.
Produces bubble, barplot, and running sum plots from a run_lsea()
result. Returns a named list of ggplot objects.
plot_lsea( lsea_result, which = c("bubble_ks", "bubble_fgsea", "bubble_combined", "barplot", "running_sum"), fdr_thresh = 0.05, case_lbl = "Case", ref_lbl = "Reference", bubble_label = c("FDR", "DS", "NES", "n") )plot_lsea( lsea_result, which = c("bubble_ks", "bubble_fgsea", "bubble_combined", "barplot", "running_sum"), fdr_thresh = 0.05, case_lbl = "Case", ref_lbl = "Reference", bubble_label = c("FDR", "DS", "NES", "n") )
lsea_result |
Named list returned by |
which |
Character vector. Which plots to generate:
|
fdr_thresh |
Numeric(1). Significance threshold for highlighting.
Default: |
case_lbl |
Character(1). Case label for plot annotations. |
ref_lbl |
Character(1). Reference label for plot annotations. |
bubble_label |
Character vector. Which statistics to display next to
each bubble. Any subset of |
Named list of ggplot objects.
run_lsea, export_lsea()
Print method for easyLSEA_result
## S3 method for class 'easyLSEA_result' print(x, ...)## S3 method for class 'easyLSEA_result' print(x, ...)
x |
An |
... |
Ignored. |
Invisibly returns the input easyLSEA_result object
(x). Called for its side effect of printing a formatted
summary of the enrichment results to the console.
Runs KS-based LSEA, fgsea, or both for each grouping level in
group_cols and returns a tidy data.frame with enrichment
statistics.
run_lsea( data, group_cols = c("LipidClass", "LipidCategory_LMAPS", "LipidCategory_functional"), fc_col = "logFC", pval_col = "P.Value", lipid_id_col = NULL, case_lbl = "Case", ref_lbl = "Reference", engine = c("both", "ks", "fgsea"), fgsea_rank = c("pi_value", "logFC", "t_stat"), min_n = 3L, n_perm = 2000L, fgsea_nperm = 10000L, fgsea_eps = 0, seed = 42L, verbose = TRUE )run_lsea( data, group_cols = c("LipidClass", "LipidCategory_LMAPS", "LipidCategory_functional"), fc_col = "logFC", pval_col = "P.Value", lipid_id_col = NULL, case_lbl = "Case", ref_lbl = "Reference", engine = c("both", "ks", "fgsea"), fgsea_rank = c("pi_value", "logFC", "t_stat"), min_n = 3L, n_perm = 2000L, fgsea_nperm = 10000L, fgsea_eps = 0, seed = 42L, verbose = TRUE )
data |
A |
group_cols |
Character vector. Names of grouping columns to test.
Each column defines one level of analysis (e.g. class, LIPID MAPS
category, functional category). Default:
|
fc_col |
Character(1). Log2 fold-change column. Default:
|
pval_col |
Character(1) or |
lipid_id_col |
Character(1) or |
case_lbl |
Character(1). Case group label. Default: |
ref_lbl |
Character(1). Reference group label. Default:
|
engine |
Character(1). Enrichment engine: |
fgsea_rank |
Character(1). Rank metric for fgsea: |
min_n |
Integer(1). Minimum set size to test. Default: |
n_perm |
Integer(1). KS permutations for |
fgsea_nperm |
Integer(1). fgsea Monte Carlo permutations.
Default: |
fgsea_eps |
Numeric(1). fgsea epsilon (0 = reduce approximation
error). Default: |
seed |
Integer(1) or |
verbose |
Logical(1). Print progress messages. Default: |
A named list with elements:
ksdata.frame of KS results (or NULL if
engine = "fgsea").
fgseadata.frame of fgsea results (or NULL if
engine = "ks" or fgsea is not installed).
combineddata.frame merging both engines by Group and Level, including a Convergence column.
Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A (2021). Fast gene set enrichment analysis. bioRxiv. doi:10.1101/060012
Xiao Y, Hsiao TH, Suresh U, Chen HI, Wu X, Wolf SE, Chen Y (2014). A novel significance score for gene selection and ranking. Bioinformatics, 30(6), 801–807. doi:10.1093/bioinformatics/btr671
annotate_lipids(), plot_lsea,
export_lsea()
data("lipid_example", package = "easyLSEA") annotated <- annotate_lipids(lipid_example) result <- run_lsea( data = annotated, fc_col = "logFC", engine = "ks", case_lbl = "NASH", ref_lbl = "Control", n_perm = 100L ) head(result$ks)data("lipid_example", package = "easyLSEA") annotated <- annotate_lipids(lipid_example) result <- run_lsea( data = annotated, fc_col = "logFC", engine = "ks", case_lbl = "NASH", ref_lbl = "Control", n_perm = 100L ) head(result$ks)
Summary method for easyLSEA_result
## S3 method for class 'easyLSEA_result' summary(object, padj_cutoff = 0.05, ...)## S3 method for class 'easyLSEA_result' summary(object, padj_cutoff = 0.05, ...)
object |
An |
padj_cutoff |
Numeric(1). FDR threshold for significant sets.
Default: |
... |
Ignored. |
Invisibly returns the input easyLSEA_result object
(object). Called for its side effect of printing a summary
table of the significant lipid sets to the console.