Jackknife isoform switching analysis on TSENATAnalysis object
Source:R/s4_functions_jis.R
calculate_jis.RdWrapper around .calculate_jis() that manages TSENATAnalysis object. Identifies transcripts with significant isoform switching patterns using jackknife resampling across samples to detect influential isoforms.
Usage
calculate_jis(
analysis,
condition_col = NULL,
subject_col = NULL,
gene_col = NULL,
isoform_col = NULL,
q = c(0, 0.5, 1, 1.5, 2),
norm = NULL,
log_base = NULL,
threshold = 90,
nboot = 1000,
pseudocount = NULL,
lm_results = NULL,
lm_p_threshold = 0.05,
use_lm_fdr = TRUE,
output_file = NULL,
verbose = FALSE,
...
)Arguments
- analysis
TSENATAnalysisobject containing:@se: SummarizedExperiment with count data@config: Configuration metadata
- condition_col
character. Column name in colData(se) specifying group assignments (default: 'sample_type'). If NULL, attempts auto-detection.- subject_col
character. Optional column for paired/repeated measures design. If provided, enables paired analysis. Default: NULL (unpaired).- gene_col
character. Column name in rowData(se) or metadata identifying genes. Default: 'gene'.- isoform_col
character. Column name in rowData(se) or metadata identifying isoforms/transcripts. Default: 'transcript' or 'isoform'.- q
numeric. Tsallis entropy parameter(s) to analyze. Can be single value or vector for multi-q analysis (default: c(0, 0.5, 1, 1.5, 2)). If NULL, uses @config$q.- norm
logical. Whether to use normalized diversity values (default: TRUE).- log_base
numeric. Logarithm base for entropy calculations. Default: NULL (uses e).- threshold
numeric. Percentile threshold for detecting transcript switching (default: 90). Transcripts with delta_influence >= threshold percentile are classified as 'switching'.- nboot
integer. Number of bootstrap resamples for confidence intervals (default: 1000).- pseudocount
numeric. Pseudocount value for count regularization. Default: NULL (uses @config$pseudocount or 0).- lm_results
data. frame. Optional LM interaction results to filter genes. If provided, only genes in lm_results are analyzed.- lm_p_threshold
numeric. P-value threshold for filtering genes from lm_results (default: 0.05).- use_lm_fdr
logical. If TRUE, uses adjusted p-values from lm_results (default: TRUE).- output_file
characterorNULL. Optional file path to save results. Supported formats: .tsv, .csv, .txt (for tables), or .rds (for S4 objects). When text format is specified, generates TWO files:output_file: Gene-level summary (one row per gene with switching statistics)output_file_transcripts. ext: Transcript-level details (one row per transcript with p-values and FDR)
For .rds format, saves only the full analysis object. Default: NULL (no file output).
- verbose
logical. Print progress messages (default: FALSE).- ...
Additional arguments for future extensibility.
Value
TSENATAnalysis object with
jackknife results stored in @jackknife_results
slot. Results are keyed by q-value (e.g., 'q_1.00'). For multi-q
analysis, multiple
calls will accumulate results in the slot.
The analysis object is returned visibly to support method chaining:
analysis <- calculate_jis(analysis, q = 0.5)
analysis <- calculate_jis(analysis, q = 1.0)
Details
Key Features:
Jackknife resampling: Robust outlier detection across all samples
Delta-influence metric: Measures how much each isoform drives phenotype
Confidence intervals: Bootstrap-based uncertainty quantification
Multi-q analysis: Tests across full q-spectrum simultaneously
LM filtering: Optional restriction to genes with significant interactions
Paired designs: Supports repeated measures/longitudinal data
**Parameter Resolution from Config**
The following parameters are resolved using a three-level priority system:
User-provided argument (if not NULL)
Value from
analysis@config(if key exists)Function default value
Affected parameters:
q: Multi-q vector c(0, 0.5, 1, 1.5, 2) if not provided, or@config$qif availablenboot: Uses@config$nbootif available, else 1000threshold: Uses@config$thresholdif available, else 90lm_p_threshold: Uses@config$lm_p_thresholdif available, else 0.05
This allows setting defaults once in the config and reusing across multiple analyses.
**Mathematical Background:** Delta-influence measures how much removing each sample changes entropy:
Delta = H_q(leave-one-out) - H_q(original)High |Delta| for specific isoforms indicates those isoforms drive differences. Identifies 'outlier samples' where isoforms contribute unusually much.
**Example Use Case:**
Sample shows high Delta for isoform X → X has outsized importance in that sample
Classifying transcripts as 'switching' if top percentile (e.g., 90th) Delta
Reveals condition-specific isoforms crucial for phenotype determination.
**Automatic Setup:**
This wrapper automatically:
1. Extracts SummarizedExperiment from @se slot
2. Detects condition_col, gene_col, isoform_col from colData/rowData or
@config
3. Calls .calculate_jis() with extracted parameters
**Parameter Auto-Detection:**
condition_col: Uses explicit parameter, then @config, then 'sample_type'gene_col: Uses explicit parameter, then looks for 'gene' or 'Gene'isoform_col: Uses explicit parameter, then looks for 'transcript', 'isoform', or 'Isoform'
See also
jackknife_isoform_switching for
the underlying implementation,
TSENATAnalysis for object structure.
Examples
data(readcounts)
metadata_df <- read.table(system.file('extdata', 'metadata.tsv', package
= 'TSENAT'),
header = TRUE, sep = '\t')
gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT')
config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition')
analysis <- build_analysis(readcounts = readcounts, tx2gene =
gff3_file, metadata = metadata_df, config = config,
tpm = tpm,
effective_length = effective_length)
analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes
= 20, subset_n_samples = 8)
analysis <- calculate_diversity(analysis, q = 1)