Filter Low-Abundance Transcripts in a TSENATAnalysis Object

S4 wrapper for .filter_se() that filters low-abundance transcripts directly within a TSENATAnalysis object. This maintains the consistent S4 workflow pattern where functions accept and return analysis objects.

Usage

filter_analysis(
  analysis,
  min_tpm = 1,
  tpm_assay_name = NULL,
  min_samples = 5L,
  stringency = "medium",
  pair_col = NULL,
  min_tx_per_gene = 2L,
  min_isoform_abundance = NULL,
  assay_name = "counts",
  subset_n_genes = NULL,
  subset_genes = NULL,
  subset_n_samples = NULL,
  subset_samples = NULL,
  subset_select_by = c("variance", "mean", "random"),
  subset_seed = 42,
  subset_min_count = NULL,
  verbose = FALSE
)

Arguments

analysis: A TSENATAnalysis S4 object containing the SummarizedExperiment to be filtered.
min_tpm: Numeric TPM threshold (default 1.0). Keeps transcripts with TPM >= min_tpm in >= min_samples samples. Ignored if stringency is specified.
tpm_assay_name: Character; name of assay containing TPM data (default: NULL). If NULL, searches for TPM assay automatically.
min_samples: Numeric. Minimum number of samples in which a transcript must be present (default: 5). Ignored if stringency is specified.
stringency: Character. Filtering stringency level: 'soft' (permissive), 'medium' (balanced), or 'severe' (stringent). When specified, auto-estimates: min_samples, min_tpm, min_tx_per_gene, and min_isoform_abundance from data. Requires pair_col in colData for paired designs. User-provided values for any parameter override stringency defaults. Default: 'medium' (balanced filtering recommended for most analyses).
pair_col: Character; column name in colData containing pair IDs for paired designs. Default: NULL (auto-detect if needed).
min_tx_per_gene: Integer minimum number of transcripts per gene required (default 2L). Single-transcript genes are always kept. Ignored if stringency is specified; when specified, automatically adjusted based on stringency level.
min_isoform_abundance: Numeric in [0, 1]; minimum relative abundance threshold for isoforms within each gene. Implements Soneson et al. (2016) filtering. Default behavior: - If stringency is specified: uses stringency-based default (soft: 0. 01, medium: 0. 05, severe: 0. 15) - If stringency is NULL: uses default 0.05 (5 - If explicitly provided: overrides any stringency default Set to 0 or NULL (post-stringency processing) to skip isoform-level filtering.
assay_name: Character; name or index of the assay to use for filtering (default: 'counts'). Deprecated: use tpm_assay_name instead.
subset_n_genes: Integer; optional number of genes to retain after filtering. If provided, genes are selected based on subset_select_by. Default: NULL.
subset_genes: Character vector; optional specific genes to retain after filtering. Default: NULL.
subset_n_samples: Integer; optional number of samples to retain after filtering. If provided, samples are selected (balanced by condition if available). Default: NULL.
subset_samples: Character vector; optional specific samples to retain after filtering. Default: NULL.
subset_select_by: Character; gene selection method for subset_n_genes: 'variance' (highest variance), 'mean' (highest mean expression), or 'random'. Default: 'variance'.
subset_seed: Integer; random seed for reproducibility when subset_select_by = 'random'. Default: 42.
subset_min_count: Numeric; optional minimum count threshold applied during subsetting. Default: NULL.
verbose: Logical. If TRUE, print filtering progress and summary statistics (default: FALSE).

Value

Invisibly returns the modified analysis object with filtered SummarizedExperiment in the @se slot. The filtering operation modifies the analysis object in-place while maintaining all other slots (results, metadata, etc.).

Details

This wrapper applies .filter_se() to the SummarizedExperiment within the TSENATAnalysis object, optionally followed by subsetting parameters. The filtering and subsetting operations are applied in sequence:

1. Extracts the SE from analysis@se 2. Filters using .filter_se() with specified filtering parameters (default: 'medium' stringency) 3. If any subset parameters are provided, applies gene/sample selection to select specific genes and/or samples 4. Stores the filtered/subsetted SE back in analysis@se 5. Returns the modified analysis object invisibly

**Default Filtering (stringency = 'medium'):** By default, filtering applies balanced stringency: requires transcripts in >= 50 isoform abundance of 5 noise reduction with preservation of isoform diversity for reliable entropy calculations.

**Important:** Filtering should be performed BEFORE computing diversity, divergence, or SAIT interaction results. If called after analysis results have been computed, those results will be based on unfiltered data and may not align with the filtered SE dimensions.

Examples

# Create test analysis and filter
data(readcounts)
readcounts <- as.matrix(readcounts)
mode(readcounts) <- 'numeric'
metadata_df <- read.table(
  system.file('extdata', 'metadata.tsv', package = 'TSENAT'),
  header = TRUE, sep = '\t'
)
gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package =
'TSENAT')
config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition')
analysis <- build_analysis(readcounts = readcounts, tx2gene =
gff3_dataset, metadata = metadata_df, config = config,
  tpm = tpm, effective_length = effective_length)
analysis <- filter_analysis(analysis, stringency = 'medium')