Skip to contents

S4 wrapper for .filter_se() that filters low-abundance transcripts directly within a TSENATAnalysis object. This maintains the consistent S4 workflow pattern where functions accept and return analysis objects.

Usage

filter_analysis(
  analysis,
  min_tpm = 1,
  tpm_assay_name = NULL,
  min_samples = 5L,
  stringency = "medium",
  pair_col = NULL,
  min_tx_per_gene = 2L,
  min_isoform_abundance = NULL,
  assay_name = "counts",
  subset_n_genes = NULL,
  subset_genes = NULL,
  subset_n_samples = NULL,
  subset_samples = NULL,
  subset_select_by = c("variance", "mean", "random"),
  subset_seed = 42,
  subset_min_count = NULL,
  verbose = FALSE
)

Arguments

analysis

A TSENATAnalysis S4 object containing the SummarizedExperiment to be filtered.

min_tpm

Numeric TPM threshold (default 1.0). Keeps transcripts with TPM >= min_tpm in >= min_samples samples. Ignored if stringency is specified.

tpm_assay_name

Character; name of assay containing TPM data (default: NULL). If NULL, searches for TPM assay automatically.

min_samples

Numeric. Minimum number of samples in which a transcript must be present (default: 5). Ignored if stringency is specified.

stringency

Character. Filtering stringency level: 'soft' (permissive), 'medium' (balanced), or 'severe' (stringent). When specified, auto-estimates: min_samples, min_tpm, min_tx_per_gene, and min_isoform_abundance from data. Requires pair_col in colData for paired designs. User-provided values for any parameter override stringency defaults. Default: 'medium' (balanced filtering recommended for most analyses).

pair_col

Character; column name in colData containing pair IDs for paired designs. Default: NULL (auto-detect if needed).

min_tx_per_gene

Integer minimum number of transcripts per gene required (default 2L). Single-transcript genes are always kept. Ignored if stringency is specified; when specified, automatically adjusted based on stringency level.

min_isoform_abundance

Numeric in [0, 1]; minimum relative abundance threshold for isoforms within each gene. Implements Soneson et al. (2016) filtering. Default behavior: - If stringency is specified: uses stringency-based default (soft: 0. 01, medium: 0. 05, severe: 0. 15) - If stringency is NULL: uses default 0.05 (5 - If explicitly provided: overrides any stringency default Set to 0 or NULL (post-stringency processing) to skip isoform-level filtering.

assay_name

Character; name or index of the assay to use for filtering (default: 'counts'). Deprecated: use tpm_assay_name instead.

subset_n_genes

Integer; optional number of genes to retain after filtering. If provided, genes are selected based on subset_select_by. Default: NULL.

subset_genes

Character vector; optional specific genes to retain after filtering. Default: NULL.

subset_n_samples

Integer; optional number of samples to retain after filtering. If provided, samples are selected (balanced by condition if available). Default: NULL.

subset_samples

Character vector; optional specific samples to retain after filtering. Default: NULL.

subset_select_by

Character; gene selection method for subset_n_genes: 'variance' (highest variance), 'mean' (highest mean expression), or 'random'. Default: 'variance'.

subset_seed

Integer; random seed for reproducibility when subset_select_by = 'random'. Default: 42.

subset_min_count

Numeric; optional minimum count threshold applied during subsetting. Default: NULL.

verbose

Logical. If TRUE, print filtering progress and summary statistics (default: FALSE).

Value

Invisibly returns the modified analysis object with filtered SummarizedExperiment in the @se slot. The filtering operation modifies the analysis object in-place while maintaining all other slots (results, metadata, etc.).

Details

This wrapper applies .filter_se() to the SummarizedExperiment within the TSENATAnalysis object, optionally followed by subsetting parameters. The filtering and subsetting operations are applied in sequence:

1. Extracts the SE from analysis@se 2. Filters using .filter_se() with specified filtering parameters (default: 'medium' stringency) 3. If any subset parameters are provided, applies gene/sample selection to select specific genes and/or samples 4. Stores the filtered/subsetted SE back in analysis@se 5. Returns the modified analysis object invisibly

**Default Filtering (stringency = 'medium'):** By default, filtering applies balanced stringency: requires transcripts in >= 50 isoform abundance of 5 noise reduction with preservation of isoform diversity for reliable entropy calculations.

**Important:** Filtering should be performed BEFORE computing diversity, divergence, or LM interaction results. If called after analysis results have been computed, those results will be based on unfiltered data and may not align with the filtered SE dimensions.

See also

build_analysis for creating a new analysis object

Examples

# Create test analysis and filter
data(readcounts)
readcounts <- as.matrix(readcounts)
mode(readcounts) <- 'numeric'
metadata_df <- read.table(
  system.file('extdata', 'metadata.tsv', package = 'TSENAT'),
  header = TRUE, sep = '\t'
)
gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package =
'TSENAT')
config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition')
analysis <- build_analysis(readcounts = readcounts, tx2gene =
gff3_dataset, metadata = metadata_df, config = config,
  tpm = tpm, effective_length = effective_length)
analysis <- filter_analysis(analysis, stringency = 'medium')