Skip to content

CONCORD

concord.Concord

A contrastive learning framework for single-cell data analysis.

CONCORD performs dimensionality reduction, denoising, and batch correction in an unsupervised manner while preserving local and global topological structures.

Attributes:

Name Type Description
adata AnnData

Input AnnData object.

save_dir Path

Directory to save outputs and logs.

config Config

Configuration object storing hyperparameters.

model ConcordModel

The main contrastive learning model.

trainer Trainer

Handles model training.

loader DataLoaderManager or ChunkLoader

Data loading utilities.

__init__(adata, save_dir='save/', inplace=True, verbose=False, **kwargs)

Initializes the Concord framework.

Parameters:

Name Type Description Default
adata AnnData

Input single-cell data in AnnData format.

required
save_dir str

Directory to save model outputs. Defaults to 'save/'.

'save/'
inplace bool

If True, modifies adata in place. Defaults to True.

True
verbose bool

Enable verbose logging. Defaults to False.

False
**kwargs

Additional configuration parameters.

{}

Raises:

Type Description
ValueError

If inplace is set to True on a backed AnnData object.

get_default_params()

Returns the default hyperparameters used in CONCORD.

Returns:

Name Type Description
dict

A dictionary containing default configuration values.

setup_config(**kwargs)

Sets up the configuration for training.

Parameters:

Name Type Description Default
**kwargs

Key-value pairs to override default parameters.

{}

Raises:

Type Description
ValueError

If an invalid parameter is provided.

init_model()

Initializes the CONCORD model and loads a pre-trained model if specified.

Raises:

Type Description
FileNotFoundError

If the specified pre-trained model file is missing.

init_trainer()

Initializes the model trainer, setting up loss functions, optimizer, and learning rate scheduler.

init_dataloader(input_layer_key='X_log1p', preprocess=True, train_frac=1.0, use_sampler=True)

Initializes the data loader for training and evaluation.

Parameters:

Name Type Description Default
input_layer_key str

Key in adata.layers to use as input. Defaults to 'X_log1p'.

'X_log1p'
preprocess bool

Whether to apply preprocessing. Defaults to True.

True
train_frac float

Fraction of data to use for training. Defaults to 1.0.

1.0
use_sampler bool

Whether to use the probabilistic sampler. Defaults to True.

True

Raises:

Type Description
ValueError

If train_frac < 1.0 and contrastive loss mode is 'nn'.

train(save_model=True, patience=2)

Trains the model on the dataset.

Parameters:

Name Type Description Default
save_model bool

Whether to save the trained model. Defaults to True.

True
patience int

Number of epochs to wait for improvement before early stopping. Defaults to 2.

2

predict(loader, sort_by_indices=False, return_decoded=False, decoder_domain=None, return_latent=False, return_class=True, return_class_prob=True)

Runs inference on a dataset.

Parameters:

Name Type Description Default
loader DataLoader or list

Data loader or chunked loader for batch processing.

required
sort_by_indices bool

Whether to return results in original cell order. Defaults to False.

False
return_decoded bool

Whether to return decoded gene expression. Defaults to False.

False
decoder_domain str

Specifies a domain for decoding. Defaults to None.

None
return_latent bool

Whether to return latent variables. Defaults to False.

False
return_class bool

Whether to return predicted class labels. Defaults to True.

True
return_class_prob bool

Whether to return class probabilities. Defaults to True.

True

Returns:

Name Type Description
tuple

Encoded embeddings, decoded matrix (if requested), class predictions, class probabilities, true labels, and latent variables.

encode_adata(input_layer_key='X_log1p', output_key='Concord', preprocess=True, return_decoded=False, decoder_domain=None, return_latent=False, return_class=True, return_class_prob=True, save_model=True)

Encodes an AnnData object using the CONCORD model.

Parameters:

Name Type Description Default
input_layer_key str

Input layer key. Defaults to 'X_log1p'.

'X_log1p'
output_key str

Output key for storing results in AnnData. Defaults to 'Concord'.

'Concord'
preprocess bool

Whether to apply preprocessing. Defaults to True.

True
return_decoded bool

Whether to return decoded gene expression. Defaults to False.

False
decoder_domain str

Specifies domain for decoding. Defaults to None.

None
return_latent bool

Whether to return latent variables. Defaults to False.

False
return_class bool

Whether to return predicted class labels. Defaults to True.

True
return_class_prob bool

Whether to return class probabilities. Defaults to True.

True
save_model bool

Whether to save the model after training. Defaults to True.

True

get_domain_embeddings()

Retrieves domain embeddings from the trained model.

Returns:

Type Description

pd.DataFrame: A dataframe containing domain embeddings.

get_covariate_embeddings()

Retrieves covariate embeddings from the trained model.

Returns:

Name Type Description
dict

A dictionary of DataFrames, each containing embeddings for a covariate.

save_model(model, save_path)

Saves the trained model to a file.

Parameters:

Name Type Description Default
model Module

The trained model.

required
save_path str or Path

Path to save the model file.

required

Returns:

Type Description

None