CONCORD
concord.Concord
A contrastive learning framework for single-cell data analysis.
CONCORD performs dimensionality reduction, denoising, and batch correction in an unsupervised manner while preserving local and global topological structures.
Attributes:
Name | Type | Description |
---|---|---|
adata |
AnnData
|
Input AnnData object. |
save_dir |
Path
|
Directory to save outputs and logs. |
config |
Config
|
Configuration object storing hyperparameters. |
model |
ConcordModel
|
The main contrastive learning model. |
trainer |
Trainer
|
Handles model training. |
loader |
DataLoaderManager or ChunkLoader
|
Data loading utilities. |
__init__(adata, save_dir='save/', inplace=True, verbose=False, **kwargs)
Initializes the Concord framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Input single-cell data in AnnData format. |
required |
save_dir
|
str
|
Directory to save model outputs. Defaults to 'save/'. |
'save/'
|
inplace
|
bool
|
If True, modifies |
True
|
verbose
|
bool
|
Enable verbose logging. Defaults to False. |
False
|
**kwargs
|
Additional configuration parameters. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
get_default_params()
Returns the default hyperparameters used in CONCORD.
Returns:
Name | Type | Description |
---|---|---|
dict |
A dictionary containing default configuration values. |
setup_config(**kwargs)
Sets up the configuration for training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
Key-value pairs to override default parameters. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid parameter is provided. |
init_model()
Initializes the CONCORD model and loads a pre-trained model if specified.
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the specified pre-trained model file is missing. |
init_trainer()
Initializes the model trainer, setting up loss functions, optimizer, and learning rate scheduler.
init_dataloader(input_layer_key='X_log1p', preprocess=True, train_frac=1.0, use_sampler=True)
Initializes the data loader for training and evaluation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_layer_key
|
str
|
Key in |
'X_log1p'
|
preprocess
|
bool
|
Whether to apply preprocessing. Defaults to True. |
True
|
train_frac
|
float
|
Fraction of data to use for training. Defaults to 1.0. |
1.0
|
use_sampler
|
bool
|
Whether to use the probabilistic sampler. Defaults to True. |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
train(save_model=True, patience=2)
Trains the model on the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_model
|
bool
|
Whether to save the trained model. Defaults to True. |
True
|
patience
|
int
|
Number of epochs to wait for improvement before early stopping. Defaults to 2. |
2
|
predict(loader, sort_by_indices=False, return_decoded=False, decoder_domain=None, return_latent=False, return_class=True, return_class_prob=True)
Runs inference on a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loader
|
DataLoader or list
|
Data loader or chunked loader for batch processing. |
required |
sort_by_indices
|
bool
|
Whether to return results in original cell order. Defaults to False. |
False
|
return_decoded
|
bool
|
Whether to return decoded gene expression. Defaults to False. |
False
|
decoder_domain
|
str
|
Specifies a domain for decoding. Defaults to None. |
None
|
return_latent
|
bool
|
Whether to return latent variables. Defaults to False. |
False
|
return_class
|
bool
|
Whether to return predicted class labels. Defaults to True. |
True
|
return_class_prob
|
bool
|
Whether to return class probabilities. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
tuple |
Encoded embeddings, decoded matrix (if requested), class predictions, class probabilities, true labels, and latent variables. |
encode_adata(input_layer_key='X_log1p', output_key='Concord', preprocess=True, return_decoded=False, decoder_domain=None, return_latent=False, return_class=True, return_class_prob=True, save_model=True)
Encodes an AnnData object using the CONCORD model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_layer_key
|
str
|
Input layer key. Defaults to 'X_log1p'. |
'X_log1p'
|
output_key
|
str
|
Output key for storing results in AnnData. Defaults to 'Concord'. |
'Concord'
|
preprocess
|
bool
|
Whether to apply preprocessing. Defaults to True. |
True
|
return_decoded
|
bool
|
Whether to return decoded gene expression. Defaults to False. |
False
|
decoder_domain
|
str
|
Specifies domain for decoding. Defaults to None. |
None
|
return_latent
|
bool
|
Whether to return latent variables. Defaults to False. |
False
|
return_class
|
bool
|
Whether to return predicted class labels. Defaults to True. |
True
|
return_class_prob
|
bool
|
Whether to return class probabilities. Defaults to True. |
True
|
save_model
|
bool
|
Whether to save the model after training. Defaults to True. |
True
|
get_domain_embeddings()
Retrieves domain embeddings from the trained model.
Returns:
Type | Description |
---|---|
pd.DataFrame: A dataframe containing domain embeddings. |
get_covariate_embeddings()
Retrieves covariate embeddings from the trained model.
Returns:
Name | Type | Description |
---|---|---|
dict |
A dictionary of DataFrames, each containing embeddings for a covariate. |
save_model(model, save_path)
Saves the trained model to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Module
|
The trained model. |
required |
save_path
|
str or Path
|
Path to save the model file. |
required |
Returns:
Type | Description |
---|---|
None |