Utilities
utils
can be replaced by ul
, e.g., concord.utils.list_adata_files
can be concord.ul.list_adata_files
concord.utils.select_features(adata, n_top_features=2000, flavor='seurat_v3', filter_gene_by_counts=False, normalize=False, log1p=False, grouping='cluster', emb_key='X_pca', k=512, knn_samples=100, gini_cut_qt=None, save_path=None, figsize=(10, 3), subsample_frac=1.0, random_state=0)
Selects top informative features from an AnnData object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
AnnData object containing gene expression data. |
required |
n_top_features
|
int
|
Number of top features to select. Defaults to 2000. |
2000
|
flavor
|
str
|
Feature selection method. Options: - 'seurat_v3': Highly variable gene selection based on Seurat v3. - 'iff': Uses Informative Feature Filtering (IFF) method. Defaults to "seurat_v3". |
'seurat_v3'
|
filter_gene_by_counts
|
Union[int, bool]
|
Minimum count threshold for feature filtering. Defaults to False. |
False
|
normalize
|
bool
|
Whether to normalize the data before feature selection. Defaults to False. |
False
|
log1p
|
bool
|
Whether to apply log1p transformation before feature selection. Defaults to False. |
False
|
grouping
|
Union[str, Series, List[str]]
|
Clustering/grouping strategy for IFF method. Defaults to 'cluster'. |
'cluster'
|
emb_key
|
str
|
Embedding key in |
'X_pca'
|
k
|
int
|
Number of neighbors for k-NN if |
512
|
knn_samples
|
int
|
Number of k-NN samples if |
100
|
gini_cut_qt
|
float
|
Quantile threshold for selecting features by Gini coefficient in IFF. Defaults to None. |
None
|
save_path
|
Optional[Union[str, Path]]
|
Path to save Gini coefficient plot. Defaults to None. |
None
|
figsize
|
tuple
|
Size of Gini coefficient plot. Defaults to (10, 3). |
(10, 3)
|
subsample_frac
|
float
|
Fraction of data to subsample for feature selection. Defaults to 1.0. |
1.0
|
random_state
|
int
|
Random seed for reproducibility. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: List of selected feature names. |
concord.utils.generate_synthetic_doublets(adata, doublet_synth_ratio, seed, batch_key, droplet_type_key, mean=0.5, var=0.1, clip_range=(0.2, 0.8), plot_histogram=True, combine_with_original=False)
Generate synthetic doublets from singlet data in an AnnData object within each batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData object containing the singlet data (with maybe unclassified doublets) |
required | |
doublet_synth_ratio
|
float, the ratio of synthetic doublets to true singlets |
required | |
seed
|
int, random seed for reproducibility |
required | |
batch_key
|
str, the key in .obs indicating batch information |
required | |
droplet_type_key
|
str, the key in .obs indicating droplet type |
required | |
mean
|
float, mean of the normal distribution for generating fractions (default: 0.5) |
0.5
|
|
var
|
float, variance of the normal distribution for generating fractions (default: 0.1) |
0.1
|
|
clip_range
|
tuple, range to clip the generated fractions (default: (0.2, 0.8)) |
(0.2, 0.8)
|
|
plot_histogram
|
bool, whether to plot the histogram of synthetic doublet fractions |
True
|
Returns:
Name | Type | Description |
---|---|---|
adata_synthetic_doublets |
AnnData object containing the synthetic doublets |
concord.utils.list_adata_files(folder_path, substring=None, extension='*.h5ad')
List all .h5ad
files in a directory (recursively) that match a given substring.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder_path
|
str
Path to the folder where |
required | |
substring
|
str, optional A substring to filter filenames (default is None, meaning no filtering). |
None
|
|
extension
|
str, optional File extension to search for (default is "*.h5ad"). |
'*.h5ad'
|
Returns:
Type | Description |
---|---|
list A list of file paths matching the criteria. |
concord.utils.read_and_concatenate_adata(adata_files, merge='unique', add_dataset_col=False, dataset_col_name='dataset', output_file=None)
Read and concatenate multiple AnnData .h5ad
files into a single AnnData object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata_files
|
list
List of file paths to |
required | |
merge
|
str, optional How to handle conflicting columns, e.g., 'unique' (default), 'first', etc. |
'unique'
|
|
add_dataset_col
|
bool, optional
Whether to add a new column in |
False
|
|
dataset_col_name
|
str, optional Name of the new column storing dataset names. |
'dataset'
|
|
output_file
|
str, optional Path to save the concatenated AnnData object. If None, the object is not saved. |
None
|
Returns:
Type | Description |
---|---|
ad.AnnData The concatenated AnnData object. |
concord.utils.filter_and_copy_attributes(adata_target, adata_source)
Filter adata_target
to match the cells in adata_source
, then copy .obs
and .obsm
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata_target
|
ad.AnnData The AnnData object to be filtered. |
required | |
adata_source
|
ad.AnnData The reference AnnData object containing the desired cells and attributes. |
required |
Returns:
Type | Description |
---|---|
ad.AnnData
The filtered AnnData object with updated |
concord.utils.ensure_categorical(adata, obs_key=None, drop_unused=True)
Convert an .obs
column to categorical dtype.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
ad.AnnData The AnnData object. |
required | |
obs_key
|
str
Column in |
None
|
|
drop_unused
|
bool, optional Whether to remove unused categories (default is True). |
True
|
concord.utils.save_obsm_to_hdf5(adata, filename)
Save the .obsm
attribute of an AnnData object to an HDF5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
anndata.AnnData
The AnnData object containing the |
required | |
filename
|
str
The path to the HDF5 file where |
required |
Returns:
Type | Description |
---|---|
None
Saves |
concord.utils.load_obsm_from_hdf5(filename)
Load the .obsm
attribute from an HDF5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename
|
str
Path to the HDF5 file containing |
required |
Returns:
Type | Description |
---|---|
dict
A dictionary where keys are |
concord.utils.subset_adata_to_obsm_indices(adata, obsm)
Subset an AnnData object to match the indices present in .obsm
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
anndata.AnnData The original AnnData object. |
required | |
obsm
|
dict
A dictionary containing |
required |
Returns:
Type | Description |
---|---|
anndata.AnnData
A subsetted AnnData object that contains only the indices available in |
concord.utils.anndata_to_viscello(adata, output_dir, project_name='MyProject', organism='hsa', clist_only=False)
Converts an AnnData object to a VisCello project directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
AnnData object containing single-cell data. |
required |
output_dir
|
str
|
Directory where the VisCello project will be created. |
required |
project_name
|
str
|
Name of the project. Defaults to "MyProject". |
'MyProject'
|
organism
|
str
|
Organism code (e.g., 'hsa' for human). Defaults to 'hsa'. |
'hsa'
|
clist_only
|
bool
|
Whether to generate only the clist file. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
None |
Side Effects
- Creates a directory with the necessary files for VisCello.
- Saves
eset.rds
(ExpressionSet),config.yml
, andclist.rds
.
concord.utils.update_clist_with_subsets(global_adata, adata_subsets, viscello_dir, cluster_key=None)
Updates an existing VisCello clist with new subsets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
global_adata
|
AnnData
|
The full AnnData object. |
required |
adata_subsets
|
dict
|
Dictionary mapping subset names to AnnData objects. |
required |
viscello_dir
|
str
|
Path to the existing VisCello directory. |
required |
cluster_key
|
str
|
Key in |
None
|
Returns:
Type | Description |
---|---|
None |
Side Effects
- Reads the existing
clist.rds
file fromviscello_dir
. - Adds new subsets as
Cello
objects to the clist. - Saves the updated
clist.rds
file inviscello_dir
.