API¶
SCIPR Model¶
-
class
scipr.
SCIPR
(match_algo, transform_algo, n_iter=5, input_normalization='l2')[source]¶ Single Cell Iterative Point set Registration (SCIPR).
Alignment of scRNA-seq data batches using an adaptation of the Iterative Closest Points (ICP) algorithm. SCIPR’s core steps are matching the points and learning the transform function, and what strategy to use for each of these are specified by the user.
Parameters: - match_algo (scipr.matching.Match) – Which matching strategy to use, an instance of a
Match
. - transform_algo (scipr.transform.Transformer) – Which transformation strategy to use, an instance of a
Transformer
. - n_iter (int) – Number of steps of SCIPR to run. Each step is a matching phase, followed by updating the transformation function.
- input_normalization ({'l2', 'std', 'log'}) –
- Which input normalization to apply to data before aligning with SCIPR.
- ’l2’ : Scale each cell’s count vector to have unit norm (vector length).
- ’std’ : Scale each gene to have zero mean and unit variance.
- ’log’ : Apply Seurat-style log normalization to each cell’s count
vector (as in Seurat’s
normalize
function).
Methods
fit
(A, B[, tensorboard, tensorboard_dir])Fit the model to align to a reference batch. fit_adata
(adata, batch_key, source, target)Fit the model to align to a reference batch, taking AnnData input. transform
(A)Apply alignment to a batch of cells. transform_adata
(adata, batch_key, batch[, …])Apply alignment to a batch of cells, taking AnnData input. -
fit
(A, B, tensorboard=False, tensorboard_dir=None)[source]¶ Fit the model to align to a reference batch.
Parameters: - A (numpy.ndarray) – The “source” batch of cells to align. Dimensions are (cellsA, genes).
- B (numpy.ndarray) – The “target” (or “reference”) batch data to align to.
A
is aligned ontoB
, whereB
is unchanged, and remains a stationary “reference”. Dimensions are (cellsB, genes). - tensorboard (bool) – If True, enable tensorboard logging of SCIPR algorithm metrics.
- tensorboard_dir (None or str) – If None, will use an automatically generated folder to store tensorboard event files. If specified, will place event files in the specified directory (creates it if it doesn’t already exist).
-
fit_adata
(adata, batch_key, source, target, tensorboard=False, tensorboard_dir=None)[source]¶ Fit the model to align to a reference batch, taking AnnData input.
Parameters: - adata (anndata.AnnData) – Annotated data object containing the two “source” and “target” cell batches. Dimensions are (cells, genes).
- batch_key (str) – The name of the column in adata.obs which contains the batch annotations.
- source,target (str) – The batch annotation values of the “source” and “target” batches.
- tensorboard (bool) – If True, enable tensorboard logging of SCIPR algorithm metrics.
- tensorboard_dir (None or str) – If None, will use an automatically generated folder to store tensorboard event files. If specified, will place event files in the specified directory (creates it if it doesn’t already exist).
See also
-
transform
(A)[source]¶ Apply alignment to a batch of cells.
Cells are transformed to be aligned to the same “reference” batch from the
fit()
method.Parameters: A (numpy.ndarray) – The batch of cells to align. Dimensions are (cellsA, genes). Returns: The aligned batch of cells, same shape as input A
.Return type: numpy.ndarray Raises: RuntimeError
– If this method is called beforefit()
method.
-
transform_adata
(adata, batch_key, batch, inplace=False)[source]¶ Apply alignment to a batch of cells, taking AnnData input.
Cells are transformed to be aligned to the same “reference” batch from the
fit()
method.Parameters: - adata (anndata.AnnData) – Annotated data object containing the batch of cells to align. Dimensions are (cells, genes).
- batch_key (str) – The name of the column in adata.obs which contains the batch annotations.
- batch (str) – The batch annotation value of the cells to align.
- inplace (bool) –
If True, then replace the cells (observations) in adata.X which have the batch annotation with the new transformed values. Beware that these transformed cells will have been first normalized then transformed, so they are transformed to align to a normalized representation (i.e. using the normalizatio specified in the constructor). Whereas the other cells in adata may not be in this normalized representation.
Otherwise, return the tuple (transformed numpy.ndarray, row indexer into adata of the transformed cells).
Returns: - transformed (numpy.ndarray) – The aligned batch of cells, same shape as input batch. Only provided if inplace is False.
- indexer (numpy.ndarray) – Boolean row indexer into adata of the transformed cells. Only provided if inplace is False.
Raises: RuntimeError
– If this method is called beforefit()
method.See also
- match_algo (scipr.matching.Match) – Which matching strategy to use, an instance of a
Matching Functions¶
Closest¶
Greedy¶
-
class
scipr.matching.
Greedy
(alpha=0.5, beta=2)[source]¶ Use the greedy matching algorithm from the SCIPR paper.
First all pairings between the two sets are sorted by distance from smallest to largest, and then selecting pairings proceeds down the list. The selection of pairs stops when we have assigned
alpha
fraction of the source set to pairs. When we are considering a pair, if thetarget
point in that pair has already participated inbeta
pairs, we do not pick it.Parameters: - alpha (float) – The
alpha
hyperparameter in the above algorithm. - beta (int) – The
beta
hyperparameter in the above algorithm.
- alpha (float) – The
MNN¶
-
class
scipr.matching.
MNN
(k=10)[source]¶ Use the Mutual Nearest Neighbors strategy to assign pairs.
For any given cell
a
in a setA
, if a cellb
in a setB
is in the set of nearest neighbors ofa
amongB
, anda
is in the set of nearest neighbors ofb
among A, then pair(a, b)
is added to the set of pairs.Parameters: k (int) – The number of neighbors of each cell to consider when finding mutual nearest neighbors.
Hungarian¶
-
class
scipr.matching.
Hungarian
(frac_to_match=1.0)[source]¶ Use the Hungarian algorithm for the assignment problem to assign pairs.
The “Hungarian” method is an efficient algorithm to solve the assignment problem, where finding pairs between two sets
A
andB
is treated as a bipartite matching problem.Parameters: frac_to_match (float) – If not 1.0, then this is the fraction of the matches to keep. All of the matches are sorted from smallest distance to largest, and then only the top n
are returned, wheren
is thefrac_to_match
portion of the smaller of the two sets of cells.
Transformation Functions¶
Rigid¶
Affine¶
-
class
scipr.transform.
Affine
(optim='adam', lr=0.001, epochs=1000)[source]¶ Use an affine transformation function to align the pairs of cells.
The affine function is of the form
f(x) = Wx + b
, whereW
andb
are the learnable weights.W
has the shape (genes, genes) andb
(the bias term) has shape (genes,).Parameters: - optim ({'adam', 'sgd'}) – Which torch optimizer to use.
- lr (float) – Learning rate to use in gradient descent.
- epochs (int) – Number of iterations to run gradient descent.
StackedAutoEncoder¶
-
class
scipr.transform.
StackedAutoEncoder
(hidden_sizes=[64], act='leaky_relu', last_layer_act=None, optim='adam', lr=0.001, epochs=1000)[source]¶ Use multiple autoencoders to align the pairs of cells.
Fit a “stack” of autoencoders, the output of one feeding in as the input into the next one (thereby “composing” them). At each step of SCIPR, the next autoencoder is fitted, and then added onto the overall stack. Since these are autoencoders, the output dimensions of each are the same as the input, so the dimensions are maintained.
Will automatically search for and use a GPU if available.
Parameters: - hidden_sizes (list of int) – The sizes (widths) of the hidden layers of each autoencoder. This is
one side of the “funnel” of the autoencoder architecture, and the other
side is built to be symmetric (same as
hidden_sizes
but in reverse). - act ({'leaky_relu', 'relu', 'sigmoid', 'tanh'}) – Which non-linear activation function to use for the autoencoders.
- last_layer_act ({None, 'leaky_relu', 'relu', 'sigmoid', 'tanh'}) – Which non-linear activation function to use for the final layer
(output) of the autoencoders.
None
means no non-linearity. See Warnings below. - optim ({'adam', 'sgd'}) – Which torch optimizer to use.
- lr (float) – Learning rate to use in gradient descent.
- epochs (int) – Number of iterations to run gradient descent.
Warning
Be aware that your choice of input normalization to SCIPR might have implications for your choice of the
last_layer_act
parameter. For exmaple, If your input normalization scales your input features to [0, 1.0], then you may want your final layer to use a sigmoid activation. Or if your input normalization allows for input features to be (-infty, +infty), then having None as the last layer’s activation might be best. This is just something to keep in mind, ultimately you may choose what performs the best alignment for you.- hidden_sizes (list of int) – The sizes (widths) of the hidden layers of each autoencoder. This is
one side of the “funnel” of the autoencoder architecture, and the other
side is built to be symmetric (same as