API

SCIPR Model

class scipr.SCIPR(match_algo, transform_algo, n_iter=5, input_normalization='l2')[source]

Single Cell Iterative Point set Registration (SCIPR).

Alignment of scRNA-seq data batches using an adaptation of the Iterative Closest Points (ICP) algorithm. SCIPR’s core steps are matching the points and learning the transform function, and what strategy to use for each of these are specified by the user.

Parameters:
  • match_algo (scipr.matching.Match) – Which matching strategy to use, an instance of a Match.
  • transform_algo (scipr.transform.Transformer) – Which transformation strategy to use, an instance of a Transformer.
  • n_iter (int) – Number of steps of SCIPR to run. Each step is a matching phase, followed by updating the transformation function.
  • input_normalization ({'l2', 'std', 'log'}) –
    Which input normalization to apply to data before aligning with SCIPR.
    • ’l2’ : Scale each cell’s count vector to have unit norm (vector length).
    • ’std’ : Scale each gene to have zero mean and unit variance.
    • ’log’ : Apply Seurat-style log normalization to each cell’s count vector (as in Seurat’s normalize function).

Methods

fit(A, B[, tensorboard, tensorboard_dir]) Fit the model to align to a reference batch.
fit_adata(adata, batch_key, source, target) Fit the model to align to a reference batch, taking AnnData input.
transform(A) Apply alignment to a batch of cells.
transform_adata(adata, batch_key, batch[, …]) Apply alignment to a batch of cells, taking AnnData input.
fit(A, B, tensorboard=False, tensorboard_dir=None)[source]

Fit the model to align to a reference batch.

Parameters:
  • A (numpy.ndarray) – The “source” batch of cells to align. Dimensions are (cellsA, genes).
  • B (numpy.ndarray) – The “target” (or “reference”) batch data to align to. A is aligned onto B, where B is unchanged, and remains a stationary “reference”. Dimensions are (cellsB, genes).
  • tensorboard (bool) – If True, enable tensorboard logging of SCIPR algorithm metrics.
  • tensorboard_dir (None or str) – If None, will use an automatically generated folder to store tensorboard event files. If specified, will place event files in the specified directory (creates it if it doesn’t already exist).
fit_adata(adata, batch_key, source, target, tensorboard=False, tensorboard_dir=None)[source]

Fit the model to align to a reference batch, taking AnnData input.

Parameters:
  • adata (anndata.AnnData) – Annotated data object containing the two “source” and “target” cell batches. Dimensions are (cells, genes).
  • batch_key (str) – The name of the column in adata.obs which contains the batch annotations.
  • source,target (str) – The batch annotation values of the “source” and “target” batches.
  • tensorboard (bool) – If True, enable tensorboard logging of SCIPR algorithm metrics.
  • tensorboard_dir (None or str) – If None, will use an automatically generated folder to store tensorboard event files. If specified, will place event files in the specified directory (creates it if it doesn’t already exist).

See also

fit()

transform(A)[source]

Apply alignment to a batch of cells.

Cells are transformed to be aligned to the same “reference” batch from the fit() method.

Parameters:A (numpy.ndarray) – The batch of cells to align. Dimensions are (cellsA, genes).
Returns:The aligned batch of cells, same shape as input A.
Return type:numpy.ndarray
Raises:RuntimeError – If this method is called before fit() method.
transform_adata(adata, batch_key, batch, inplace=False)[source]

Apply alignment to a batch of cells, taking AnnData input.

Cells are transformed to be aligned to the same “reference” batch from the fit() method.

Parameters:
  • adata (anndata.AnnData) – Annotated data object containing the batch of cells to align. Dimensions are (cells, genes).
  • batch_key (str) – The name of the column in adata.obs which contains the batch annotations.
  • batch (str) – The batch annotation value of the cells to align.
  • inplace (bool) –

    If True, then replace the cells (observations) in adata.X which have the batch annotation with the new transformed values. Beware that these transformed cells will have been first normalized then transformed, so they are transformed to align to a normalized representation (i.e. using the normalizatio specified in the constructor). Whereas the other cells in adata may not be in this normalized representation.

    Otherwise, return the tuple (transformed numpy.ndarray, row indexer into adata of the transformed cells).

Returns:

  • transformed (numpy.ndarray) – The aligned batch of cells, same shape as input batch. Only provided if inplace is False.
  • indexer (numpy.ndarray) – Boolean row indexer into adata of the transformed cells. Only provided if inplace is False.

Raises:

RuntimeError – If this method is called before fit() method.

See also

transform()

Matching Functions

Closest

class scipr.matching.Closest[source]

Use the classic “closest” strategy to assign pairs.

For each cell in source batch A, pair it with the closest cell to it in the target batch B.

Greedy

class scipr.matching.Greedy(alpha=0.5, beta=2)[source]

Use the greedy matching algorithm from the SCIPR paper.

First all pairings between the two sets are sorted by distance from smallest to largest, and then selecting pairings proceeds down the list. The selection of pairs stops when we have assigned alpha fraction of the source set to pairs. When we are considering a pair, if the target point in that pair has already participated in beta pairs, we do not pick it.

Parameters:
  • alpha (float) – The alpha hyperparameter in the above algorithm.
  • beta (int) – The beta hyperparameter in the above algorithm.

MNN

class scipr.matching.MNN(k=10)[source]

Use the Mutual Nearest Neighbors strategy to assign pairs.

For any given cell a in a set A, if a cell b in a set B is in the set of nearest neighbors of a among B, and a is in the set of nearest neighbors of b among A, then pair (a, b) is added to the set of pairs.

Parameters:k (int) – The number of neighbors of each cell to consider when finding mutual nearest neighbors.

Hungarian

class scipr.matching.Hungarian(frac_to_match=1.0)[source]

Use the Hungarian algorithm for the assignment problem to assign pairs.

The “Hungarian” method is an efficient algorithm to solve the assignment problem, where finding pairs between two sets A and B is treated as a bipartite matching problem.

Parameters:frac_to_match (float) – If not 1.0, then this is the fraction of the matches to keep. All of the matches are sorted from smallest distance to largest, and then only the top n are returned, where n is the frac_to_match portion of the smaller of the two sets of cells.

Transformation Functions

Rigid

class scipr.transform.Rigid[source]

Use a rigid transformation function to align the pairs of cells.

Rigid trasformations are constrained to the operations of rotation, reflection, translation, and combinations of these.

Affine

class scipr.transform.Affine(optim='adam', lr=0.001, epochs=1000)[source]

Use an affine transformation function to align the pairs of cells.

The affine function is of the form f(x) = Wx + b, where W and b are the learnable weights. W has the shape (genes, genes) and b (the bias term) has shape (genes,).

Parameters:
  • optim ({'adam', 'sgd'}) – Which torch optimizer to use.
  • lr (float) – Learning rate to use in gradient descent.
  • epochs (int) – Number of iterations to run gradient descent.

StackedAutoEncoder

class scipr.transform.StackedAutoEncoder(hidden_sizes=[64], act='leaky_relu', last_layer_act=None, optim='adam', lr=0.001, epochs=1000)[source]

Use multiple autoencoders to align the pairs of cells.

Fit a “stack” of autoencoders, the output of one feeding in as the input into the next one (thereby “composing” them). At each step of SCIPR, the next autoencoder is fitted, and then added onto the overall stack. Since these are autoencoders, the output dimensions of each are the same as the input, so the dimensions are maintained.

Will automatically search for and use a GPU if available.

Parameters:
  • hidden_sizes (list of int) – The sizes (widths) of the hidden layers of each autoencoder. This is one side of the “funnel” of the autoencoder architecture, and the other side is built to be symmetric (same as hidden_sizes but in reverse).
  • act ({'leaky_relu', 'relu', 'sigmoid', 'tanh'}) – Which non-linear activation function to use for the autoencoders.
  • last_layer_act ({None, 'leaky_relu', 'relu', 'sigmoid', 'tanh'}) – Which non-linear activation function to use for the final layer (output) of the autoencoders. None means no non-linearity. See Warnings below.
  • optim ({'adam', 'sgd'}) – Which torch optimizer to use.
  • lr (float) – Learning rate to use in gradient descent.
  • epochs (int) – Number of iterations to run gradient descent.

Warning

Be aware that your choice of input normalization to SCIPR might have implications for your choice of the last_layer_act parameter. For exmaple, If your input normalization scales your input features to [0, 1.0], then you may want your final layer to use a sigmoid activation. Or if your input normalization allows for input features to be (-infty, +infty), then having None as the last layer’s activation might be best. This is just something to keep in mind, ultimately you may choose what performs the best alignment for you.