Base

manify.embedders._base

Base embedder class.

BaseEmbedder(pm, random_state=None, device=None)

Bases: BaseEstimator, TransformerMixin, ABC

Base class for everything in manify.embedders.

This is an abstract class that that defines a common interface for all embedding methods. We assume only that a ProductManifold object is given. We try to follow the scikit-learn API's fit/transform paradigm as closely as possible, while accommodating the nuances of product manifold geometry and Pytorch/Geoopt.

Attributes:
  • pm

    ProductManifold object associated with the embedder.

  • random_state

    Random state for reproducibility.

  • device

    Device for tensor computations. If not provided, defaults to pm.device.

  • loss_history_ (dict[str, list[float]]) –

    History of loss values during training.

  • is_fitted_ (bool) –

    Boolean flag indicating if the embedder has been fitted.

Source code in manify/embedders/_base.py
33
34
35
36
37
38
def __init__(self, pm: ProductManifold, random_state: int | None = None, device: str | None = None) -> None:
    self.pm = pm
    self.random_state = random_state
    self.device = device or pm.device
    self.loss_history_: dict[str, list[float]] = {}
    self.is_fitted_: bool = False

fit(X=None, D=None, lr=0.01, burn_in_lr=0.001, curvature_lr=0.0, burn_in_iterations=2000, training_iterations=18000, loss_window_size=100, logging_interval=10) abstractmethod

Abstract method to fit an embedder. Requires at least one of (features, distances).

Parameters:
  • X (Float[Tensor, 'n_points n_features'] | None, default: None ) –

    Features to embed. Used by Mixed-curvature VAE and Siamese Network classes.

  • D (Float[Tensor, 'n_points n_points'] | None, default: None ) –

    Distances to embed. Used by coordinate learning and Siamese Network classes.

  • lr (float, default: 0.01 ) –

    Learning rate for the main training phase.

  • burn_in_lr (float, default: 0.001 ) –

    Learning rate for the burn-in phase.

  • curvature_lr (float, default: 0.0 ) –

    Learning rate for optimizing manifold scale factors. Off (no learning) by default.

  • burn_in_iterations (int, default: 2000 ) –

    Number of iterations for the burn-in phase.

  • training_iterations (int, default: 18000 ) –

    Number of iterations for the main training phase.

  • loss_window_size (int, default: 100 ) –

    Window size for computing moving average loss.

  • logging_interval (int, default: 10 ) –

    Interval for logging training progress.

Returns:
  • self( 'BaseEmbedder' ) –

    Fitted embedder instance.

Source code in manify/embedders/_base.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
@abstractmethod
def fit(
    self,
    X: Float[torch.Tensor, "n_points n_features"] | None = None,
    D: Float[torch.Tensor, "n_points n_points"] | None = None,
    lr: float = 1e-2,
    burn_in_lr: float = 1e-3,
    curvature_lr: float = 0.0,  # Off by default
    burn_in_iterations: int = 2_000,
    training_iterations: int = 18_000,
    loss_window_size: int = 100,
    logging_interval: int = 10,
) -> "BaseEmbedder":
    """Abstract method to fit an embedder. Requires at least one of (features, distances).

    Args:
        X: Features to embed. Used by Mixed-curvature VAE and Siamese Network classes.
        D: Distances to embed. Used by coordinate learning and Siamese Network classes.
        lr: Learning rate for the main training phase.
        burn_in_lr: Learning rate for the burn-in phase.
        curvature_lr: Learning rate for optimizing manifold scale factors. Off (no learning) by default.
        burn_in_iterations: Number of iterations for the burn-in phase.
        training_iterations: Number of iterations for the main training phase.
        loss_window_size: Window size for computing moving average loss.
        logging_interval: Interval for logging training progress.

    Returns:
        self: Fitted embedder instance.
    """
    pass

transform(X) abstractmethod

Apply embedding to new data. Not defined for coordinate learning.

Parameters:
  • X (Float[Tensor, 'n_points n_features'] | None) –

    New features to embed using the trained embedder.

Returns:
  • X_embedded( Float[Tensor, 'n_points embedding_dim'] ) –

    Embedded representation of the input features.

Source code in manify/embedders/_base.py
71
72
73
74
75
76
77
78
79
80
81
82
83
@abstractmethod
def transform(
    self, X: Float[torch.Tensor, "n_points n_features"] | None
) -> Float[torch.Tensor, "n_points embedding_dim"]:
    """Apply embedding to new data. Not defined for coordinate learning.

    Args:
        X: New features to embed using the trained embedder.

    Returns:
        X_embedded: Embedded representation of the input features.
    """
    pass

fit_transform(X=None, D=None, **fit_kwargs)

Fit the embedder and transform the data in one step.

Parameters:
  • X (Float[Tensor, 'n_points n_features'] | None, default: None ) –

    Features to embed.

  • D (Float[Tensor, 'n_points n_points'] | None, default: None ) –

    Distances to embed.

  • **fit_kwargs (Any, default: {} ) –

    Additional keyword arguments for fitting.

Returns:
  • X_embedded( Float[Tensor, 'n_points embedding_dim'] ) –

    Embedded representation of the input features.

Source code in manify/embedders/_base.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def fit_transform(
    self,
    X: Float[torch.Tensor, "n_points n_features"] | None = None,
    D: Float[torch.Tensor, "n_points n_points"] | None = None,
    **fit_kwargs: Any,
) -> Float[torch.Tensor, "n_points embedding_dim"]:
    """Fit the embedder and transform the data in one step.

    Args:
        X: Features to embed.
        D: Distances to embed.
        **fit_kwargs: Additional keyword arguments for fitting.

    Returns:
        X_embedded: Embedded representation of the input features.
    """
    return self.fit(X=X, D=D, **fit_kwargs).transform(X=X)