Pipelines

manify.curvature_estimation._pipelines

distortion_pipeline(pm, dists, embedder_init_kwargs=None, embedder_fit_kwargs=None)

Builds a distortion‐based pipeline function for greedy signature selection.

Parameters:
  • pm (ProductManifold) –

    Product manifold to use for the pipeline.

  • dists (Float[Tensor, 'n_nodes n_nodes']) –

    Pairwise distances to approximate.

  • embedder_init_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for initializing the embedder model.

  • embedder_fit_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for fitting the embedder model.

Returns:
  • float

    A function f(signature) → loss, where signature is a list

  • float

    of (curvature, dim) tuples.

Source code in manify/curvature_estimation/_pipelines.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def distortion_pipeline(
    pm: ProductManifold,
    dists: Float[torch.Tensor, "n_nodes n_nodes"],
    embedder_init_kwargs: dict[str, Any] | None = None,
    embedder_fit_kwargs: dict[str, Any] | None = None,
) -> float:
    """Builds a distortion‐based pipeline function for greedy signature selection.

    Args:
        pm: Product manifold to use for the pipeline.
        dists: Pairwise distances to approximate.
        embedder_init_kwargs: Additional keyword arguments for initializing the embedder model.
        embedder_fit_kwargs: Additional keyword arguments for fitting the embedder model.

    Returns:
        A function f(signature) → loss, where signature is a list
        of (curvature, dim) tuples.
    """
    embedder_init_kwargs = embedder_init_kwargs or {}
    embedder_fit_kwargs = embedder_fit_kwargs or {}

    dists = dists.to(pm.device)
    dists_rescaled = dists / dists.max()

    # Initialize embedder model
    model = CoordinateLearning(pm=pm, device=pm.device, **embedder_init_kwargs)

    # Fit the model
    model.fit(X=None, D=dists_rescaled, **embedder_fit_kwargs)

    # Loss is the distortion loss of the new embeddings
    embeddings = model.embeddings_
    new_dists = pm.pdist(X=embeddings)
    return float(distortion_loss(new_dists, dists_rescaled).item())

predictor_pipeline(pm, dists, labels, classifier=ProductSpaceDT, task='classification', embedder_init_kwargs=None, embedder_fit_kwargs=None, model_init_kwargs=None, model_fit_kwargs=None)

Builds a classifier‐based pipeline function for greedy signature selection.

Parameters:
  • pm (ProductManifold) –

    Product manifold to use for the pipeline.

  • dists (Float[Tensor, 'n_nodes n_nodes']) –

    Pairwise distances to approximate.

  • labels (Float[Tensor, 'n_nodes']) –

    Labels for the nodes, used for training the classifier.

  • classifier (type[BasePredictor], default: ProductSpaceDT ) –

    Classifier to use for evaluating the signature.

  • task (Literal['classification', 'regression'], default: 'classification' ) –

    Task type, either "classification" or "regression".

  • embedder_init_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for initializing the coordinate learning model.

  • embedder_fit_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for fitting the coordinate learning model.

  • model_init_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for initializing the classifier.

  • model_fit_kwargs (dict[str, Any] | None, default: None ) –

    Additional keyword arguments for fitting the classifier.

Returns:
  • float

    The loss of the classifier on the test set after embedding the distances using the product manifold.

Source code in manify/curvature_estimation/_pipelines.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def predictor_pipeline(
    pm: ProductManifold,
    dists: Float[torch.Tensor, "n_nodes n_nodes"],
    labels: Float[torch.Tensor, "n_nodes"],
    classifier: type[BasePredictor] = ProductSpaceDT,
    task: Literal["classification", "regression"] = "classification",
    embedder_init_kwargs: dict[str, Any] | None = None,
    embedder_fit_kwargs: dict[str, Any] | None = None,
    model_init_kwargs: dict[str, Any] | None = None,
    model_fit_kwargs: dict[str, Any] | None = None,
) -> float:
    """Builds a classifier‐based pipeline function for greedy signature selection.

    Args:
        pm: Product manifold to use for the pipeline.
        dists: Pairwise distances to approximate.
        labels: Labels for the nodes, used for training the classifier.
        classifier: Classifier to use for evaluating the signature.
        task: Task type, either "classification" or "regression".
        embedder_init_kwargs: Additional keyword arguments for initializing the coordinate learning model.
        embedder_fit_kwargs: Additional keyword arguments for fitting the coordinate learning model.
        model_init_kwargs: Additional keyword arguments for initializing the classifier.
        model_fit_kwargs: Additional keyword arguments for fitting the classifier.

    Returns:
        The loss of the classifier on the test set after embedding the distances using the product manifold.
    """
    embedder_init_kwargs = embedder_init_kwargs or {}
    embedder_fit_kwargs = embedder_fit_kwargs or {}
    model_init_kwargs = model_init_kwargs or {}
    model_fit_kwargs = model_fit_kwargs or {}

    dists = dists.to(pm.device)
    dists_rescaled = dists / dists.max()

    # Embedding steps
    embedder = CoordinateLearning(pm=pm, device=pm.device, **embedder_init_kwargs)
    embedder.fit(X=None, D=dists_rescaled, **embedder_fit_kwargs)
    X = embedder.embeddings_

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, labels)

    # Train classifier
    model_init_kwargs["task"] = task
    model = classifier(pm=pm, **model_init_kwargs)
    model.fit(X=X_train, y=y_train, **model_fit_kwargs)
    loss = model.score(X=X_test, y=y_test)

    # For classification, we want to maximize accuracy; for regression, we minimize MSE.
    return -loss if task == "classification" else loss