pyrichlet.mixture_models.BetaBernoulliMixture

class pyrichlet.mixture_models.BetaBernoulliMixture(*, p=1, alpha=1, rng=None, **kwargs)
fit_gibbs(y, init_groups=None, warm_start=False, show_progress=None, init_method='kmeans')

Fit posterior distribution using Gibbs sampling.

This method does self.total_iter steps of the Gibbs sampler and stores the arising variables for a later computation of the expected a posteriori of the probability distribution density or of the clusters.

Parameters:
y{array-like} of shape (n_samples, n_features)

The input sample.

init_groups: int, default=None

Maximum number of groups to assign in the initialization. If None, the initial number of groups is drawn from the weighting structure model’s attribute n. This parameter is only used in k-means initialization.

warm_startbool, default=False

Whether to continue the sampling process from a past run or start over. If False, the sampling will start from the prior and saved states will be deleted.

show_progress: bool, default=None

If show_progress is True, a progress bar from the tqdm library is displayed.

init_method: str, default=”random”

“random”: does a random initialization based on the prior models “kmeans”: does a kmeans initialization “variational”: fits the variational distribution an uses the MAP parameters as initialization

fit_variational(y, n_groups=None, warm_start=False, show_progress=None, tol=1e-08, init_method='kmeans')

Fit posterior variational distribution using mean field theory.

This method does up to self.total_iter steps of the gradient descent algorithm to fit the variational distributions of weights, atoms and assignations of the mixture.

Parameters:
y{array-like} of shape (n_samples, n_features)

The input sample.

n_groupsint, default=None

The number of groups of the truncated variational distribution. If None, the number of groups will be deduced from the weighting structure if possible.

warm_startbool, default=False

Whether to continue the sampling process from a past run or start over. If False, the sampling will start from the prior parameters and any previous calculations will be discarded.

show_progressbool, default=None

If show_progress is True, a progress bar from the tqdm library is displayed.

tol: float, default=1e-8

The tolerance of change in the evidence lower bound (ELBO) between iterations. The process finishes when the change is less than tol.

init_methodstr, default=”kmeans”

“kmeans”: initialize variational parameters using k-means algorithm “random”: initialize variational parameters using a random assignment

gibbs_eap_affinity_matrix(y=None)

Returns the (Gibbs fitted) affinity matrix for the observations y

This method must be called after fitting a dataset with fit_gibbs. It returns an affinity matrix for y. The entry (i,j) of the returned matrix denotes the proportion of draws where the observation i shared the same group as the observation j.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points for which to get an affinity matrix. If None the data used at fitting is used.

gibbs_eap_density(y=None, dim=None, component=None, periods=None)

Returns the (Gibbs fitted) expected a posteriori density at y

This method must be called after fitting a dataset with fit_gibbs. It returns the density at y as defined by the average of the mixture at every saved Gibbs step.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points over which to evaluate the EAP density. If None the data used at fitting is used.

dim: int, {array-like} default=None

The desired dimension index for which to marginalize the density, if None, all dimensions are used.

component: int default=None

Only returns the scaled density for a particular component.

periodsint, default=None

The number of saved periods to use counting backwards from the last Gibbs step. If None, all saved periods are used.

gibbs_eap_spectral_consensus_cluster(y=None, n_clusters=1)

Returns the (Gibbs fitted) expected a posteriori cluster for y

This method must be called after fitting a dataset with fit_gibbs. It returns the EAP consensus clustering for the observations y. It uses the spectral clustering algorithm over the EAP affinity matrix as consensus algorithm.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points to cluster. If None the data used at fitting is used.

n_clusters: int, default=1

The number of clusters to output.

gibbs_map_cluster(y=None, full=False)

Returns the (Gibbs fitted) maximum a posteriori cluster for y

This method is called after fitting a dataset with fit_gibbs. It returns the clustering for y using the mixture within the Gibbs steps with the greatest likelihood.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points to cluster. If None the data used at fitting is used.

full: bool, default=False

if full is false, only a vector with the clustering output is returned. If true, a tuple with the clusters and assignation uncertainties is returned.

gibbs_map_density(y=None, dim=None, component=None)

Returns the (Gibbs fitted) maximum a posteriori density at y

This method must be called after fitting a dataset with fit_gibbs. It returns the density at y as defined by the random mixture within the Gibbs steps having the highest likelihood.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points over which to evaluate the MAP density. If None the data used at fitting is used.

dim: int, {array-like} default=None

The desired dimension index for which to marginalize the density, if None, all dimensions are used.

component: int default=None

Only returns the scaled density for a particular component.

var_eap_affinity_matrix(y=None)

Returns the (Variational fitted) affinity matrix for the observations y

This init_method must be called after fitting a dataset with fit_variational. It returns an affinity matrix for y. The entry (it,it2) of the returned matrix denotes the variational probability of draws in the assignation of y[it] and y[it2].

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points for which to get an affinity matrix. If None the data used at fitting is used.

var_eap_density(y=None, dim=None, component=None)

Returns the expected a posteriori density at y using variational inference

This method is called after fitting a dataset with fit_variational. It returns the density at y as described by the fitted variational distributions using the expected density at each point.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The points at which to draw the variational EAP density. If None the data used at fitting is used.

dim: int, {array-like} default=None

The desired dimension index for which to marginalize the density, if None, all dimensions are used.

component: int default=None

Only returns the scaled density for a particular component.

var_eap_spectral_consensus_cluster(y=None, n_clusters=1)

Returns the (Variational fitted) expected a posteriori cluster for y

This init_method must be called after fitting a dataset with fit_variational. It returns the EAP consensus clustering for the observations y. It uses the spectral clustering algorithm over the EAP affinity matrix as the consensus algorithm.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The data points to cluster. If None the data used at fitting is used.

n_clusters: int, default=1

The number of clusters to output.

var_map_cluster(y=None, full=False)

Returns the maximum a posteriori clustering for y using variational inference

This method is called after fitting a dataset with fit_variational. It returns a clustering for y using the fitted variational distributions and the assignations with greater likelihood.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The points to cluster using the MAP assignations. If None the data used at fitting is used.

full: bool, default=False

If False (default), only the maximum a posteriori clustering is returned. If True, the variational assignation probabilty is also returned.

var_map_density(y=None, dim=None, component=None)

Returns the maximum a posteriori density at y using variational inference

This method is called after fitting a dataset with fit_variational. It returns the density at y as described by the fitted variational distributions using the maximum likelihood density at each point.

Parameters:
y{array-like} of shape (n_samples, n_features), default=None

The points at which to draw the variational MAP density. If None the data used at fitting is used.