Usage
Note
The weighting structure models that this package implements are listed under Models.
Hint
Code blocks using the rng
parameter use it only for reproducibility
purposes.
There are two families of classes in pyrichlet, weighting models and mixture models. A Weighting model object can be imported and initialized as:
>>> from pyrichlet import weight_models
>>> wm = weight_models.DirichletProcess(rng=0)
>>> wm = weight_models.DirichletProcess(rng=0)
then wm can be used to draw a sample vector of weights with length 10 from its prior distribution
>>> wm.random(10)
array([7.02467947e-01, 2.12012016e-01, 8.52339410e-02, 2.75312557e-04,
8.69186887e-06, 8.68128787e-07, 2.27208506e-07, 6.14190832e-07,
6.03943713e-08, 2.02785792e-07])
we can get a new realization for the vector by running wm.random
again, or
get more weights for the same realization with
>>> wm.complete(12)
array([7.02467947e-01, 2.12012016e-01, 8.52339410e-02, 2.75312557e-04,
8.69186887e-06, 8.68128787e-07, 2.27208506e-07, 6.14190832e-07,
6.03943713e-08, 2.02785792e-07, 7.66088536e-08, 2.75049683e-08])
or get 100 independent random assignations using the current truncated structure
>>> wm.random_assignment(100)
array([0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 2, 0, 0, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 1, 0, 0, 1, 0, 1, 1, 2,
0, 1, 2, 2, 0, 1, 2, 2, 0, 2, 1, 1, 0, 0, 1, 2, 0, 0, 0, 2, 0, 1,
0, 0, 1, 0, 1, 0, 2, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 2, 0, 0, 0, 1, 0])
this methods can be applied to the posterior distribution after fitting a database of assignations. Once fitted the minimum vector size can be inferred from the data
>>> wm.fit([0, 1, 2, 3, 3, 3, 5])
>>> wm.random()
array([0.22802716, 0.14438267, 0.2624806 , 0.3200459 , 0.0360923 ,
0.00697771])
>>> wm.random_assignment(20)
array([1, 3, 3, 2, 1, 5, 1, 0, 3, 3, 3, 4, 3, 3, 3, 1, 0, 3, 3, 0])
The fitted data can be replaced by calling again the fit
method, or by
resetting the weighting structure
>>> wm.reset()
For each weighting structure there is an associated Gaussian mixture model.
>>> from pyrichlet import mixture_models
>>> mm = mixture_models.DirichletProcessMixture(rng=0)
The mixture models can fit data represented in an array or as a dataframe
>>> mm.fit_gibbs([1, 2, 3, 4], init_groups=2)
we can get the EAP density at a single point
>>> mm.gibbs_eap_density(2.5)
array([0.25341657])
or at several
>>> mm.gibbs_eap_density([1.5, 2.5, 3.5])
array([0.18017642, 0.25341657, 0.19041053])
weighting structures can also be fitted using variational inference, to which we can calculate the EAP density
>>> mm.fit_variational([1, 2, 3, 4], n_groups=2)
>>> mm.var_eap_density([1.5, 2.5, 3.5])
array([0.20426694, 0.32282514, 0.20426694])
mixture models can also be used for clustering
>>> mm.var_map_cluster()
array([0, 0, 1, 1])
>>> mm.gibbs_map_cluster()
array([0, 0, 0, 0])
>>> mm.gibbs_eap_spectral_consensus_cluster()
array([0, 0, 0, 0], dtype=int32)
Depending on the database, fitting can take a noticeable time to finish. To show the progress of the fitting method, the parameter show_progress can be set
>>> mm.fit_gibbs([1, 2, 3, 4], init_groups=2, show_progress=True)