(by Antoine Liutkus)
In a nutshell, the beta_ntf module
the beta_ntf Python module provides a very simple to use, light and yet powerful implementation of the nonnegative PARAFAC model, also known as Nonnegative Tensor Factorization, Nonnegative Matrix Factorization (NMF) or Nonnegative Canonical Decomposition.
As an example, let X be a ndarray of shape (1000,400,5) composed of nonnegative entries:
>>>X.shape (1000, 400, 5)
The objective of beta_ntf in that case is to approximate X using 3 matrices of respective shapes (1000,K) (400,K) and (5,K) where K is called the number of components. Let us call these matrices A, B and C. We want to approximate X as:
Such problems occur in many fields such as source separation, data-mining, image processing, etc. To perform this with say K=10 while minimizing the Kullback-Leibler divergence, simply use:
>>>import beta_ntf >>>ntf = beta_ntf.BetaNTF(X.shape, n_components=10, beta=1, n_iter=100,verbose=True) >>>ntf.fit(X)
Now, the ntf object contains an approximation of X using 10 components. You can
Given a list factors of matrices which are ndarrays with the same number of columns, the beta_ntf module permits to efficiently compute the corresponding PARAFAC model. For example, assume >>>factors=[A,B,C,D] where
>>>A.shape (100,5) >>>B.shape (50,5) >>>C.shape (10,5) >>>D.shape (5,5) >>>factors=[A,B,C,D]
To compute the (100,50,10,5) tensor P given by
As a small extra, the beta_ntf module provides the nnrandn, which permits to generate nonnegative random ndarrays of arbitrary shape. Nothing complex really, but useful :
>>>E = beta_ntf.nnrandn((10,40,10)) >>>E.shape (10,40,10)
Interestingly, the beta_ntf module supports data to fit of arbitrary shape (up to 25 dimensions). In the example above, X could well have been a (100,40,5,10,8) ndarray, with exactly the same syntax for using beta_ntf. This feature is noticeable, since it is not so common (as of this writing in December 2012) to find light implementations of nonnegative PARAFAC models for higher order tensors, fitting in only one source file of less than 250 lines.
Generating synthetic data
>>>data_shape=(15,10,5,40) >>>true_ncomponents=50 >>>true_factors = [beta_ntf.nnrandn((shape, n_components)) for shape in data_shape] >>>X=beta_ntf.parafac(true_factors) >>>X.shape (15,10,5,40)
Creating the BetaNTF object
>>>ntf = beta_ntf.BetaNTF(X.shape, n_components=10, beta=1, n_iter=100,verbose=False)
Fitting the data
>>>beta_ntf.fit(X) Fitting NTF model with 100 iterations.... Done. <beta_ntf.BetaNTF instance at 0x7f664c01e0e0>
Now, construct the approximation
>>>model = beta_ntf.parafac(ntf.factors_) >>>model.shape (15, 10, 5, 40)
Print the shape of the factors
>>>for f in ntf.factors_:print f.shape (15, 10) (10, 10) (5, 10) (40, 10)
Extract the 3 first components
>>>firstComponents=ntf[...,0:3] >>>firstComponents.shape (15, 10, 5, 40, 3)
While performing the approximation, beta_ntf minimizes a cost function between original data X and its approximation P. The cost function D to be minimized for this optimization is any β-divergence, summed over all entries of X. We thus have :
where the β-divergence dβ to use for each entry is defined as :
Particular cases of interest are:
Simply change the beta property of a BetaNTF object to change the divergence it will minimize during fit.
It may be required to compute the score corresponding to a given decomposition, e.g. for model selection. If you want to know how far your NTF model is from data X, simply use :
>>> ntf.score(X) 304.01426732381253
The quantity returned gives the overall selected β-divergence between the model and the given ndarray, which ought to be of the same shape as ntf.data_shape. Since this score is a divergence, the smaller, the better the approximation.
Some datapoints may be more important than others. Similarly, some datapoints may not be available, leading to missing data. This is common in data inpainting, or in source separation. In that case, it is desirable to weight the cost function accordingly.
More precisely, let X be the nonnegative ndarray to approximate and let P be the corresponding NTF model. Until now, I have only discribed estimation of P that minimizes the simple sum D of individual entrywise β-divergence :
where dβ has been defined above. Instead of giving the same weight to all these individual divergences, we may instead weight these entries so as to consider instead :
where W gives the weights to assign to individual divergences. If a datapoint X[i] is less important than others, its corresponding weight W[i] may be smaller, or even 0 if X[i] is actually not known. This feature is implemented in beta_ntf. If required, you can simply provide these weights as the W parameter to the fit function.
As a first exemple, suppose we have two nonnegative X and W such that
>>>X.shape (400,50,30) >>>W.shape (400,50,30)
You can use W to weight the divergence simply through:
where the BetaNTF object ntf has been defined as above.
Interestingly, W and X do not have to be of the same shape. It is only required that W*X be defined, and thus that they can be broadcasted and multiplied. We could thus have instead :
and use the fit function the same way. This can be practical for e.g. giving different weights to whole modalities of the experiment.
The algorithm implemented by beta_ntf features standard multiplicative updates minimizing β-divergence, which were recently shown to guarantee a decrease of the cost-function at each step. For recent references on the topic, check for example:
Algorithms for the weighted NTF are derived exactly the same way. As far as I known they were first derived by Virtanen in his 2007 paper :
In any case, the update equations I derived in the general case of arbitrary dimension of the data is really nothing original with respect to those works and references therein. The main challenge was actually in terms of implementation. Still, I will write a small technical report about this and upload it here when available.
beta_ntf makes use of the powerful voodoo magic of numpy.einsum to perform its computations, thus explaining both its computational efficiency and its conciseness. Hence, it requires numpy 1.6, in which einsum first appears.
Extensions of this small module for very large datasets would be nice. For this small project, I have focused on a light implementation that is easy to start with. It should run quite fast provided the data used fits in memory.