My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information

Super β-NTF 2000

(by Antoine Liutkus)

In a nutshell, the beta_ntf module

  • supports weighted parafac factorization of nonnegative numpy ndarray of arbitrary shape minimizing any β-divergence
  • fits in one small single .py file
  • requires numpy v1.6


the beta_ntf Python module provides a very simple to use, light and yet powerful implementation of the nonnegative PARAFAC model, also known as Nonnegative Tensor Factorization, Nonnegative Matrix Factorization (NMF) or Nonnegative Canonical Decomposition.

As an example, let X be a ndarray of shape (1000,400,5) composed of nonnegative entries:

(1000, 400, 5) 

The objective of beta_ntf in that case is to approximate X using 3 matrices of respective shapes (1000,K) (400,K) and (5,K) where K is called the number of components. Let us call these matrices A, B and C. We want to approximate X as:

Such problems occur in many fields such as source separation, data-mining, image processing, etc. To perform this with say K=10 while minimizing the Kullback-Leibler divergence, simply use:

>>>import beta_ntf
>>>ntf = beta_ntf.BetaNTF(X.shape, n_components=10, beta=1, n_iter=100,verbose=True) 

Now, the ntf object contains an approximation of X using 10 components. You can

  • Directly access the factors as ntf.factors_ which is a list composed of 3 matrices here.
  • Get the approximation using beta_ntf.parafac(ntf.factors_)
  • Get any of the constitutive components, e.g. using ntf[...,3] for the 4th component
  • Compute the total β-divergence of your approximation using ntf.score(X)

Detailed Features

1) Simply compute the parafac model

Given a list factors of matrices which are ndarrays with the same number of columns, the beta_ntf module permits to efficiently compute the corresponding PARAFAC model. For example, assume >>>factors=[A,B,C,D] where


To compute the (100,50,10,5) tensor P given by

simply use


2) Easily generate random nonnegative data

As a small extra, the beta_ntf module provides the nnrandn, which permits to generate nonnegative random ndarrays of arbitrary shape. Nothing complex really, but useful :

>>>E = beta_ntf.nnrandn((10,40,10))

3) Arbitrary shapes are supported

Interestingly, the beta_ntf module supports data to fit of arbitrary shape (up to 25 dimensions). In the example above, X could well have been a (100,40,5,10,8) ndarray, with exactly the same syntax for using beta_ntf. This feature is noticeable, since it is not so common (as of this writing in December 2012) to find light implementations of nonnegative PARAFAC models for higher order tensors, fitting in only one source file of less than 250 lines.

>>>import beta_ntf 

Generating synthetic data

>>>true_factors = [beta_ntf.nnrandn((shape, n_components)) for shape in data_shape]

Creating the BetaNTF object

>>>ntf = beta_ntf.BetaNTF(X.shape, n_components=10, beta=1, n_iter=100,verbose=False)

Fitting the data

Fitting NTF model with 100 iterations....
<beta_ntf.BetaNTF instance at 0x7f664c01e0e0>

Now, construct the approximation

>>>model = beta_ntf.parafac(ntf.factors_)
(15, 10, 5, 40)

Print the shape of the factors

>>>for f in ntf.factors_:print f.shape 
(15, 10)
(10, 10)
(5, 10)
(40, 10)

Extract the 3 first components

(15, 10, 5, 40, 3)

4) All β-divergences are supported

While performing the approximation, beta_ntf minimizes a cost function between original data X and its approximation P. The cost function D to be minimized for this optimization is any β-divergence, summed over all entries of X. We thus have :

where the β-divergence dβ to use for each entry is defined as :

Particular cases of interest are:

  • β = 2 : Euclidean distance
  • β = 1 : Kullback-Leibler divergence
  • β = 0 : Itakura-Saito divergence.

Simply change the beta property of a BetaNTF object to change the divergence it will minimize during fit.

5) Easily compute the score of your decomposition

It may be required to compute the score corresponding to a given decomposition, e.g. for model selection. If you want to know how far your NTF model is from data X, simply use :

>>> ntf.score(X)

The quantity returned gives the overall selected β-divergence between the model and the given ndarray, which ought to be of the same shape as ntf.data_shape. Since this score is a divergence, the smaller, the better the approximation.

6) Weighting the cost function

Some datapoints may be more important than others. Similarly, some datapoints may not be available, leading to missing data. This is common in data inpainting, or in source separation. In that case, it is desirable to weight the cost function accordingly.

More precisely, let X be the nonnegative ndarray to approximate and let P be the corresponding NTF model. Until now, I have only discribed estimation of P that minimizes the simple sum D of individual entrywise β-divergence :

where dβ has been defined above. Instead of giving the same weight to all these individual divergences, we may instead weight these entries so as to consider instead :

where W gives the weights to assign to individual divergences. If a datapoint X[i] is less important than others, its corresponding weight W[i] may be smaller, or even 0 if X[i] is actually not known. This feature is implemented in beta_ntf. If required, you can simply provide these weights as the W parameter to the fit function.

As a first exemple, suppose we have two nonnegative X and W such that


You can use W to weight the divergence simply through:


where the BetaNTF object ntf has been defined as above.

Interestingly, W and X do not have to be of the same shape. It is only required that W*X be defined, and thus that they can be broadcasted and multiplied. We could thus have instead :


and use the fit function the same way. This can be practical for e.g. giving different weights to whole modalities of the experiment.

About the algorithm

The algorithm implemented by beta_ntf features standard multiplicative updates minimizing β-divergence, which were recently shown to guarantee a decrease of the cost-function at each step. For recent references on the topic, check for example:

  • A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative matrix and tensor factorizations : Applications to exploratory multi-way data analysis and blind source separation, Wiley Publishing, September 2009.
  • C. Févotte and J. Idier, Algorithms for nonnegative matrix factorization with the beta-divergence, Neural Computation 23 (2011), no. 9, 2421–2456.

Algorithms for the weighted NTF are derived exactly the same way. As far as I known they were first derived by Virtanen in his 2007 paper :

  • Virtanen, Tuomas, Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization, Technical report, Tampere University of Technology, Institute of Signal Processing, 2007.

In any case, the update equations I derived in the general case of arbitrary dimension of the data is really nothing original with respect to those works and references therein. The main challenge was actually in terms of implementation. Still, I will write a small technical report about this and upload it here when available.

beta_ntf makes use of the powerful voodoo magic of numpy.einsum to perform its computations, thus explaining both its computational efficiency and its conciseness. Hence, it requires numpy 1.6, in which einsum first appears.

Extensions of this small module for very large datasets would be nice. For this small project, I have focused on a light implementation that is easy to start with. It should run quite fast provided the data used fits in memory.

Powered by Google Project Hosting