My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

This project creates various types of statistics and graphs from subversion repository log data.

Oscar Castaneda (SVNPlot Commiters) talks about his Summer of Code project on Google blog (Life after Google Summer of Code)

IMPORTANT : Version 0.6.x users and 0.7.0 users, please upgrade to 0.7.4 and RECREATE the database.

News/Updates

NEW version 0.7.6 available (14 Nov 2011)

  • Fixes Fixes a bug where rare case svnlog2sqlite.py got an 'Inconsistent line ending style'
  • Fixes a bug in svnplot where svnplot crashed in end with error 'super object has no attribute del' message.

version 0.7.5 available (28 May 2011)

  • Fixes critical bugs about wrong line count, 'unknown node kind' messages, unicode errors ('expected character buffer' message).
  • There is a separate svnstatscsv.py which exports basic repository data in CSV (Comma Separated Values) format.
  • Please recreate the sqlite database after upgrade.

Version 0.7.4 available (5 Feb 2011)

  • Fixes critical bugs about wrong line count, 'unknown node kind' messages, unicode errors ('expected character buffer' message).
  • Please recreate the sqlite database after upgrade.
  • LocChurn graph added for matplot lib based svnplot (svnplot.py)
  • Command line parameters for specifying Username/password for repository authentication.
  • Some basic support for exporting the stats in CSV format (svnstatscsv.py)
  • GSoC 2010/2009 changes merged into trunk. (Thanks Oscar)
  • Bug fixes for correct display of javascript charts in IE 7 and IE8.
  • Improvements in the computation of author activity index.
  • Many small bug fixes.

Version 0.5.14 Available (4 Feb 2010) - detection of binary files based on list of commonly used binary files extension. Improvements in calculating the diffs for large repositories where you can access repository as 'file://' repository.

DO NOT USE 0.5.13. Version 0.5.13 has a bug in the linecount computations. If you are using 0.5.13, please discard the repository stats database and regenerate it again.


Steps to generate these statistics :

  1. subversion log information is first converted into a sqlite database.
  2. then using sql queries various stats are generated
  3. these stats are converted into graphs using the matplotlib package

The various graphs generated are inspired by the graphs generated using StatSVN/StatCVS.

Currently following statistics and graphs are generated

  • General Statistics
    1. Revision count
    2. Author count
    3. File Count
    4. Head revision number
  • Top 10 Hot List
    1. Top 10 Active Authors
    2. Top 10 Active Files
  • LoC graphs
    1. total loc line graph (loc vs dates)
    2. average file size vs date line graph
    3. Contributed lines of code line graph (loc vs dates). Using different colour line for each developer
    4. Loc and Churn graph (loc vs date, churn vs date)- Churn is number of lines touched (i.e. lines added + lines deleted + lines modified)
  • File Count graphs
    1. file count vs dates line graph
    2. file type vs number of files horizontal bar chart
  • Directory size graphs
    1. directory size vs date line graph. Using different coloured lines for each directory
    2. directory size pie chart (latest status)
    3. Directory file count pie char(latest status)

  • Commit Activity Graphs
    1. Commit Activity Index
    2. Activity by hour of day bar graph (commits vs hour of day)
    3. Activity by day of week bar graph (commits vs day of week)
    4. Author Commit trend history (histogram of time between consecutive commits by same author)
    5. Author Activity horizontal bar graph (author vs adding+commiting percentage)
    6. Commit activity for each developer - scatter plot (hour of day vs date)
    7. NEW Daily Commit count
  • Others
    1. Tag cloud of words from revision log messages.
    2. Tag cloud of author names.

These scripts depend on following python packages

  1. pysvn - Python interface to subversion
  2. sqlite3 - Included by default in python distribution
  3. matplotlib - python graph library

Currently I am experimenting with applying social network analysis to repositories. Check the preliminary results at Social Network Analysis of Rietveld Subversion Repository and Treemap of Commit count vs centrality for Rietveld repository

I am a novice to python, sqlite and matplotlib. So any suggestions on improvements are welcome.

Powered by Google Project Hosting