Pick is a python tool set for extracting and analyzing data from open source mailing lists and source code repositories.
Currently, it can parse and store the following information into a relational database:
- email messages, with parsed header info and directional conversation graphs
- threads of conversation in mailing list
- email addresses and repository usernames, linked to the real humans that use a set of them.
- successful and non-successful patch submissions in mailing list
- repository commits and comments
- version controlled files
- lines changed in each commit
- method interfaces, interface changes, start and stop line of methods (limited language support)
- methods changed in each commit (limited language support)
See the wiki for more information on each module.
Pick uses a custom framework for state management and logging. It is designed to be extended and is easily modified and this makes Pick a great research tool.
Many academic papers have been based on the data produced by Pick. Some examples are:
Bird, C., Gourley, A., Devanbu, P., Swaminathan, A., and Gertz, M. Mining Email Social Networks, ICSE 2006 Workshop on Mining Software Repositories (MSR 2006)
Christian Bird, Alex Gourley, Premkumar Devanbu. Detecting Patch Submission and Acceptance in OSS Projects. ICSE 2007 Workshop on Mining Software Repositories. 2007.
Michael Ogawa, Kwan-Liu Ma, Christian Bird, Premkumar Devanbu, Alex Gourley, Visualizing Social Interaction in Open Source Software Projects (Asia-Pacific Symposium on Visualisation 2007)
Thanks go to Professor Devanbu at UC Davis for paying me to write the first version of this as an undergrad and then letting me liberate it. Thanks also go to Christian Bird for letting me include his clustering module.