| Title | pygr: examples, documentation, and datasets |
|---|---|
| Student | Rachel McCreary |
| Mentor | Charles Titus Brown |
| Abstract | |
|
Computational biology is a rapidly-growing field that relies tremendously on comparing large data sets efficiently. Genomic information stored in the DNA of organisms can be compared to that of similar organisms to ascertain the function and organization of the genome. As the quantity of data increases and the need for effective methods of storage and retrieval grows, pygr stands out as an effective solution.
pygr establishes a consistent framework for genomic data sets, enabling users to work with multiple genomes and annotation sets easily. Furthermore, pygr provides a namespace system (pygr.Data) that enables consistent multi-machine access to the same data sets. Among other benefits, this enables MapReduce-style data processing and linkages between data sets. However, while pygr is an resource to bioinformatics researchers, it remains underutilized. One of the major goals of the project is to update, refactor, and extend documentation, with the intent to attract new users to pygr and ensure ease of use. In order to do so, examples must be developed and implemented to initiate users; these examples could potentially be created by enhancing currently existing tutorials, developing pygr cookbook snippets into tutorials, and creating new tutorials that address areas not currently thoroughly represented in the pygr documentation, such as those listed in http://bio.scipy.org/wiki/index.php/Most_wanted. The examples could include work utilizing microbial genomes and even much heftier genomes. Another example could include more complex documentation connecting to a SQL database, like MySQL, which is a common requirement for users of pygr. By creating the examples using doctest, the documentation and tests could be transparently presented and further simplify new users’ implementation of pygr. The examples and tutorials would be posted to http://bio.scipy.org, where prospective and current users would have easy access to all of the pertinent information, as well as illustrations of pygr’s potential as a tool in bioinformatics research. |
|