| Title | Blue Sky Project - Bridging the Gaps Between Statistics, Biology and E-commerce |
|---|---|
| Student | Renee McElhaney |
| Mentor | Warren A Kibbe |
| Abstract | |
|
The original goals of improving the support of Agilent microarray platform in Bioconductor, streamlining the microarray data analysis pipeline from statistical testing to experimental validation, and demonstrating the utility of pathway-focused annotation in the MAQC-II study may be implemented at a later date. Due to a change in priorities, I am now creating an embedded Entrez Gene database. We believe this change in focus will demonstrate a different methodology when practicing scientific discovery.
Entrez Gene database has a wonderful wealth of genetic information that is accessible via the internet. With the interactive search mechanisms in place at NCBI, you can learn such things as a gene’s proposed function, where it is located, and its relationship to other genes. As researchers, mining the information for specific goals can be a cumbersome task. Moreover, there is a need to integrate genomics information with the statistical applications written in R/Bioconductor. Being able to download a copy of the information from Entrez Gene on your specific computer allows easy accessibility and enables data mining. SQLite is a free database program written in C that is available over the web. With this tool, you can access the Entrez’ data we selected to be most useful. We have proven that this database, once constructed, can be easily embedded into other applications written in R, Java and Perl as well. R is a free software environment that allows statistical operations and Bioconductor packages to be run; Bioconductor is an open source project for the analysis of genomic data; being able to access the database via R can help researchers get at information they need without switching programs. BioJava is a open source, bioinformatic organization, which focuses on using Java for the analysis of biological data. Java is highly availed by many bioinformatic researchers; by using an SQLite Java Wrapper/JDBC Driver, this data can be extracted, without switching to a C environment, though a C environment is required in the setup. BioPerl is an organization that creates reusable, Perl programs for bioinformatics. Perl, likewise, is a great language for parsing specific information out of a database. Permitting the Entrez Gene database to be accessed in any of these languages allows for wider usability in data mining of this data. Our goal is to make examination of genomic data faster for the end user, available at no cost, and the database highly portable. |
|