|
GSoC2011
Google Summer of Code 2011
Featured IntroductionClimate Code Foundation's list of projects and ideas for Google Summer of Code 2011 (GSoC). GSoC is a programme funded by Google where students work on open source projects and earn a stipend for doing so. Please discuss these projects and make suggestions on the mailing list. Note: these ideas are just starting points. Some of them, as they stand, are far too small to form a suitable project. In contrast, some of them are far too large and wide-ranging to be completed in a single GSoC project. None of them are anything like detailed enough to make a project proposal. If you are a potential GSoC student, we invite you to devise a proposal based on one or more of these, or on anything else which addresses our Foundational goal (promoting the public understanding of climate science). A cut-and-paste from this ideas page is not a proposal. Ideasccc-gistemp ideasFlexible GridsReplace the 8000 cell equal area grid, which is unique to the GISTEMP analysis, with a flexible gridding system so that "industry standard" grids can be used. These seem mostly to be simple NxN or NxM degree grids. The user should be able to specify a grid on the command line. All the code for a given grid should be in a separate module, so third parties could develop new grids. There is some overlap with GIS Integration because GIS tools may expect a regular grid. This will require Python expertise and a willingness to get very intimate with the internals of ccc-gistemp. GIS IntegrationThe current GISTEMP products (gridded and zonal) are in binary formats that are peculiar to GISTEMP. We would like to see output files that can be plugged straight into industry standard GIS tools. We are not experts in this field, so this would suit a student who already has some GIS expertise, who can make sensible choices regarding tools and formats. We expect that this idea could be completed without needing to get bogged down with the internals of ccc-gistemp. One-click PackagingAny interested party would currently need Python and a certain level of gumption to download and run ccc-gistemp (although it's a lot easier than GISTEMP, which requires Fortran, Python, C++, ksh, awk, sed, a big-endian machine, and a following wind). This idea is to make it more shrink-wrapped and easy-access, from download, data acquisition, processing, and display of results. Possibly part of this will involve using py2exe for Windows, although other platforms are important too. We want it to be easy enough that even busy journalists and busy high school teachers can run it. Of course if ccc-gistemp can be made to run a hundred times faster (see NumPy item below), that will also help there. Excel IntegrationThis is somewhat open ended. The idea here is to engage with people who use Excel (which we generally don't). Could be as simple as providing the time series products in a file format that can be opened in Excel without any further processing, to an example complete analysis done entirely in Excel. Super Whizzy Visualisation BrowserA really cool super whizzy dataset and station record browser where you can see the time series, gridded datasets, stations, and the station combinations and adjustments made. Allows "drilling down" to see how any particular part of the global time series is comprised of combinations of cells and stations, and "drilling up" to see how stations combine to form averages over larger areas. Also allows inspection of the internals of GISTEMP to show exactly how stations combine, and the periods and references series used for the urban adjustment (say). Alternate NumPy Implementationccc-gistemp is never quite as fast as we'd like (because of the emphasis on clarity). Most of the time is spent inside a very small amount of code (primarily series.combine). We don't advocate requiring non-standard Python modules (such as numpy) in order to run ccc-gistemp, but it may be a good idea to exploit them if they are available. Provide a "drop-in" replacement for a small amount of the ccc-gistemp code that accelerates it if numpy is available. Would require Python and numpy expertise. Publication-Quality FiguresThe figures and charts that ccc-gistemp produces, such as the one on the project googlecode home page, are fine for simple visualisation and checking, but are not really good enough for a figure for a scientific publication. The current figures require no additional software to be installed, but we see no reason (similar to the Alternate NumPy Implementation) why extra optional tools that produced publication quality figures could not require some additional installation. We would expect a suitable framework to be identified (matplotlib / rpy / something we haven't heard of). Every Published FigureTake a publication like http://pubs.giss.nasa.gov/cgi-bin/abstract.cgi?id=ha00510u (Hansen et al 2010) and make sure that we have software that can do a ccc-gistemp version of every figure in there. Other ideasClimate Code DirectoryWe have plans for a "Climate Code Directory": a comprehensive and searchable catalogue of published climate science software. We aim to make it possible to quickly discover any available code underpinning a given climate science publication. There are already some lists of climate software, and some climate science bibliography sites, on the web, we could start by scraping those. This project is definitely going to happen anyway, so you might be working fairly closely with an existing Foundation developer. This will obviously need quite a bit of web development, scraping, and presumably RDF and Semantic Web. Coding is likely to be in Python and JavaScript. If you don't know what "Dublin Core" means, this may not be for you. Climate Science LibrariesThere are a number of functions of climate science software - or of science software in general - which could be formed into useful open-source software libraries. Some such libraries already exist, but are not available in our preferred languages. A project to create new libraries, or provide Python bindings for existing ones, could be very useful. A good example is the CLiMT project http://people.su.se/~rcaba/climt/ a climate-modelling toolkit which includes Fortran components with Python bindings. Another, seemingly trivial example: software often needs to obtain source data across the internet, for instance by FTP. The data files may change their names, or require a changing web query. Has the dataset been updated? Or maybe each month's new data is in a separate file, and we need to check whether there are any new files? Does the software fail elegantly if the data is missing, unobtainable, or the connection times out? What format is the data in? Plain-text? Fortran or other raw binary? What endianness? Can I plug in readers for a variety of different data formats? Can I validate it? Write a library to take all the pain out of this process. Similarly, we are in touch with climate scientists in Toronto who are addressing the problem of management of and computation with climate-science datasets which are far too large to fit in memory. A student interested in this area should contact us for more details before writing a proposal. Open Climate ProjectWe are in contact with a group of US climate scientists who have started an "Open Climate Project" (OCP), to "provide a free, transparent and open source environment that will enable validation and rapid improvement of climate reconstructions of the past 2 millennia" using open-source code in an open-source language (R or Python) and open-access datasets. Starting this off could be an excellent GSoC project. There are some obvious overlaps with our ccc-gistemp work and the ccc-gistemp projects above (for instance, gridding and visualisation). A project in this area would be mentored by one of the OCP scientists. Clear R Implementation of GISTEMP or other algorithms("R" is a codeword for "some language other than Python") Being aware that Python is not as popular as we would like in some communities, we see a need for Clear implementations (of GISTEMP-style analyses) in other languages. R has a good track record in certain communities, and is entirely Open Source (so we would prefer it over IDL or MATLAB, say). We would not expect the same "bug for bug" compatible approach that ccc-gistemp takes. Given the timescales, we would expect a simpler approach would be taken. We could advise. CRUTEM ImplementationWe can't see the code for CRUTEM, but perhaps you, a student at the Met Office or CRU, can? Get the bosses to sign off on making a replica Open Source version in a language like Python, and have at it. Menne/Williams Homogenization Algorithm(this is a placeholder for any other climate science related algorithm that is a similar size to GISTEMP: a few thousand lines at most). There aren't many fully automated temperature record homogenization algorithms that are intended to be deployed on a global network. This is one of them; it's the algorithm used in the GHCN v3 adjusted dataset, described in "Menne, M.J., and C.N. Williams, Jr., 2009: Homogenization of temperature series via pairwise comparisons. Journal of Climate, 22, 1700-1717, doi: 10.1175/2008JCLI2263.1.". It's a FORTRAN 77 program available from the NCDC FTP server (about 17,000 lines). Clear rewrite (in Python?). Sea-Ice Visualisation / Crowd-Sourcing WebsiteA website which presents satellite images of arctic sea ice, such as the visible-light ones from MODIS, and allows users to visually identify features. The site collects the data into some useful dataset, and also provides visual feedback to the users. For instance, the site might show several images from different days. The user can select a distinctive feature on a floe, and mark it on several separate images. The website deduces the drift velocity, displays it somehow, and collects the data. Compare with Galaxy Zoo and Old Weather, for successful crowd-sourced science data sites. Alternatively, a site which just presents the variety of arctic sea ice data sets in separate layers on a single map - a Google Maps mashup? Or a Google Earth application of some sort. There's a lot of information out there. We are very open to further suggestions, and will update this page over time. |
Subject: Excel Integration
Hi,I am interested in doing the excel integration project. But I am a bit concerned about what format of data is going to be exported into excel and how it should be presented. I have some software development experience with C++ and I am willing to learn new languages if necessary to complete the project. I would also like to know how is the application process here for Climate Code Foundation. Thanks so much in advance.
Start at http://www.google-melange.com/gsoc/org/show/google/gsoc2011/climatecode and http://mailman.climatecode.org/mailman/listinfo/gsoc-2011
Hello, is it permissible to submit two proposals to your organization? I have interests in two projects based upon the above ideas entitled GIS Integration and the Open Climate Project.
Do you have any other potential suggestions for a homogenization algorithm to be rewritten in python? Also is there a preference (perhaps one the members of the project are more familiar with?) as to which version of python?
Hi, I am interested in the climate science libraries project for GSoC 2011. And I noticed that I need to contact the Clear Climate Code committee for more details before writing a proposal. Therefore, I would like to have more information before starting writing my proposal for this project. BTW, how do I contact with the CCC community?
Hello, i'm interested in develop an automatic scheme for cyclone tracking and plot using NetCDF data in python and matplotlibs. These can be used to investigate how these phenomena behave in future climate scenarios
Hi, I am a 3rd Year Dual Degree(B.Tech+ M.S.) Computer Science Student in International Institute of Information Technology Hyderabad. I am doing my masters in the field of Data-Mining. I have worked on MS-Excel this year. I built an add-on for it with the help of which one could do basic financial functionalities on excel itself. I am interested in the project of Excel Integration. It will be really helpful if you can guide me with respect to this project. Thanking You, Saket Bharambe