English | Site Directory

Google Code University

Distributed Systems

One of the most important recent developments in computing is the growth in distributed and parallel applications.

Tutorials

In these tutorials, we distinguish between local programming (on a single machine) and distributed programming using multiple components via a network.

We cover what designers and programmers need to consider in developing applications in a distributed environment. We also cover parallel computation using an open source tool called Hadoop, which is a MapReduce implementation, running on a distributed file system. The goal is to help build an understanding of these important new trends, and provide opportunities to practice with them.

Contributed Course Content

These submissions from industry and academia are designed to help teach distributed computing to students around the world.

By Peter A. Buhr
An introduction to advanced control-flow with an emphasis on concurrency and writing concurrent programs at the programming-language level in C++. Programming techniques and styles are examined to express complex forms of control flow, such as exceptions, co routines, and multiple forms of concurrency. Students will learn how to structure, implement and debug complex control-flow.
By Tia Newhall
Spring 2008
Three projects designed to familiarize students with developing client/server applications and dealing with issues of asynchronous communication and parallel programming.
By Paul Krzyzanowski
Spring 2008
The course covers a broad spectrum of topics encompassing system architecture, software abstractions, distributed algorithms, and issues pertaining to distributed environments such as security. Course topics include network communications, remote procedure calls, remote file systems, distributed agreement, clock synchronization, clustering, and a variety of security and system design topics.
By Aaron Kimball, Sierra Michels-Slettvet, Christophe Bisciglia
Summer 2007
During the Summer of 2007 a week long course in Cluster Computing and MapReduce was offered to interns working at Google. This submission contains the materials used in that class, along with video recordings of each of the lectures. This material builds on Introduction to Problem Solving on Large Scale Clusters, listed below.
By Aaron Kimball, Sierra Michels-Slettvet, Christophe Bisciglia, et al.
Spring 2007
The University of Washington ran an upper-division course on Distributed Computing with MapReduce in Spring 2007. This submission contains the materials used for the class: five lectures in Powerpoint format, as well as four lab exercises designed to create a toolbox of distributed algorithms and data structures for the student. These were completed by students in the course on a cluster running Hadoop. This material builds on MapReduce in a Week, listed below.
By Hannah Tang, Albert Wong, Aaron Kimball
Winter 2007
This submission contains a complete set of lectures, programming assignments, and reading materials. It is designed to provide you with all the material you need in order to teach MapReduce as a section within a course on distributed systems.

Hadoop tools and resources

Getting started with a distributed system environment can be challenging. To help with this, we've assembled a few tools and resources that can be useful to both students and educators.

    Hadoop Virtual Image
    by Google
    This VMware image contains a preconfigured single node instance of Hadoop. This provides the same interface as a full cluster without any of the overhead. It is suitable for educators exploring the platform and students working independently. The following Download and VMware Player links point to websites external to Google.
    MapReduce Tools for Eclipse Plug-In
    by IBM
    A robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. The following Plug-in and Eclipse links point to websites external to Google.
    Sample Datasets
    The following links provide interesting data samples that are most efficiently manipulated using distributed systems techniques.

Video lectures

In this area, you will find a set of video-taped lectures from Google Video on various technology areas. These videos are great opportunities for students and faculty to hear directly from some of the current pioneers in high-tech. They can also potentially serve as "guest lectures" for courses in these areas.


Building Large Systems at Google

    Presenter: Shiva Shivakumar - Google Distinguished Entreprenuer

    Google deals with large amounts of data and millions of users. We'll take a behind-the-scenes look at some of the distributed systems and computing platform that power Google's various products, and make the products scalable and reliable.




Big Table: A Distributed Structured Storage System

    Presenter: Jeff Dean - Google Distinguished Engineer

    Google's Jeff Dean discusses the Bigtable content storage system used in google's backend at the University of Washington.




Testing Distributed Systems

    Prsenters: Martin Omander, Jason Huggins

    Testing Distributed Systems with AJAX, XML - Lessons Learned from Google Checkout.