My favorites | English | Sign in

Google Code University

Distributed Systems

One of the most important recent developments in computing is the growth in distributed and parallel applications.

Tutorials

In these tutorials, we distinguish between local programming (on a single machine) and distributed programming using multiple components via a network.

We cover what designers and programmers need to consider in developing applications in a distributed environment. We also cover parallel computation using an open source tool called Hadoop, which is a MapReduce implementation, running on a distributed file system. The goal is to help build an understanding of these important new trends, and provide opportunities to practice with them.

Contributed Course Content

These submissions from industry and academia are designed to help teach distributed computing to students around the world.

By Michael Haungs, Aaron Keen
Spring 2009
We are providing a simple, one-week module that can be used to introduce MapReduce in a wide variety of courses.
By Ed Lazowska, Aaron Kimball, Slava Chernyak
Fall 2008
This course will "put the meat on the bones" of our popular Hadoop programming course. We will discuss the software techniques employed to construct and program reliable highly-scalable systems. You should think of this course as covering "cool current topics in computer systems."
By Matthew Johnson, Daniel D. Garcia, Brian K. Harvey, Robert H. Liao, Alexander Rasmussen, Ramesh Sridharan
We provide software support and curriculum for parallelism units in our main lower division sequence. In CS 61A, Structure and Interpretation of Computer Programs, we use a purely functional Scheme-based interface to Hadoop, so that the connection between MapReduce and the underlying Map and Reduce operations is manifest. In CS 61C, Machine Structures, we use Pthreads and MPI to expose an even lower level of parallel control, with the focus on timing measurements.
By Amin Vahdat
CSE 124 an undergraduate course on networking and distributed systems. The continued exponential growth of the Internet has made the network an important part of our everyday lives. Companies use the network to conduct business, doctors to diagnose medical issues, etc. This course will provide a broad understanding of exactly how the network infrastructure supports distributed applications ranging from email to web browsing to electronic commerce. Topics covered in the course include the socket API, security, naming, network file systems, transport protocols (TCP). Hands-on programming assignments provide in-depth understanding of issues in distributed systems and networking.
By Peter A. Buhr
An introduction to advanced control-flow with an emphasis on concurrency and writing concurrent programs at the programming-language level in C++. Programming techniques and styles are examined to express complex forms of control flow, such as exceptions, co routines, and multiple forms of concurrency. Students will learn how to structure, implement and debug complex control-flow.
By Tia Newhall
Spring 2008
Three projects designed to familiarize students with developing client/server applications and dealing with issues of asynchronous communication and parallel programming.
By Paul Krzyzanowski
Spring 2008
The course covers a broad spectrum of topics encompassing system architecture, software abstractions, distributed algorithms, and issues pertaining to distributed environments such as security. Course topics include network communications, remote procedure calls, remote file systems, distributed agreement, clock synchronization, clustering, and a variety of security and system design topics.
By Aaron Kimball, Sierra Michels-Slettvet, Christophe Bisciglia
Summer 2007
During the Summer of 2007 a week long course in Cluster Computing and MapReduce was offered to interns working at Google. This submission contains the materials used in that class, along with video recordings of each of the lectures. This material builds on Introduction to Problem Solving on Large Scale Clusters, listed below.
By Aaron Kimball, Sierra Michels-Slettvet, Christophe Bisciglia, et al.
Spring 2007
The University of Washington ran an upper-division course on Distributed Computing with MapReduce in Spring 2007. This submission contains the materials used for the class: five lectures in Powerpoint format, as well as four lab exercises designed to create a toolbox of distributed algorithms and data structures for the student. These were completed by students in the course on a cluster running Hadoop. This material builds on MapReduce in a Week, listed below.
By Hannah Tang, Albert Wong, Aaron Kimball
Winter 2007
This submission contains a complete set of lectures, programming assignments, and reading materials. It is designed to provide you with all the material you need in order to teach MapReduce as a section within a course on distributed systems.

Hadoop tools and resources

Getting started with a distributed system environment can be challenging. To help with this, we've assembled a few tools and resources that can be useful to both students and educators.

    Hadoop Virtual Image
    by Google
    This VMware image contains a preconfigured single node instance of Hadoop. This provides the same interface as a full cluster without any of the overhead. It is suitable for educators exploring the platform and students working independently. The following Download and VMware Player links point to websites external to Google.
    MapReduce Tools for Eclipse Plug-In
    by IBM
    A robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. The following Plug-in and Eclipse links point to websites external to Google.
    Sample Datasets
    The following links provide interesting data samples that are most efficiently manipulated using distributed systems techniques.

Video lectures

In this area, you will find a set of video-taped lectures from Google Video on various technology areas. These videos are great opportunities for students and faculty to hear directly from some of the current pioneers in high-tech. They can also potentially serve as "guest lectures" for courses in these areas.


Building Large Systems at Google

    Presenter: Shiva Shivakumar - Google Distinguished Entreprenuer

    Google deals with large amounts of data and millions of users. We'll take a behind-the-scenes look at some of the distributed systems and computing platform that power Google's various products, and make the products scalable and reliable.




Big Table: A Distributed Structured Storage System

    Presenter: Jeff Dean - Google Distinguished Engineer

    Google's Jeff Dean discusses the Bigtable content storage system used in google's backend at the University of Washington.




Testing Distributed Systems

    Prsenters: Martin Omander, Jason Huggins

    Testing Distributed Systems with AJAX, XML - Lessons Learned from Google Checkout.