My favorites | Sign in
Logo
                

To INSTALL or Use MemX now, go to Usage

The MemX system, developed in the Operating Systems and Networks (OSNET) Lab at Binghamton University, provides a mechanism to virtualize the collective memory resources of machines in a cluster with zero modifications to your application or operating system. It has the following features:

New features (rc-1):

  1. (beta) Live client state transfers (allows you to shutdown a client and reattach a block device on a new host or virtual machine). Sort of like a self-migrating block device without VMs
  2. (stable) Live server shutdowns. Allows you to migrate server memory of individual hosts without disconnecting clients. (Self-dispersing servers).

Current features (stable):

  1. Auto-discovery and pooling of unused memory of nodes within a Gigabit Ethernet based cluster.
  2. Complete Linux kernel-space implementation.
  3. Completely transparent access to memory clients (no application or OS modifications).
  4. 80 microseconds RTT for individual remote memory page accesses (on a 1Gbps network)
  5. Linux 2.6.XX based deployment, with potential for portability to other OSes.
  6. Ability to automatically tear-down memory servers during machine shutdown or network restart without human intervention.
  7. Ability use MemX both across kernel modules or purely as a linux block device for storage
  8. Ability to grant and revoke access to specific groupings of memory servers

MemX uses about 5000 lines of kernel code for all of the above features.

Coming features:

  1. Page-sharing
  2. RAID-like reliability
  3. Multiple-client read support (Please note that MemX is not a DSM system requiring message passing and/or coherency/consistency protocols.)

How it Works:

To begin, we categorize following operational modes of a remote memory system.

Remote memory systems separate themselves in 3 main ways:

  1. The type network interconnect used. Popular ones include ethernet, myrinet, infiniband, and RDMA.
  2. The level of the operating system at which they are implemented. This affects programmability and transparency.
  3. The method used for resource discovery and page access among nodes in the cluster.

All of these affect performance. Although the interconnect and implementation level are decided at design time of the remote memory system, the method of paging and resource discovery is more dynamic.

In MemX, a linux module is inserted into the vanilla kernel of every participating machine in the cluster, turning the machine into either a client or a server that contributes remote memory to the cluster.

The client module implements a block device. This block device can be used either by the linux swap daemon for paging (just as it would a disk) or to configure a low-latency file system over collective remote memory.

Node Discovery

:
  1. Servers announce (broadcast) themselves and their load statistics to every machine in the cluster at regular, reasonable intervals. These announcements allow clients to make decisions for page allocation and retrieval across all available servers. During transience, servers may reject allocation requests, forcing clients to try elsewhere.
  2. Clients accordingly accept block I/O request, either from the local linux swap-daemon or the file system layer, and service the I/O requests from cluster-wide memory.

Recent Features

:
  1. Servers are capable of being shut-down live. This works by getting notifier callbacks from the NIC when the machine shuts down or reboots, at which point all of the memory of the server will be automatically migrate page-by-page to adjacent hosts with free memory.
  2. Clients are capable of being transfered to new host kernels. (Sort of a self-migrating block device). This was done to keep server memory more stateless, allowing a client host machine to be shutdown or rebooted or allowing a VM that depends on MemX to be migrated.

Our approach has several advantages and performance improvements in either mode:

  1. We get the benefits of a kernel-space implementation, without changing the virtual memory system.
  2. The network protocol stack is bypassed because we don't need routing, fragmentation, nor a transport layer.
  3. Cluster-wide memory is managed and virtualized across a low-latency, high-bandwidth interconnect.
  4. Depending on the state of the cluster, any client or server can be made to join or leave the system at will.
  5. Any workstation with a vanilla linux kernel can be used to run any legacy memory-intensive application.

Previous Work:

The following is a list of non-simulation, implemented remote memory systems that come close to what we do but do not satisfy the goals that we do:

(Refer to "How it Works" for explanation of terms used)

Classification

Name Test Platform

Used
Code Available Test Cluster

Size
Test Network

Used
Page-Fault Time

or speedup
Main Caveat Self-migration

capable
Currently

Active
MemX Linux 2.6. Yes 140 GB DRAM total across Twelve 8-core 2.5ghz Opterons Gigabit Ethernet 80 usec Needs replication Yes Current
JumboMem Linux/Unix Yes 250 Nodes, 4GB each, 2 Ghz Opterons 4X Infiniband 54 usec Application changes to malloc() library call No Current
Nswap Linux 2.6. No Eight 512MB nodes, Pentium 4 Gigabit Ethernet unsure None No Current
LamdaRAM unsure No unsure Wide-Area-Networks 80 ms? unsure No 2003?
"SAMSON" Linux 2.2 No Seven 933mhz Pentiums Myrinet or Ethernet 300 usec Kernel Changes No 2003
"Network Ramdisk" Linux 2.0 or Digital Unix 4.0 Yes 233mhz Pentium or DEC-Alpha 3000 155 Mbps ATM or 10 Mbps Ethernet 10-20 msec User-level Servers No 1999
"User-level VM" Nemesis No 200mhz Pentium nodes 10 Mbps Ethernet Several millisec Application is changed No 1999
Berkeley "NOW" GLUnix + Solaris Yes 105 Nodes, UltraSparc I 1.28 Gbps Myrinet unsure unsure No 1998
User-level "Dodo" Linux 2.0, Condor-based No 14 Nodes, 200mhz Pentium 100 Mbps Ethernet speedup of 2 to 3 Application is changed No 1998
"Global Memory System" Digital Unix 4.0 No 5-20 Nodes, 225 MHz Dec Alphas 1.28 Gbps Myrinet 370 microseconds Kernel VM Subsystem Changes No 1999
"Reliable Remote Memory Pager" DEC OSF/1 No 16? Nodes, DEC Alpha 3000 10 Mbps Ethernet several millisec, speedup of 2 User-level Servers, not linux-based No 1996

Publications









Hosted by Google Code