To INSTALL or Use MemX now, go to Usage
The MemX system, developed in the Operating Systems and Networks (OSNET) Lab at Binghamton University, provides a mechanism to virtualize the collective memory resources of machines in a cluster with zero modifications to your application or operating system. It has the following features:
Recent features:
- Live client state transfers (allows you to shutdown a client and reattach a block device on a new host or virtual machine). Sort of like a self-migrating block device without VMs
- Live server shutdowns. Allows you to migrate server memory of individual hosts without disconnecting clients. (Self-dispersing servers).
Default features:
- Auto-discovery and pooling of unused memory of nodes within a Gigabit Ethernet based cluster.
- Complete Linux kernel-space implementation.
- Completely transparent access to memory clients (no application or OS modifications).
- 80 microseconds RTT for individual remote memory page accesses (on a 1Gbps network)
- Linux 2.6.XX based deployment, with potential for portability to other OSes.
- Ability to automatically tear-down memory servers during machine shutdown or network restart without human intervention.
- Ability use MemX both across kernel modules or purely as a linux block device for storage
- Ability to grant and revoke access to specific groupings of memory servers
MemX uses about 5000 lines of kernel code for all of the above features.
Coming features:
- Page-sharing
- RAID-like reliability
- Multiple-client read support (Please note that MemX is not a DSM system requiring message passing and/or coherency/consistency protocols.)
To begin, we categorize following operational modes of a remote memory system.
Remote memory systems separate themselves in 3 main ways:
- The type network interconnect used. Popular ones include ethernet, myrinet, infiniband, and RDMA.
- The level of the operating system at which they are implemented. This affects programmability and transparency.
- The method used for resource discovery and page access among nodes in the cluster.
All of these affect performance. Although the interconnect and implementation level are decided at design time of the remote memory system, the method of paging and resource discovery is more dynamic.
In MemX, a linux module is inserted into the vanilla kernel of every participating machine in the cluster, turning the machine into either a client or a server that contributes remote memory to the cluster.
The client module implements a block device. This block device can be used either by the linux swap daemon for paging (just as it would a disk) or to configure a low-latency file system over collective remote memory.
Node Discovery
:- Servers announce (broadcast) themselves and their load statistics to every machine in the cluster at regular, reasonable intervals. These announcements allow clients to make decisions for page allocation and retrieval across all available servers. During transience, servers may reject allocation requests, forcing clients to try elsewhere.
- Clients accordingly accept block I/O request, either from the local linux swap-daemon or the file system layer, and service the I/O requests from cluster-wide memory.
Recent Features
:- Servers are capable of being shut-down live. This works by getting notifier callbacks from the NIC when the machine shuts down or reboots, at which point all of the memory of the server will be automatically migrate page-by-page to adjacent hosts with free memory.
- Clients are capable of being transfered to new host kernels. (Sort of a self-migrating block device). This was done to keep server memory more stateless, allowing a client host machine to be shutdown or rebooted or allowing a VM that depends on MemX to be migrated.
Our approach has several advantages and performance improvements in either mode:
- We get the benefits of a kernel-space implementation, without changing the virtual memory system.
- The network protocol stack is bypassed because we don't need routing, fragmentation, nor a transport layer.
- Cluster-wide memory is managed and virtualized across a low-latency, high-bandwidth interconnect.
- Depending on the state of the cluster, any client or server can be made to join or leave the system at will.
- Any workstation with a vanilla linux kernel can be used to run any legacy memory-intensive application.
Previous Work:
The following is a list of non-simulation, implemented remote memory systems that come close to what we do but do not satisfy the goals that we do:
(Refer to "How it Works" for explanation of terms used)
Classification
| Name | Test Platform Used |
Code Available | Test Cluster Size |
Test Network Used |
Page-Fault Time or speedup |
Main Caveat | Self-migration capable |
Currently Active |
| MemX | Linux 2.6. | Yes | 140 GB DRAM total across Twelve 8-core 2.5ghz Opterons | Gigabit Ethernet | 80 usec | Needs replication | Yes | Current |
| JumboMem | Linux/Unix | Yes | 250 Nodes, 4GB each, 2 Ghz Opterons | 4X Infiniband | 54 usec | Application changes to malloc() library call | No | Current |
| Nswap | Linux 2.6. | No | Eight 512MB nodes, Pentium 4 | Gigabit Ethernet | unsure | None | No | Current |
| LamdaRAM | unsure | No | unsure | Wide-Area-Networks | 80 ms? | unsure | No | 2003? |
| "SAMSON" | Linux 2.2 | No | Seven 933mhz Pentiums | Myrinet or Ethernet | 300 usec | Kernel Changes | No | 2003 |
| "Network Ramdisk" | Linux 2.0 or Digital Unix 4.0 | Yes | 233mhz Pentium or DEC-Alpha 3000 | 155 Mbps ATM or 10 Mbps Ethernet | 10-20 msec | User-level Servers | No | 1999 |
| "User-level VM" | Nemesis | No | 200mhz Pentium nodes | 10 Mbps Ethernet | Several millisec | Application is changed | No | 1999 |
| Berkeley "NOW" | GLUnix + Solaris | Yes | 105 Nodes, UltraSparc I | 1.28 Gbps Myrinet | unsure | unsure | No | 1998 |
| User-level "Dodo" | Linux 2.0, Condor-based | No | 14 Nodes, 200mhz Pentium | 100 Mbps Ethernet | speedup of 2 to 3 | Application is changed | No | 1998 |
| "Global Memory System" | Digital Unix 4.0 | No | 5-20 Nodes, 225 MHz Dec Alphas | 1.28 Gbps Myrinet | 370 microseconds | Kernel VM Subsystem Changes | No | 1999 |
| "Reliable Remote Memory Pager" | DEC OSF/1 | No | 16? Nodes, DEC Alpha 3000 | 10 Mbps Ethernet | several millisec, speedup of 2 | User-level Servers, not linux-based | No | 1996 |
Publications
- MemX: Supporting Large Memory Applications in Xen Virtual Machines, SlidesIn Proc. of November, 2007 Second International Workshop on Virtualization Technology in Distributed Computing (VTDC07). A workshop in cunjunction with Super Computing 2007, Reno, Nevada. M. Hines and K. Gopalan
- Distributed Anemone: Transparent Low-Latency Access to Remote Memory in Commodity Clusters, In Proc. of December, 2006 International Conference on High-Performance Computing (HiPC'06). Bangalore, India, M. Hines, J. Wang and K. Gopalan