radc

RADC - Resiliency Against Data Corruptions

Storage resiliency is one of the most demanding support for most of the mid-range storage solutions today. Storage is a vital component for any corporate world. Big giants in the corporate world are investing millions to overcome storage failure and data loss, since a single failure can cause a loss of billions. Data stored on storage devices is modified regularly which makes is more probable to different failures and corruptions has could occur while writing the data. Hence, their is big market for storage solution solution providers like NetApp, EMC, HDS and HP which offers data protection to overcome storage failure at a very high cost.

RADC stands for Resiliency Against Data Corruptions, which is an open source solution to offer resiliency against different failures that are encountered in the storage stack. This is primarily based on the inter-operability of three different modules in the Linux kernel including Device-Mapper, RAID and File-Systems. From a bird’s eye view each module has its own responsibility, Device-Mapper would be responsible for storing redundancy in for of 64-bytes checksum per block and detecting data integrity errors, RAID would be storing redundancy at block level in form of parity or block replication and File system would be responsible for serving data. One of the most desirable behavior with RADC would be have the file system as transparent as possible from the underlying RAID and Device-Mapper changes.

Error Detection & Recovery:

The error detection and recovery mechanism makes sure that during an client reads any data integrity must be detected by the device mapper and then the error must be propagated to the RAID layer. The RAID layer then using redundancy would be able to recover the block. The level of redundancy would totally depend upon the RAID level i.e RAID-5 would mean 1 block in a stripe, RAID-6 would mean 2 blocks in a stripe. Or all the blocks in case of a RAID-1 deployment.

Device-Mapper:

Device mapper requires a new target ‘dm-cksum’ which would be responsible for maintaining redundancy in the for of checksum. Checksum is a 64-bytes entity stored for each block during any writes to the device. This must be purely independent of the fact if the write is a client I/O or not. This would mean that this module will make sure of data integrity irrespective of the fact whether there is or not a file system on top of the device.

Checksum is stored on every 64th block on the device. Every collection of 64 blocks aligned at 64 will be henceforth referred to as a ‘zone’. The last block in the zone holding the checksum for the previous 63 blocks would be referred as the ‘checksum block’. For performance reasons, the location of the checksum block could also be located in the middle of the zone, 32nd block in the zone. The last 64 bytes entry in the checksum block would hold the checksum of the checksum block itself.

Though checksum would be stored at the granularity of sectors but we would be working and referring to checksum at the granularity of blocks for the sake of simplicity. This would also mean that we would be able to provide resiliency independent of the block size.

[1][2][3] ........[62][63] [Z1] [65][66]........[127] [Z2] [129] ........

Size of Z block = 8 Sectors = 4096 bytes Checksum for each block = 8 bytes X 8 sectors = 64 bytes Checksum for 63 blocks = 63 block X 64 bytes = 4032 bytes Remaining space on Z block = 4096 bytes - 4032 bytes = 64 bytes

The remaining 64 bytes are used to store the checksum of the Z block.

There is some limitation from the perspective of space i.e One block would be used internally to store the checksum for every 63 blocks. Hence, a device of size 64 MB would eventually allow the file system to use only 63 MB storage space.

DM-RAID:

As part of data recovery in case of any error reported by any sub system at the underlying layers, RAID must be created using the Z device, which is created using the above describe device mapper target. The current implementation of RAID in the Linux kernel doesn’t have support to handle errors reported by the underlying sub systems. RADC intends to implement that support of error handling so that in case of any error reported by the underlying layers, the data must be recovered and must be transparently returned to the clients. There could be different RAID level protection that must be supported at this layer but the initial intention is to support RAID-1 to avoid any complexities during the course of development. Any enhancements can be taken up as a future enhancement.

File System:

No changes have been identified at the file system layer. Commodity file systems do not have protection or the capability to handle any sector level failures today. Which makes it difficult to even identify data as valid or invalid. Remember, the fact that if a block is corrupted using dd then reading the block through a file system would never report invalid data but would simply return the corrupted data as valid one, making the situation even worse.

For any feedback or suggestions do mail us at radc.fscops@gmail.com, radc_fscops.googlegroups.com

Project Information

License: GNU GPL v2
10 stars
svn-based source control

Labels:
Storage Linux DeviceMapper

Code

Archive

radc

Project Information