from IEEE Transactions on Computers http://www.computer.org/csdl/trans/tc/preprint/06522411-abs.html
In data storage systems, it is crucial to protect data from loss due to failures. Erasure codes lay the foundation of this protection, enabling systems to reconstruct lost data when components fail. Erasure codes can, however, impose significant performance overhead in two core operations: encoding, where parity is calculated from newly written data, and decoding, where data is reconstructed after failures. This paper focuses on improving the performance of encoding, the more frequent operation. We observed that CPU cache efficiency has great impact on the encoding performance, and proposed several encoding scheduling algorithms to optimize the use of cache memory. We call the technique XOR-scheduling and demonstrate how it applies to a wide variety of existing erasure codes. To illustrate the generality of this technique, we have conducted a performance evaluation of scheduling these codes on a variety of platforms and shown that XOR-scheduling significantly improves upon the conventional approach. Hence, we believe that XOR-scheduling has great potential to have wide impact in practical storage systems.