In the world of technologies there is no such thing as the perfect reliability. However, it is possible to get close to that by reaching unusually high levels of solid operation without any failure of the dedicated function. Performances of this kind a required in banking sector, military, aeronautics, medicine and usually involve very complex systems with very high associated upkeep costs.
However, the trends of high complexity and high maintenance costs are gradually shifting. Well, this is happening at least in the field of information technologies where the cost to produce the system is becoming increasingly lower compared with the cost required to service the same system.
This is the idea of a new research paper published on arXiv.org this week. The article written by a team of computer engineering researchers suggests an idea of producing disk array systems for data storage that would not require any major maintenance even in case of failure of separate constituting components, while simultaneously achieving operation reliability (of probability of not losing data) level as high as 99.999 percent over the course of four years.
Now this sounds really captivating, doesn’t it? Is it really possible to make a data storage system completely maintenance-free or, to be more exact, to make this idea financially viable? From technical point of view you could buy several units of the most advanced data disks, connect them in proper RAID configuration, and you should be done. But what about that “maintenance-free” option and the cost efficiency?
According to traditional solutions, failed disks in disk array systems need to be replaced and their content has to be regenerated. However, this approach has some disadvantages. “First, it introduces an additional delay, which will have a detrimental effect on the reliability of the storage system. Second, the cost of the service call is likely to exceed that of the equipment being replaced”, argue the authors of the study.
To avoid these negative effects, the researchers propose using self-repairing disk arrays. The disks are not literally self-repairing, explain the authors. Such array would contain enough spare disks which could be switched with the failed ones automatically over the expected lifetime of each array.
How many spare disks exactly would we need? The researchers argue that it is possible to build a disk array that can achieve 99.999 percent reliability over a duration of four years with a space overhead not exceeding that of disk mirroring. To prove this concept, they devised a model to simulate the reliability of a disk array system based on standard disk performance parameters (e.g. five-year lifetime, mean time to failure (MTTF) of 100 000 hours). Later, the model was complemented with a practically collected disk reliability data obtained from a major online backup service.
Complete two-dimensional adaptive RAID arrays were selected as a base of configuration for simulation model because they have lower update costs than three-dimensional RAID arrays, the scientists note. The simulation results showed that in order to achieve a desired level of reliability (99.999% over four years) the required number of spare disks and parity disks would be approximately equal to the number of data disks. However, this approach would become not economically viable in systems with a very large number of data disks (most efficient scenario with smallest space overhead contained 45 disks).
The team also demonstrated that the same approach could be extended to more traditional RAID level 6 array system; however, in this case only a system containing 10 data disks, 2 parity disks and 18 spare disks could achieve the reliability target. Otherwise, a non-standard RAID solution or standard RAID with at least triple parity level would be required to meet these objectives, the authors of the paper conclude.
Written by Alius Noreika