Showing posts from 2016

Backblaze hard disk drive failure data: Update to Q2 2016

Ross Lazarus, September 2016

This is a Kaplan Meier analysis of the BackBlaze hard drive reliability data, using all available data to end second quarter of 2016 from

Previous posts are at and

I reran my scripts and got the plots shown below. It's taking a while to read all the data as there are now a very large number of drives spinning. A total of 41740623 rows were processed in about 35 minutes on my home desktop by the python script in the github repository.

The new 8TB drives are performing the best of all - even better than the HGST and Hitachis - and way better than any of the earlier seagates. Hard to miss here - not so obvious in the report at Backblaze

Updated curves:

By Manufacturer:

Survival analysis of hard disk drive failure data: Update to Q1 2016

Ross Lazarus, May 2016This is an update to now that additional data for Q1 2016 has been released from
I reran my scripts and got the plots shown below. Whole process only takes a few minutes.

For me, the interesting thing is that so little really changes in the KM curves and statistics with 10% more data, suggesting that this statistical approach is reliable and robust, although in general we expect that more data provides better resolution. 

The WD30-EFRX and WD10-EADS and drives are reordered in terms of failure risk with more data down near the middle of the pack, but the updated models KM curves otherwise suggest the same pattern of risk of failure over time. Hitachi and HGST have reversed their positions at the top of the manufacturer survival curves as a result of the additional data, but the other manufacturers remain largely unchanged.

In t…
Survival analysis of hard disk drive failure data.
Ross Lazarus, February 2016
Executive Summary:
Using a well established, objective analysis and data presentation method designed for right censored hard disk drive failure data provides insights which are not provided by simple descriptive statistics or charts. The Kaplan-Meier statistics and plots are recommended for routine use with hard drive failure data and their use is illustrated with 30M data points from the BackBlaze public data.

Hard disk drives are widely used for mass storage in servers, network attached storeage devices, laptops and desktop computers. Familiar and convenient as they are, these complex electro-mechanical devices are prone to sudden catastrophic failure, which can lead to very unpleasant consequences such as loss of data which was not securely backed up elsewhere. Selecting drive manufacturers and models for home or for commercial applications is complicated by the problem that objective and r…