A deeper dive into disk drive survival time

Evaluating newer classes in the context of historical failure data: Time windowed KM survival curves
Background:  A substantial proportion of online data and services rely on hard disk drives that form a ubiquitous part of modern information infrastructure, so reliable statistical analysis of differences in failure over time for different disk drive models is of particular interest to those responsible for maintaining storage integity at home or at work. The Backblaze hard disk failure data represent an interesting "big data" analytic opportunity to compare enterprise and consumer hard disk drives over time under real world operating conditions. In this article, some statistical issues are discussed and the results of a some simple analyses are presented. The results provide interesting insight that cannot be obtained by the use of simple descriptive statistics and the statistical tests show that many of the differences observed are important and unlikely to have arisen pur…

Update to Q1 2017: Seagate redeemed?

Update June 8 2017

After some delay, I finally got around to downloading another 9 months of data and rerunning the KM plots. Methods are documented in the first post and won't be repeated here. Note that drive models with fewer than 500 units, and manufacturers with fewer than 200 units are ignored to simplify the plots - you can fix this in the code if you need. 

Images below are available for the closer inspection they deserve at - they really are too detailed to appear here - sorry for the ugly layout here but you can download them or clone the repository if you want a closer look.

Straight to the chase. Here's the drive model survival curve to date:

The newer Seagate ST8000NM0055 is promising excellent longevity although there's only a tiny duration of observation so the initial curves may change with time. 

Also, we haven't tested if this is related…

Backblaze hard disk drive failure data: Update to Q2 2016

Ross Lazarus, September 2016

This is a Kaplan Meier analysis of the BackBlaze hard drive reliability data, using all available data to end second quarter of 2016 from

Previous posts are at and

I reran my scripts and got the plots shown below. It's taking a while to read all the data as there are now a very large number of drives spinning. A total of 41740623 rows were processed in about 35 minutes on my home desktop by the python script in the github repository.

The new 8TB drives are performing the best of all - even better than the HGST and Hitachis - and way better than any of the earlier seagates. Hard to miss here - not so obvious in the report at Backblaze

Updated curves:

By Manufacturer:

Survival analysis of hard disk drive failure data: Update to Q1 2016

Ross Lazarus, May 2016This is an update to now that additional data for Q1 2016 has been released from
I reran my scripts and got the plots shown below. Whole process only takes a few minutes.

For me, the interesting thing is that so little really changes in the KM curves and statistics with 10% more data, suggesting that this statistical approach is reliable and robust, although in general we expect that more data provides better resolution. 

The WD30-EFRX and WD10-EADS and drives are reordered in terms of failure risk with more data down near the middle of the pack, but the updated models KM curves otherwise suggest the same pattern of risk of failure over time. Hitachi and HGST have reversed their positions at the top of the manufacturer survival curves as a result of the additional data, but the other manufacturers remain largely unchanged.

In t…