A deeper dive into disk drive survival time

Evaluating newer classes in the context of historical failure data: Time windowed KM survival curves
Background:  A substantial proportion of online data and services rely on hard disk drives that form a ubiquitous part of modern information infrastructure, so reliable statistical analysis of differences in failure over time for different disk drive models is of particular interest to those responsible for maintaining storage integity at home or at work. The Backblaze hard disk failure data represent an interesting "big data" analytic opportunity to compare enterprise and consumer hard disk drives over time under real world operating conditions. In this article, some statistical issues are discussed and the results of a some simple analyses are presented. The results provide interesting insight that cannot be obtained by the use of simple descriptive statistics and the statistical tests show that many of the differences observed are important and unlikely to have arisen pur…

Update to Q1 2017: Seagate redeemed?

Update June 8 2017

After some delay, I finally got around to downloading another 9 months of data and rerunning the KM plots. Methods are documented in the first post and won't be repeated here. Note that drive models with fewer than 500 units, and manufacturers with fewer than 200 units are ignored to simplify the plots - you can fix this in the code if you need. 

Images below are available for the closer inspection they deserve at - they really are too detailed to appear here - sorry for the ugly layout here but you can download them or clone the repository if you want a closer look.

Straight to the chase. Here's the drive model survival curve to date:

The newer Seagate ST8000NM0055 is promising excellent longevity although there's only a tiny duration of observation so the initial curves may change with time. 

Also, we haven't tested if this is related…