Proxmox

We bought 1347 Used Data Center SSDs to See SSD Endurance



We bought 1347 used data center SSDs to see just how much SSD endurance people are using, and how it is changing over time. The result is we may need to re-think the DWPD metric.

We have been collecting the data for years, but this video came about at a discussion with Solidigm. That is why we are saying this is sponsored by Solidigm and are showing their drives. Of course, most of the actual data collection happened well before Solidigm was even a company. It is, however, valuable to be able to get some vendor insights as well.

STH Main Site Article:
STH Top 5 Weekly Newsletter:

———————————————————————-
Become a STH YT Member and Support Us
———————————————————————-
Join STH YouTube membership to support the channel:

———————————————————————-
Where to Find STH
———————————————————————-
STH Forums:
Follow on Twitter:
Follow on LinkedIn:
Follow on Facebook:
Follow on Instagram:

———————————————————————-
Other STH Content Mentioned in this Video
———————————————————————-
– 2016 SSD Endurance study:
– 61.44TB SSD Video:

———————————————————————-
Timestamps
———————————————————————-
00:00 Introduction and Our 2016 Data
06:38 Looking at DWPD and SSD Endurance in our 2024 Study
13:40 Talking SSD Reliability
17:24 Key Lessons Learned

[ad_2]

source

Related Articles

40 Comments

  1. If you talk about storage of videos, if it isn't a university, it is storage on the trash, it is silly, it is useless to the society and it just makes acess to storage for useful information harder. I have really changed my view about youtube overall specially when it comes to these dudes a lot when I stopped to think a couple seconds.
    Not the platform, but its use.

  2. your calculation of drives, out of 1300 drives you have 1000+ 2.5" AND 1000+ nvme drives…. meaning some 2.5 drives are actually nvme? I have never heard of such, must have missed those…

  3. The issue you mentioned with MTTR is why I have begun to wonder if there might not be a place for something like a delayed (or lazy) Raid 1, aka instead of pure spares or a pure Raid1 the data is synced over on a much delayed basis. That would decrease the number of writes on the second pair of drives alot (since multiple rewrites on the primary disk would be a single sync to the secondary) so once the primary signals near death then you already have almost pristine data on the secondary drives while also having a much less risk of them failing in the near future.

  4. The endurance is meant for the disks to last so we don't have to replace them in 2-4 years, they can sit there until we decom whole box… thats the idea of having a lot of endurance.

  5. Good analysis but with an oversight. Ive had many SSDs and HDDs fail over the years. The problem being a lot of time an SSD will fail quickly and then be unrecoverable en mass with 100% data loss. An HDD often fails progressively with errors showing up in scans or with bad behaviour. When data from some sectors is lost almost always some of the rest is saveable.

    With the appropriate backup strategy this makes it less of a problem of course. It does shift the emphasis of how one cares for the data though.

  6. We've had Intel S3500/3510 Sata SSDs as the boot drives in RAID1 for all our production Dell R730s for coming up 8 years – never had an issue with any of them. We had 3x P5800X Optanes fail under warrant, but the 750 PCI-E cards are still going strong

  7. Can you please test the data access rates and data transfer rates to see if the used drives are really performing according to manufacturer promises?

    Steve Gibson's GRC free ReadSpeed acknowledges "… we often witness a significant slowdown at the front of solid state drives (SSDs), presumably due to excessive use in that region …".

    And free HDD Scan and free HD Tune can show us graphs of the slow or even unreadable sectors.

    And then SpinRite 6.1 Level 5 or HDD Regenerator will show the qualities of every sector's REWRITABILITY.

    Without that information, it's impossible to know the true value of any of those SSDs, really.

    Let us know when you have the next video with a full analysis of the SSD drive's REAL qualities to READ and WRITE compared to manufacturer performance specifications.

    Thanks.

    .

  8. I suppose the sequential writes applies to hibernation files too? The biggest cause of SSD wear on my laptops is likely to be from hibenation file writes, as I set them to hibernate after a certain period of inactivity.

  9. This is a very interesting video… even though it does not apply to me, I use Intel 905p’s and Seagate EXOS X14 12TB hard drives in my gaming system. (Utilizing ZFS on Debian.)

  10. Wow! Only $8250 for a 61.44 TB. SSD. I'll take 3 please. That should be enough storage for my little home lab.🤣 That's just bonkers! I have about 16 tb. available on my NAS and I back up all my running machines to it, and it's STILL only about 1/3 full!!😄

  11. 😮🤔 great video! Ive seen a few videos of all SSD NAS's and tbought well that is bold. Though now watching this I'm thinking I want to try it too! I happen to have a collection of enterprise SSD's from decomm servers at work. The SMART numbers on these are probably crazy low. This also sounds very appealing from a power/heat perspective. Im always trying to make the homelab more efficient.

  12. I remember around 2010 that I introduced 2x 60GB drives as the IT guy at a company in raid 1 config for their main database for their accounting software. Reports and software upgrades that ran for minutes up to half an hour was done in seconds. The software technician was apprehensive about using SSDs for databases but after seeing these performance numbers he was convinced. These drives worked for around 4 years after being retired but were still working.
    Capacitors and other support electronics seem to be less reliable than the flash chips themselves lol! I've upgraded all my HDD drives to SDDs last year and never looked back.

  13. Controversially, years ago, I estimated that most quality commercial SSDs would simply obselete themselves in terms of capacity long before reaching their half-life, given even "enthusiast" levels of use. Thus far, this has been the case, even with QLC drives.

    Capacities continue to increase, write endurance continues to improve, and costs continue to decrease. It will be curious to see what levels of performance and endurance PLC delivers.

  14. Just for shits and giggles I ran the DWPD on all of the SSD's in my server. The highest was on my two Optane's (which I use as mirrored SLOG drives). They have a whopping ~0.1 DWPD average over ~3 years. :p

  15. You don't think that maybe your data set might be skewed due to sellers not selling drives where they have already consumed all or close to all of the drives write cycles? Because of this, I just don't think your sample is truly random or representative.

  16. SSD aren't actually so much larger now. The vast majority of SSD used, even by IT geeks, are vastly smaller than HDDs. Even in 2024 1 or 2 TB is normal and that's insane. That was normal for HDD in 2009. No human person can really afford to buy a SSD that is larger than a HDD. That is only something corporate persons can do.

  17. I think outside of the early gens of non-SLC SSDs, I haven't had any wear out. But far more of those drives died from controller failure, as was the style of the time. 100% failure rate on some brands.

    I recently bought a around 50 of 10-12 year old Intel SSDs. Discounting the one that was DOA, the worst drive was down to 93%, the next worst was 97%, the rest were 98-99%. A bunch of them still had data (seller should not have done that…) and I could tell that many of them had been in use till about a year ago.

  18. I have been keeping to 5yo old Dell enterprise hardware.

    Currently my need is just 8TB within truenas. Enterprise SAS SSDs have been a huge leap for my use.

  19. Intel drives were known for going read-only and then bricking themselves on the next power reset when lifetime bytes written hit the warranty limit, whether those had been small-block writes or large sequential, and whether or not the drive was still perfectly good. Does Solidigm retain that behavior?

  20. Where are you buying your used SSDs? I'm trying to find a resonable price for 4TB + SSDs. I would have expected them to have dropped considerable by now but have not..

  21. The takeaway message seems to be that SSD's are ~10x more reliable than mechanical drives:
    Helpful to know that SSD's in servers have almost eliminated HDD failures.
    Helpful to point out that larger SSD's help improve reliability.

    Mechanical HDD's have to swapped out every ~5 years even if they've had light use.
    That's starts to get very expensive and inconvenient.
    SSD's are a much better solution.

    Most users just want a drive that is not going to fail during the life of the computer.
    The lifespan of many computers might be 10 years or more.
    NVMe drives are great because you get speed, small form factor and low price all in one package.
    The faster the drive the better in most cases… especially if you like searching your drives for emails or files.

    My key metric remains total data written before failure… although it is useful to know over what time period the data was written.
    I've yet to have an SSD fail.
    Most of my SSD's live on in various upgrades e.g. Laptops.
    That means that old SSD's will continue to be used until they become obsolete.

    It's rare to see meaningful useability data on SSD's. Nicely done.

    🙂

  22. datacenter drives are timed out. Its not something you want to play with. What they do is stagger the replacement at ~60% wear life in the clusters.

    these drives are often fine for home use.. IF you have the right interface.

  23. I have several intel 311 (20g) I should upgrade (purchased 2010), as zfs slog service, PONH=93683, DWPD=1.53, but everything has been optimized to not write unless needed, and moving everything to containers helped with this even more.

  24. Three things:

    1) SSD usage and by extension, endurance, REALLY depends on what it is that you do.

    One of the guys that I went to college with, who is now a Mechanical Design Lead at SpaceX, runs Monte Carlo simulations and on his new workstation which uses E1S NVMe SSDs — a SINGLE batch of runs, consumed 2% of the drives' total write endurance.

    (When you are using SSDs as scratch disk space for HPC/CFD/FEA/CAE applications, especially FEA applications, it just rains data like no tomorrow. For some of the FEA work that I used to do on vehicle suspension systems and body-on-frame pickup trucks, a single run can easily cycle through about 10 TB of scratch disk data.)

    So, if customers are using the SSDs because they're fast, and they're using it for storage of large, sequential (read: video) files, then I would 100% agree with you.

    But if they are using it for its blazing fast random read/write capabilities (rather than sequential transfers), then the resulting durability and reliability is very different.

    2) I've killed 2 NVMe SSDs (ironic that you mentioned the Intel 750 Series NVMe SSD, because that was the one that I killed. Twice.) and 5 SATA 6 Gbps SSDs (all Intel drives) over the past 8 years because I use the SSDs as swap space for Windows clients (which is also the default, when you install Windows), for systems that had, at minimum, 64 GB of RAM, and a max of 128 GB of RAM.

    The Intel 750 Series 400 GB AIC NVMe SSDs, died, with an average of 2.29 GB writes/day, and yet, because it was used as a swap drive, it still died within the warranty period (in 4 years out of the 5 year warranty).

    On top of that, the manner in how it died was also really interesting because you would think that when you burn up the write endurance of the NAND flash cells/modules/chips, that you'd still be able to read the data, but that wasn't true neither. In fact, it was the read that was the indicator that the drive had a problem/died — because it didn't hit the write endurance limits (according to STR nor DWPD nor TBW).

    The workload makes a HUGE difference.

    3) It is quite a pity that a 15.36 TB Intel/Solidigm D5-P5316 U.2 NVMe costs a minimum of $1295 USD whereas a WD HC550 16 TB SATA 6 Gbps HDD can be had for as little as $129.99 USD (so almost 1/10th the cost, for a similar capacity).

    Of course, the speed and the latency is night-and-day and isn't comparable at all, but from the cost perspective, I can buy 10 WD HC550 16 TB SATA HDDs for the cost of one Intel D5-P5316 15.36 TB U.2 NVMe SSD.

    So, it'll be a while before I will be able to replace my homelab server with these SSDs, possibly never.

  25. Ironically our own usage of nvme ssd keeps going up, since we keep migrating more and more data to the cloud yet need legacy tools to be able to read the data as if it's on a posix file system. So we end up needing to use filesystem drivers to transparently cache the s3 data on NVMe while it's being used. Which means that tasks which used to only read data are now having to write the data first before reading it 😂

  26. Way too many people liked to spread scare stories because they didn't like SSD's and didn't understand how the real world vs theory was vastly different in most cases for lifespan of solid state devices. the amount that would still spec HDD's in servers until fairly recently because they used to claim drives would burn out is was sad

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button