Damus
marlies :tblverified: · 1w
Degraded data redundancy: 240248/360372 objects degraded (66.667%), 129 pgs degraded, 129 pgs undersized you do not, in fact, love to see it. Thankfully I have backups for the important stuff.
Dr. Christopher Kunz profile picture
@nprofile1q... Holy crap, that gives me instant PTSD. Back in 2014-ish, when Ceph was still relatively new, we had a large outage in our production Ceph cluster that resulted in a similar outcome (fck those intel SSDs). Luckily, we had an Inktank support contract and Sage Weil personally wrote a Python script that helped reconstruct the data from the one remaining replica. It took about a week until all VMs in the cluster were running, and I had customers on the phone who were literally in tears.
1
marlies :tblverified: · 1w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpqdyt7rahrhvmy300sk9jsr46enmzu08w5qkyyl4sp6dm2pc02kccq7jmej2 ha, those f}>€£#~, intel enterprise ssds that would just die after a fixed number of minutes of uptime were what originally taught me this lesson as well. Thank god we were able to fl...