@nprofile1q... Holy crap, that gives me instant PTSD. Back in 2014-ish, when Ceph was still relatively new, we had a large outage in our production Ceph cluster that resulted in a similar outcome (fck those intel SSDs). Luckily, we had an Inktank support contract and Sage Weil personally wrote a Python script that helped reconstruct the data from the one remaining replica. It took about a week until all VMs in the cluster were running, and I had customers on the phone who were literally in tears.