Is a secure-erase suggested after the firmware update? Sometimes manufacturers do that.
On Sep 1, 2023, at 05:16, Frédéric Nass
<frederic.nass(a)univ-lorraine.fr> wrote:
Hello,
This message to inform you that DELL has released a new firmwares for these SSD drives to
fix the 70.000 POH issue:
[
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=69j…
| Toshiba A3B4 for model number(s) PX02SMF020, PX02SMF040, PX02SMF080 and PX02SMB160. ]
[
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=31j… |
Toshiba A4B4 for model number(s) PX02SSF010, PX02SSF020, PX02SSF040 and PX02SSB080. ] [
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=69j…
]
[
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=tc8… |
Toshiba A5B4 for model number(s) PX03SNF020, PX03SNF080 and PX03SNB160. ]
Based on our recent experience, this firmware gets dead SSD drives back to life with
their data (after the upgrade, you may need to import foreign config by pressing
'F' key on the next start)
Many thanks to DELL French TAMs and DELL engineering for providing this firmware in a
short time.
Best regards,
Frédéric.
----- Le 19 Juin 23, à 10:46, Frédéric Nass <frederic.nass(a)univ-lorraine.fr> a
écrit :
Hello,
This message does not concern Ceph itself but a
hardware vulnerability which can
lead to permanent loss of data on a Ceph cluster equipped with the same
hardware in separate fault domains.
The DELL / Toshiba PX02SMF020, PX02SMF040,
PX02SMF080 and PX02SMB160 SSD drives
of the 13G generation of DELL servers are subject to a vulnerability which
renders them unusable after 70,000 hours of operation, i.e. approximately 7
years and 11 months of activity.
The risk is all the greater since these disks may
die at the same time in the
same server leading to the loss of all data in the server.
To date, DELL has not provided any firmware
fixing this vulnerability, the
latest firmware version being "A3B3" released on Sept. 12, 2016:
https://www.dell.com/support/home/en-us/ drivers/driversdetails?driverid=hhd9k
If your have servers running these drives, check
their uptime. If they are close
to the 70,000 hour limit, replace them immediately.
The smartctl tool does not report the uptime for
these SSDs, but if you have
HDDs in the server, you can query their SMART status and get their uptime,
which should be about the same as the SSDs.
The smartctl command is: smartctl -a -d megaraid,XX /dev/sdc (where XX is the
iSCSI bus number).
We have informed DELL about this but have no
information yet on the arrival of a
fix.
We have lost 6 disks, in 3 different servers, in
the last few weeks. Our
observation shows that the drives don't survive full shutdown and restart of
the machine (power off then power on in iDrac), but they may also die during a
single reboot (init 6) or even while the machine is running.
Fujitsu released a corrective firmware in June
2021 but this firmware is most
certainly not applicable to DELL drives:
https://www.fujitsu.com/us/imagesgig5/PY-CIB070-00.pdf
Regards,
Frederic
Sous-direction Infrastructure and Services
Direction du Numérique
Université de Lorraine
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io