Disk is not ok, look to the output below:
SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE
you should replace the disk.
On Wed, May 20, 2020 at 5:11 PM Thomas <74cmonty(a)gmail.com> wrote:
>
> Hello,
>
> I have a pool of +300 OSDs that are identical model (Seagate model:
> ST1800MM0129 size: 1.64 TiB).
> Only 1 OSD crashes regularely, however I cannot identify a root cause.
>
> Based on the output of smartctl the disk is ok.
>
> # smartctl -a -d megaraid,1
> /dev/sda
> [47/1833]
> smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.3.18-2-pve] (local build)
> Copyright (C) 2002-19, Bruce Allen, Christian Franke,
www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Vendor: LENOVO-X
> Product: ST1800MM0129
> Revision: L2B6
> Compliance: SPC-4
> User Capacity: 1,800,360,124,416 bytes [1.80 TB]
> Logical block size: 512 bytes
> Physical block size: 4096 bytes
> LU is fully provisioned
> Rotation Rate: 10500 rpm
> Form Factor: 2.5 inches
> Logical Unit id: 0x5000c500bb7822cf
> Serial number: WBN0QHX80000E852944J
> Device type: disk
> Transport protocol: SAS (SPL-3)
> Local Time is: Mon May 18 09:19:41 2020 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> Temperature Warning: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE
> FAILURE [asc=5d, ascq=10] [22/1833]
>
> Grown defects during certification <not available>
> Total blocks reassigned during format <not available>
> Total new blocks reassigned = 68
> Power on minutes since format <not available>
> Current Drive Temperature: 33 C
> Drive Trip Temperature: 65 C
>
> Manufactured in week 31 of year 2018
> Specified cycle count over device lifetime: 10000
> Accumulated start-stop cycles: 21
> Specified load-unload count over device lifetime: 300000
> Accumulated load-unload cycles: 709
> Elements in grown defect list: 18
>
> Error counter log:
> Errors Corrected by Total Correction
> Gigabytes Total
> ECC rereads/ errors algorithm
> processed uncorrected
> fast | delayed rewrites corrected invocations [10^9
> bytes] errors
> read: 3278853896 1 0 3278853897 32
> 83933.567 19
> write: 0 0 0 0 0
> 24093.894 0
> verify: 3080361880 0 0 3080361880 0
> 12630.494 0
>
> Non-medium error count: 244
>
> SMART Self-test log
> Num Test Status segment LifeTime
> LBA_first_err [SK ASC ASQ]
> Description number (hours)
> # 1 Background short Completed -
> 3761 - [- - -]
> # 2 Background short Completed -
> 3737 - [- - -]
> # 3 Background short Completed -
> 3713 - [- - -]
> # 4 Background short Completed -
> 3689 - [- - -]
> # 5 Background short Completed -
> 3665 - [- - -]
> # 6 Background short Completed -
> 3641 - [- - -]
> # 7 Background short Completed -
> 3617 - [- - -]
> # 8 Background short Completed -
> 3593 - [- - -]
> # 9 Background long Completed -
> 3569 - [- - -]
> #10 Background short Completed -
> 3545 - [- - -]
> #11 Background short Completed -
> 3521 - [- - -]
> #12 Background short Completed -
> 3497 - [- - -]
> #13 Background short Completed -
> 3473 - [- - -]
> #14 Background short Completed -
> 3449 - [- - -]
> #15 Background short Completed -
> 3425 - [- - -]
> #16 Background short Completed -
> 3401 - [- - -]
> #17 Background short Completed -
> 3377 - [- - -]
> #18 Background short Completed -
> 3353 - [- - -]
> #19 Background short Completed -
> 3329 - [- - -]
> #20 Background short Completed -
> 3305 - [- - -]
>
> Long (extended) Self-test duration: 9459 seconds [157.7 minutes]
>
> I have attached the log of the affected OSD.
>
> THX
> Thomas
>
> Ich habe 1 zu dieser E-Mail gehörende Datei hochgeladen:
> ceph-osd.92.log.1.gz <https://we.tl/t-7DzNCDP3iZ>(578
> KB)WeTransferhttps://we.tl/t-7DzNCDP3iZ
> Mozilla Thunderbird <https://www.thunderbird.net> macht es einfach,
> große Dateien über E-Mails zu teilen.
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io