Ich habe einen kleinen Proxmox Backup Server (Morefine S500+ Mini PC AMD R7-5800H), auf dem zwei SN700 WD Red NVMe Laufwerke laufen. Vor einigen Tagen fiel ein Laufwerk vorübergehend aus. Ein Neustart hat nicht geholfen, aber ein komplettes Herunterfahren und sie ist wieder online. Die Temperatur beträgt die ganze Zeit maximal 45° C, also sollte das theoretisch kein Problem sein. Hat jemand eine Idee, was hier das Problem sein könnte?
Code:
Oct 19 23:24:25 pbs kernel: nvme nvme1: I/O tag 753 (92f1) opcode 0x1 (I/O Cmd) QID 5 timeout, aborting req_op:WRITE(1) size:8192
Oct 19 23:26:30 pbs kernel: nvme nvme1: I/O tag 193 (50c1) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:4096
Oct 19 23:26:30 pbs kernel: nvme nvme1: I/O tag 606 (325e) opcode 0x1 (I/O Cmd) QID 9 timeout, aborting req_op:WRITE(1) size:4096
Oct 19 23:26:30 pbs kernel: nvme nvme1: I/O tag 753 (92f1) opcode 0x1 (I/O Cmd) QID 5 timeout, reset controller
Oct 19 23:26:30 pbs kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1
Oct 19 23:26:30 pbs kernel: nvme nvme1: Abort status: 0x371
Oct 19 23:26:30 pbs kernel: nvme nvme1: Abort status: 0x371
Oct 19 23:26:30 pbs kernel: nvme nvme1: Abort status: 0x371
Oct 19 23:26:30 pbs kernel: INFO: task txg_sync:460 blocked for more than 122 seconds.
Oct 19 23:26:30 pbs kernel: Tainted: P O 6.8.12-2-pve #1
Oct 19 23:26:30 pbs kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 19 23:26:30 pbs kernel: task:txg_sync state:D stack:0 pid:460 tgid:460 ppid:2 flags:0x00004000
Oct 19 23:26:30 pbs kernel: Call Trace:
Oct 19 23:26:30 pbs kernel: <TASK>
Oct 19 23:26:30 pbs kernel: __schedule+0x401/0x15e0
Oct 19 23:26:30 pbs kernel: ? ttwu_queue_wakelist+0x101/0x110
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? try_to_wake_up+0x248/0x5f0
Oct 19 23:26:30 pbs kernel: schedule+0x33/0x110
Oct 19 23:26:30 pbs kernel: cv_wait_common+0x109/0x140 [spl]
Oct 19 23:26:30 pbs kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Oct 19 23:26:30 pbs kernel: __cv_wait+0x15/0x30 [spl]
Oct 19 23:26:30 pbs kernel: zil_sync+0xdd/0x580 [zfs]
Oct 19 23:26:30 pbs kernel: ? spa_taskq_dispatch_ent+0x66/0xe0 [zfs]
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? zio_issue_async+0x53/0xb0 [zfs]
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? zio_nowait+0xd5/0x1c0 [zfs]
Oct 19 23:26:30 pbs kernel: dmu_objset_sync+0x441/0x600 [zfs]
Oct 19 23:26:30 pbs kernel: dsl_dataset_sync+0x61/0x200 [zfs]
Oct 19 23:26:30 pbs kernel: dsl_pool_sync+0xb2/0x4e0 [zfs]
Oct 19 23:26:30 pbs kernel: spa_sync+0x578/0x1050 [zfs]
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? spa_txg_history_init_io+0x120/0x130 [zfs]
Oct 19 23:26:30 pbs kernel: txg_sync_thread+0x207/0x3a0 [zfs]
Oct 19 23:26:30 pbs kernel: ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
Oct 19 23:26:30 pbs kernel: ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Oct 19 23:26:30 pbs kernel: thread_generic_wrapper+0x5f/0x70 [spl]
Oct 19 23:26:30 pbs kernel: kthread+0xf2/0x120
Oct 19 23:26:30 pbs kernel: ? __pfx_kthread+0x10/0x10
Oct 19 23:26:30 pbs kernel: ret_from_fork+0x47/0x70
Oct 19 23:26:30 pbs kernel: ? __pfx_kthread+0x10/0x10
Oct 19 23:26:30 pbs kernel: ret_from_fork_asm+0x1b/0x30
Oct 19 23:26:30 pbs kernel: </TASK>
Oct 19 23:26:30 pbs kernel: INFO: task tokio-runtime-w:366496 blocked for more than 122 seconds.
Oct 19 23:26:30 pbs kernel: Tainted: P O 6.8.12-2-pve #1
Oct 19 23:26:30 pbs kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 19 23:26:30 pbs kernel: task:tokio-runtime-w state:D stack:0 pid:366496 tgid:921 ppid:1 flags:0x00000002
Oct 19 23:26:30 pbs kernel: Call Trace:
Oct 19 23:26:30 pbs kernel: <TASK>
Oct 19 23:26:30 pbs kernel: __schedule+0x401/0x15e0
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? zio_nowait+0xd5/0x1c0 [zfs]
Oct 19 23:26:30 pbs kernel: schedule+0x33/0x110
Oct 19 23:26:30 pbs kernel: cv_wait_common+0x109/0x140 [spl]
Oct 19 23:26:30 pbs kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Oct 19 23:26:30 pbs kernel: __cv_wait+0x15/0x30 [spl]
Oct 19 23:26:30 pbs kernel: zil_commit_impl+0x326/0x14b0 [zfs]
Oct 19 23:26:30 pbs kernel: zil_commit+0x3d/0x80 [zfs]
Oct 19 23:26:30 pbs kernel: zfs_fsync+0xa5/0x140 [zfs]
Oct 19 23:26:30 pbs kernel: zpl_fsync+0x112/0x1a0 [zfs]
Oct 19 23:26:30 pbs kernel: __x64_sys_fdatasync+0x52/0xa0
Oct 19 23:26:30 pbs kernel: x64_sys_call+0x21d4/0x24b0
Oct 19 23:26:30 pbs kernel: do_syscall_64+0x81/0x170
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? __f_unlock_pos+0x12/0x20
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? ksys_write+0xe6/0x100
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? syscall_exit_to_user_mode+0x89/0x260
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? do_syscall_64+0x8d/0x170
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? syscall_exit_to_user_mode+0x89/0x260
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? do_syscall_64+0x8d/0x170
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? syscall_exit_to_user_mode+0x89/0x260
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? do_syscall_64+0x8d/0x170
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: ? irqentry_exit+0x43/0x50
Oct 19 23:26:30 pbs kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 19 23:26:30 pbs kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
Oct 19 23:26:30 pbs kernel: RIP: 0033:0x7f8d81781bfa
Oct 19 23:26:30 pbs kernel: RSP: 002b:00007f8d791ff6d0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Oct 19 23:26:30 pbs kernel: RAX: ffffffffffffffda RBX: 00006396898912a0 RCX: 00007f8d81781bfa
Oct 19 23:26:30 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000c
Oct 19 23:26:30 pbs kernel: RBP: 00007f8d30267d80 R08: 0000000000000007 R09: 00007f8cbc1057c0
Oct 19 23:26:30 pbs kernel: R10: 0ad5968c6ef1cbc4 R11: 0000000000000293 R12: 000063968985e218
Oct 19 23:26:30 pbs kernel: R13: 0000639687a2f3f0 R14: 0000639689891290 R15: 00007f8d30267db0
Oct 19 23:26:30 pbs kernel: </TASK>
Oct 19 23:26:30 pbs kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x3
Oct 19 23:26:30 pbs kernel: nvme nvme1: Disabling device after reset failure: -19
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=2 offset=1924908388352 size=4096 flags=1572992
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=2 offset=343609511936 size=8192 flags=1572992
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=2 offset=1925171638272 size=4096 flags=1572992
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=2 offset=343609520128 size=8192 flags=1572992
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=5 offset=0 size=0 flags=1049728
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=5 offset=0 size=0 flags=1049728
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=5 offset=0 size=0 flags=1049728
Oct 19 23:26:30 pbs kernel: zio pool=rpool vdev=/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 error=5 type=5 offset=0 size=0 flags=1049728
Oct 19 23:26:30 pbs zed[372909]: eid=10 class=statechange pool='rpool' vdev=nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 vdev_state=REMOVED
Oct 19 23:26:30 pbs zed[372916]: eid=11 class=removed pool='rpool' vdev=nvme-eui.e8238fa6bf530001001b448b4736aea1-part3 vdev_state=REMOVED