Hi,

 

The Suse (SES4Win) build is significantly out of date. Previous versions had some known overflows. Here’s a daily build of our MSI: https://cloudbase.it/downloads/ceph_v16_0_0_beta.msi

 

Please send me the output of the following commands before upgrading Ceph:

 

    wnbd-client.exe -v

    rbd-wnbd.exe -v

 

Thanks,

Lucian

 

From: Kostas Liakakis
Sent: Saturday, March 6, 2021 1:56 PM
To: dev@ceph.io
Cc: Lucian Petrut
Subject: Windows port: RBD image gets corrupted

 

 

Hello Lucian,

After you confirming in private correspondence that, unless there is a huge demand for it, Server 2008R2 support is unlikely to come soon, I bit the bullet and tried the windows port on Server 2019. I used both SES4Win driver as well as Cloudbase Solutions' WNBD driver, compiled from your repository.

My use case is backing up a bunch of files to an RBD image mounted with WNBD and formatted as NTFS. The backup is done with robocopy. The file sizes vary from a few KB to 1-2GB at most, the vast majority being under 10MB however. Total size is about 180GB.

The problem is that after sufficient GB of data have been copied, the WNBD mounted NTFS volume gets corrupted and copying stops. In case of SES4Win driver, the copy stopped after about 77GB of data have been copied. Cloudbase's did a bit better, managing to copy about 120GB.

The NTFS filesystem in the RBD image is rendered unusable. 'rbd unmap' will fail and resort to forceful unmapping (pardon the vague terms, I don't write it down the first time and I haven't unmapped it this time yet, in case you need me do something). Mapping it again succeeds but Windows sees only a raw, unformated partition. That is to say the partition table (GPT) survived, but the NTFS filesystem was toasted.

On the last attempt, with Cloudbase storage driver, the failure manifests at robocopy output like this:

100%        New File              424404        5.blabla.pdf
2021/03/06 00:08:36 ERROR 1393 (0x00000571) Time-Stamping Destination Directory r:\pf-bak\2021-05-03-18-24-31\docs\dir1\dir2\[rest of path removed.....]
The disk structure is corrupted and unreadable.
Waiting 1 seconds... Retrying...

[... more retries, same error message ...]

ERROR: RETRY LIMIT EXCEEDED.

          New Dir          5    \\10.207.6.1\c$\publicfolders\[path name removed.....]\
2021/03/06 00:08:42 ERROR 1392 (0x00000570) Creating Destination Directory r:\pf-bak\2021-05-03-18-24-31\docs\dir1\dir2\[rest of path removed.....]
The file or directory is corrupted and unreadable.
Waiting 1 seconds... Retrying...

Errors go on like this, until robocopy starts being able to write again, on a different path:

ERROR: RETRY LIMIT EXCEEDED.

2021/03/06 00:15:35 ERROR 1392 (0x00000570) Creating Destination Directory r:\pf-bak\2021-05-03-18-24-31\docs\dir1\dir2\[rest of path removed.....]
The file or directory is corrupted and unreadable.

          New Dir         37    \\10.207.6.1\c$\publicfolders\docs\dir1\dir3\
100%        New File              475357        blabla.pdf

 

But a bit further, again, errors:

 

          New Dir          4    \\10.207.6.1\c$\publicfolders\docs\dir1\dir3\[rest of path removed.....]
100%        New File               1.1 m        blabla.pdf
100%        New File               1.1 m        blabla.pdf
100%        New File               1.0 m        blabla.pdf
100%        New File               1.0 m        blabla.pdf
          New Dir         15    \\10.207.6.1\c$\publicfolders\docs\dir4\
2021/03/06 01:15:21 ERROR 1392 (0x00000570) Copying NTFS Security to Destination Directory r:\pf-bak\2021-05-03-18-24-31\docs\dir4\
The file or directory is corrupted and unreadable.

          New Dir          0    \\10.207.6.1\c$\publicfolders\docs\dir5\
2021/03/06 01:15:21 ERROR 1392 (0x00000570) Copying NTFS Security to Destination Directory r:\pf-bak\2021-05-03-18-24-31\docs\dir5\
The file or directory is corrupted and unreadable.

(the errors continue until source directory list is exhausted)

 

The errors with the SES4Win storage driver where much like the above.

 

Cloudbase driver version string is 16.59.18.505, dated March 5th, 2021. I don't have the SES4Win version handy but I believe it is well known.

My Ceph cluster version is 14.2.3.

This is the image being mapped:

rbd image 'grph.publicfolders.backup':
    size 500 GiB in 128000 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 5f47936179aab9
    block_name_prefix: rbd_data.5f47936179aab9
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Thu Mar  4 23:11:25 2021
    access_timestamp: Sat Mar  6 13:08:53 2021
    modify_timestamp: Sat Mar  6 13:08:52 2021

 

Please tell me if there is any other info I can provide.

 

Thanks in advance,

-Kostas