On Tue, 28 Jan 2020, Paul Emmerich wrote:
Yes, data that is not synced is not guaranteed to be
written to disk,
this is consistent with POSIX semantics.
To get all 0s back during read() of a part that returned successfully from
write() of data other than 0s does not seem to be consistent with POSIX:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
"
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the write() for
that position until such byte positions are again modified.
"
Reason for my anxiety would be a typical use case:
Using the standard tool 'cp -r' to copy a set of important files from one
place into a CEPH filesystem. When the out-of-space condition is not
reported - for large amounts of data even - the user might remove them
from the original location without realising that the data is lost,
possibly only discovering this months later.
Changing cp (or whatever standard tool is used) to call fsync() before
each close() is not an option for a user. Also, doing that would lead to
terrible performance generally. Just tested - a recursive copy of a 70k
files linux source tree went from 15 s to 6 minutes on a local filesystem
I have at hand.
Best regards,
Håkan
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at
https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
>
www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Jan 27, 2020 at 9:11 PM Håkan T Johansson <f96hajo(a)chalmers.se> wrote:
>>
>>
>> Hi,
>>
>> for test purposes, I have set up two 100 GB OSDs, one
>> taking a data pool and the other metadata pool for cephfs.
>>
>> Am running 14.2.6-1-gffd69200ad-1 with packages from
>>
https://mirror.croit.io/debian-nautilus
>>
>> Am then running a program that creates a lot of 1 MiB files by calling
>> fopen()
>> fwrite()
>> fclose()
>> for each of them. Error codes are checked.
>>
>> This works successfully for ~100 GB of data, and then strangely also succeeds
>> for many more 100 GB of data... ??
>>
>> All written files have size 1 MiB with 'ls', and thus should contain the
data
>> written. However, on inspection, the files written after the first ~100 GiB,
>> are full of just 0s. (hexdump -C)
>>
>>
>> To further test this, I use the standard tool 'cp' to copy a few
random-content
>> files into the full cephfs filessystem. cp reports no complaints, and after
>> the copy operations, content is seen with hexdump -C. However, after forcing
>> the data out of cache on the client by reading other earlier created files,
>> hexdump -C show all-0 content for the files copied with 'cp'. Data that
was
>> there is suddenly gone...?
>>
>>
>> I am new to ceph. Is there an option I have missed to avoid this behaviour?
>> (I could not find one in
>>
https://docs.ceph.com/docs/master/man/8/mount.ceph/ )
>>
>> Is this behaviour related to
>>
https://docs.ceph.com/docs/mimic/cephfs/full/
>> ?
>>
>> (That page states 'sometime after a write call has already returned 0'.
But if
>> write returns 0, then no data has been written, so the user program would not
>> assume any kind of success.)
>>
>> Best regards,
>>
>> Håkan
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>