Hi,
I've got a problem on Octopus (15.2.3, debian packages) install, bucket
S3 index shows a file:
s3cmd ls s3://upvid/255/38355 --recursive
2020-07-27 17:48 50584342
s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4
radosgw-admin bi list also shows it
{
"type": "plain",
"idx":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"entry": { "name":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"instance": "", "ver": {
"pool": 11,
"epoch": 853842
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 50584342,
"mtime": "2020-07-27T17:48:27.203008Z",
"etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7",
"storage_class": "",
"owner": "filmweb-app",
"owner_display_name": "filmweb app user",
"content_type": "",
"accounted_size": 50584342,
"user_data": "",
"appendable": "false"
},
"tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
},
but trying to download it via curl (I've set permissions to public0 only gets me
<?xml version="1.0"
encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>upvid</BucketName><RequestId>tx0000000000000000e716d-005f1f14cb-e478a-pl-war1</RequestId><HostId>e478a-pl-war1-pl</HostId></Error>
(the actually nonexisting files shows access denied in same context)
same with other tools:
$ s3cmd get s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 /tmp
download: 's3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' -> '/tmp/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' [1 of 1]
ERROR: S3 error: 404 (NoSuchKey)
cluster health is OK
Any ideas what is happening here ?
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
NOC: [+48] 22 380 10 20
E: admin(a)efigence.com
Hi,
On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing
OSDs randomly drop out of the cluster.
Usually it's 2 to 4 OSDs spread out over different nodes. Each node has
16 OSDs and not all the failing OSDs are on the same node.
The OSDs are marked as down and all they keep print in their logs:
monclient: _check_auth_rotating possible clock skew, rotating keys
expired way too early (before 2020-06-04T07:57:17.706529-0400)
Looking at their status through the admin socket:
{
"cluster_fsid": "68653193-9b84-478d-bc39-1a811dd50836",
"osd_fsid": "87231b5d-ae5f-4901-93c5-18034381e5ec",
"whoami": 206,
"state": "active",
"oldest_map": 73697,
"newest_map": 75795,
"num_pgs": 19
}
The message brought me to my own ticket I created 2 years ago:
https://tracker.ceph.com/issues/23460
The first thing I've checked is NTP/time. Double, triple check this. All
the times are in sync on the cluster. Nothing wrong there.
Again, it's not all the OSDs on a node failing. Just 1 or 2 dropping out.
Restarting them brings them back right away and then within 24h some
other OSDs will drop out.
Has anybody seen this behavior with Octopus as well?
Wido
Are you facing a scarcity of timings or lack of time while composing assignments for Kuwait Universities? Are you not sure while taking the help of anybody to compose your assignments? For every student, assignments sound necessary but time-consuming tasks. You have to manage your time for your projects to score high marks as you can’t ignore your assignments during your study tenure. In this situation, if you can’t manage your time and require reliable assistance for your important task, place your order for assignment help even in Kuwait. Two important things that you need to keep in mind while working on your assignments are time management and quality content. Most students fail to submit their assignments on time because they could not manage their time and can’t collect relevant information for drafting their academic papers. However, don’t burst out and suffer your marks because of any reason. Instead of it, connect with assignment writing service provider and get your work done on time if you have less time to write your assignment.
Many students achieve high marks on their projects due to the assistance of experts and professional writers. When you quote order for assignment writing help, you will provide enough time to engage yourself in some other academic tasks. If you don’t have time for writing your assignments and have no idea how to collect particulars for your work, transfer your project to professionals. Experts know how to arrange relevant information for composing the effective academic papers so you will not lose your marks. Expert’s knowledge and experience will allow you to connect with the right source of information and help you to score high marks. So, if you have issues in writing your academic papers, don’t forget to check out the services of online academic writing.
https://www.greatassignmenthelp.com/kw/
Hello,
I’m running kvm virtualization with rbd storage, some images on rbd pool become efficiently unusable after VM restart.
All I/O to problematic rbd image blocks infinitely.
Checked that it is not a permission or locking problem.
The bug was silent until we performed a planned restart of few VMs and some of VMs failed to start (kvm process timed out).
It could be related to recent upgrades luminous to nautilus or proxmox 5 to 6.
Ceph backend is clean, no observable problems, all mons/mgrs/osds up and running. Network is ok.
Nothing in logs relevant to the problem.
ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
kernel 5.3.13-2-pve #1 SMP PVE 5.3.13-2 (Fri, 24 Jan 2020 09:49:36 +0100) x86_64 GNU/Linux
HEALTH_OK
No locks:
# rbd status rbd-technet/vm-402-disk-0
Watchers: none
# rbd status rbd-technet/vm-402-disk-1
Watchers: none
Normal image vs problematic:
# rbd object-map check rbd-technet/vm-402-disk-0
Object Map Check: 100% complete…done.
# rbd object-map check rbd-technet/vm-402-disk-1
^C
disk-0 is good while disk-1 is effectively lost. Command hangs for many minutes with no visible activity, interrupted.
rbd export runs without problems, however some data is lost after being imported back (ext4 errors).
rbd deep copy worked for me. Copy looks good, no errors.
# rbd info rbd-technet/vm-402-disk-1
rbd image 'vm-402-disk-1':
size 16 GiB in 4096 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: c600d06b8b4567
block_name_prefix: rbd_data.c600d06b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
op_features:
flags:
create_timestamp: Fri Jan 31 17:50:50 2020
access_timestamp: Sat Mar 7 00:30:53 2020
modify_timestamp: Sat Mar 7 00:33:35 2020
journal: c600d06b8b4567
mirroring state: disabled
What can be done to debug this problem?
Thanks,
Ilia.
Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.
Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.
The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i…
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
Before equipping all nodes with these SSDs, I did some tests with "fio"
as recommended, e.g. like this:
fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
The results are as the following:
-----------------------
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
-----------------------
So the read speed is impressive, but the write speed is really bad.
Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse for writing:
-----------------------
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
-----------------------
I did some research and found out, that the "--sync" flag sets the flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.
It seems that this relates to the fact that the write cache is only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.
However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2 TB cluster seems not really affordable/viable.
So, can please anyone give me hints what to do? Is it possible to ensure
that the write cache is not disabled in some way (my server is situated
in a data center, so there will probably never be loss of power).
Or is the link above already outdated as newer ceph releases somehow
deal with this problem? Or maybe a later Debian release (10) will handle
the O_DSYNC flag differently?
Perhaps I should simply invest in faster (and bigger) harddisks and
forget the SSD-cluster idea?
Thank you in advance for any help,
Best Regards,
Hermann
--
hermann(a)qwer.tk
PGP/GPG: 299893C7 (on keyservers)
I'm trying to deploy a ceph cluster with a cephadm tool. I've already successfully done all steps except adding OSDs. My testing equipment consists of three hosts. Each host has SSD storage, where OS is installed into. On that storage I created partition, which can be used as a ceph block.db. Hosts have also 2 additional HDs (spinning drives) for OSD data. On docs I couldn't find how to deploy such configuration. Do you have any hints, how to do that?
Thanks for help!
Many students can’t complete their assignments within the assigned date because of some unavoidable circumstances in Hong Kong. It’s true that your mental stress raises hindrance and disturbs your concentration for your work. Because of stress and tension, you will not have enough thoughts to make your work effective. This situation directs you to the platform of Assignment Help services even in Hong Kong. This is because an unstable mind could not generate the right ideas to compose a worthy assignment. Make your assignment informative and productive using the online writing services of assignment experts. Professionals have a better understanding of the subject and know how to frame all information in the right format. So, you can use online assignment help when you have issues in composing your academic papers irrespective of any subject.
https://www.greatassignmenthelp.com/hk/
Hello everybody,
Can somebody add support for Debian buster and ceph-deploy:
https://tracker.ceph.com/issues/42870
Highly appreciated,
Regards,
Jelle de Jong
Dear all,
After enabling "allow_standby_replay" on our cluster we are getting
(lots) of identical errors on the client /var/log/messages like
Apr 29 14:21:26 hal kernel: ceph: mdsmap_decode got incorrect
state(up:standby-replay)
We are using the ml kernel 5.6.4-1.el7 on Scientific Linux 7.8
Cluster and client are running Ceph v14.2.9
Setting was enabled with:
# ceph fs set cephfs allow_standby_replay true
[root@ceph-s1 ~]# ceph mds stat
cephfs:1 {0=ceph-s3=up:active} 1 up:standby-replay 2 up:standby
Is this something to worry about, or should we just disable
allow_standby_replay ?
any advice appreciated,
many thanks
Jake
Note: I am working from home until further notice.
For help, contact unixadmin(a)mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile 0776 9886539