Hi
Last week we created an NFS service like this:
"
ceph nfs cluster create jumbo "ceph-flash1,ceph-flash2,ceph-flash3"
--ingress --virtual_ip 172.21.15.74/22 --ingress-mode keepalive-only
"
Worked like a charm. Yesterday we upgraded from 17.2.7 to 18.20.0 and
the NFS virtual IP seems to have gone missing in the process:
"
# ceph nfs cluster info jumbo
{
"jumbo": {
"backend": [
{
"hostname": "ceph-flash1",
"ip": "172.21.15.148",
"port": 2049
}
],
"virtual_ip": null
}
}
"
Service spec:
"
service_type: nfs
service_id: jumbo
service_name: nfs.jumbo
placement:
count: 1
hosts:
- ceph-flash1
- ceph-flash2
- ceph-flash3
spec:
port: 2049
virtual_ip: 172.21.15.74
"
I've tried restarting the nfs.jumbo service which didn't help. Suggestions?
Mvh.
Torkil
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: torkil(a)drcmr.dk
Hi,
I'm running Pacific 16.2.4 and I want to start a manual pg split
process on the data pool (from 2048 to 4096). I'm reluctant to upgrade
to 16.2.14/15 at this point. Can I avoid the dups bug
(https://tracker.ceph.com/issues/53729) if I will increase the pgs
slowly with 32 or 64pgs at every increment instead of moving directly to
4096 ? I don't have the autoscaler enabled.
Thanks.
Hello.
I have a ceph cluster which works in stretch mode:
*DC1:*
node1 (osd, mon, mgr)
node2 (osd, mon)
node3 (osd, mds)
*DC2:*
node1 (osd, mon, mgr)
node2 (osd, mon)
node3 (osd, mds)
*DC3:*
node1 (mon)
Datacenters are distributions between different locations.
I use RBD on my clients.
How can I set up my clients to connect to the local datacenter?
I don't need big traffic between datacenters only for replications, so my
client in DC1 should connect to ceph in DC1. But when something happened
with DC1 my clients in DC1 should still be working. In configs on my
clients I set up all cluster monitors.
Is this possible at all?
Hello,
last week I've got a HEALTH_OK on our CEPH cluster and I
started upgrade firmware in network cards.
When I had upgraded the sixth card from nine (one-by-one), this
server didn't started correctly and our ProxMox had problem with
accessing disk images on CEPH.
rbd ls pool
was OK, but:
rbd ls pool -l
didn't work. Our virtual servers had a trouble to work with
disks.
After I resolve network problem with OSD server, everythink
returning to normal state.
But I've found, that every OSD nod have very high activity: when
I've started 'iotop', there was very high load: around 180MB/s
read and 20MB/s write. In this time, cluster was in the HEALTH_OK
state. I've found, that there is a massive scrubbing activity...
After a few days, I have on our OSD nodes around 90MB/s read and
70MB/s write while 'ceph -s' have client io as 2,5MB/s read and
50MB/s write.
I've found in log file of our mon server many lines about
starting of scrubbing, but there are many messages about
starting of scrubb the same PG? I've grep'ed syslog for some of
them and attach it to this e-mail.
Is this activity OK? Why CEPH start scrubing this PG once and
once again?
And another question: Is scrubbing part of mClock scheduler?
Many thanks for explanation.
Sincerely
Jan Marek
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html
Hi
When I deploy my cluster I didn't notice on two of my servers the private
network was not working (wrong vlan), now it's working, but how can I check
the it's indeed working (currently I don't have data).
Regards
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
lun. 29 janv. 2024 22:36:01 CET
Hi
We put a host in maintenance and had issues bringing it back.
Is there a safe way of exiting maintenance while the host is unreachable / offline?
We would like the cluster to rebalance while we are working to get this host back online.
Maintenance was set using:
ceph orch host maintenance enter osd1
I tried exiting using:
ceph orch host maintenance exit osd1
but got the below stacktrace.
root@mon1 ~ # ceph orch host maintenance exit osd1
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 455, in _host_maintenance_exit
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
Thanks
Bryce
Bryce Nicholls
OpenStack Engineer
Bryce.Nicholls92(a)thehutgroup.com
[THG Ingenuity Logo]<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk> [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgplc?lang=en>
Hey All,
We will be having a Ceph science/research/big cluster call on Wednesday
January 31st. If anyone wants to discuss something specific they can add
it to the pad linked below. If you have questions or comments you can
contact me.
This is an informal open call of community members mostly from
hpc/htc/research/big cluster environments (though anyone is welcome)
where we discuss whatever is on our minds regarding ceph. Updates,
outages, features, maintenance, etc...there is no set presenter but I do
attempt to keep the conversation lively.
Pad URL:
https://pad.ceph.com/p/Ceph_Science_User_Group_20240131
Virtual event details:
January 31, 2024
15:00 UTC
4pm Central European
9am Central US
Description: Main pad for discussions:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone:
https://meet.jit.si/ceph-science-wg
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS/TROPICS
Space Science & Engineering Center
University of Wisconsin-Madison
Hi all,
how can radosgw be deployed manually? For Ceph cluster deployment,
there is still (fortunately!) a documented method which works flawlessly
even in Reef:
https://docs.ceph.com/en/latest/install/manual-deployment/#monitor-bootstra…
But as for radosgw, there is no such description, unless I am missing
something. Even going back to the oldest docs still available at
docs.ceph.com (mimic), the radosgw installation is described
only using ceph-deploy:
https://docs.ceph.com/en/mimic/install/install-ceph-gateway/
Is it possible to install a new radosgw instance manually?
If so, how can I do it?
Thanks!
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
Hi,
Other than get all objects of the pool and filter by image ID,
is there any easier way to get the number of allocated objects for
a RBD image?
What I really want to know is the actual usage of an image.
An allocated object could be used partially, but that's fine,
no need to be 100% accurate. To get the object count and
times object size, that should be sufficient.
"rbd export" exports actual used data, but to get the actual usage
by exporting the image seems too much. This brings up another
question, is there any way to know the export size before running it?
Thanks!
Tony