Hello All,
I have a HW RAID based 240 TB data pool with about 200 million files for
users in a scientific institution. Data sizes range from tiny parameter
files for scientific calculations and experiments to huge images of
brain scans. There are group directories, home directories, Windows
roaming profile directories organized in ZFS pools on Solaris operating
systems, exported via NFS and Samba to Linux, macOS, and Windows clients.
I would like to switch to CephFS because of the flexibility and
expandability but I cannot find any recommendations for which storage
backend would be suitable for all the functionality we have.
Since I like the features of ZFS like immediate snapshots of very large
data pools, quotas for each file system within hierarchical data trees
and dynamic expandability by simply adding new disks or disk images
without manual resizing would it be a good idea to create RBD images,
map them onto the file servers and create zpools on the mapped images? I
know that ZFS best works with raw disks but maybe a RBD image is close
enough to a raw disk?
Or would CephFS be the way to go? Can there be multiple CephFS pools for
the group data folders and for the user's home directory folders for
example or do I have to have everything in one single file space?
Maybe someone can share his or her field experience?
Thank you very much.
Best regards
Willi
Hi all,
I am new to Ceph. But I have a some good understanding of iSCSI protocol. I
will dive into Ceph because it looks promising. I am particularly
interested in Ceph-RBD. I have a request. Can you please tell me, if any,
what are the common similarities between iSCSI and Ceph. If someone has to
work on a common model for iSCSI and Ceph, what would be those significant
points you would suggest to someone who has some understanding of iSCSI?
Looking forward to answers. Thanks in advance :-)
BR
Hello, Ceph users,
does anybody use Ceph on recently released CentOS 8? Apparently there are
no el8 packages neither at download.ceph.com, nor in the native CentOS package
tree. I am thinking about upgrading my cluster to C8 (because of other
software running on it apart from Ceph). Do el7 packages simply work?
Can they be rebuilt using rpmbuild --rebuild? Or is running Ceph on
C8 more complicated than that?
Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
laryross> As far as stealing... we call it sharing here. --from rcgroups
Hello Team,
We've integrated Ceph cluster storage with Kubernetes and provisioning
volumes through rbd-provisioner. When we're creating volumes from yaml
files in Kubernetes, pv > pvc > mounting to pod, In kubernetes end pvc are
showing as meaningful naming convention as per yaml file defined. But in
ceph cluster, rbd image name is creating with dynamic uid.
During troubleshooting time, this will be tedious to find exact rbd image.
Please find the provisioning logs in below pasted snippet.
kubectl get pods,pv,pvc
NAME READY STATUS RESTARTS AGE
pod/sleepypod 1/1 Running 0 4m9s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
persistentvolume/pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO Delete
Bound default/test-dyn-pvc ceph-rbd 4m9s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/test-dyn-pvc Bound
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO ceph-rbd 4m11s
*rbd-provisioner logs*
I1121 10:59:15.009012 1 provision.go:132] successfully created rbd image
"kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2" I1121
10:59:15.009092 1 controller.go:1087] provision "default/test-dyn-pvc"
class "ceph-rbd": volume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5"
provisioned I1121 10:59:15.009138 1 controller.go:1101] provision
"default/test-dyn-pvc" class "ceph-rbd": trying to save persistentvvolume
"pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" I1121 10:59:15.020418 1
controller.go:1108] provision "default/test-dyn-pvc" class "ceph-rbd":
persistentvolume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" saved I1121
10:59:15.020476 1 controller.go:1149] provision "default/test-dyn-pvc"
class "ceph-rbd": succeeded I1121 10:59:15.020802 1 event.go:221]
Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default",
Name:"test-dyn-pvc", UID:"cd37d2d6-cecc-4a05-9736-c8d80abde7f5",
APIVersion:"v1", ResourceVersion:"24545639", FieldPath:""}): type: 'Normal'
reason: 'ProvisioningSucceeded' Successfully provisioned volume
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5
*rbd image details in Ceph cluster end*
rbd -p kube ls --long
NAME SIZE PARENT FMT PROT LOCK
kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2 1 GiB 2
is there way to setup proper naming convention for rbd image as well during
kubernetes deployment itself.
Kubernetes version: v1.15.5
Ceph cluster version: 14.2.2 nautilus (stable)
*Best Regards,*
*Palanisamy*
hi,every one,
my ceph version 12.2.12,I want to set require min compat client
luminous,I use command
#ceph osd set-require-min-compat-client luminous
but ceph report:Error EPERM: cannot set require_min_compat_client to
luminous: 4 connected client(s) look like jewel (missing
0xa00000000200000); add --yes-i-really-mean-it to do it anyway
[root@node-1 ~]# ceph features
{
"mon": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 15
}
},
"client": {
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 4
},
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 168
}
}
}
so,I run command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client luminous
--yes-i-really-mean-it
set require_min_compat_client to luminous
but now,I want to set require min compat client jewel,I use command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client jewel
Error EPERM: osdmap current utilizes features that require luminous;
cannot set require_min_compat_client below that to jewel
what‘s the way we are set luminous chang to jewel?
Hi Folks
We are using Ceph as our storage backend on our 6 Node Proxmox VM Cluster. To Monitor our systems we use Zabbix and i would like to get some Ceph Data into our Zabbix to get some alarms when something goes wrong.
Ceph mgr has a module, "zabbix" that uses "zabbix-sender" to actively send data, but i cannot get the module working. It always responds with "failed to send data"
The network side seems to be fine:
root@vm-2:~# traceroute 192.168.15.253
traceroute to 192.168.15.253 (192.168.15.253), 30 hops max, 60 byte packets
1 192.168.15.253 (192.168.15.253) 0.411 ms 0.402 ms 0.393 ms
root@vm-2:~# nmap -p 10051 192.168.15.253
Starting Nmap 7.70 ( https://nmap.org ) at 2019-09-18 08:40 CEST
Nmap scan report for 192.168.15.253
Host is up (0.00026s latency).
PORT STATE SERVICE
10051/tcp open zabbix-trapper
MAC Address: BA:F5:30:EF:40:EF (Unknown)
Nmap done: 1 IP address (1 host up) scanned in 0.61 seconds
root@vm-2:~# ceph zabbix config-show
{"zabbix_port": 10051, "zabbix_host": "192.168.15.253", "identifier": "VM-2", "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 60}
root@vm-2:~#
But if i try "ceph zabbix send" i get "failed to send data to zabbix" and this show up in the systems journal:
Sep 18 08:41:13 vm-2 ceph-mgr[54445]: 2019-09-18 08:41:13.272 7fe360fe4700 -1 mgr.server reply reply (1) Operation not permitted
The log of ceph-mgr on that machine states:
2019-09-18 08:42:18.188 7fe359fd6700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253392]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; seconds spent: 0.000179"}]
2019-09-18 08:43:18.217 7fe359fd6700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253629]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; seconds spent: 0.000321"}]
I'm guessing, this could have something to do with user rights. But i have no idea where to start to track this down.
Maybe someone here has a hint?
If more information is needed, i will gladly provide it.
greetings
Ingo
We've run into a problem on our test cluster this afternoon which is running Nautilus (14.2.2). It seems that any time PGs move on the cluster (from marking an OSD down, setting the primary-affinity to 0, or by using the balancer), a large number of the OSDs in the cluster peg the CPU cores they're running on for a while which causes slow requests. From what I can tell it appears to be related to slow peering caused by osd_pg_create() taking a long time.
This was seen on quite a few OSDs while waiting for peering to complete:
# ceph daemon osd.3 ops
{
"ops": [
{
"description": "osd_pg_create(e179061 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
"initiated_at": "2019-08-27 14:34:46.556413",
"age": 318.25234538000001,
"duration": 318.25241895300002,
"type_data": {
"flag_point": "started",
"events": [
{
"time": "2019-08-27 14:34:46.556413",
"event": "initiated"
},
{
"time": "2019-08-27 14:34:46.556413",
"event": "header_read"
},
{
"time": "2019-08-27 14:34:46.556299",
"event": "throttled"
},
{
"time": "2019-08-27 14:34:46.556456",
"event": "all_read"
},
{
"time": "2019-08-27 14:35:12.456901",
"event": "dispatched"
},
{
"time": "2019-08-27 14:35:12.456903",
"event": "wait for new map"
},
{
"time": "2019-08-27 14:40:01.292346",
"event": "started"
}
]
}
},
...snip...
{
"description": "osd_pg_create(e179066 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
"initiated_at": "2019-08-27 14:35:09.908567",
"age": 294.900191001,
"duration": 294.90068416899999,
"type_data": {
"flag_point": "delayed",
"events": [
{
"time": "2019-08-27 14:35:09.908567",
"event": "initiated"
},
{
"time": "2019-08-27 14:35:09.908567",
"event": "header_read"
},
{
"time": "2019-08-27 14:35:09.908520",
"event": "throttled"
},
{
"time": "2019-08-27 14:35:09.908617",
"event": "all_read"
},
{
"time": "2019-08-27 14:35:12.456921",
"event": "dispatched"
},
{
"time": "2019-08-27 14:35:12.456923",
"event": "wait for new map"
}
]
}
}
],
"num_ops": 6
}
That "wait for new map" message made us think something was getting hung up on the monitors, so we restarted them all without any luck.
I'll keep investigating, but so far my google searches aren't pulling anything up so I wanted to see if anyone else is running into this?
Thanks,
Bryan