Hi,
I am new to ceph ... i am trying to increase object file size .. i can upload file size upto 128MB .. how can i upload more than 128MB file .
i can upload file using this
rados --pool z10 put testfile-128M.txt testfile-128M.txt
Thats ok when the file size is upto 128MB
But its not ok when the file size is >128MB
Got this error
error putting z13/testfile-129MB.txt: (27) File too large
Please help
Tapas Jana
Hi
Ceph: nautilus (14.2.2)
NFS-Ganesha v 2.8
ceph-ansible stable 4.0 << git checkout 28th Aug
CentOS 7
I am trying to do a fresh installation using Ceph Ansible and I am
getting the following error when running the playbook. I have not
enabled or config dashboard/grafana/prometheus yet.
fatal: [stor1]: FAILED! =>
msg: |-
The task includes an option with an undefined variable. The error
was: No first item, sequence was empty.
The error appears to be in
'/usr/share/ceph-ansible/roles/ceph-facts/tasks/facts.yml': line 314,
column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: set grafana_server_addr fact - ipv4
^ here
If I remove [grafana-server] from the /etc/ansible/hosts file I get
fatal: [stor1]: FAILED! => changed=false
msg: you must add a [grafana-server] group and add at least one node.
Any idea what might be causing this?
Mandi! Alwin Antreich
In chel di` si favelave...
> > There's something i can do? Thanks.
> Did you go through our upgrade guide(s)?
Sure!
> See the link [0] below, for the
> permission changes. They are needed when an upgrade from Hammer to Jewel
> is done.
Sure! The problem arise in the 'Set partition type' section, because:
root@deadpool:~# for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do echo $l; blkid -o udev -p $l; echo ""; done
/dev/sda5
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_UUID=9c277a97-05
ID_PART_ENTRY_TYPE=0xfd
ID_PART_ENTRY_NUMBER=5
ID_PART_ENTRY_OFFSET=546877440
ID_PART_ENTRY_SIZE=97654784
ID_PART_ENTRY_DISK=8:0
/dev/sda6
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_UUID=9c277a97-06
ID_PART_ENTRY_TYPE=0xfd
ID_PART_ENTRY_NUMBER=6
ID_PART_ENTRY_OFFSET=644534272
ID_PART_ENTRY_SIZE=97654784
ID_PART_ENTRY_DISK=8:0
/dev/sdb7
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_UUID=802474ca-07
ID_PART_ENTRY_TYPE=0xfd
ID_PART_ENTRY_NUMBER=7
ID_PART_ENTRY_OFFSET=742191104
ID_PART_ENTRY_SIZE=97654784
ID_PART_ENTRY_DISK=8:16
/dev/sdb8
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_UUID=802474ca-08
ID_PART_ENTRY_TYPE=0xfd
ID_PART_ENTRY_NUMBER=8
ID_PART_ENTRY_OFFSET=839847936
ID_PART_ENTRY_SIZE=97853440
ID_PART_ENTRY_DISK=8:16
As stated, partitions are 'DOS'...
--
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/
Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN)
marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797
Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
Hello Marco,
On Thu, Aug 29, 2019 at 12:55:56PM +0200, Marco Gaiarin wrote:
>
> I've just finished a double upgrade on my ceph (PVE-based) from hammer
> to jewel and from jewel to luminous.
>
> All went well, apart that... OSD does not restart automatically,
> because permission troubles on the journal:
>
> Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to open journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied
> Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store
> Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission denied#033[0m
>
>
> A little fast rewind: when i've setup the cluster i've used some 'old'
> servers, using a couple of SSD disks as SO and as journal.
> Because servers was old, i was forced to partition the boot disk in
> DOS, not GPT mode.
>
> While creating the OSD, i've received some warnings:
>
> WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. Symlinking directly.
>
>
> Looking at the cluster now, seems to me that osd init scripts try to
> idetify journal based on GPT partition label/info, and clearly fail.
>
>
> Not that if i do, on servers that hold OSD:
>
> for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown ceph: $l; done
>
> OSD start flawlessy.
>
>
> There's something i can do? Thanks.
Did you go through our upgrade guide(s)? See the link [0] below, for the
permission changes. They are needed when an upgrade from Hammer to Jewel
is done.
On the wiki you can also find the upgrade guides for PVE 5.x -> 6.x and
Luminous -> Nautilus.
--
Cheers,
Alwin
[0] https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel#Set_permission
Hi,
I'm running a small nautilus cluster (14.2.2) which was recently
upgraded from mimic (13.2.6). After the upgrade I enabled the
pg_autoscaler which resulted in most of the pools having their pg count
changed. All the remapping has completed but the cluster is still
reporting a HEALTH_WARN. I have adjusted the target ratios such that
sum < 1.0 but this didn't help. What else can I look at?
Thanks,
James
# ceph -s
cluster:
id: ...
health: HEALTH_WARN
1 subtrees have overcommitted pool target_size_bytes
1 subtrees have overcommitted pool target_size_ratio
services:
mon: 3 daemons, quorum ceph-00,ceph-01,ceph-02 (age 3d)
mgr: ceph-01(active, since 6d), standbys: ceph-02, ceph-00
osd: 32 osds: 32 up (since 2d), 32 in (since 2d)
rgw: 1 daemon active (rgw-00)
data:
pools: 14 pools, 1512 pgs
objects: 4.17M objects, 16 TiB
usage: 47 TiB used, 69 TiB / 116 TiB avail
pgs: 1510 active+clean
2 active+clean+scrubbing+deep
# ceph osd pool autoscale-status (this might wrap horribly...):
POOL SIZE TARGET SIZE RATE RAW CAPACITY
RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
loc.rgw.buckets.index 0 3.0 116.1T
0.0000 1.0 4 on
vms1 5318G 3.0 116.1T
0.1341 0.2000 1.0 256 on
vms2 3419G 3.0 116.1T
0.0862 0.0200 1.0 64 on
.rgw.root 3648k 3.0 116.1T
0.0000 1.0 4 on
default.rgw.meta 384.0k 3.0 116.1T
0.0000 1.0 4 on
lov.rgw.log 384.0k 3.0 116.1T
0.0000 1.0 4 on
vms3 35799G 3.0 116.1T
0.9028 0.6000 1.0 1024 on
default.rgw.control 0 3.0 116.1T
0.0000 1.0 4 on
loc.rgw.meta 768.5k 3.0 116.1T
0.0000 1.0 4 on
vms4 2306G 3.0 116.1T
0.0582 0.1000 1.0 128 on
loc.rgw.buckets.non-ec 200.4k 3.0 116.1T
0.0000 1.0 4 on
loc.rgw.buckets.data 56390M 3.0 116.1T
0.0014 1.0 4 on
loc.rgw.control 0 3.0 116.1T
0.0000 1.0 4 on
default.rgw.log 0 3.0 116.1T
0.0000 1.0 4 on
Zynstra is a private limited company registered in England and Wales (registered number 07864369). Our registered office and Headquarters are at The Innovation Centre, Broad Quay, Bath, BA1 1UD. This email, its contents and any attachments are confidential. If you have received this message in error please delete it from your system and advise the sender immediately.
Hi,
I have a cluster running on Ubuntu Bionic, with stock Ubuntu Ceph packages. When upgrading, I always try to follow the procedure as documented here: https://docs.ceph.com/docs/master/install/upgrading-ceph/
However, the Ubuntu packages restart all daemons upon upgrade, per node. So if I upgrade the first node, it will restart mon, osds, rgw, and mds'es on that node, even though the rest of the cluster is running the old version.
I tried upgrading a single package, to see how that goes, but due to dependencies in dpkg, all other packages are upgraded as well.
How should I proceed?
Thanks,
--
Mark Schouten <mark(a)tuxis.nl>
Tuxis, Ede, https://www.tuxis.nl
T: +31 318 200208
Hello,
I've been facing some issues with a single node ceph cluster (mimic). I
know an environment like this shouldn't be in production but the server end
up dealing with operational workloads for the last 2 years.
Some users detected some issues in cephfs; some files not being accessible
and hanging the node while trying to list the content of affected folders.
I noticed a heavy memory load on the server. Main memory was consumed by
cache as well as quite a reasonable swap.
The command "ceph health detail" reported some inactive PGs. Those PGs
didn't exist.
After rebooting the node, an fsck was run in the 3 affected OSDs.
ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-1/
Unfortunately, all of them crashed with a core dump and now they don't
start anymore.
The logs report messages like:
2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb:
[/build/ceph-13.2.1/src/rocksdb/db/version_set.cc:3088] Recovering from
manifest file: MANIFEST-004059
2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb:
[/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all
background work
2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb:
[/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2019-08-28 03:00:12.999 7f21d787c240 -1 rocksdb: NotFound:
2019-08-28 03:00:12.999 7f21d787c240 -1 bluestore(/var/lib/ceph/osd/ceph-0)
_open_db erroring opening db:
2019-08-28 03:00:12.999 7f21d787c240 1 bluefs umount
2019-08-28 03:00:12.999 7f21d787c240 1 stupidalloc 0x0x5650c5255800
shutdown
2019-08-28 03:00:12.999 7f21d787c240 1 bdev(0x5650c5604a80
/var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.247 7f21d787c240 1 bdev(0x5650c5604700
/var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.479 7f21d787c240 -1 osd.0 0 OSD:init: unable to mount
object store
2019-08-28 03:00:13.479 7f21d787c240 -1 ** ERROR: osd init failed: (5)
Input/output error
I'm not sure if the fsck has introduced additional damage.
After that, I tried to mark unfound as lost with the following commands:
ceph pg 4.1e mark_unfound_lost revert
ceph pg 9.1d mark_unfound_lost revert
ceph pg 13.3 mark_unfound_lost revert
ceph pg 13.e mark_unfound_lost revert
Currently, since there are 3 OSD down, there are:
316 unclean PGs
76 inactive PGs
root@ceph-s01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-2 0.43599 root ssd
-4 0.43599 disktype ssd_disk
12 ssd 0.43599 osd.12 up 1.00000 1.00000
-1 60.03792 root default
-5 60.03792 disktype hdd_disk
0 hdd 0 osd.0 down 1.00000 1.00000
1 hdd 5.45799 osd.1 down 0 1.00000
2 hdd 5.45799 osd.2 up 1.00000 1.00000
3 hdd 5.45799 osd.3 up 1.00000 1.00000
4 hdd 5.45799 osd.4 up 1.00000 1.00000
5 hdd 5.45799 osd.5 up 1.00000 1.00000
6 hdd 5.45799 osd.6 up 1.00000 1.00000
7 hdd 5.45799 osd.7 down 0 1.00000
8 hdd 5.45799 osd.8 up 1.00000 1.00000
9 hdd 5.45799 osd.9 up 1.00000 1.00000
10 hdd 5.45799 osd.10 up 1.00000 1.00000
11 hdd 5.45799 osd.11 up 1.00000 1.00000
Running the following command, a MANIFEST file appeared in the folder
db/lost. I guess that the repair moved here.
# ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-7
--out-dir osd7/
...
db/LOCK
db/MANIFEST-000001
db/OPTIONS-018543
db/OPTIONS-018581
db/lost/
db/lost/MANIFEST-018578
Any ideas? Suggestions?
Thank you.
Regards,
Jordi
I have an OSD that is throwing sense errors - It's at it's end of life and needs to be replaced.
The server is in the datacentre and I won't get there for a few weeks so I've stopped the service (systemctl stop ceph-osd@208) and let the cluster rebalance, all is well.
My thinking is that if for some reason the host that OSD208 resides within was to reboot, that OSD would start and become part of the cluster again.
So I'd like to prevent this OSD from ever starting again without physically being able to remove it from the server.
I was thinking that deleting it's key from the auth list might work. So a ceph osd purge 208
Then when the service tries to start it'll fail with an auth error.
Any other suggestions?
Cheers,
Cory