I'm running some tests with mixed storage units, and octopus.
8 nodes, each with 2 SSDs, and 8 HDDs .
the SSDsare relatively small: around 100GB each.
Im mapping 8 rbds, striping them together, and running fio on them for testing.
# fio --filename=/...../fio.testfile --size=120GB --rw=randrw --bs=8k --direct=1 --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=readwritelatency-test-job --runtime=120 --eta-newline=1
Trouble is, I'm seeing sporadic delays of IOs.
When I test ZFS, for example, it has this neat wait clumping status check:
zpool iostat -w 20
and it shows me that some write io's are taking over 4 secomds to complete. Many are taking 1s or 2s
This kind of thing has sort of happened before(but previously, I think I was using SSDs exclusively). When I emailed the list, people suggested turning off RBD cache, which worked great in that situation.
This time, I have already done that (I believe), but still see this behavior.
Would folks have any further suggestions to smooth performance out?
The odd thing is, I read that bluestore is supposed to smooth things out and provide consistent response time, but that doesnt seem to be the case.
Sample output from the zpool iostat below:
twelve total_wait disk_wait syncq_wait asyncq_wait
latency read write read write read write read write scrub trim
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
.. snip ...
1ms 1.01K 0 1.00K 0 0 0 0 0 0 0
2ms 29 0 29 18 0 0 0 1 0 0
4ms 23 3 23 14 0 0 0 3 0 0
8ms 64 6 64 9 0 0 0 7 0 0
16ms 74 10 74 59 0 0 0 11 0 0
33ms 24 17 24 154 0 0 0 19 0 0
67ms 7 25 7 100 0 0 0 26 0 0
134ms 3 40 3 36 0 0 0 36 0 0
268ms 1 59 1 18 0 0 0 59 0 0
536ms 0 116 0 3 0 0 0 113 0 0
1s 0 109 0 0 0 0 0 98 0 0
2s 0 24 0 0 0 0 0 20 0 0
4s 0 2 0 0 0 0 0 1 0 0
8s 0 0 0 0 0 0 0 0 0 0
17s 0 0 0 0 0 0 0 0 0 0
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Hello,
TL;DR
Looking for guidance on ceph-volume lvm activate --all as it would apply to
a containerized ceph deployment (Nautilus or Octopus).
Detail:
I’m planning to upgrade my Nautilus non-container cluster to Octopus
(eventually containerized). There’s an expanded procedure that was tested
and working in our lab, however won’t go into the whole process. My
question is around existing OSD hosts.
I have to re-platform the host OS, and one of the ways in the OSDs were
reactivated previously when this was done (non-containerized) was to
install ceph packages, deploy keys, config, etc. then run ceph-volume lvm
activate --all to magically bring up all OSDs.
Looking for a similar approach except if the OSDs are containerized, and I
re-platform the host OS (Centos -> Ubuntu), how could I reactivate all OSDs
as containers and avoid rebuilding data on the OSDs?
Thank you.
Hi, please if someone know how to help, I have an HDD pool in mycluster and
after rebooting one server, my osds has started to crash.
This pool is a backup pool and have OSD as failure domain with an size of 2.
After rebooting one server, My osds started to crash, and the thing is only
getting worse. I have then tried to run ceph-bluestore-tool repair and I
receive what I think is the same error that shows on the osd logs:
[root@cwvh13 ~]# ceph-bluestore-tool repair --path
/var/lib/ceph/osd/ceph-81 --log-level 10
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
In function 'virtual Allocator::SocketHook::~SocketHook()' thread
7f6467ffcec0 time 2021-03-11 12:13:12.121766
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
53: FAILED ceph_assert(r == 0)
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x7f645e1a7b27]
2: (()+0x25ccef) [0x7f645e1a7cef]
3: (()+0x3cd57f) [0x5642e85c457f]
4: (HybridAllocator::~HybridAllocator()+0x17) [0x5642e85f3f37]
5: (BlueStore::_close_alloc()+0x42) [0x5642e84379d2]
6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x5642e84bbac8]
7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x293) [0x5642e84bbf13]
8: (main()+0x13cc) [0x5642e83caaec]
9: (__libc_start_main()+0xf5) [0x7f645ae24555]
10: (()+0x1fae9f) [0x5642e83f1e9f]
Hello, what do you think about of ceph cluster made up of 6 nodes each one
with the following configuration ?
A+ Server 1113S-WN10RT
Barebone
Supermicro A+ Server 1113S-WN10RT - 1U - 10x U.2 NVMe - 2x M.2 - Dual
10-Gigabit LAN - 750W Redundant
Processor
AMD EPYC™ 7272 Processor 12-core 2.90GHz 64MB Cache (120W)
Memory
8 x 8GB PC4-25600 3200MHz DDR4 ECC RDIMM
U.2/U.3 NVMe Drive
5 x 8.0TB Intel® SSD DC P4510 Series U.2 PCIe 3.1 x4 NVMe Solid State Drive
Hard Drive
2 x 240GB Intel® SSD D3-S4610 Series 2.5" SATA 6.0Gb/s Solid State Drive
Network Card
2 x Intel® 10-Gigabit Ethernet Converged Network Adapter X710-DA2 (2x SFP+)
Server Management
Supermicro Update Manager (SUM) (OOB Management Package), included
Operating System
No Windows Operating System
Warranty
3 Year Depot Warranty (Return for Repair)
RAID U.2 [NVMe, 10 ports]
No RAID (*OS) -> 5 x 8.0TB Intel® SSD DC P4510 Series U.2 PCIe 3.1 x4 NVMe
Solid State Drive
Thanks
Ignazio
Hi! yesterday i bootstrapped (with cephadm) my first ceph installation
and things looked somehow ok .. but today the osds are not yet ready and
i have in dashboard this warnings:
MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
PG_AVAILABILITY: Reduced data availability: 64 pgs inactive
PG_DEGRADED: Degraded data redundancy: 2/14 objects degraded (14.286%),
66 pgs undersized
TOO_FEW_OSDS: OSD count 2 < osd_pool_default_size 3
and in logs:
3/12/21 12:18:19 PM
[INF]
OSD <1> is not empty yet. Waiting a bit more
3/12/21 12:18:19 PM
[INF]
OSD <0> is not empty yet. Waiting a bit more
3/12/21 12:18:19 PM
[INF]
Can't even stop one OSD. Cluster is probably busy. Retrying later..
3/12/21 12:18:19 PM
[ERR]
cmd: osd ok-to-stop failed with: 31 PGs are already too degraded, would
become too degraded or might become unavailable. (errno:-16)
this is a single node, whole package ceph install with 2 local nvme
drives as osds (to be used 2x replicated like a raid1 array)
So, can anyone tell me what is going on?
Thanks a lot!!
Adrian
Hi Ceph'ers,
I love the possibility to make snapshots on Cephfs systems.
Although there is one thing that puzzles me.
Creating snapshot takes no time to do and deleting snapshots can bring PGs into snaptrim state for some hours.
While recovering data from a snapshot will always invoke a full data transfer, where data are "physically" being copied back into place.
This can make recovering from snapshots on Cephfs a rather heavy procedure.
I have even tried "mv" command but that also starts transfer real data instead of just moving metadata pointers.
Am I missing some "ceph snapshot recover" command, that can move metadata pointers and make recovery much lighter, or is this just that way it is?
Best reagards,
Jesper
--------------------------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Gustav Wieds Vej 10
8000 Aarhus C
E-mail: jelka(a)mbg.au.dk
Tlf: +45 50906203
Hi cephers.
Recently we are upgrading ceph cluster from mimic to nautilus. We have 5
ranks and decrease max_mds from 5 down to 1 smoothly.
When we set max_mds from 2 to 1, the cluster show that rank 1 is failed,
those are mds logs:
2021-03-12 16:21:26.974 7f366e949700 1 mds.1.125077 handle_mds_map state
change up:boot --> up:replay
2021-03-12 16:21:26.974 7f366e949700 1 mds.1.125077 replay_start
2021-03-12 16:21:26.974 7f366e949700 1 mds.1.125077 recovery set is 0
2021-03-12 16:21:26.974 7f366e949700 1 mds.1.125077 waiting for osdmap
460461 (which blacklists prior instance)
2021-03-12 16:21:27.018 7f366893d700 0 mds.1.cache creating system inode
with ino:0x101
2021-03-12 16:21:27.019 7f366893d700 0 mds.1.cache creating system inode
with ino:0x1
2021-03-12 16:21:27.404 7f366713a700 0 mds.1.cache creating system inode
with ino:0x100
2021-03-12 16:21:27.407 7f366713a700 -1 log_channel(cluster) log [ERR] :
client client1: (2934972)loaded with preallocated inodes that are
inconsistent with inotable
2021-03-12 16:21:27.407 7f366713a700 -1 log_channel(cluster) log [ERR] :
client client2: (2862164)loaded with preallocated inodes that are
inconsistent with inotable
2021-03-12 16:21:27.407 7f366713a700 -1 log_channel(cluster) log [ERR] :
client client3: (2579839)loaded with preallocated inodes that are
inconsistent with inotable
2021-03-12 16:21:27.407 7f366713a700 -1 log_channel(cluster) log [ERR] :
client client4: (2579815)loaded with preallocated inodes that are
inconsistent with inotable
...
thanks for any help!
Hi,
I'm struggling with my old cluster cnamed address.
The s3 and curl commands are working properly with the not cnamed address, but with the cnamed one, I got this in the ciwetweb log:
2021-03-12 10:24:18.812329 7f6b0c527700 1 ====== starting new request req=0x7f6b0c520f90 =====
2021-03-12 10:24:18.812387 7f6b0c527700 2 req 10:0.000058::HEAD /::initializing for trans_id = tx00000000000000000000a-00604adee2-8e4fc3-default
2021-03-12 10:24:18.812412 7f6b0c527700 10 rgw api priority: s3=5 s3website=4
2021-03-12 10:24:18.812417 7f6b0c527700 10 host=cnamedhostname
2021-03-12 10:24:18.812484 7f6b0c527700 10 handler=25RGWHandler_REST_Bucket_S3
2021-03-12 10:24:18.812490 7f6b0c527700 2 req 10:0.000163:s3:HEAD /::getting op 3
2021-03-12 10:24:18.812499 7f6b0c527700 10 op=25RGWStatBucket_ObjStore_S3
2021-03-12 10:24:18.812503 7f6b0c527700 2 req 10:0.000176:s3:HEAD /:stat_bucket:verifying requester
2021-03-12 10:24:18.812541 7f6b0c527700 2 req 10:0.000214:s3:HEAD /:stat_bucket:normalizing buckets and tenants
2021-03-12 10:24:18.812548 7f6b0c527700 10 s->object=<NULL> s->bucket= cnamedhostname
2021-03-12 10:24:18.812556 7f6b0c527700 2 req 10:0.000229:s3:HEAD /:stat_bucket:init permissions
2021-03-12 10:24:18.812594 7f6b0c527700 10 cache get: name=default.rgw.meta+root+ cnamedhostname : type miss (requested=0x16, cached=0x0)
2021-03-12 10:24:18.813525 7f6b0c527700 10 cache put: name=default.rgw.meta+root+ cnamedhostname info.flags=0x0
2021-03-12 10:24:18.813554 7f6b0c527700 10 moving default.rgw.meta+root+ cnamedhostname to cache LRU end
2021-03-12 10:24:18.813664 7f6b0c527700 10 read_permissions on cnamedhostname [] ret=-2002
2021-03-12 10:24:18.813833 7f6b0c527700 2 req 10:0.001506:s3:HEAD /:stat_bucket:op status=0
2021-03-12 10:24:18.813848 7f6b0c527700 2 req 10:0.001520:s3:HEAD /:stat_bucket:http status=404
2021-03-12 10:24:18.813855 7f6b0c527700 1 ====== req done req=0x7f6b0c520f90 op status=0 http_status=404 ======
2021-03-12 10:24:18.813962 7f6b0c527700 1 civetweb: 0x557d45468000: 10.118.199.248 - - [12/Mar/2021:10:24:18 +0700] "HEAD / HTTP/1.1" 404 0 - curl/7.29.0
And I got this on the s3cmd verbose output:
DEBUG: s3cmd version 2.1.0
DEBUG: ConfigParser: Reading file '.s3cfg-testuser-http'
DEBUG: ConfigParser: access_key->29...17_chars...J
DEBUG: ConfigParser: secret_key->fK...37_chars...R
DEBUG: ConfigParser: host_base->cnamedhostname:80
DEBUG: ConfigParser: host_bucket->cnamedhostname:80/%(bucket)
DEBUG: ConfigParser: use_https->False
DEBUG: ConfigParser: signature_v2->True
DEBUG: Updating Config.Config cache_file ->
DEBUG: Updating Config.Config follow_symlinks -> False
DEBUG: Updating Config.Config verbosity -> 10
DEBUG: Unicodising 'ls' using UTF-8
DEBUG: Command: ls
DEBUG: CreateRequest: resource[uri]=/
DEBUG: Using signature v2
DEBUG: SignHeaders: u'GET\n\n\n\nx-amz-date:Fri, 12 Mar 2021 03:31:39 +0000\n/'
DEBUG: Processing request, please wait...
DEBUG: get_hostname(None): cnamedhostname
DEBUG: ConnMan.get(): creating new connection: http://cnamedhostname
DEBUG: non-proxied HTTPConnection(cnamedhostname, None)
DEBUG: format_uri(): /
DEBUG: Sending request method_string='GET', uri=u'/', headers={'Authorization': u'AWS 293WEU2ADWGIUO4RN39J:Q7kh7kzWXWSqMvUqqWwLOY6QKUE=', 'x-amz-date': 'Fri, 12 Mar 2021 03:31:39 +0000'}, body=(0 bytes)
DEBUG: ConnMan.put(): connection put back to pool (http://cnamedhostname#1)
DEBUG: Response:
{'data': '<?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code><RequestId>tx00000000000000000000b-00604ae09b-8e4fbd-default</RequestId><HostId>8e4fbd-default-default</HostId></Error>',
'headers': {'accept-ranges': 'bytes',
'content-length': '198',
'content-type': 'application/xml',
'date': 'Fri, 12 Mar 2021 03:31:39 GMT',
'x-amz-request-id': 'tx00000000000000000000b-00604ae09b-8e4fbd-default'},
'reason': 'Forbidden',
'status': 403}
DEBUG: S3Error: 403 (Forbidden)
DEBUG: HttpHeader: date: Fri, 12 Mar 2021 03:31:39 GMT
DEBUG: HttpHeader: content-length: 198
DEBUG: HttpHeader: x-amz-request-id: tx00000000000000000000b-00604ae09b-8e4fbd-default
DEBUG: HttpHeader: content-type: application/xml
DEBUG: HttpHeader: accept-ranges: bytes
DEBUG: ErrorXML: Code: 'SignatureDoesNotMatch'
DEBUG: ErrorXML: RequestId: 'tx00000000000000000000b-00604ae09b-8e4fbd-default'
DEBUG: ErrorXML: HostId: '8e4fbd-default-default'
ERROR: S3 error: 403 (SignatureDoesNotMatch)
Any idea?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Body:
We're happy to announce the 17th backport release in the Nautilus
series. We recommend users to update to this release. For a detailed
release notes with links & changelog please refer to the official blog
entry at https://ceph.io/releases/v14-2-17-nautilus-released
Notable Changes
---------------
* $pid expansion in config paths like `admin_socket` will now properly
expand to the daemon pid for commands like `ceph-mds` or `ceph-osd`.
Previously only `ceph-fuse`/`rbd-nbd` expanded `$pid` with the actual
daemon pid.
* RADOS: PG removal has been optimized in this release.
* RADOS: Memory allocations are tracked in finer detail in BlueStore and
displayed as a part of the ``dump_mempools`` command.
* cephfs: clients which acquire capabilities too quickly are throttled
to prevent instability. See new config option
``mds_session_cap_acquisition_throttle`` to control this behavior.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.17.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 2e95b5d99e0dec516803c8a1b57fbd2c8f45fd63