Why to wait with upgrade from Ceph Pacific with non-root user to Quincy
v16.2.7 -> v17.2.0
Intro
As official Ceph blog says:
Quincy is the 17th stable release of Ceph. It is named after Squidward Quincy Tentacles from Spongebob Squarepants.
You can read changes from Pacific within mentioned blog, but there is one caveat, if you are using Ceph Pacific with non-root user for example as we do with cephadmin user and If you decide to upgrade your cluster to a new release of Quincy, you will run into a problem.
I tested the upgrade path from Ceph Pacific 16.2.7 to Ceph Quincy 17.2.0 on a small testing cluster. Basically, I just bootstrapped a Ceph cluster with cephadm and make sure I have 3 MGRs, 3 MONs and 9 OSDs.
I'm running commands as cephadmin user with SSH passwordless access to all the nodes in a cluster with sudo rights.
All hosts have _admin
label. The _admin
label will make cephadm maintain a copy of the ceph.conf
file and a client.admin
keyring file in /etc/ceph
and there is an issue, but first I will guide you through what needs to be done during the upgrade.
Bootstrap
sudo cephadm bootstrap --mon-ip 192.168.250.101 \
--ssh-user cephadmin \
--ssh-private-key /home/cephadmin/.ssh/id_rsa \
--ssh-public-key /home/cephadmin/.ssh/id_rsa.pub
Cluster needs to be in HEALTH_OK state before upgrade!
Upgrade
Cluster was deployed with cephadm, then the upgrade process is entirely automated.
To initiate the upgrade, we need to run
sudo cephadm shell -- ceph orch upgrade start --ceph-version 17.2.0
and we can watch upgrade process with
sudo cephadm shell -- ceph -W cephadm
The output of the ceph -W cephadm
Inferring fsid fe3bbdd0-778b-11ec-9739-3bc8299859eb
Using recent ceph image quay.io/ceph/ceph@sha256:6f2e9e45515e003fb332bbf9302c55d604810ff35978e88b75fe005a5f470f41
cluster:
id: fe3bbdd0-778b-11ec-9739-3bc8299859eb
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 13m)
mgr: ceph-01.ksjkto(active, since 12m), standbys: ceph-03.wzuhyf, ceph-02.rcxyiz
osd: 9 osds: 9 up (since 12m), 9 in (since 3h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 51 MiB used, 180 GiB / 180 GiB avail
pgs: 1 active+clean
progress:
Upgrade to 17.2.0 (3s)
[............................]
2022-05-04T11:43:04.882601+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Need to upgrade myself (mgr.ceph-01.ksjkto)
2022-05-04T11:43:06.579199+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Updating mgr.ceph-03.wzuhyf
2022-05-04T11:43:06.645322+0000 mgr.ceph-01.ksjkto [INF] Deploying daemon mgr.ceph-03.wzuhyf on ceph-03
2022-05-04T11:43:16.444486+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Need to upgrade myself (mgr.ceph-01.ksjkto)
2022-05-04T11:43:16.457313+0000 mgr.ceph-01.ksjkto [INF] Failing over to other MGR
2022-05-04T11:43:32.085494+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-03
2022-05-04T11:43:32.298550+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-02
2022-05-04T11:43:32.729798+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-01
2022-05-04T11:43:36.512301+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Bus STARTING
2022-05-04T11:43:36.666814+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Serving on https://192.168.250.103:7150
2022-05-04T11:43:36.667079+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Bus STARTED
2022-05-04T11:43:50.861792+0000 mgr.ceph-03.wzuhyf [INF] Updating ceph-03:/etc/ceph/ceph.conf
2022-05-04T11:43:51.011186+0000 mgr.ceph-03.wzuhyf [ERR] Unable to write ceph-03:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
await asyncssh.scp(f.name, (conn, tmp_path))
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
await source.run(srcpath)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
self.handle_error(exc)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
raise exc from None
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
await self._send_files(path, b'')
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files
self.handle_error(exc)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
raise exc from None
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files
await self._send_file(srcpath, dstpath, attrs)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file
await self._make_cd_request(b'C', attrs, size, srcpath)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request
self._fs.basename(path))
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request
raise exc
asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
2022-05-04T11:43:51.018169+0000 mgr.ceph-03.wzuhyf [ERR] executing refresh((['ceph-01', 'ceph-02', 'ceph-03'],)) failed.
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
await asyncssh.scp(f.name, (conn, tmp_path))
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
await source.run(srcpath)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
self.handle_error(exc)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
raise exc from None
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
await self._send_files(path, b'')
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files
self.handle_error(exc)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
raise exc from None
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files
await self._send_file(srcpath, dstpath, attrs)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file
await self._make_cd_request(b'C', attrs, size, srcpath)
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request
self._fs.basename(path))
File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request
raise exc
asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
As we can see there is an error
scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
what basically means if the default ssh user we use isn't root we might be running this with too low of permissions.
As a workaround is suggested to remove _admin
label from hosts, what will remove ceph.conf
and ceph.client.admin.keyring
from /etc/ceph
directory, but then we can't use CLI commands.
We can backup ceph.*
files and then copy them back from copies within directory to have access to run CLI commands and upgrade the Ceph cluster what actually works, because there is no task during upgrade to place configuration files to the hosts, and we see only message during upgrade like
2022-05-04T09:12:16.213318+0000 mgr.ceph-01.ksjkto [WRN] unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place <ServiceSpec for service_name=mon>: No matching hosts for label _admin
but upgrade will end with
2022-05-04T09:25:23.775510+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Complete!
After upgrade, we will have all components upgraded, but if we decide to put _admin
label back on the host we will get an error in log like
[ERR]executing refresh((['ceph-01', 'ceph-02', 'ceph-03'],)) failed. Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh self._write_client_files(client_files, host) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in _write_client_files self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in write_remote_file host, path, content, mode, uid, gid, addr)) File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async return self.event_loop.get_result(coro) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result return asyncio.run_coroutine_threadsafe(coro, self._loop).result() File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in _write_remote_file raise OrchestratorError(msg) orchestrator._interface.OrchestratorError: Unable to write ceph-01:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
and we need to maintain ceph.*
config files without cephadm what is not ideally.
Conlusion
We need to wait.
Here is the bug report and there is an PR to solve this issue. PR is a part of the another PR. I hope that a back-port release will be released soon, and we could update to the Quincy without issues.