Why to wait with upgrade from Ceph Pacific with non-root user to Quincy

Why to wait with upgrade from Ceph Pacific with non-root user to Quincy

v16.2.7 -> v17.2.0

Intro

As official Ceph blog says:

Quincy is the 17th stable release of Ceph. It is named after Squidward Quincy Tentacles from Spongebob Squarepants.

You can read changes from Pacific within mentioned blog, but there is one caveat, if you are using Ceph Pacific with non-root user for example as we do with cephadmin user and If you decide to upgrade your cluster to a new release of Quincy, you will run into a problem.

I tested the upgrade path from Ceph Pacific 16.2.7 to Ceph Quincy 17.2.0 on a small testing cluster. Basically, I just bootstrapped a Ceph cluster with cephadm and make sure I have 3 MGRs, 3 MONs and 9 OSDs.

I'm running commands as cephadmin user with SSH passwordless access to all the nodes in a cluster with sudo rights.

All hosts have _admin label. The _admin label will make cephadm maintain a copy of the ceph.conf file and a client.admin keyring file in /etc/ceph and there is an issue, but first I will guide you through what needs to be done during the upgrade.

Bootstrap

sudo cephadm bootstrap --mon-ip 192.168.250.101 \
--ssh-user cephadmin \
--ssh-private-key /home/cephadmin/.ssh/id_rsa \
--ssh-public-key /home/cephadmin/.ssh/id_rsa.pub

Cluster needs to be in HEALTH_OK state before upgrade!

ceph.png

Upgrade

Cluster was deployed with cephadm, then the upgrade process is entirely automated.

To initiate the upgrade, we need to run

sudo cephadm shell -- ceph orch upgrade start --ceph-version 17.2.0

and we can watch upgrade process with

sudo cephadm shell -- ceph -W cephadm

The output of the ceph -W cephadm

Inferring fsid fe3bbdd0-778b-11ec-9739-3bc8299859eb
Using recent ceph image quay.io/ceph/ceph@sha256:6f2e9e45515e003fb332bbf9302c55d604810ff35978e88b75fe005a5f470f41
  cluster:
    id:     fe3bbdd0-778b-11ec-9739-3bc8299859eb
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 13m)
    mgr: ceph-01.ksjkto(active, since 12m), standbys: ceph-03.wzuhyf, ceph-02.rcxyiz
    osd: 9 osds: 9 up (since 12m), 9 in (since 3h)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   51 MiB used, 180 GiB / 180 GiB avail
    pgs:     1 active+clean

  progress:
    Upgrade to 17.2.0 (3s)
      [............................]


2022-05-04T11:43:04.882601+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Need to upgrade myself (mgr.ceph-01.ksjkto)
2022-05-04T11:43:06.579199+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Updating mgr.ceph-03.wzuhyf
2022-05-04T11:43:06.645322+0000 mgr.ceph-01.ksjkto [INF] Deploying daemon mgr.ceph-03.wzuhyf on ceph-03
2022-05-04T11:43:16.444486+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Need to upgrade myself (mgr.ceph-01.ksjkto)
2022-05-04T11:43:16.457313+0000 mgr.ceph-01.ksjkto [INF] Failing over to other MGR
2022-05-04T11:43:32.085494+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-03
2022-05-04T11:43:32.298550+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-02
2022-05-04T11:43:32.729798+0000 mgr.ceph-03.wzuhyf [INF] Deploying cephadm binary to ceph-01
2022-05-04T11:43:36.512301+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Bus STARTING
2022-05-04T11:43:36.666814+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Serving on https://192.168.250.103:7150
2022-05-04T11:43:36.667079+0000 mgr.ceph-03.wzuhyf [INF] [04/May/2022:11:43:36] ENGINE Bus STARTED
2022-05-04T11:43:50.861792+0000 mgr.ceph-03.wzuhyf [INF] Updating ceph-03:/etc/ceph/ceph.conf
2022-05-04T11:43:51.011186+0000 mgr.ceph-03.wzuhyf [ERR] Unable to write ceph-03:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
    await asyncssh.scp(f.name, (conn, tmp_path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
    await source.run(srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
    await self._send_files(path, b'')
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files
    await self._send_file(srcpath, dstpath, attrs)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file
    await self._make_cd_request(b'C', attrs, size, srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request
    self._fs.basename(path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request
    raise exc
asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied
2022-05-04T11:43:51.018169+0000 mgr.ceph-03.wzuhyf [ERR] executing refresh((['ceph-01', 'ceph-02', 'ceph-03'],)) failed.
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
    await asyncssh.scp(f.name, (conn, tmp_path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
    await source.run(srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
    await self._send_files(path, b'')
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files
    await self._send_file(srcpath, dstpath, attrs)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file
    await self._make_cd_request(b'C', attrs, size, srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request
    self._fs.basename(path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request
    raise exc
asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied

As we can see there is an error

scp: /tmp/etc/ceph/ceph.conf.new: Permission denied

what basically means if the default ssh user we use isn't root we might be running this with too low of permissions.

As a workaround is suggested to remove _admin label from hosts, what will remove ceph.conf and ceph.client.admin.keyring from /etc/ceph directory, but then we can't use CLI commands.

We can backup ceph.* files and then copy them back from copies within directory to have access to run CLI commands and upgrade the Ceph cluster what actually works, because there is no task during upgrade to place configuration files to the hosts, and we see only message during upgrade like

2022-05-04T09:12:16.213318+0000 mgr.ceph-01.ksjkto [WRN] unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place <ServiceSpec for service_name=mon>: No matching hosts for label _admin

but upgrade will end with

2022-05-04T09:25:23.775510+0000 mgr.ceph-01.ksjkto [INF] Upgrade: Complete!

After upgrade, we will have all components upgraded, but if we decide to put _admin label back on the host we will get an error in log like

[ERR]executing refresh((['ceph-01', 'ceph-02', 'ceph-03'],)) failed. Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh self._write_client_files(client_files, host) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in _write_client_files self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in write_remote_file host, path, content, mode, uid, gid, addr)) File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async return self.event_loop.get_result(coro) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result return asyncio.run_coroutine_threadsafe(coro, self._loop).result() File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in _write_remote_file raise OrchestratorError(msg) orchestrator._interface.OrchestratorError: Unable to write ceph-01:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied

and we need to maintain ceph.* config files without cephadm what is not ideally.

Conlusion

We need to wait.

Here is the bug report and there is an PR to solve this issue. PR is a part of the another PR. I hope that a back-port release will be released soon, and we could update to the Quincy without issues.

Did you find this article valuable?

Support Jozef Rebjak by becoming a sponsor. Any amount is appreciated!