Now that the weekend and the RD-maintenance-pocalypse is over, we've lined up some upgrades to supporting elements of our storage layer, during the week while the platform is quiet(er). Due to the nature of the upgrades, it's necessary to either restart or fully turn off user pods after each update, during the 1-hour elf-glowup maintenance window.
Here's the plan for this week, in US time:
Monday 16 Jun - Decypharr replaces Zurg (for all Aarr users)
Lots of users have already opted into the "Decypharr Replaces Zurg" early trial, and results are beneficial and stable enough that we're now ready to transition all Decypharr users by default. If you've not yet opted in, or not yet configured your Decypharr at all, we'll handle this automatically - grabbing your RD token from your Zurg config file, and inserting it into your Decypharr config.
To force a consistent and clean environment, all your pods will restart as a result of this change.
(Tuesday's rclone update - see below - will also be applied to the US cluster a day early, as a "canary test")
Tuesday 17 Jun - rclone update from 1.60 to 1.67/68
This is a platform-level update - the pods we use to mount your remote volumes (RD and otherwise) are running an older version of rclone (v1.60). We'll upgrade these to the latest stable version (final testing underway currently), and once again, restart all your pods afterwards to force a fresh mount.
(Wednesday's TopoLVM update - see below - will also be applied to the US cluster a day early, as a "canary test")
Wednesday 18 Jun - TopoLVM upgrade
TopoLVM provides local storage for your pods. The last time (Nov 2024) we attempted a TopoLVM upgrade to align with our Kubernetes version, some users experienced data loss as volumes were recreated, and we rolled back rather than trying to debug "live".
This issue is believed to have been an upstream bug which is now resolved (we've been unable to reproduce it), but out of an abundance of caution, we're going to shut down all user pods while performing the upgrade this time. Given the scale of the shutdown, and the necessity to confirm safety and stability after the upgrade, this particular elf-glowup may extend beyond the usual 1h window.
If we encounter issues in the plan, we may defer / delay subsequent upgrades, but the goal is to be completed well in advance of next weekend, and to use the US cluster (smaller "blast radius" means issues will be quicker to detect and resolve) to canary-test the platform-level updates before applying them to the default, DE cluster.