Page MenuHomePhabricator

MoveComms support for Northward Datacentre Switchover (March 2024)
Closed, ResolvedPublic

Description

Dear MoveComms,

We are planning a datacentre switchover for the week of March 18th (week 12) with the following schedule:

The expected impact is 2-3 minutes of read-only on Wednesday, 20 March 2024 @ 14:00 UTC.

Note that we are implementing the changes described in Recurring, Equinox-based, Data Center Switchovers, in particular:

  • There is no switchback! we are staying in eqiad until the next switchover.
  • Future switchovers are predictable and take place every 6 months; always on the week of an equinox.

Let serviceops know if you need more info on the changes.

Thank you!

Planning

As soon as the task is received by CRS

  • Ask SRE if anything major changed since the last time (noticeable things that are worth being announced).
    • If yes, update the process or the message.
  • Confirm when the wikis will be in read-only
  • Add the date to Asana, with a link to this task https://app.asana.com/0/1176034857101033/1206737712150740/f

Three weeks before (Feb 29)

  • Tech News message (initial warning) - done
  • Update the message to communities
    • Check on dates and links
  • Have the message being translated by emailing both:
    • translators-l for translations
    • wikitech-ambassadors for information and translations
  • Monitor the message's talk page

Two weeks before (March 4)

The week before (March 11)

    • Emailing mailing lists:
      • wikitech-ambassadors and translators-l once again (reminders)
      • wmfall (done by SRE)
    • Added to the news on the Meta front page - diff
    • Send the message to communities
  • Tech News (add to the upcoming issue "this is happening this week")

The week it happens (March 18)

  • monitor the wikis

The week after (March 25)

  • Debrief how it went, and document (comment on this task) - done in advance

Event Timeline

Trizek-WMF updated the task description. (Show Details)

Thank you @jijiki! We will start the process very soon.

Trizek-WMF changed the task status from Open to In Progress.Feb 29 2024, 2:37 PM
Trizek-WMF triaged this task as High priority.
Trizek-WMF updated the task description. (Show Details)
Trizek-WMF moved this task from To Triage to In current Tech/News draft on the User-notice board.

@jijiki, I'll be your host on this journey.

Have you changed anything major/noticeable compared to the previous read-only time?

Can you carefully read the message I plan to send to communities, please? https://meta.wikimedia.org/wiki/Tech/Server_switch Maybe some items listed there are not relevant anymore.

@Trizek-WMF as per our off-phabricator discussion, the major change is that this is not a procedure we test anymore, but it has become standard practice. Please edit the message as you see fit to reflect that.

Trizek-WMF updated the task description. (Show Details)

The message was updated to remove the idea of a test. As everything is okay, I can continue with the next steps.

sgrabarczuk renamed this task from CommRel support for Northward Datacentre Switchover (March 2024) to MoveComms support for Northward Datacentre Switchover (March 2024).Mar 14 2024, 10:34 AM
sgrabarczuk updated the task description. (Show Details)

The few reactions I observed from communities came from users who thanked me for the information message I sent last Friday.

Debrief with @jijiki:

The read-only time can happen between 14:00 and 14:30 - the time window allocated by SRE - depending on how smoothly the process goes. As a consequence, the banner shown on wikis should stay longer: at the moment, it is displayed from 13:30 to 14:01, but it should be scheduled to last until 14:30. SRE and Movement Communication reach at each other to declare the read-only done, and then the banner is deactivated.

I updated the documentation accordingly.

Trizek-WMF updated the task description. (Show Details)