Page MenuHomePhabricator

Fix database master queries from HTTP GET/HEAD before active-active multi-dc
Open, MediumPublic

Description

Most of MediaWiki was written with the assumption that MediaWiki and its data stores are collocated and connected via reliable, low-latency network links. Until recently, MediaWiki had minimal facilities for maintaining consistency and partition tolerance across wide-area network links. As a result, although the Wikimedia Foundation operates data centers in multiple locations, we only run MediaWiki in one location at any one time.

This has several practical consequences: first, we are not as fault-tolerant as we'd like to be. We have a secondary data center with enough capacity to serve our traffic in case our primary datacenter goes down, but it is in cold standby, meaning it takes some time (and some manual effort) to get it running. Second, site performance is poor for logged-in users that are geographically remote from Ashburn, Virginia, due to the time it takes to transmit and receive data across long-distance links. Thirdly, in some basic cases, like parsing pages, the master database must be up, leading to a SPOF.

It's going to take a lot of work to fix this completely, but we are getting closer to being able to serve some traffic from secondary datacenter. Specifically, we would like to serve "reads" -- requests that don't require a master database connection -- from a secondary datacenter.

In order to serve reads from a different datacenter, we need to be able to predict which incoming requests will modify data, so that we can route them accordingly. We need to be able to make this determination at the edge -- i.e., the outermost layers of the infrastructure, so it cannot be complicated or slow.

The solution we have is to use the HTTP request method (T91820): GETs/HEADs are read-only, while POSTs are not. This was already true for most cases, but there is a long tail of actions with side-effects that are done via GET, such as purge, rollback, markpatrolled.

This task mostly involves fixing DBPerformance log warnings. Warnings can be dealt with be:
a) Changing DB master reads to use DB slaves
b) Moving the database updates to POST requests, the jobqueue, or at least to post-send updates via DeferredUpdates
c) Disabling warnings for a few exceptional cases like CentralAuth.

See +channel:DBPerformance on logstash.wikimedia.org

Most of these warnings are writes or master queries on HTTP GET requests, which would be cross DC in active-active setup for some user. Ideally we could eventually get these to zero.

Details

Subject Repo Branch Lines +/-
mediawiki/core master +41 -53
mediawiki/core master +19 -27
mediawiki/core master +17 -8
mediawiki/extensions/NewUserMessage master +45 -3
mediawiki/extensions/CentralAuth master +8 -3
mediawiki/extensions/CentralAuth master +1 -1
mediawiki/core master +19 -7
mediawiki/extensions/CentralAuth master +3 -3
mediawiki/extensions/Translate master +1 -1
mediawiki/extensions/PageTriage master +6 -0
mediawiki/extensions/CentralAuth master +7 -3
mediawiki/core master +8 -3
mediawiki/extensions/CentralAuth master +9 -8
mediawiki/core master +77 -23
mediawiki/extensions/CentralAuth master +25 -14
mediawiki/core master +22 -1
mediawiki/extensions/CentralAuth REL1_27 +3 -2
mediawiki/extensions/CentralAuth master +3 -2
mediawiki/extensions/CentralAuth REL1_27 +6 -5
mediawiki/extensions/CentralAuth master +6 -5
mediawiki/extensions/Math master +10 -8
mediawiki/core master +5 -1
mediawiki/extensions/CentralAuth master +29 -27
mediawiki/extensions/FlaggedRevs master +0 -1
mediawiki/core master +33 -21
mediawiki/extensions/LiquidThreads master +1 -1
mediawiki/extensions/Wikibase master +5 -2
mediawiki/extensions/CentralAuth wmf/1.28.0-wmf.2 +3 -2
mediawiki/extensions/EducationProgram master +4 -1
mediawiki/extensions/CentralAuth master +5 -6
mediawiki/extensions/Translate master +10 -9
mediawiki/extensions/LiquidThreads master +4 -1
mediawiki/extensions/FlaggedRevs master +2 -1
mediawiki/extensions/FlaggedRevs master +29 -27
mediawiki/core master +3 -1
mediawiki/extensions/LiquidThreads master +3 -1
mediawiki/core master +4 -4
mediawiki/core master +4 -1
mediawiki/extensions/AbuseFilter master +3 -1
mediawiki/extensions/CentralAuth master +9 -11
mediawiki/extensions/CentralAuth master +11 -2
mediawiki/extensions/CentralAuth master +9 -10
mediawiki/extensions/VisualEditor master +3 -1
mediawiki/core master +4 -2
mediawiki/extensions/VisualEditor master +1 -1
mediawiki/extensions/Translate master +25 -10
mediawiki/core master +3 -7
mediawiki/extensions/LiquidThreads master +3 -3
mediawiki/core master +1 -1
mediawiki/core master +2 -2
mediawiki/core wmf/1.27.0-wmf.8 +266 -419
mediawiki/core master +266 -419
mediawiki/core master +33 -17
mediawiki/extensions/CentralAuth master +133 -30
mediawiki/extensions/CentralNotice master +24 -21
mediawiki/extensions/Translate master +4 -1
mediawiki/extensions/ContentTranslation master +4 -1
mediawiki/extensions/BetaFeatures master +4 -1
mediawiki/core master +6 -9
mediawiki/extensions/Flow master +4 -1
mediawiki/extensions/Echo master +7 -8
mediawiki/extensions/PageTriage master +5 -3
mediawiki/extensions/Echo master +3 -1
mediawiki/extensions/MobileFrontend master +3 -1
mediawiki/extensions/UniversalLanguageSelector master +5 -2
mediawiki/core master +14 -5
mediawiki/core master +3 -0
mediawiki/extensions/CentralAuth master +3 -1
mediawiki/core master +1 -3
mediawiki/extensions/CentralAuth master +3 -0
mediawiki/extensions/FlaggedRevs master +1 -1
mediawiki/extensions/LiquidThreads master +3 -2
mediawiki/extensions/CentralAuth master +6 -3
mediawiki/core master +8 -0
mediawiki/extensions/OAI master +2 -0
mediawiki/extensions/Echo master +4 -1
mediawiki/core master +4 -1
mediawiki/core master +7 -2
mediawiki/extensions/GettingStarted master +5 -1
mediawiki/core master +1 -1
mediawiki/core master +28 -39
mediawiki/extensions/TimedMediaHandler master +30 -10
mediawiki/core REL1_25 +150 -174
mediawiki/core master +150 -174
mediawiki/extensions/TimedMediaHandler master +35 -63
mediawiki/extensions/FlaggedRevs master +67 -21
mediawiki/extensions/LiquidThreads master +8 -7
mediawiki/core master +12 -2
mediawiki/extensions/FlaggedRevs master +7 -11
mediawiki/core master +4 -5
mediawiki/core master +1 -1
mediawiki/core master +1 -4
mediawiki/extensions/CodeReview master +2 -2
mediawiki/core wmf/1.25wmf23 +4 -1
mediawiki/core wmf/1.25wmf24 +4 -1
mediawiki/core master +3 -1
mediawiki/core master +7 -3
mediawiki/core master +1 -0
mediawiki/extensions/CentralAuth master +1 -3
mediawiki/core master +1 -1
mediawiki/core master +3 -12
mediawiki/extensions/MobileFrontend master +1 -0
mediawiki/core master +50 -4
mediawiki/extensions/ConfirmEdit master +9 -4
mediawiki/core master +25 -10
mediawiki/extensions/MobileFrontend master +1 -1
mediawiki/core master +1 -11
mediawiki/core master +12 -12
mediawiki/extensions/LiquidThreads master +1 -1
Show related patches Customize query in gerrit

Related Objects

Status Subtype Assigned Task
Resolved aaron
Open None
Resolved aude
Duplicate Gilles
Duplicate aaron
Resolved aaron
Duplicate None
Resolved hoo
Duplicate None
Duplicate None
Resolved aaron
Open None
Open None
Resolved PRODUCTION ERROR aaron
Duplicate None
Resolved aaron
Resolved Nikerabbit
Resolved aaron
Resolved aaron
Duplicate None
Duplicate None
Resolved MarcoAurelio
Resolved aaron
Resolved tstarling
Resolved Ladsgroup
Resolved Krinkle
Duplicate aaron
Resolved kostajh
Resolved Huji
Resolved aaron
Resolved aaron
Resolved aaron
Open PRODUCTION ERROR None

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 348030 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Make opportunistic password hash upgrades post-send

https://gerrit.wikimedia.org/r/348030

Change 348030 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Make opportunistic password hash upgrades post-send

https://gerrit.wikimedia.org/r/348030

Change 348658 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid triggering master queries in ApiValidatePassword

https://gerrit.wikimedia.org/r/348658

Change 350969 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid master queries in loadAndLazyInit() for miser mode

https://gerrit.wikimedia.org/r/350969

Change 350969 merged by jenkins-bot:
[mediawiki/core@master] Avoid master queries in loadAndLazyInit() for miser mode

https://gerrit.wikimedia.org/r/350969

Change 351790 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Add $flags parameter to renameInProgressOn()

https://gerrit.wikimedia.org/r/351790

Change 351790 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Add $flags parameter to renameInProgressOn()

https://gerrit.wikimedia.org/r/351790

Change 353823 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid master queries in beginSecondaryAuthentication()

https://gerrit.wikimedia.org/r/353823

Change 353824 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/PageTriage@master] Avoid DB_MASTER queries on HTTP GET in ArticleMetadata->getMetadata

https://gerrit.wikimedia.org/r/353824

Change 353825 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid master queries in SpecialGlobalRenameProgress

https://gerrit.wikimedia.org/r/353825

Change 353827 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid DB_MASTER queries in User::newSystemUser() when possible

https://gerrit.wikimedia.org/r/353827

Change 353829 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/Translate@master] Use TranslateUtils::getSafeReadDB() in loadAggregateGroups

https://gerrit.wikimedia.org/r/353829

Change 353824 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Avoid DB_MASTER queries on HTTP GET in ArticleMetadata->getMetadata

https://gerrit.wikimedia.org/r/353824

Change 353829 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Use TranslateUtils::getSafeReadDB() in loadAggregateGroups

https://gerrit.wikimedia.org/r/353829

Change 353825 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid master queries in SpecialGlobalRenameProgress

https://gerrit.wikimedia.org/r/353825

Change 353827 merged by jenkins-bot:
[mediawiki/core@master] Avoid DB_MASTER queries in User::newSystemUser() when possible

https://gerrit.wikimedia.org/r/353827

Change 353823 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid master queries in beginSecondaryAuthentication()

https://gerrit.wikimedia.org/r/353823

Change 348658 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid triggering master queries in ApiValidatePassword

https://gerrit.wikimedia.org/r/348658

Change 499969 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Use the main stash for basic user talk page notifications

https://gerrit.wikimedia.org/r/499969

Change 499985 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

https://gerrit.wikimedia.org/r/499985

Change 499990 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/NewUserMessage@master] Use a job for triggering new user talk messages

https://gerrit.wikimedia.org/r/499990

Change 499995 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] Avoid Category count refresh DB writes on HTTP GET

https://gerrit.wikimedia.org/r/499995

Change 499990 merged by jenkins-bot:
[mediawiki/extensions/NewUserMessage@master] Use a job for triggering new user talk messages

https://gerrit.wikimedia.org/r/499990

Change 499985 abandoned by Aaron Schulz:
Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

Reason:
Taking a different route

https://gerrit.wikimedia.org/r/499985

aaron removed aaron as the assignee of this task.May 21 2019, 9:04 PM
aaron removed a project: Patch-For-Review.
Krinkle renamed this task from Fix problematic database master queries performed on HTTP GET/HEAD to Fix database master queries from HTTP GET/HEAD before active-active multid-dc.Jul 2 2020, 2:53 PM
Krinkle subscribed.

I'm slightly rescoping this to give the tracking task a natural end, namely to only track that which we want to get done prior to starting to serve active-active. For the rest we can file individual Sustainability and/or Performance Issue tasks that we might track on our Performance-Team (Radar)

Krinkle renamed this task from Fix database master queries from HTTP GET/HEAD before active-active multid-dc to Fix database master queries from HTTP GET/HEAD before active-active multi-dc.Jul 2 2020, 4:08 PM
Krinkle reassigned this task from Krinkle to aaron.
Krinkle moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.

Change 499995 abandoned by Aaron Schulz:

[mediawiki/core@master] [WIP] Avoid Category count refresh DB writes on HTTP GET

Reason:

https://gerrit.wikimedia.org/r/499995

Aklapper removed aaron as the assignee of this task.Sep 26 2022, 10:33 AM
Aklapper removed a subscriber: Gilles.

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

Change 499969 abandoned by Aaron Schulz:

[mediawiki/core@master] Use the main stash for basic user talk page notifications

Reason:

https://gerrit.wikimedia.org/r/499969