Page MenuHomePhabricator

Table of contents (TOC) missing sporadically without apparent reason
Closed, ResolvedPublic

Description

There are reports by different users in dewiki, that the TOC is sometimes missing on pages, though it should be present. The TOC appears after purging, so it seems that in some cases a version where the TOC is missing for whatever reason is cached.

I could reproduce this with https://de.wikipedia.org/wiki/Herbert_Backe:
Before purging with missing TOC:

<div id="mw-content-text" dir="ltr" class="mw-content-ltr" lang="de">
 <div class="mw-parser-output">
  <div class="thumb tright">...</div>
  <p>(First paragraph of lead)</p>
  <p>(Second paragraph of lead)</p>
  <p></p>
  <h2>...</h2>
  ...
<!-- 
NewPP limit report
Parsed by mw1304
Cached time: 20170609222315
Cache expiry: 2592000
Dynamic content: false
CPU time usage: 0.236 seconds
Real time usage: 0.290 seconds
Preprocessor visited node count: 2697/1000000
Preprocessor generated node count: 0/1500000
Post‐expand include size: 36322/2097152 bytes
Template argument size: 8114/2097152 bytes
Highest expansion depth: 14/40
Expensive parser function count: 1/500
Lua time usage: 0.039/10.000 seconds
Lua memory usage: 2.15 MB/50 MB
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00%  213.924      1 -total
 27.26%   58.311      1 Vorlage:BibISBN
 19.59%   41.900      1 Vorlage:BibISBN/3100910524
 18.92%   40.473      1 Vorlage:BibRecord
 16.59%   35.500      1 Vorlage:BibRecord/Literatur
 14.99%   32.076      1 Vorlage:Literatur
  8.20%   17.550      1 Vorlage:Commonscat
  5.60%   11.974      1 Vorlage:Exzellent
  4.91%   10.505      1 Vorlage:NaviBlock
  4.32%    9.245      1 Vorlage:NDB
-->
</div>
<!-- Saved in parser cache with key dewiki:pcache:idhash:525037-0!canonical and timestamp 20170609222315 and revision id 164319958
 -->
</div>

After purging (with TOC):

<div id="mw-content-text" dir="ltr" class="mw-content-ltr" lang="de">
 <div class="mw-parser-output">
  <div class="thumb tright">...</div>
  <p>...</p>
  <p>....</p>
  <p></p>
  <div id="toc" class="toc">...</div>
  <p></p>
  <h2>...</h2>
...

<!-- 
NewPP limit report
Parsed by mw1241
Cached time: 20170616073144
Cache expiry: 2592000
Dynamic content: false
CPU time usage: 0.240 seconds
Real time usage: 0.286 seconds
Preprocessor visited node count: 2697/1000000
Preprocessor generated node count: 0/1500000
Post‐expand include size: 36322/2097152 bytes
Template argument size: 8114/2097152 bytes
Highest expansion depth: 14/40
Expensive parser function count: 1/500
Lua time usage: 0.040/10.000 seconds
Lua memory usage: 2.15 MB/50 MB
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00%  205.666      1 -total
 27.64%   56.842      1 Vorlage:BibISBN
 19.86%   40.849      1 Vorlage:BibISBN/3100910524
 19.34%   39.785      1 Vorlage:BibRecord
 16.85%   34.659      1 Vorlage:BibRecord/Literatur
 15.61%   32.109      1 Vorlage:Literatur
 10.05%   20.661      1 Vorlage:Commonscat
  5.28%   10.860      1 Vorlage:NaviBlock
  4.78%    9.821      1 Vorlage:Exzellent
  4.17%    8.566      8 Vorlage:Zitat
-->
</div>
<!-- Saved in parser cache with key dewiki:pcache:idhash:525037-0!canonical and timestamp 20170616073144 and revision id 164319958
 -->
</div>

More examples (though most of them already have the TOC back):

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Umherirrender added subscribers: Krinkle, tstarling, Anomie and 2 others.

Sounds for some pollution of the parser cache, because both have the same key, but storing different parser results.

First report on June 9 sounds like a regression in the June 8 deployment (1.30.0-wmf.4)

The Parser output does not vary on whether the TOC is present or absent. Unless it's suppressed with the __NOTOC__ magic word, the parser always outputs the TOC if there are enough headers in the page and later processing strips it out if necessary.

If you find another page affected by this issue, it would be interesting to see if its page properties contain "__NOTOC__" (see https://en.wikipedia.org/wiki/Main_Page?action=info for an example of such a page). It may be that some template was edited to remove that magic word and pages weren't reparsed.

If you find another page affected by this issue, it would be interesting to see if its page properties contain "__NOTOC__" (see https://en.wikipedia.org/wiki/Main_Page?action=info for an example of such a page). It may be that some template was edited to remove that magic word and pages weren't reparsed.

No, according to https://de.wikipedia.org/w/index.php?title=Hungerplan&action=info the page https://de.wikipedia.org/wiki/Hungerplan has no __NOTOC__ set, though currently it doesn't show one.

This commit moved the loading of the mediawiki.toc module, based on a boolean that is maintained from Parser>ParserOutput>OutputPage>Skin. However, it does not affect the HTML content.

The Parser output does not vary on whether the TOC is present or absent. Unless it's suppressed with the __NOTOC__ magic word, the parser always outputs the TOC if there are enough headers in the page and later processing strips it out if necessary.

Yes, except that when retrieving the output through ParserOutput::getText, the TOC will be stripped if setTOCEnabled(false) was called. While this cannot make one appear when there is none, it will remove one if there was one. This is used by MobileFrontend.

This was made to be called by MobileFrontend at run-time on a ParserOutput object after it is received from, or saved to, the parser cache. It's possible that maybe the call is being made too early, and such the state of mTOCEnable ends up being saved?

The same problem occurs at cs-wiki just now. First report is from Jun 22. The TOC is/was missing in many arcticles. The relevant discussion is here. Some articles were already purged by bot, but the articles with this problem could be still easily found, see for example https://cs.wikipedia.org/wiki/Harappsk%C3%A1_kultura.

Thanks, I was able to catch that page and examine it before someone purged it. The page being served says Saved in parser cache with key cswiki:pcache:idhash:177124-0!canonical and timestamp 20170619214508 and revision id 14231519. The corresponding parser cache entry does in fact contain <mw:toc>. It also has mTOCEnabled set to false, so @Krinkle's surmise seems like it may be correct. Although at a quick glance through the code I don't see any code path that would obviously do so.

For people with access to terbium, I saved the serialized ParserOutput object in /home/anomie/T168040.txt for further examination.

There are tow changes which affects the TOCEnabled in the list of changes to 1.30wmf.4[1] and that is in WikidataPageBanner[2] and MobileFrontend[3]

Extensions with a call to ParserOutput::setTOCEnabled:
MobileFrontend
MOOC
TextExtracts
WikidataPageBanner

[1] https://www.mediawiki.org/wiki/MediaWiki_1.30/wmf.4
[2] https://gerrit.wikimedia.org/r/#/c/355274/
[3] https://gerrit.wikimedia.org/r/#/c/355273/

As far as I know WikidataPageBanner isn't installed on dewiki, so shouldn't be able to cause this issue.

Weird behavior related to this: sometimes if I purge page with TOC shown, it hides after purge

Calls to ParserOutput::setTOCEnabled:

  1. mediawiki-core / WikiTextStructure.php#extractWikitextParts
    • Used by WikitextContentHandler::getDataForSearchIndex, which has 1 caller (CirrusSearch), which gets its ParserOutput from ContentHandler:: getParserOutputForIndexing(), which gets it fresh from Parser or PasserCache and saves before returning, and not shared with other code. Appears safe.
  2. mediawiki-core / ApiParse.php
    • Received directly from Parser or ParserCache. Reference not shared elsewhere. Safe.
  3. mediawiki-extensions-MobileFrontend / Hooks.php#onOutputPageParserOutput
    • Shared reference used by OutputPage, but, only called after any save to ParserCache. Appears safe.
  4. mediawiki-extensions-MobileFrontend / ApiMobileView.php#getParserOutput
    • Received directly from ParserCache via WikiPage#getParserOutput. Does involve shared PoolCounter, but no reference sharing. Appears safe.
    • Called by ApiMobileView#getData, which saves it to Memcached, but uses a MobileFrontend-specific cache key. Safe.
  5. mediawiki-extensions-TextExtracts / ApiQueryExtracts.php
    • Received directly from ParserCache. Not shared or saved. Safe.
  6. mediawiki-extensions-WikidataPageBanner / Hooks.php#onOutputPageParserOutput
    • Shared reference from OutputPage, but, only called after core saves it to ParserCache. Appears safe.

If one of these must be the cause, then I'd suspect either 1 (CirrusSearch) or 3/6 (Hooks/OutputPageParserOutput).

@Krinkle: That matches my analysis. I suggest the next step be to add some logging into ParserCache::save() to try to get a backtrace. Perhaps something like this:

if ( !$parserOutput->getTOCEnabled() ) {
    $ex = new Exception( 'Saving a ParserOutput with getTOCEnabled() == false' );
    wfDebugLog( 'AdHocDebug', $ex->getMessage(), 'private', [
        'exception' => $ex,
        'parserOutputKey' => $parserOutputKey,
        'timestamp' => $cacheTime,
        'revid' => $revId,
    ] );
}

Change 361825 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@master] [DNM] ParserCache: Add debug logging for T168040

https://gerrit.wikimedia.org/r/361825

Change 361826 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@wmf/1.30.0-wmf.6] ParserCache: Add debug logging for T168040

https://gerrit.wikimedia.org/r/361826

Change 361826 merged by jenkins-bot:
[mediawiki/core@wmf/1.30.0-wmf.6] ParserCache: Add debug logging for T168040

https://gerrit.wikimedia.org/r/361826

Mentioned in SAL (#wikimedia-operations) [2017-06-28T08:46:50Z] <legoktm@tin> Synchronized php-1.30.0-wmf.6/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 48s)

Change 361830 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@wmf/1.30.0-wmf.7] ParserCache: Add debug logging for T168040

https://gerrit.wikimedia.org/r/361830

Change 361830 merged by jenkins-bot:
[mediawiki/core@wmf/1.30.0-wmf.7] ParserCache: Add debug logging for T168040

https://gerrit.wikimedia.org/r/361830

Mentioned in SAL (#wikimedia-operations) [2017-06-28T09:05:53Z] <legoktm@tin> Synchronized php-1.30.0-wmf.7/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 46s)

I've deployed @Anomie's debug logging to all of the currently deployed branches.

In that bug the ParserOutput object in the cache was missing the <mw:toc>...</mw:toc> bit, IIRC.

But, if MobileFrontend's onOutputPageParserOutput was causing trouble then, perhaps it's still causing trouble now. Hopefully the debug logging will point to how.

This phenomenon has been spotted on several fiwiki pages, too. See this, for example:

https://fi.wikipedia.org/wiki/Metal_Gear_Solid

(I hope someone who works with this issue catches this example before someone purges that page.)

2017-07-06 11:20:35 [WV4dAwpAMCUAABE2PCAAAABP] mw1209 enwiki 1.30.0-wmf.7 AdHocDebug INFO: Saving a ParserOutput with getTOCEnabled() == false {"parserOutputKey":"enwiki:stable-pcache:idhash:27072613-0!canonical!responsiveimages=0","timestamp":"20170706112035","revid":788545319} 
[Exception Exception] (/srv/mediawiki/php-1.30.0-wmf.7/includes/parser/ParserCache.php:326) Saving a ParserOutput with getTOCEnabled() == false
  #0 /srv/mediawiki/php-1.30.0-wmf.7/extensions/FlaggedRevs/frontend/FlaggablePageView.php(719): ParserCache->save(ParserOutput, FlaggableWikiPage, ParserOptions)
  #1 /srv/mediawiki/php-1.30.0-wmf.7/extensions/FlaggedRevs/frontend/FlaggablePageView.php(359): FlaggablePageView->showStableVersion(FlaggedRevision, string, string)
  #2 /srv/mediawiki/php-1.30.0-wmf.7/extensions/FlaggedRevs/frontend/FlaggedRevsUI.hooks.php(179): FlaggablePageView->setPageContent(boolean, boolean)
  #3 /srv/mediawiki/php-1.30.0-wmf.7/includes/Hooks.php(186): FlaggedRevsUIHooks::onArticleViewHeader(Article, boolean, boolean)
  #4 /srv/mediawiki/php-1.30.0-wmf.7/includes/page/Article.php(515): Hooks::run(string, array)
  #5 /srv/mediawiki/php-1.30.0-wmf.7/includes/actions/ViewAction.php(68): Article->view()
  #6 /srv/mediawiki/php-1.30.0-wmf.7/includes/MediaWiki.php(499): ViewAction->show()
  #7 /srv/mediawiki/php-1.30.0-wmf.7/includes/MediaWiki.php(293): MediaWiki->performAction(Article, Title)
  #8 /srv/mediawiki/php-1.30.0-wmf.7/includes/MediaWiki.php(862): MediaWiki->performRequest()
  #9 /srv/mediawiki/php-1.30.0-wmf.7/includes/MediaWiki.php(523): MediaWiki->main()
  #10 /srv/mediawiki/php-1.30.0-wmf.7/index.php(43): MediaWiki->run()
  #11 /srv/mediawiki/w/index.php(3): include(string)
  #12 {main}

There are also a few log entries in AdHocDebug.log-20170701.gz, except those all point to enwiki as well.

Change 363656 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@wmf/1.30.0-wmf.7] Keep track of what is disabling the TOC

https://gerrit.wikimedia.org/r/363656

So far, we have very few log entries from https://gerrit.wikimedia.org/r/361830. All are via a FlaggedRevs code path that could as well be reading an existing bad cache entry and re-saving it into FlaggedRev's own cache.

One of these log entries pointed to the FlaggedRevs-generated cache key enwiki:stable-pcache:idhash:27072613-0!canonical!responsiveimages=0. The corresponding parser cache key, enwiki:pcache:idhash:27072613-0!canonical!responsiveimages=0, does currently exist, and does have getTOCEnabled() returning false, and is timestampped after we added the logging. But there's no log entry for it. WTF?

  1. mediawiki-extensions-MobileFrontend / Hooks.php#onOutputPageParserOutput
    • Shared reference used by OutputPage, but, only called after any save to ParserCache. Appears safe.

[...]

  1. mediawiki-extensions-WikidataPageBanner / Hooks.php#onOutputPageParserOutput
    • Shared reference from OutputPage, but, only called after core saves it to ParserCache. Appears safe.

Maybe these aren't so safe after all. WMF uses a MultiWriteBagOStuff for the parser cache, with memcached primary and SQL secondary. ObjectCache::newFromParams() defaults the asyncHandler parameter to DeferredUpdates::addCallableUpdate.

So perhaps we have the following sequence of events going on here:

  • Something calls ParserCache::save(), with a ParserOutput that has getTOCEnabled() returning true as it should.
  • ParserCache passes this on to its MultiWriteBagOStuff instance.
  • MultiWriteBagOStuff saves the ParserOutput to memcached.
  • MultiWriteBagOStuff schedules a DefferedUpdate to write the ParserOutput object to the database via SQLBagOStuff. Note the ParserOutput object is held on to, it's not cloned or serialized at this point.
  • Whatever called ParserCache::save() adds the ParserOutput to OutputPage, triggering one of the code paths quoted above (probably the MobileFrontend one). The ParserOutput now has getTOCEnabled() returning false.
  • DeferredUpdates run. At this point SQLBagOStuff serializes the ParserOutput object with getTOCEnabled() returning false.
  • Eventually the entry drops out of memcached and so gets read back from the database. Now, suddenly, we see this bug.

I haven't been able to track down what might have changed to make this start showing up recently, since none of the code here seems new.

I haven't been able to track down what might have changed to make this start showing up recently, since none of the code here seems new.

Prior to that date, some changes have been made regarding deferred updates execution. See rMWd80fca05e1 and rMW24842cfac0. That may explain it, if some jobs are now more delayed, or maybe if those jobs were never inserted before (due to the linked bugs) and now they are.

Change 363831 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] MultiWriteBagOStuff: Fix async writes of mutable objects

https://gerrit.wikimedia.org/r/363831

Change 363656 abandoned by Legoktm:
Keep track of what is disabling the TOC

https://gerrit.wikimedia.org/r/363656

Change 367633 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/extensions/TextExtracts@master] Don't call ParserOuptut::setTOCEnabled()

https://gerrit.wikimedia.org/r/367633

Change 367633 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@master] Don't call ParserOuptut::setTOCEnabled()

https://gerrit.wikimedia.org/r/367633

Change 363831 merged by Krinkle:
[mediawiki/core@master] MultiWriteBagOStuff: Fix async writes of mutable objects

https://gerrit.wikimedia.org/r/363831

The same problem occurs at be-tarask-wiki. The TOC is/was missing in many articles.

I am still able to find affected pages:

https://cs.wikipedia.org/wiki/Religionistika
https://cs.wikipedia.org/wiki/Andr%C3%A9_Kert%C3%A9sz
https://cs.wikipedia.org/wiki/Chung-%C4%8D%E2%80%99

The patches above are merged, so how long it may take until we can determine whether they are effective? It will probably take some time until the cached versions of the affected pages are renewed?

Anomie claimed this task.

The fix was deployed to all wikis as of 2017-07-27 at about 19:50 UTC. However, cache entries still exist with the problem and will need to be purged.

If you look in the source of the page, you should see an HTML comment along the lines of

<!-- Saved in parser cache with key cswiki:pcache:idhash:12927-0!canonical and timestamp 20170718093727 and revision id 14419001
-->

If the timestamp in there is earlier than 20170727200000, just purge the page to clear the bad cache entry. If it's later, report it here.

In the mean time, let's be optimistic and mark the task as "Resolved".

Change 361825 abandoned by Umherirrender:
[DNM] ParserCache: Add debug logging for T168040

Reason:
Clean up: Task is resolved, no longer needed

https://gerrit.wikimedia.org/r/361825