Page MenuHomePhabricator

Chinese Language Converter is not working in the sidebar table of the contents in Vector-2022
Closed, ResolvedPublicBUG REPORT

Assigned To
Authored By
50829
Apr 26 2022, 3:46 AM
Referenced Files
F36895227: Screenshot 2023-03-06 at 4.37.56 PM.png
Mar 7 2023, 12:44 AM
F36895226: Screenshot 2023-03-06 at 4.38.40 PM.png
Mar 7 2023, 12:44 AM
F36891339: image.png
Mar 4 2023, 10:50 AM
F36885409: Screenshot 2023-02-28 at 7.34.46 PM.png
Mar 1 2023, 3:43 AM
F36885408: Screenshot 2023-02-28 at 7.35.46 PM.png
Mar 1 2023, 3:43 AM
Restricted File
Jun 22 2022, 3:21 PM
Restricted File
Jun 22 2022, 3:21 PM
F35069163: image.png
Apr 26 2022, 10:38 PM
Tokens
"Party Time" token, awarded by Stang.

Description

What happens?:
I find that Chinese Traditional and Simplified Conversion doesn't work in the sidebar table of the contents like the following screenshot. Could somebody fix it? Thanks a lot.

Request URL: this

image.png (565×1 px, 120 KB)

What should have happened instead?:
The Chinese characters in the sidebar table of the contents should be converted like those in articles.

QA steps

in beta cluster

Check the text in the table of contents in Vector 2022 ( https://zh.wikipedia.beta.wmflabs.org/wiki/%E6%B8%AF%E9%90%B5%E4%B8%AD%E6%9C%9F%E7%BF%BB%E6%96%B0%E5%88%97%E8%BB%8A?variant=zh-cn&useskin=vector-2022 ) matches Vector 2010 (https://zh.wikipedia.beta.wmflabs.org/wiki/%E6%B8%AF%E9%90%B5%E4%B8%AD%E6%9C%9F%E7%BF%BB%E6%96%B0%E5%88%97%E8%BB%8A?variant=zh-cn&useskin=vector )

In production

Check the text in the table of contents in Vector 2022 ( https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector-2022 ) matches Vector 2010 (https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector )

QA Results - Beta

AC Status Details
1 T306862#8655249

QA Results - Prod

AC Status Details
1 T306862#8671009

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I would like to fix the output of ParserOutput::getSections() to let the function itself return the expected converted result, but have no idea how to.

I agree with @Winston_Sung (that's "option 2" above), we should probably be a bit careful about it to make sure we don't break clients. This is somewhat related to T315222 which identifies a different bug in getSections() output that could/should be fixed.

And I agree with @Func that this is probably related to T303855, in that conversion of both the TOCHTML and the getSections() data should probably be applied before the contents are put into the ParserOutput.

Reaffirming above comment: the language conversion of the section data and the section HTML needs to be moved to the end of the parse, instead of being done in ParserOutput::getText().

@cscott is targetting the week of 19th to fix this one at which point web team may want to QA. C Scott will likely get someone from the language team to look at it in Subbu’s absence.

Change 836854 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP fix lang conv of sections

https://gerrit.wikimedia.org/r/836854

Sorry, I haven't gotten this patch finished yet, but it's still on my radar.

Sorry, I haven't gotten this patch finished yet, but it's still on my radar.

Hey @cscott - do you have a rough estimate for when we can expect this to be wrapped up? It's currently a blocker for us, as we're trying to deploy Vector 2022 across all Wikipedias, and this is blocking the deployment to most Chinese sites.

Change 881937 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Language-convert Table of Contents at parse time

https://gerrit.wikimedia.org/r/881937

Change 884084 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Remove back-compatibility NO_TOC_CONVERSION code

https://gerrit.wikimedia.org/r/884084

Change 881937 merged by jenkins-bot:

[mediawiki/core@master] Language-convert Table of Contents at parse time

https://gerrit.wikimedia.org/r/881937

This should ride the train this week; added User-notice to get a mention on tech news. If someone could post in the zhwiki village pump that would be helpful!

Jdlrobson added a project: Unplanned-Sprint-Work.
Jdlrobson updated the task description. (Show Details)

C Scott: Web team is going to QA this one

Test Result - Beta

Status: ✅ PASS per T306862#8662558
Environment: beta
OS: macOS Ventura
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

✅ AC1: Check the text in the table of contents in Vector 2022 ( https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector-2022 ) matches Vector 2010 (https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector )

@Jdlrobson, I'm not sure exactly what I'm verifying. I verified visually the characters, then I converted the text using chrome's built-in translating tool and the text matches. Is this sufficient?

Screenshot 2023-02-28 at 7.35.46 PM.png (608×1 px, 227 KB)

Screenshot 2023-02-28 at 7.34.46 PM.png (624×1 px, 229 KB)

I verified visually the characters

Screenshot 2023-02-28 at 7.35.46 PM.png (608×1 px, 227 KB)

AC passed.

@Edtadros As a Chinese speaker I can confirm it is correctly converted.

@Jdlrobson The example provided does not contain in-article conversion rules, which should also be QA-ed due to wide usage.

@Diskdance would you be able to edit the description and the beta cluster page to add QA steps for those? I agree we should QA anything with wide usage and I am not confident in myself in covering them all.

Thanks @Winston_Sung for the confirmation on the screenshots! Much appreciated!

Would this be backported to 1.39 since it is arguably a regression with the ToC?

@Jdlrobson @Edtadros you don't need to have knowledge of the Chinese language to QA this. This variant converter bug only affects skins that generate ToC through Mustache template data (e.g. Vector 2022, Citizen, etc). You can compare the content of the ToC to skins that does not use Mustache for ToC (e.g. Timeless, Monobook, Modern, and most other skins).

I am not sure where can I find the QA template so hope the following format works:

QA Steps
  1. Open the article in Vector 2022
  2. Open the article in Timeless (by adding useskin=timeless to the end of the URL)
  3. Switch to a language variant (at least compare simplified (简体) and traditional (繁體))
  4. Compare the ToC titles (✅ if they are identical)

EDIT: Just realized that Legacy Vector also uses the old ToC. In that case using Legacy Vector for comparison also works.

This should ride the train this week; added User-notice to get a mention on tech news. [...]

Hi, for Tech News, what wording would you suggest as the content? (Drafts/summary-details always help!) - From reading all the comments above, I'm still unsure of details, and it might be good to include in the entry:

  • what was the fundamental problem(s)?
  • where did it affect? (E.g. is this problem related to only wikis using Chinese languages, or to all wikis that use multiple writing systems?)
  • do any volunteers need to do anything? (or is this just informational?)
  • is it already fixed (so belongs in "Recent changes" section) or on the deployment train ("Changes later this week" section)?

My current best-guess is along the lines of:

  • Recent changes:
    • Chinese language wikis had problems with the language converter (used for multiple writing systems) when used in the site sidebar and table of contents. This has now been fixed.

Something more accurate, would be appreciated. Thanks.

在T306862#8655255中,@Winston_Sung写道:

I verified visually the characters

Screenshot 2023-02-28 at 7.35.46 PM.png (608×1 px, 227 KB)

AC passed.

Me too

在T306862#8662356中,@Quiddity写道:

My current best-guess is along the lines of:

  • Recent changes:
    • Chinese language wikis had problems with the language converter (used for multiple writing systems) when used in the site sidebar and table of contents. This has now been fixed.

Something more accurate, would be appreciated. Thanks.

thx, I update to https://meta.wikimedia.org/wiki/Tech/News/2023/10

Just curious, can this task be closed without a proper QA? We can confirm it works as expected on Chinese Wikipedia.

@Edtadros - could you take a look and add some screenshots as well?

I have created https://zh.wikipedia.org/wiki/User:Diskdance/public/T306862_test for testing, and found this inconsistency (same in Vector 2022):

image.png (486×759 px, 53 KB)

The patch description says ToC use all page-wide rules so I believe this is expected, but I would like to confirm it with @cscott.

@Shizhao I believe this should also be mentioned in Tech News so users won't accidentally write things like this.

@Diskdance That does seem like expected behavior. We'd like to suggest that "best style" is to ensure all conversion rules are at the top of the document (ideally in glossaries, as is typical practice on zhwiki I believe), which avoids this particular discrepancy.

@Shizhao I don't think {{DISPLAYTITLE}} is related to this task (or this patch set) in one way or the other, but there was earlier work on displaytitle which might be involved: eed3121a8fca38bbd523c7068087e611f2606bb5 / e085e3f310b201c97c265cd160b321d146097a68 / bacd87e4942baa34808a1b77d3b29bfdb566cc17 / T67747: Sanitizer::removeHTMLtags doesn't close tags correctly when $wgUseTidy is enabled. Needs further investigation: I filed T331316: Vector (or Vector 2022?) displays {{DISPLAYTITLE}} only when no ToC is present for follow-up.

Change 894746 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Don't clear LanguageConverter display title when converting ToC

https://gerrit.wikimedia.org/r/894746

Edtadros reassigned this task from Edtadros to ovasileva.

Test Result - Prod

Status: ✅ PASS
Environment: zhwiki
OS: macOS Ventura
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

✅ AC1: Check the text in the table of contents in Vector 2022 ( https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector-2022 ) matches Vector 2010 (https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B1%82%E5%8A%A9?variant=zh-cn&useskin=vector )

Screenshot 2023-03-06 at 4.38.40 PM.png (584×1 px, 161 KB)

Screenshot 2023-03-06 at 4.37.56 PM.png (1×1 px, 656 KB)

[mediawiki/core@master] WIP: Don't clear LanguageConverter display title when converting ToC

@cscott perhaps we could capture the remaining work in a new ticket?

[mediawiki/core@master] WIP: Don't clear LanguageConverter display title when converting ToC

@cscott perhaps we could capture the remaining work in a new ticket?

Yes, already done: T331316. This ticket can be closed.

Change 894746 merged by jenkins-bot:

[mediawiki/core@master] Don't clear LanguageConverter display title when converting ToC

https://gerrit.wikimedia.org/r/894746

Change 884084 merged by jenkins-bot:

[mediawiki/core@master] Parser: Remove back-compatibility NO_TOC_CONVERSION code

https://gerrit.wikimedia.org/r/884084

Change #1029559 had a related patch set uploaded (by Alistair3149; author: C. Scott Ananian):

[mediawiki/core@REL1_39] Language-convert Table of Contents at parse time

https://gerrit.wikimedia.org/r/1029559

Change #1029559 abandoned by Alistair3149:

[mediawiki/core@REL1_39] Language-convert Table of Contents at parse time

Reason:

Tocdata is not available in Parser for MW 1.39. Instead, it is built during output.

https://gerrit.wikimedia.org/r/1029559