Page MenuHomePhabricator

Deploy "add a link" to 15th round of wikis
Closed, ResolvedPublic1 Estimated Story Points

Description

  • Training models
    • Quechua Wikipedia qu
    • Romansh Wikipedia rm
    • Romani Wikipedia rmy
    • Rundi Wikipedia rn
    • Aromanian Wikipedia roa-rup
    • Tarandíne Wikipedia roa-tara
    • Rusyn Wikipedia rue
    • Kinyarwanda Wikipedia rw
    • Sanskrit Wikipedia sa
    • Sakha Wikipedia sah
    • Santali Wikipedia sat
    • Sardinian Wikipedia sc
    • Sicilian Wikipedia scn
    • Scots Wikipedia sco
    • Sindhi Wikipedia sd
    • Northern Sami Wikipedia se
    • Sango Wikipedia sg
    • Serbo-Croatian Wikipedia sh
    • Shan Wikipedia shn see T308141#8778455
    • Sinhala Wikipedia si
    • Slovak Wikipedia sk
    • Slovenian Wikipedia sl
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end
  • Check how the model works on the wikis
  • In Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • Inform communities
  • Deploy front-end

Event Timeline

22/22 models were trained successfully in the 15th round of wikis.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5 Recall@0.5
quwiki 0.75 0.16
rmwiki 0.84 0.65
rmywiki 0.89 0.76
rnwiki 0.98 0.67
roa_rupwiki 0.76 0.57
roa_tarawiki 0.97 0.76
ruewiki 0.84 0.57
rwwiki 0.83 0.65
sawiki 0.75 0.13
sahwiki 0.77 0.43
satwiki 0.76 0.52
scwiki 0.80 0.54
scnwiki 0.91 0.56
scowiki 0.84 0.54
sdwiki 0.75 0.45
sewiki 0.95 0.69
sgwiki 1.00 0.89
shwiki 0.90 0.51
shnwiki 0.50 0.02
siwiki 0.74 0.13
skwiki 0.91 0.59
slwiki 0.83 0.53

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine besides:

  • shnwiki has a low precision (0.50) and recall (0.02)
  • siwiki's precision (0.74) is slightly lower than the recommended one (0.75)

Talked to @MGerlach about these results and agreed that siwiki should be published but shnwiki shouldn't.

kevinbazira added a subscriber: kostajh.

@kostajh, we published datasets for all 21/22 models that passed the evaluation in this round.

elukey moved this task from In Progress to Watching on the Machine-Learning-Team board.
elukey added a subscriber: kevinbazira.
Sgs subscribed.

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

PHAB=T308141
for WIKI in quwiki rmwiki rmywiki rnwiki roa_rupwiki roa_tarawiki ruewiki rwwiki sawiki sahwiki satwiki scwiki scnwiki scowiki sdwiki sewiki sgwiki shwiki siwiki skwiki slwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    echo "Press <Enter> to continue"
    read # give time for manual verification
done

Note that the script didn't populate excludedSections for rnwiki, roa_rupwiki, roa_tarawiki and sgwiki because these were not present in the wiki_sections.jsonl, see T345562.

Change 964949 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 15th round of wikis

https://gerrit.wikimedia.org/r/964949

Change 964949 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 15th round of wikis

https://gerrit.wikimedia.org/r/964949

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:27:16Z] <sgimeno@deploy2002> Started scap: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]]

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:28:35Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:35:01Z] <sgimeno@deploy2002> Finished scap: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]] (duration: 07m 45s)

Sgs changed the task status from Open to In Progress.Oct 11 2023, 4:45 PM
Sgs moved this task from Triaged to Sprint 0 (Growth Team) on the Growth-Team board.
Sgs edited projects, added Growth-Team (Sprint 0 (Growth Team)); removed Growth-Team.
Sgs moved this task from Incoming to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.
KStoller-WMF triaged this task as Medium priority.Nov 7 2023, 6:26 PM
Trizek-WMF set Due Date to Nov 22 2023, 5:00 PM.Nov 8 2023, 5:22 PM
Sgs set the point value for this task to 1.Nov 14 2023, 4:46 PM

All wikis present results now.

Notes

rnwiki and sgwiki present a very low number of suggestions, 4 and 1 respectively, probably due to the scarce amount of articles (639 and 318). I think we can enable their frontends anyways wishing for the model to generate more results as articles grow. cc @Trizek-WMF

rmwiki and shwiki were not present in wikis.txt file leading to 400 errors on the API requests with this domains as parameters. Probably due to the fact the datasets were published before we fixed the publish-dataset.sh script in T340944. I've added them manually cc @kevinbazira

In T308141#9334257, @Sgs wrote:

I think we can enable their frontends anyways wishing for the model to generate more results as articles grow. cc @Trizek-WMF

Agreed!

Just to confirm for Tech News purposes: Is this releasing next week even though there isn't a deployment train next week (20 Nov)? Or does the Tech News entry need to be moved to the following week (27 Nov) instead?
(The current entry says "Starting on Wednesday, a new set of Wikipedias will get "Add a link" [...]" (meaning 22 Nov). Thanks!

Just to confirm for Tech News purposes: Is this releasing next week even though there isn't a deployment train next week (20 Nov)? Or does the Tech News entry need to be moved to the following week (27 Nov) instead?
(The current entry says "Starting on Wednesday, a new set of Wikipedias will get "Add a link" [...]" (meaning 22 Nov). Thanks!

Thanks for your question. All required code for Add a link tasks is already in all production wikis. We will enable the new set of Wikipedias by performing a config backport in one of the available windows, probably at 14:00 UTC.

And (IIRC) when this code will be backported, activation is a config change, which is not impacted by the absence of train.

Change 977644 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable frontend for 15th round of wikis

https://gerrit.wikimedia.org/r/977644

Change 977644 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable frontend for 15th round of wikis

https://gerrit.wikimedia.org/r/977644

Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:10:22Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]]

Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:11:36Z] <urbanecm@deploy2002> sgimeno and anzx and urbanecm: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:21:45Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]] (duration: 11m 23s)

Checked some wikis from the list - all looks as expected. Leaving the task in the Test in Production columns to monitor it during this week.