Page MenuHomePhabricator

Deploy "add a link" to 14th round of wikis
Closed, ResolvedPublic

Description

Event Timeline

15/16 models were trained successfully in the 14th round of wikis.

The Chinese Wikipedia (zhwiki) training pipeline returned a UnicodeEncodeError being investigated in this task T325521#8717012.

kevinbazira added a subscriber: MGerlach.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5 Recall@0.5
wawiki 0.81 0.40
warwiki 0.95 0.77
wowiki 0.83 0.54
wuuwiki 0.00 0.00
xalwiki 0.99 0.60
xhwiki 0.83 0.32
xmfwiki 0.76 0.27
yiwiki 0.76 0.44
yowiki 0.96 0.83
zawiki 0.91 0.61
zeawiki 0.97 0.78
zh_classicalwiki 0.00 0.00
zh_min_nanwiki 0.97 0.84
zh_yuewiki 0.48 0.00
zuwiki 0.97 0.80

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine besides:

  • wuuwiki, zh_classicalwiki, and zh_yuewiki which have extremely low precision and recall compared to the recommended threshold of 0.75 and 0.2.

Talked to @MGerlach about these results and agreed that wuuwiki, zh_classicalwiki, and zh_yuewiki should not be deployed.

He also said:

I think for the zhwiki_* it is expected as we rely mostly on whitespaces for word-tokenization to identify link-candidates in the text which is likely to not work for these languages.

@kostajh, we published datasets for all 12/16 models that passed the evaluation in this round.

Trizek-WMF updated the task description. (Show Details)

I moved Swahili Wikipedia from a later group as we plan to collaborate with them.

Change 954004 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend for swwiki

https://gerrit.wikimedia.org/r/954004

Change 954004 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend for swwiki

https://gerrit.wikimedia.org/r/954004

Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:09:02Z] <sgimeno@deploy1002> Started scap: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:10:43Z] <sgimeno@deploy1002> sgimeno: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:16:37Z] <sgimeno@deploy1002> Finished scap: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]] (duration: 07m 34s)

Change 960074 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 14th round of wikis

https://gerrit.wikimedia.org/r/960074

Sgs changed the task status from Open to In Progress.Sep 22 2023, 3:04 PM
Sgs claimed this task.
Sgs edited projects, added Growth-Team (Sprint 0 (Growth Team)); removed Growth-Team.
Sgs moved this task from Incoming to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

PHAB=T308139
for WIKI in wawiki warwiki wowiki xalwiki xhwiki xmfwiki yiwiki yowiki zawiki zeawiki zh_min_nanwiki zuwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    echo "Press <Enter> to continue"
    read # give time for manual verification
done

Note that the script didn't populate excludedSections for zawiki and zh_min_nanwiki because these were not present in the wiki_sections.jsonl, see T345562.

Change 960074 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 14th round of wikis

https://gerrit.wikimedia.org/r/960074

Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:01:47Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]]

Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:14:23Z] <urbanecm@deploy2002> urbanecm and sgimeno: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:25:15Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]] (duration: 23m 28s)

I've checked the enabled wikis and all present a fair amount of results except for:

  • xalwiki returns 5 results
  • xmfwiki returns 3 results
  • xhwiki returns 0 results

@kevinbazira do you have any clues on why the model produces few results in the mentioned wikis?

I think we can go ahead and enable the frontend in all wikis but xhwiki. Even with a few number of results in xalwiki and xmfwiki we expect the script to keep generating some suggestions. How does that sound @Trizek-WMF? What's the phab ticket for tracking wikis with AddLink suggestions issues, I can't find it under the scaling epic 🤔

In T308139#9216703, @Sgs wrote:

I've checked the enabled wikis and all present a fair amount of results except for:

  • xalwiki returns 5 results
  • xmfwiki returns 3 results
  • xhwiki returns 0 results

@kevinbazira do you have any clues on why the model produces few results in the mentioned wikis?

@Sgs the models produce few results in the mentioned wikis because those languages have few articles (as shown below) and those articles possibly have fewer links for the models to learn and make predictions from.

CCing @MGerlach, in case he would like to add comments on the above.

What's the phab ticket for tracking wikis with AddLink suggestions issues, I can't find it under the scaling epic 🤔

The tracking for add-a-link models that had issues and were not published was happening in this task: T309263. We might not want to add these languages to that task though, since they passed the evaluation (T308139#8723643) and got published (T308139#8728522). As more articles with links in these languages become available, the models are expected to perform better after the next training round (T336927#8864508).

Trizek-WMF set Due Date to Oct 11 2023, 4:00 PM.Oct 3 2023, 1:19 PM
Trizek-WMF updated the task description. (Show Details)

Change 964929 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink frontend 14th round of wikis

https://gerrit.wikimedia.org/r/964929

Change 964929 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink frontend 14th round of wikis

https://gerrit.wikimedia.org/r/964929

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:15:43Z] <sgimeno@deploy2002> Started scap: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]]

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:17:10Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:24:48Z] <sgimeno@deploy2002> Finished scap: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]] (duration: 09m 05s)

Sgs changed the task status from In Progress to Open.Oct 11 2023, 4:44 PM
Sgs moved this task from In Progress to QA on the Growth-Team (Sprint 0 (Growth Team)) board.
Etonkovidova subscribed.

Selectively checked some wikis from the list:

xalwiki has only 6 suggested articles; no suggested edits have been made so far
xhwiki has 98 suggested articles; no suggested edits have been made so far
zawiki has 52 suggested articles; no suggested edits have been made so far

Moving to Test in Production|Watching for monitoring and for possible feedback.