Page MenuHomePhabricator

Deploy "add a link" to third round of wikis
Closed, ResolvedPublic

Description

  • Training models
    • Catalan Wikipedia
    • Hebrew Wikipedia
    • Hindi Wikipedia
    • Korean Wikipedia
    • Norwegian Bokmål Wikipedia
    • Portuguese Wikipedia
    • Simple English Wikipedia
    • Swedish Wikipedia
    • Ukrainian Wikipedia
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end (on May 5th at 13h UTC)
  • [This task only] Notes on throughput and how long "Deploy back-end" took so that we can decide whether to make improvements for future rounds (T304953)
  • Check how the model works on the wikis
  • In Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • Inform communities
  • Deploy front-end (May 18)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Training models for the wikis listed has been completed successfully.

We have also worked on models verification using the backtesting results shown below:

Precision@0.5 Recall@0.5
cawiki 0.85 0.51
hewiki 0.75 0.28
hiwiki 0.76 0.27
kowiki 0.72 0.25
nowiki 0.84 0.54
ptwiki 0.85 0.48
simplewiki 0.79 0.44
svwiki 0.9 0.61
ukwiki 0.8 0.42

CCing @MGerlach, in case he'd like to add comments on the backtesting evaluation.

Training models for the wikis listed has been completed successfully.

We have also worked on models verification using the backtesting results shown below:
CCing @MGerlach, in case he'd like to add comments on the backtesting evaluation.

The numbers from the backtesting do not raise any red flags for me. They are comparable to what we observed in the second round ( T284481#7163025).

  • precision is above 75% except kowiki which falls slightly below at 72%; as a comparison in the second round bnwiki had the lowest precision with 73%.
  • recall is above 40% except hewiki, hiwiki, kowiki at 25-28%; as a comparison in the second round bnwiki had the lowest recall with 28%. a low number here is an indicator that we might run into problems generating enough recommendations. however, given it worked for bnwiki so far, I think this is ok.
Trizek-WMF added a project: User-notice.
Trizek-WMF updated the task description. (Show Details)
Trizek-WMF set Due Date to Apr 6 2022, 4:00 PM.

@kostajh, we completed generating models and datasets for the third round of wikis (listed in the task description) and shared the models' evaluation above, would you like us to continue with publishing the datasets?

@kostajh, we completed generating models and datasets for the third round of wikis (listed in the task description) and shared the models' evaluation above, would you like us to continue with publishing the datasets?

Yes, please go ahead with publishing the datasets. Once the service imports the datasets, we should be able to do some tests using https://api.wikimedia.org/service/linkrecommendation/apidocs/

Trizek-WMF changed Due Date from Apr 6 2022, 4:00 PM to Apr 20 2022, 4:00 PM.

I added the back end deployment and the verification of the models on each wikis to the task description. Plus a 1-2 weeks verification time as Kosta suggested.

I removed the due date because of Growth team's packed schedule. We will soon have a proper calendar for the next steps: T304953: Schedule the deployment of "Add a link" to more wikis.

Keeping here the announcement for Tech News:

* <translate>Starting on Wednesday, a new set of Wikipedias will get "[[<tvar name="1">mw:Special:MyLanguage/Help:Growth/Tools/Add a link</tvar>|Add a link]]" (<tvar name="2">{{int:project-localized-name-cawiki}}, {{int:project-localized-name-hewiki}}, {{int:project-localized-name-hiwiki}}, {{int:project-localized-name-kowiki}}, {{int:project-localized-name-nowiki}}, {{int:project-localized-name-ptwiki}}, {{int:project-localized-name-simplewiki}}, {{int:project-localized-name-svwiki}}, {{int:project-localized-name-ukwiki}}</tvar>). This is part of the progressive deployment of this tool [<tvar name="3">https://phabricator.wikimedia.org/T304110</tvar> to more Wikipedias]. The communities can [[<tvar name="4">mw:Special:MyLanguage/Growth/Community configuration</tvar>|configure how this feature works locally]].</translate> [https://phabricator.wikimedia.org/T304542]

Keeping here the announcement for Tech News:

* <translate>Starting on Wednesday, a new set of Wikipedias will get "[[<tvar name="1">mw:Special:MyLanguage/Help:Growth/Tools/Add a link</tvar>|Add a link]]" (<tvar name="2">{{int:project-localized-name-cawiki}}, {{int:project-localized-name-hewiki}}, {{int:project-localized-name-hiwiki}}, {{int:project-localized-name-kowiki}}, {{int:project-localized-name-nowiki}}, {{int:project-localized-name-ptwiki}}, {{int:project-localized-name-simplewiki}}, {{int:project-localized-name-svwiki}}, {{int:project-localized-name-ukwiki}}</tvar>). This is part of the progressive deployment of this tool [<tvar name="3">https://phabricator.wikimedia.org/T304110</tvar> to more Wikipedias]. The communities can [[<tvar name="4">mw:Special:MyLanguage/Growth/Community configuration</tvar>|configure how this feature works locally]].</translate> [https://phabricator.wikimedia.org/T304542]

I changed it to even harder to understand wikitext, however with more correct output:

* <translate>Starting on Wednesday, a new set of Wikipedias will get "[[<tvar name="1">mw:Special:MyLanguage/Help:Growth/Tools/Add a link</tvar>|Add a link]]" (<tvar name="2">{{int:project-localized-name-cawiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-hewiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-hiwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-kowiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-nowiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-ptwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-simplewiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-svwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-ukwiki/{{TRANSLATIONLANGUAGE}}}}</tvar>). This is part of the [[<tvar name="3">phab:T304110</tvar>|progressive deployment of this tool to more Wikipedias]]. The communities can [[<tvar name="4">mw:Special:MyLanguage/Growth/Community configuration</tvar>|configure how this feature works locally]].</translate> [https://phabricator.wikimedia.org/T304542]

Thank you @Tacsipacsi. I was not aware of this trick.

Yes, please go ahead with publishing the datasets.

@kostajh, thank you for the confirmation. We have published the datasets for the 9 wikis listed in the description.

Yes, please go ahead with publishing the datasets.

@kostajh, thank you for the confirmation. We have published the datasets for the 9 wikis listed in the description.

Looks like all of the datasets have imported according to the output from https://api.wikimedia.org/service/linkrecommendation/v1/linkrecommendations/wikipedia/zz/Cat?threshold=0.5&max_recommendations=15, and a test query with uk and ca wikis.

Change 789556 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] GrothExperiments: Enable Add Link backend on tier 3 wikis

https://gerrit.wikimedia.org/r/789556

Change 789556 merged by jenkins-bot:

[operations/mediawiki-config@master] GrothExperiments: Enable Add Link backend on tier 3 wikis

https://gerrit.wikimedia.org/r/789556

Mentioned in SAL (#wikimedia-operations) [2022-05-05T13:06:53Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789556|GrothExperiments: Enable Add Link backend on tier 3 wikis (T304542)]] (duration: 00m 49s)

Mentioned in SAL (#wikimedia-operations) [2022-05-05T13:25:36Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789556|GrothExperiments: Enable Add Link backend on tier 3 wikis (T304542)]] (again, used the wrong directory before) (duration: 00m 48s)

All the task pools were filled by about 14h UTC on the 7th, so in total it took about two days, which is ~5.3 hours per wiki. All ended up with decent pool sizes- 18K for simplewiki, 21-22K for the rest.

Trizek-WMF updated the task description. (Show Details)
Trizek-WMF updated Other Assignee, added: Tgr.

I will check if the lists are populated and then inform the communities.

Trizek-WMF triaged this task as High priority.May 10 2022, 7:21 PM
Trizek-WMF set Due Date to May 18 2022, 4:00 PM.

Searching for hasrecommendation:link returns no result at the following wikis:

  • Hindi Wikipedia
  • Ukrainian Wikipedia

We can deploy to the listed wikis, except the two missing ones.

These two can be removed from this list if needed, or moved to T304548: Deploy "add a link" to 4th round of wikis, as I haven't informed these communities yet.

The deployment is announced in Tech News, "starting on Wednesday" (May 18), hence it can be done anytime after this date. We can change the list of wikis written in Tech News until the distribution starts, on Monday afternoon UTC.


Anecdotally, running hasrecommendation:link at all wikis listed on this task returned often the same article topics at Special:Search. I saw several times the articles about the French National Library, ISBN, Encyclopedia of life, Wikipedia... :)

Trizek-WMF changed the task status from Open to In Progress.May 11 2022, 1:58 PM
Trizek-WMF updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2022-05-11T20:28:19Z] <tgr> T304542 running mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php hiwiki --verbose

For hiwiki/ukwiki, all mwaddlink service requests fail with There was a problem during the HTTP request: 400 Bad Request.

A manually reproduced example gives Request Line is too large (4228 > 4094).

I guess we should move sections_to_exclude to the POST body.

Anecdotally, running hasrecommendation:link at all wikis listed on this task returned often the same article topics at Special:Search. I saw several times the articles about the French National Library, ISBN, Encyclopedia of life, Wikipedia... :)

We are going through the topics in the order defined in code, so early topics might be a bit overrepresented, but otherwise article selection is random. When you use Special:Search, articles are sorted in part by the number of incoming links, which would explain the popularity of things like ISBN. (You can add &sort=random to get a proper random sampling.)

In T304542#7923850, @Tgr wrote:
In T304542#7922454, @Tgr wrote:

I guess we should move sections_to_exclude to the POST body.

T308186: Support long section exclusion lists for link recommendations

Will it be ready for next week? If not, it is not a big deal: we can move the two wikis impacted to the next round.

In T304542#7923878, @Tgr wrote:

Anecdotally, running hasrecommendation:link at all wikis listed on this task returned often the same article topics at Special:Search. I saw several times the articles about the French National Library, ISBN, Encyclopedia of life, Wikipedia... :)

We are going through the topics in the order defined in code, so early topics might be a bit overrepresented, but otherwise article selection is random. When you use Special:Search, articles are sorted in part by the number of incoming links, which would explain the popularity of things like ISBN. (You can add &sort=random to get a proper random sampling.)

Got it, thank you for the details! :)

Will it be ready for next week? If not, it is not a big deal: we can move the two wikis impacted to the next round.

The bug has been fixed; the task pool should be filled by the end of the day.

I tested Ukrainian and Hindi Wikipedia, and I now see some results. Thank you for fixing it, @Tgr!

hiwiki is a bit short on tasks (ended up with about 7500).

In T304542#7929675, @Tgr wrote:

hiwiki is a bit short on tasks (ended up with about 7500).

What happen when the list is empty?

We may have this issue for other rounds, as they would have less pages.

Also, where have you found this number, so that I can check it for future rounds?

Also, where have you found this number, so that I can check it for future rounds?

It's in https://hi.wikipedia.org/wiki/Special:NewcomerTasksInfo (same as other suggested edits); you need to look for link-recommendation. It only shows up after the task pool is at least partially loaded.

In T304542#7929675, @Tgr wrote:

hiwiki is a bit short on tasks (ended up with about 7500).

What happen when the list is empty?

No structured add a link tasks will be suggested to the users. Depending on the user's task type filters, this can mean only non-structured tasks are suggested, or that no tasks are available. For the small wikis our features are on, "no tasks" or "not enough tasks" is probably already the case, so at the very least, this won't cause us any trouble we wouldn't have w/o structured add a link deployed.

Also, where have you found this number, so that I can check it for future rounds?

You can check all wikis (in a somewhat hard-to-read format) at https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?viewPanel=31 .

Forgot about this yesterday, let's deploy the frontend today.

Change 793395 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] GrothExperiments: Enable Add Link frontend on tier 3 wikis

https://gerrit.wikimedia.org/r/793395

Change 793395 merged by jenkins-bot:

[operations/mediawiki-config@master] GrothExperiments: Enable Add Link frontend on tier 3 wikis

https://gerrit.wikimedia.org/r/793395

Mentioned in SAL (#wikimedia-operations) [2022-05-19T14:36:48Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793395|GrothExperiments: Enable Add Link frontend on tier 3 wikis (T304542)]] (duration: 00m 50s)