Page MenuHomePhabricator

Distinguish disambiguation pages from normal articles cheaply in database
Closed, ResolvedPublic

Description

Author: mediazilla

Description:
It is already implemented into MediaWiki that Redirects are not counted as "true articles". I'm requesting the feature to distinguish Disambiguation-Pages similarily. Not because of statstics but because of something else. In detail it is about the Special-Pages "Lonelypages". Most of the Disambiguation-Pages are Lonelypages. That's a matter of fact due to the fact that a link to a certain topic is mostly directed at the disambigued lemma and not the Disambiguation-Page, which is fully right the way it is done. But if you are now seeking for Lonelypages via the Special-Page for it you will find a lot of Disambiguation-Pages which do not neet to be linked by other articles as they are just Disambiguation-Pages for users who have just typed the expression seeking for an explenation. This cost a lot of my nerves.

And it's not only about my nerves, there's another thing why I think that this feature should be implemented. It still has to do with the Lonelypages.. As I mentioned you can use this tool to search for Lonelypages and link them then. But - as most of the special pages - these pages are cached pages (which is annoying but necessary - and not the point) and mostly queries limites to 1000 entries. In other words: There are very many lonelypages. More than 1000. So they can not be found via the Lonelypages-Tool. Mostly you will only get those pages beginning with A or B. Okay, no problem, you might think... You can link all the A and B pages so that at the next query C and D will come up.. But it's exactly that what won't work, because of the Disambiguation-Pages. There are also more than 1000 Disambiguation-Pages, which, as we just found out, do not need to be linkes, so they won't disappear from that list and block C, D etc.

A work-around now would be to create a page.. maybe. "List of all Disambiguations" which links those pages to make them disappear, but i guess the Tools considers pages to be lonely if no article links to them.. And "list of all disambiguations" is not a page to put into the article-namespace as it has no encyclopaedic relevance.

I hope you got the point even due to a few mistakes in my english :-)


Version: unspecified
Severity: normal

Details

Reference
bz6754

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 9:18 PM
bzimport set Reference to bz6754.

mediazilla wrote:

"if no article links to them" - what i mean is no page from the article namespace.. i guess the most of you will have figured out this already,
but just to make sure everybody understands ; )

dunc_harris wrote:

AFAIK all disambiguation pages should start with

#DISAMBIGUATION

so to be recognised as different

so that then can be implemented:

[[special:whatlinkshere]] can identify disambiguation pages.

and I'm sure there are other advantages that I have not thought, but the
#DISAMBIGUATION thing needs to go in first.

ayg wrote:

Disambiguation pages are distinguished technically from non-disambigs. See
[[Special:Disambiguations]]. Any particular requests about what should be done
with disambiguation pages should be in separate bugs.

dunc_harris wrote:

No. This is clearly not fixed. Examples of where disambiguation pages need to
be distinguished include:

In [[special:whatlinkshere]], pages that are disambiguation pages should be

identified.

  1. [[special:randompage]] should not take users to disambiguation pages.
  2. [[special:allpages]] should identify disambiguation pages from articles.

robchur wrote:

These three issues should be separate feature requests.

dunc_harris wrote:

And so would it be possible to do this without any technical alterations? Just
by using a {{template}} ??

robchur wrote:

Administrators provide a list of disambiguation pages via a page in the
MediaWiki namespace. Therefore, MediaWiki knows what pages are supposed to be
classed as disambiguation pages. To request special treatment for disambiguation
page links, etc. in certain cases, please file requests to have that done.

Do not reopen this bug, which concerned something *else*.

ayg wrote:

Okay, after looking at the code I'm no longer sure about this being resolved.
[[Special:Disambiguations]] uses an isExpensive() database query to pick out a
list of the disambigs, but I really don't see how that's usable for bug 7935,
bug 7936, bug 7937, et al. What we need for those is a cheap and easy method
like Article::isDisambig(), à la Article::isRedirect().

titoxd.wikimedia wrote:

Perhaps adding a boolean marker on the database (or a different disambiguation
table), which is updated via a hook after a page save or a purge would work.
That way, Article::isDisambig() would just make a quick query to the field or
table, and return a simple yes/no, which then can be used accordingly.

ayg wrote:

I've looked at the query and actually we store it efficiently for single-page lookups. This query is fast even on enwiki (using toolserver):

mysql> EXPLAIN SELECT 1 FROM templatelinks WHERE tl_namespace=10 AND tl_title IN ('Bio-dab', 'Dab', 'Diasmbig', 'Disamb', 'Disamb-cleanup', 'Disambig', 'Disambig-cleanup', 'Disambiguation', 'Geodis', 'Hndis', 'Hndisambig', 'Numberdis', 'Roaddis', 'Surname') AND tl_from=1234;
+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+

id select_type table type possible_keys key key_len ref rows Extra

+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+

1 SIMPLE templatelinks ref tl_from,tl_namespace tl_from 8 const,const 3 Using where; Using index

+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+
1 row in set (0.01 sec)

mysql> SELECT 1 FROM templatelinks WHERE tl_namespace=10 AND tl_title IN ('Bio-dab', 'Dab', 'Diasmbig', 'Disamb', 'Disamb-cleanup', 'Disambig', 'Disambig-cleanup', 'Disambiguation', 'Geodis', 'Hndis', 'Hndisambig', 'Numberdis', 'Roaddis', 'Surname') AND tl_from=1234;
Empty set (0.01 sec)

which verifies that article id 1234 is not a disambig. Resolving INVALID and removing dependencies; there's no problem with doing an extra JOIN for whatever queries you want, AFAICT. Special:Disambiguations is, I think, only slow because it needs a filesort for the alphabetization? Actually the query without alphabetization takes a couple of seconds to run on the toolserver, with an appropriate LIMIT, but I suspect that might be because the toolserver is overloaded.

jasonspiro4 wrote:

Julian, I respectfully disagree with something you wrote. You said that disambig pages should not be shown on the Lonelypages list. But as en:user:Revolving_Bugbear wrote: "Disambiguation pages should not be orphans, otherwise they would serve no purpose. Disambigs get wikilinked from hatnotes." --http://en.wikipedia.org/wiki/Wikipedia_talk:Special:Lonelypages#.22Except_for_disambiguation_pages_....22

jasonspiro4 wrote:

Julian, you said in your original comment that disambig pages should not be shown on the Lonelypages list. That is an old bug, bug 3483 (Disambiguation pages should not be listed in Special:Lonelypages). :-)

bluehairedlawyer wrote:

I can't see how this bug is either resolved or invalid. Surely it can't be both!? It seems like it was more of a won't fix. I'm reopening it. This feature would be very useful for disambiguation.

To respond to some points made above:

We could create a magic word called DISAMBIG . When included in a non-template page, the parser would set a field called 'page_class' in the 'page' table to a number indicating that it was a disambiguation page. This could then be used to colour links to disambiguation pages like we do now with redirects.

I've set out my proposal at Bug 18254.

  • Bug 43210 has been marked as a duplicate of this bug. ***

I implemented a 2 line fix for this at https://gerrit.wikimedia.org/r/#/c/40343/

It sets a 'disambiguation' page property for any page that includes 'DISAMBIG'

We already have a method of identifying a page as a disambig page in core (with a category). And querying for this is already efficient. I see no reason for us to have to add extra columns or unnecessary magic words when this data is already inside the database.

Actually you are mistaken, it uses templates listed on [[MediaWiki:Disambiguationspage]] which does not work well, I would suggest using a parser function like what Ryan added along with a new column in the page table.

@daniel: That solution doesn't work across wikis. It also isn't efficient since it requires hundreds of administrators to maintain hundreds of special lists in MediaiWiki space. This solution is simple and lightweight; it doesn't involve any extra columns, just a simple magic word to add to the disambiguation templates.

I would suggest adding a page_is_disambig to the DB as magic words do not work well for database queries as they are not stored in the db, they just effect page rendering

@daniel: Well it technically does work across wikis, but in a very hackish way. The existing solution is very painful.

@Betacommand: That's why I use a doubleunderscore magic word, not a regular magic word.

I'm not opposed to storing that it is a disambiguation page in the page properties, but it should be detected via the existing [[MediaWiki:Disambiguationspage]], not with a new magic word.

Since doubleunderscore magic words are especially magical (and have very little documentation), I guess I should explain how my patch actually works and what it does...

Unlike regular magic words, doubleunderscore magic words don't necessarily output anything. For example, you can put one on a line by itself and it won't effect the page rendering at all (even with the newline). The only thing they do by default is set a page property for any page that includes it. For example, when a category includes 'HIDDENCAT' that just sets a page property on the category which can then be queried using Parser::getProperty(). It does this through the existing page_props table so no new columns or tables are necessary. In fact no schema change is needed at all. This is exactly the sort of use that the page_props table was intended for and exactly the sort of use that doubleunderscore magic words were intended for. There's no need to over-engineer this with new extensions, hooks, or schema changes.

@Platonides: Why? If we have a simple efficient way to detect them why would we want to use a complicated fragile method?

In case it's not obvious, the way this would work is that we would add the magic word into the disambiguation templates. Then we wouldn't have to keep track of all of these templates via the MediaWiki pages and we would be able to query the state with a single simple function call. I don't understand why this simple solution is so controversial (or why no one has bothered to implement it for 6 years). The existing solution is a terrible hack and I doubt if most wikis are even utilizing it.

I still think a new columm in the page table can be very useful, that then can be used for not only this issue, but many other future features, (removing disambig from Special Random, excluding from article counts and several other ideas just off the top of my head) If this is done just via a magic word we lose a lot of the future features that could be built off this change.

@Betacommand: Page properties are stored in the database. Thus, any piece of MediaWiki code has access to the information. You don't have to query the database directly though, you can just use Parser::getProperty('disambiguation'). There is no reason we need to create a new column for this. Also, any solution that requires a schema change will be about 100x less likely to get deployed.

Sorry I meant ParserOutput::getProperty.

Strange, I could have sworn that we used a message to point to a category name. Not extract template links from.

Does the core handling for MediaWiki:Disambiguationspage also set this property? It should, right?

(In reply to comment #29)

Does the core handling for MediaWiki:Disambiguationspage also set this
property? It should, right?

[[MediaWiki:Disambiguationspage]] seems only to be queried on [[Special:Disambiguations]] where pages with the property disambiguation will probably only be duplicates. If you mean if code should be added that if on page save a page transcludes a template contained in [[MediaWiki:Disambiguationspage]] the page property disambiguation is set, hell, no :-). Just add the magic word to the templates, marvel at the nice and easily understandable and maintainable code (not only from the PHP, but also from the wiki side), and when everybody feels comfortable, get rid of [[MediaWiki:Disambiguationspage]].

I killed the gerrit change since apparently no one wants disambiguation code in core. I'll see about writing an extension instead.

(In reply to comment #31)

I killed the gerrit change since apparently no one wants disambiguation code
in
core. I'll see about writing an extension instead.

Moving to MediaWiki extension to avoid closing it WONTFIX.

(In reply to comment #30)

(In reply to comment #29)

Does the core handling for MediaWiki:Disambiguationspage also set this
property? It should, right?

[[MediaWiki:Disambiguationspage]] seems only to be queried on
[[Special:Disambiguations]] where pages with the property disambiguation will
probably only be duplicates. If you mean if code should be added that if on
page save a page transcludes a template contained in
[[MediaWiki:Disambiguationspage]] the page property disambiguation is set,
hell, no :-). Just add the magic word to the templates, marvel at the nice
and
easily understandable and maintainable code (not only from the PHP, but also
from the wiki side), and when everybody feels comfortable, get rid of
[[MediaWiki:Disambiguationspage]].

Whether or not the handling of MediaWiki:Disambiguationspage uses this property isn't a problem. We can't/shouldn't migrate internals fully to this property as it would require a re-parse of every page.

It could be useful to set the property on-parse, but that's for a later point in time.

However introducing this property is pointless if disambiguation-queries don't use it. If Special:Disambiguations exclusively uses MediaWiki:Disambiguationspage, then this property is just a meaningless property that happens to be named "disambiguation".

(In reply to comment #33)

[...]
Whether or not the handling of MediaWiki:Disambiguationspage uses this
property
isn't a problem. We can't/shouldn't migrate internals fully to this property
as
it would require a re-parse of every page.

No, it doesn't. The addition of the magic word to the template triggers the setting of the property in the pages that transclude the template (at least according to my tests). So the load isn't any different than that from other changes to the templates.

[...]
However introducing this property is pointless if disambiguation-queries
don't
use it. If Special:Disambiguations exclusively uses
MediaWiki:Disambiguationspage, then this property is just a meaningless
property that happens to be named "disambiguation".

I think its introduction is a fine example for the procedure we follow for schema changes as well: Small steps that can be rolled back at any time. Even in the initial stage, the property is not pointless as it can far more easily be queried by the gadgets. For example, [[de:MediaWiki:Gadget-bkl-check.js]] doesn't use the "official" [[MediaWiki:Disambiguationspage]], but relies on that the templates on the German Wikipedia add a category "Begriffsklärung". With a property "disambiguation", this gadget can be used globally with a little change and no knowledge of the local category names.

I have rewritten Special:Disambiguations as Special:Disambiguator. The output is exactly the same, but it uses the page property rather than the complicated multistep queries. This new page will be checked in as soon as I can get a new gerrit project created. The old page will be retired as soon as everyone is happy with using the new page. Besides, once the information is easily available from the database, I don't know why anyone would want to scrape a special page instead.

Change 40349, a dependency for this bug, has been merged.

(In reply to comment #36)

Change 40349, a dependency for this bug, has been merged.

For convenience: https://gerrit.wikimedia.org/r/#/c/40349/ (Creating new GetDoubleUnderscoreIDs hook)

(In reply to comment #35)

I have rewritten Special:Disambiguations as Special:Disambiguator. The
output is exactly the same, but it uses the page property rather than the
complicated multistep queries. This new page will be checked in as soon
as I can get a new gerrit project created. The old page will be retired
as soon as everyone is happy with using the new page. Besides, once the
information is easily available from the database, I don't know why
anyone would want to scrape a special page instead.

This is https://gerrit.wikimedia.org/r/41043.

Isn't it fixed with Extension:Disambiguator ?

(In reply to comment #39)

Isn't it fixed with Extension:Disambiguator ?

I think so and opened bug 50174 requesting it to be installed on WMF wikis.

The Disambiguator extension is installed on all Wikimedia wikis now. For instructions on how to use it, see https://www.mediawiki.org/wiki/Extension:Disambiguator#Usage.

FYI, specific issues and enhancements are now tracked in the Disambiguator component: https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=%3Adisambiguator