Page MenuHomePhabricator

Language converter: unsafe attribute injection via glossary rules (CVE-2017-8815)
Closed, ResolvedPublic

Assigned To
Authored By
Bawolff
Nov 20 2015, 1:14 AM
Referenced Files
F10769778: T119158-REL1_27.patch
Nov 13 2017, 8:27 PM
F10769788: T119158-master.patch
Nov 13 2017, 8:27 PM
F10769782: T119158-REL1_29.patch
Nov 13 2017, 8:27 PM
F10769787: T119158-REL1_30.patch
Nov 13 2017, 8:27 PM
F10769779: T119158-REL1_28.patch
Nov 13 2017, 8:27 PM
F4307602: T119158-REL1_23.patch
Aug 1 2016, 12:08 PM
F4307607: T119158-part2-REL1_23.patch
Jul 26 2016, 1:45 AM
F4307600: T119158-REL1_27.patch
Jul 26 2016, 1:45 AM
Tokens
"Cup of Joe" token, awarded by RandomDSdevel.

Description

While looking at the patch for T97157, I realised that language converter can be tricked in even more ways, by adding conversion rule delimiters.

*sigh*

Basically, -{}- breaks up the string it runs the regex on, so if you insert it in an attribute, then language converter no longer realizes its in the middle of a bunch of html.

Some examples:

-{H|abc=>sr-el:" onfocus="alert(1)" onload="alert(2)" data-foo="}-

{{special:Contributions|target=-{}-abc-{}-}}

[[File:"8th Anniversary Celebration" - NARA - 285576.tif|100px|alt=-{n}-abc-{}-]]

(This assumes your language code is set to 'sr', you are viewing the sr-el variant, and you have an image on your wiki with that name).

Interestingly, Sanitizer::safeEncodeAttribute already escapes all the neccesary stuff. So ordinary tags aren't affected.

See also T118755

Patches

  • Master (As of July 26, 2016): ,
  • REL1_27: ,
  • REL1_26: ,
  • REL1_23: ,

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

If we're going to use the more complex regex here, we may want to consider using it in autoConvert() too -

(This patch is totally separate from the other one on this bug. Technically its more for T73394: XSS in language converter when used with Html class's tricky escaping then this bug. Note also that this patch is based on top of the patch for T124404: language converter can be tricked into replacing text inside tags by adding a lot of junk after the rule definition (CVE-2017-8814)). I tested this patch also with the obama article, and noticed no performance change.

Actually i just discovered that we put raw ">" in the id attributes for headlines (similar to what we used to do with HTML::element). Thus that patch is actually needed for security (unless we want to continue playing whackamole about escaping >)

try #2:

I believe this fixes the (original) issue, and should be safe to deploy.

I was trying to add a parser test for this specific case, and can't seem to get either of your original test cases to run correctly in the parser tests (<poem> works fine though). @Bawolff, would you be able to get a parser test added?

try #2:

I believe this fixes the (original) issue, and should be safe to deploy.

I was trying to add a parser test for this specific case, and can't seem to get either of your original test cases to run correctly in the parser tests (<poem> works fine though). @Bawolff, would you be able to get a parser test added?

This adds a parser test. Locally the test fails before the patch, and passes afterwards [Using the parserTest.php runner]. Sometimes I seem to have issues with tests behaving locally differently for me then other people when images are involved, but I don't think I've had that recently

Note, this will break about 1600 links on zhwiki that are relying on the old unsafe behaviour:

MariaDB [zhwiki_p]> select count(*) from externallinks where  el_to like '%-\%7B%\%7D-%' limit 4;
+----------+
| count(*) |
+----------+
|     1674 |
+----------+
1 row in set (3.67 sec)

This should be considered an estimate. May have false positives

List of affected pages and links https://tools.wmflabs.org/bawolff/zhwiki-langconv.txt

Note: this is out of 5136008, so a very small percentage. If you only count main namespace, its only 1292 links.

@liangent @cscott So this is going to break those links. Any advice for how best to communicate this change to the zhwiki community?

I'm going to help you on that task. I have contacts with zh.wp which can help us.

You say that change will break about 1600 links on zhwiki. Is there a technical way to modify these links locally before you apply that patch?

That task is triaged as high. When do you plan to release that patch? Have a short notice will not allow people to correct links.

Let's warn others wikis too, just with messages on VPs: is the 'gan', 'iu', 'kk', 'ku', 'shi', 'sr', 'tg' and 'uz' list exhaustive? How many pages will be broken on these wikis? If there is a wiki where thousand of links may be broken, we may consider to focus on it.

I'm going to help you on that task. I have contacts with zh.wp which can help us.

You say that change will break about 1600 links on zhwiki. Is there a technical way to modify these links locally before you apply that patch?

That task is triaged as high. When do you plan to release that patch? Have a short notice will not allow people to correct links.

We use prority field to mean severity in the security component. So high just means this is a bad issue (XSS-could lead to taking over other users accounts, etc), not that it will be neccesarily be fixed super soon.

As far as timeline goes. Ideal timeline would be like monday, as then we could get it in 1.27.1 release. However thats probably totally unreasonable. So if we miss the 1.27.1 release we have a lot of time. I would definitely like to see this deployed at the very latest 3 weeks before 1.27.2. 1.27.2 is currently scheduled for first wednessday of october, so definitely by early/mid september. However beyond that we arent super constrained by time. That said, once we start publically talking about the issue, we would want to deploy the fix relatively quickly lest people figure out why we are making the change.

Let's warn others wikis too, just with messages on VPs: is the 'gan', 'iu', 'kk', 'ku', 'shi', 'sr', 'tg' and 'uz' list exhaustive? How many pages will be broken on these wikis? If there is a wiki where thousand of links may be broken, we may consider to focus on it.

Yes, the list is exhaustive (if you add zh) This is language codes, not projects so you've also got wikinews/source/quote etc.

There is a possibility this feature could interact with pages translated to one of the above mentioned languages by the translate extension on a multilingual wiki

Ill run stats on those wikis to see how many links we are dealing with, and get back to you

You say that change will break about 1600 links on zhwiki. Is there a technical way to modify these links locally before you apply that patch?

In theory someone could write a bot to do it. I dont have all that much experiance with bots, so if we go that route, id prefer that someone not be me.

bawolff@tools-bastion-03:~$ for i in `egrep 'commons|meta|zh|gan|iu|kk|ku|shi|sr|tg|uz' all.dblist ` ; do echo "select count(*) "\'$i\'" from externallinks where  el_to like '%-\%7B%\%7D-%'" | sql $i | cat ; done
commonswiki
0
fiu_vrowiki
0
ganwiki
0
iuwiki
0
iuwiktionary
0
kkwiki
0
kkwikibooks
0
kkwikiquote
0
kkwiktionary
0
kuwiki
2
kuwikibooks
0
kuwikiquote
0
kuwiktionary
0
ltgwiki
0
metawiki
0
srnwiki
0
srwiki
2287
srwikibooks
0
srwikinews
0
srwikiquote
0
srwikisource
0
srwiktionary
0
tgwiki
5
tgwikibooks
0
tgwiktionary
0
uzwiki
8
uzwikibooks
0
uzwikiquote
0
uzwiktionary
0
zh_classicalwiki
0
zh_min_nanwiki
0
zh_min_nanwikibooks
0
zh_min_nanwikiquote
0
zh_min_nanwikisource
0
zh_min_nanwiktionary
0
zh_yuewiki
0
zhwiki
1662
zhwikibooks
0
zhwikinews
0
zhwikiquote
0
zhwikisource
0
zhwikivoyage
0
zhwiktionary
4

So looks like serbian is also an issue.

If you need to deploy the fix on Monday, we can leave emergency messages on sr.wp and zh.wp tomorrow. That is not the best option at all, but it is possible. I would prefer to have more time, and the mid-September window would be optimal.

If we decide to take the September option, we can leave a message 3 weeks before, and a new one every week. I can post it on bot assistance pages on these wikis (Serbian ; Chinese). Other wikis can also have a warning if we are on that option.

The message would contain:

  • LanguageConverter, used to switch from one language to an other, has a security issue. Add explanation about what is affected.
  • That change is only affecting a few percentage of pages (0,67% of pages on sr.wp ; 0,18% on zh.wp)
  • There is a possibility this feature could interact with pages translated to one of the above mentioned languages by the translate extension on a multilingual wiki
  • These broken links can be fixed by using a bot. We are not providing that bot assistance.

Questions:

  • Are pages affected because Templates are using that bad syntax?
  • Would it be possible to create a category tracking pages containing these errors? Or to have a list of pages to give to the communities? That would be very helpful and appreciated.
  • What will happen of the change is not done when you deploy that patch?
  • Is it possible to do the change before that patch is deployed? Is there an alternative to obtain the same result? If yes, we have to add that to the message.

If you need to deploy the fix on Monday, we can leave emergency messages on sr.wp and zh.wp tomorrow. That is not the best option at all, but it is possible. I would prefer to have more time, and the mid-September window would be optimal.

Sounds good. (To be clear, when I said september, I meant that's the longest period we can wait, we can also do it sooner than that if desired).

If we decide to take the September option, we can leave a message 3 weeks before, and a new one every week. I can post it on bot assistance pages on these wikis (Serbian ; Chinese). Other wikis can also have a warning if we are on that option.

One thing however, is that we want to avoid drawing attention to this being a "security" issue until after the patch is deployed (Its not very hard for someone to figure out what the issue is if we call it a security issue).

Perhaps we can use T87332 as cover (Links using this syntax do not show up properly on Special:LinkSearch)

The message would contain:

  • LanguageConverter, used to switch from one language to an other, has a security issue. Add explanation about what is affected.
  • That change is only affecting a few percentage of pages (0,67% of pages on sr.wp ; 0,18% on zh.wp)

Note, the previous stats I ran gave the number of links, not the number of pages affected (An affected page could have multiple links). I could re-run the stats for number of pages affected if we need.

  • There is a possibility this feature could interact with pages translated to one of the above mentioned languages by the translate extension on a multilingual wiki
  • These broken links can be fixed by using a bot. We are not providing that bot assistance.

Questions:

  • Are pages affected because Templates are using that bad syntax?

Not particularly, however {{GeoGroup}} is probably responsible for a third of the links on zhwiki. There is no significant cases from templates on srwiki

  • Would it be possible to create a category tracking pages containing these errors? Or to have a list of pages to give to the communities? That would be very helpful and appreciated.

We can get a list from the db. https://tools.wmflabs.org/bawolff/zhwiki-langconv.txt is an example for zh (We could probably prettify that for users).

I'd prefer not to do a tracking category (Tracking categories take a while to show up on pages, which would make it kind of ineffective)

  • What will happen of the change is not done when you deploy that patch?

The links will break. By break I mean they will go to a website that doesn't exist.

  • Is it possible to do the change before that patch is deployed? Is there an alternative to obtain the same result? If yes, we have to add that to the message.

The change required for the links can be done right now. That is

http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}-

Can be replaced with the desired

-{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }-

right now and everything works fine (Note that the space before }- is really important)

One other thought I had - Once we make the announcement, we could potentially add this form of link to the spam blacklist, preventing anyone from adding new links of this type. This could help ensure that people won't try to add new links of this form in the pre-deployment warning period. I'm not sure if that's a good idea.

Based on your answers (thanks!), I think we should send a message to Bots handlers, and introduce that as a trivial change. We don't need to panic the whole community.

Key points are now:

  • LanguageConverter, used to switch from one language to an other, has a problem: links using LC do not show up properly on Special:LinkSearch. To fix that, we need to change how links are written (add the example)
  • That change is only affecting a few percentage of pages (0,67% of pages on sr.wp ; 0,18% on zh.wp). We have lists of concerned pages (to generate and attach).
  • If you don't make that change, links will break.
  • There is a possibility this feature could interact with pages translated to one of the above mentioned languages by the translate extension on a multilingual wiki
  • These broken links can be fixed by using a bot. We are not providing that bot assistance, but the change is trivial. Note: I think any bot handler can perform the change.
  • Please spread the word and update documentation, etc.

When you would have reviewed these key points and generated proper lists of pages where changes must be done, I would recommend to leave that message asap, and monitor changes (I'll cc you on these messages). I'll ping some users (at least on zh.wp) to help us to monitor that change too, as well as updating Help pages if needed. Then, let's wait for one week to apply the patch (let's say on 16th of August to hurry them a little bit).

One other thought I had - Once we make the announcement, we could potentially add this form of link to the spam blacklist, preventing anyone from adding new links of this type. This could help ensure that people won't try to add new links of this form in the pre-deployment warning period. I'm not sure if that's a good idea.

zh.wp has an AbuseFilter system, not sr.wp. AbuseFulter would be nicer than the brutal spam filter because it can provide an explanation.

The change required for the links can be done right now. That is

http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}-

Can be replaced with the desired

-{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }-

right now and everything works fine (Note that the space before }- is really important)

To clarify, this is the only change that the bot would need to do?

The change required for the links can be done right now. That is

http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}-

Can be replaced with the desired

-{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }-

right now and everything works fine (Note that the space before }- is really important)

To clarify, this is the only change that the bot would need to do?

Yep.

Re: Trizek: im going to be away all next week. I meant to generate the list, etc yesterday but didnt get to it. I probably wont be able to do it until week after next

Re: Trizek: im going to be away all next week. I meant to generate the list, etc yesterday but didnt get to it. I probably wont be able to do it until week after next

FYI I'll be OoO the week after you.

@Trizek-WMF Sorry, I kind of let this fall by the wayside. I would really like to get this going.

An Updated version of the lists of affected links are at https://tools.wmflabs.org/bawolff/langconv/ (Note, that these lists are statically generated, so they won't change as people update the wiki until I manually regenerate them.

Is there anything else you need from me?

(Note to self: SQL query used is: select page_namespace "ns", page_title "title", el_to "link" from externallinks inner join page on el_from = page_id where el_to like "%-\%7B%\%7D-%" order by page_namespace, page_title, el_to;

That's the message I plan to send to bots owners on the concerned wikis:


LanguageConverter (LC), used to switch from one language to an other, has a problem: links using LanguageConverter do not show up properly on Special:LinkSearch.

These broken links can be fixed manually or by using a bot. We are not providing that bot assistance, but the change is trivial:

Old syntax:
http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}-

New syntax:
-{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }-

The space before }- is really important.

That change is only affecting a few percentage of pages. You can find the list for your wiki here: https://tools.wmflabs.org/bawolff/langconv/

If you don't make that change, links will break.

Please spread the word and update your local documentation! If you have any question, please ping me or @Bawolff.


Is there anything important that I'm missing?

OK, I'll post that tomorrow!

Announcements links to follow up:

Reading the task again, I've noticed wikis that were not included on these 6 ones:

  • commons
  • meta
  • gan
  • iu
  • kk
  • shi

I guess we have to warn them. Is that list exhaustive?

I plan to send them the following message:

Hello!

LanguageConverter (LC), used to switch from one language to an other, has a problem: links using LanguageConverter with a wrond wyntax do not show up properly on Special:LinkSearch.

If you want to use LC for an external link, a new syntax must be used:

Old syntax (examples for Chinese languages):
<code><nowiki>http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}-</nowiki></code>

New syntax:
<code><nowiki>-{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }-</nowiki></code>

The space before <code><nowiki> }-</nowiki></code> is ''really'' important.

If you use the old syntax, links will break.

Please spread the word and update your local documentation! If you have any question, please ping me or [[user:Bawolff|Bawolff]].

Thanks, ~~~~

I dont think it is neccesary to warn those other wikis. They are the wikis with lang converter enabled but have 0 usages of this aspect of lang converter, so there is nothing to fix.

Perhaps the message should say to ping me as [[User:BWolff (WMF)]] since im doing this in a work not volunteer context.

Other than that, message sounds good. Thank you for doing this.

I dont think it is neccesary to warn those other wikis. They are the wikis with lang converter enabled but have 0 usages of this aspect of lang converter, so there is nothing to fix.

We should warn them of that change is they pnal to use it, shouldn't we?
If yes, is that list correct?

  • commons
  • meta
  • gan
  • iu
  • kk
  • shi

Perhaps the message should say to ping me as [[User:BWolff (WMF)]] since im doing this in a work not volunteer context.

I've checked your user profile attached to your account and I've seen that page where "I'm a WMF staff" (paraphrasing) is written. My bad :/

Other than that, message sounds good. Thank you for doing this.

De rien !

I have only followed this discussion tangentially.

This comment is not relevant to the work that you guys are doing, but, based on what I am reading in the message, T43716#469890 seems related. While we (Parsoid team back then) never got to doing that, looks like that strategy needs to be adapted to deal with this potential security scenario till such time lang variant code runs *after* the sanitizer.

I dont think it is neccesary to warn those other wikis. They are the wikis with lang converter enabled but have 0 usages of this aspect of lang converter, so there is nothing to fix.

We should warn them of that change is they pnal to use it, shouldn't we?
If yes, is that list correct?

  • commons
  • meta
  • gan
  • iu
  • kk
  • shi

Ok. I think we should skip commons and meta because language converter can only come into play on certain subpages using the translate extension, so its a really obscure edge case for those wikis. The rest of that list is correct.

Can't access to shi.wp. Which language were you targeting?

For Commons, I've posted the message on Templates' project. I haven't found an equivalent on Meta, so I've skipped it.

@Trizek-WMF when should i make the actual change?

@Trizek-WMF when should i make the actual change?

You have announced that change in Tech News, which will be delivered on Monday/Tuesday. When you want after Tuesday.

Communies had time to notice the change. I've checked the announcements, and no comment has been done. Have you noticed any change on the use of the incorrect syntax after the announcement?

@Trizek-WMF when should i make the actual change?

You have announced that change in Tech News, which will be delivered on Monday/Tuesday. When you want after Tuesday.

Communies had time to notice the change. I've checked the announcements, and no comment has been done. Have you noticed any change on the use of the incorrect syntax after the announcement?

I've been talking with someone from zh at WikiCon NA, he says the change is more complicated then it appears at first glance but they are working on it (So far about half of the links on zh have been fixed thus far)

Currently a lot of these links come from wikidata #property expansion:

http://example.com/?{{#property:P1402|from=Q776995}}

gives out

*************************** 1. row ***************************
   el_id: 14586147
 el_from: 548296
   el_to: http://example.com/?-%7Bzh:61803;zh-hans:61803;zh-hant:61803;zh-cn:61803;zh-hk:61803;zh-mo:61803;zh-sg:61803;zh-tw:61803;%7D-
el_index: http://com.example./?-%7Bzh:61803;zh-hans:61803;zh-hant:61803;zh-cn:61803;zh-hk:61803;zh-mo:61803;zh-sg:61803;zh-tw:61803;%7D-
1 row in set (0.01 sec)

These are difficult for us to workaround imo.

Also [-{zh-cn:http://example.com ;zh-tw:http://example.net ;}- 文字] doesn't work, thus requiring bots to convert the text part explicitly. It would be great if this could be avoided :)

This is a lot more complicated than I originally thought. Maybe I have to rethink the approach to fixing this bug.

This is a lot more complicated than I originally thought. Maybe I have to rethink the approach to fixing this bug.

Is it going to have an impact on communities?

Reedy lowered the priority of this task from High to Medium.Jul 10 2017, 9:25 PM
Bawolff raised the priority of this task from Medium to High.Jul 11 2017, 7:00 PM

So this kind of got forgotten about in the hustle and bustle combined with me just not knowing what to do about the wikidata thing. That's my bad (sorry).

So in addition to the wikidata thing, the other concerning thing was people using code like this:

<pre>-{}- This text here will now be converted</pre>

(and similar for <code>)

Now looking at zhwiki, it appears both the wikidata links (http://example.com/?{{#property:P1402|from=Q776995}}) and the translated blocks inside <pre>, are pretty rare. Are there other things that might break here?

Ok, I think we need to bite the bullet and just do this. I intend to make this change tonight.

Revised versions of patches (Note, I combined the part 1 & 2 thing into just one patch)

(as of nov 13, 2017)



So this kind of got forgotten about in the hustle and bustle combined with me just not knowing what to do about the wikidata thing. That's my bad (sorry).

So in addition to the wikidata thing, the other concerning thing was people using code like this:

<pre>-{}- This text here will now be converted</pre>

(and similar for <code>)

Actually, this patch probably won't affect the <pre> behvaiour.

[22:49]	bawolff	!log deployed patch T119158 (Will affect language converter. -{}- no longer allowed in link urls)
MoritzMuehlenhoff renamed this task from Language converter: unsafe attribute injection via glossary rules to Language converter: unsafe attribute injection via glossary rules (CVE-2017-8815).Nov 14 2017, 9:30 AM
Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 15 2017, 12:06 AM
Reedy changed the edit policy from "Custom Policy" to "All Users".

Change 391416 merged by Ejegg:
[mediawiki/core@fundraising/REL1_27] SECURITY: Handle -{}- syntax in attributes safely

https://gerrit.wikimedia.org/r/391416

Change 391455 had a related patch set uploaded (by Reedy; owner: Brian Wolff):
[mediawiki/core@master] SECURITY: Handle -{}- syntax in attributes safely

https://gerrit.wikimedia.org/r/391455

Change 391455 merged by Reedy:
[mediawiki/core@master] SECURITY: Handle -{}- syntax in attributes safely

https://gerrit.wikimedia.org/r/391455

Change 391738 had a related patch set uploaded (by Chad; owner: Brian Wolff):
[mediawiki/core@wmf/1.31.0-wmf.8] SECURITY: Handle -{}- syntax in attributes safely

https://gerrit.wikimedia.org/r/391738

@Bawolff

Just to make sure I parse the release notes and stuff correctly for Tech News: Is this a good description and will it be in production from wmf.10?

"Language converter syntax will no longer work inside external links. Wikitext like http://-{zh-cn:foo.com; zh-hk:bar.com; zh-tw:baz.com}- must be replaced. You will have to write -{zh-cn: http://foo.com ; zh-hk: http://bar.com ; zh-tw:http://baz.com }- instead. This only affects languages with Language Converter enabled. Examples of such languages are Chinese and Serbian."

Yes thats correct.

I sent out an email to wikitech-ambassadors as well https://lists.wikimedia.org/pipermail/wikitech-ambassadors/2017-November/001730.html but im hopeful that tech news will better reach these communities as its an issue that only affects a few non english languages.

e

Change 391738 abandoned by Chad:
SECURITY: Handle -{}- syntax in attributes safely

https://gerrit.wikimedia.org/r/391738

ssastry lowered the priority of this task from High to Low.Nov 30 2021, 6:01 PM

The only calls that involve the Parser object are:

->replaceVariables( $input )
->getTargetLanguage()->getDir();
->getOptions()->getUserLangObj();
->getOutput()->addModuleStyles( [
->getContentLanguage();
->recursiveTagParse( $this->mLabelText );
->getTargetLanguageConverter();

Of those, repalceVariables call is probably no longer relevant in the Parsoid context since all variables will be replaced when templates are expanded. Of the rest, the only one that is a potential issue is the call to fetch the language converted to lang-convert any language variant markup in extension options. That would need some Parsoid changes to expose it directly to extensions.

That said, the current Parsoid strategy of using the legacy parser to expand extensions works well with InputBox and I don't see it breaking immediately. As such, it is lower priority to migrate this extension over to use Parsoid natively. So, I am going to lower the priority of this task.