Page MenuHomePhabricator

Tighten up ParserOutput::setPageProperty() to only allow string values; add ::setIndexedPageProperty() for numeric values
Open, Needs TriagePublic

Description

See the discussion in T301915#7821799: although ParserOutput will properly store any JSON-serializable value, the deferred LinksUpdate task will stringify everything before storing it in the database. Further, the sort key will be null unless the value is int|float|bool.

This is a dangerous pattern, as many implicit conversions in PHP will convert between numeric strings (strings which "look like" numbers) and int|float. For example, storing a numeric string as an array_key will convert it to a number. It is too easy to either (a) add bogus sort keys to things like title strings that just sometimes happen to be numbers (and which were stored as array keys and thus numeric-ified), or (b) store a numeric string instead of an actual number and thus get the sort key dropped (see T350224).

It would be best to split this into two methods:

  • ::setIndexedPageProperty() which only takes int|float and coerces its values to numbers --- at least until PHP 8 when we can use the union type as a type hint
    • This can also allow specification of sort key independent from value, as hinted by a note in PagePropsTable (Txxxx)
  • ::setUnindexedPageProperty() which only takes string and ensures the values won't be inadvertently indexed

Details

Subject Repo Branch Lines +/-
mediawiki/vendor master +279 -121
mediawiki/extensions/ProofreadPage master +2 -2
mediawiki/services/parsoid REL1_42 +136 -20
mediawiki/services/parsoid master +136 -20
mediawiki/services/parsoid master +111 -20
mediawiki/core REL1_42 +5 -3
mediawiki/core REL1_42 +4 -3
mediawiki/core master +5 -3
mediawiki/core master +4 -3
mediawiki/core REL1_42 +174 -48
mediawiki/core master +174 -48
mediawiki/core master +15 -2
mediawiki/extensions/Wikibase master +2 -2
mediawiki/core master +29 -15
mediawiki/extensions/Wikibase master +34 -33
mediawiki/core master +24 -5
mediawiki/extensions/WikibaseLexeme master +6 -6
mediawiki/extensions/Wikibase master +1 -1
mediawiki/extensions/PagedTiffHandler master +4 -4
mediawiki/extensions/LiquidThreads master +3 -3
mediawiki/extensions/WikidataPageBanner master +2 -2
mediawiki/extensions/Score master +3 -2
mediawiki/extensions/JsonConfig master +2 -2
mediawiki/extensions/Kartographer master +2 -2
mediawiki/extensions/Cite master +2 -2
mediawiki/extensions/ProofreadPage master +4 -1
mediawiki/core master +14 -7
Show related patches Customize query in gerrit

Event Timeline

Change 775881 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] [doc only] Improve the documentation re non-string page property values

https://gerrit.wikimedia.org/r/775881

Change 775881 merged by jenkins-bot:

[mediawiki/core@master] [doc only] Improve the documentation re non-string page property values

https://gerrit.wikimedia.org/r/775881

Change 775901 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] WIP: Only store strings as page property values

https://gerrit.wikimedia.org/r/775901

Change 775902 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Cite@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775902

Change 775903 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/JsonConfig@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775903

Change 775905 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Kartographer@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775905

Change 775882 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Deprecate the use of non-string page property values

https://gerrit.wikimedia.org/r/775882

Change 775907 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/LiquidThreads@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775907

Change 775908 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/PagedTiffHandler@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775908

Change 775909 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/ProofreadPage@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775909

Change 775911 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Score@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775911

Change 775915 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/WikidataPageBanner@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775915

Change 775909 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775909

Change 776016 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/WikibaseLexeme@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/776016

Change 776019 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] Ensure page properties are cast to string before comparison

https://gerrit.wikimedia.org/r/776019

Change 775902 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775902

Change 775905 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775905

Change 775903 merged by jenkins-bot:

[mediawiki/extensions/JsonConfig@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775903

Change 775911 merged by jenkins-bot:

[mediawiki/extensions/Score@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775911

Change 775915 merged by jenkins-bot:

[mediawiki/extensions/WikidataPageBanner@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775915

Change 775907 merged by jenkins-bot:

[mediawiki/extensions/LiquidThreads@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775907

Change 775908 merged by jenkins-bot:

[mediawiki/extensions/PagedTiffHandler@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/775908

Change 776019 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Ensure page properties are cast to string before comparison

https://gerrit.wikimedia.org/r/776019

Change 776016 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Page properties should always be strings

https://gerrit.wikimedia.org/r/776016

This change makes no sense to me. The database table is specifically optimized for numerical values with pp_sortkey – why should we cast properties that semantically represent numbers and will be used as numbers into strings?

cscott renamed this task from Tighten up ParserOutput::setPageProperty() to only allow string values to Tighten up ParserOutput::setPageProperty() to only allow string values; add ::setIndexedPageProperty() for numeric values.Feb 16 2024, 2:54 PM
cscott updated the task description. (Show Details)

Change 1004146 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput::setPageProperty(): Update documentation

https://gerrit.wikimedia.org/r/1004146

Change 1004147 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values

https://gerrit.wikimedia.org/r/1004147

Change 1004148 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

https://gerrit.wikimedia.org/r/1004148

Change 775882 abandoned by C. Scott Ananian:

[mediawiki/core@master] Deprecate the use of non-string page property values

Reason:

Abandoned in favor of I8a39a7c90341dfee932aa819c9a0a637a8782f69; see updated T305158.

https://gerrit.wikimedia.org/r/775882

Change 1004195 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] [GlobeCoordinateKartographerDataUpdaterTest] Use valid page property values

https://gerrit.wikimedia.org/r/1004195

Change 775901 abandoned by C. Scott Ananian:

[mediawiki/extensions/Wikibase@master] Page properties should always be strings

Reason:

Abandoned in favor of Ib36787d04c0ca713587dc8b814ca1c5a827f6f72

https://gerrit.wikimedia.org/r/775901

Change 1004146 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput::setPageProperty(): Update documentation

https://gerrit.wikimedia.org/r/1004146

Change 1004195 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] [GlobeCoordinateKartographerDataUpdaterTest] Use valid page property values

https://gerrit.wikimedia.org/r/1004195

Change 1004147 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values

https://gerrit.wikimedia.org/r/1004147

Change #1004148 merged by jenkins-bot:

[mediawiki/core@master] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

https://gerrit.wikimedia.org/r/1004148

Change #1018798 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] [CMC] Add ::setUnindexedPageProperty and ::setIndexedPageProperty

https://gerrit.wikimedia.org/r/1018798

Change #1018970 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_42] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

https://gerrit.wikimedia.org/r/1018970

Change #1018970 abandoned by C. Scott Ananian:

[mediawiki/core@REL1_42] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

Reason:

Botched the cherry-pick

https://gerrit.wikimedia.org/r/1018970

Change #1018970 restored by C. Scott Ananian:

[mediawiki/core@REL1_42] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

https://gerrit.wikimedia.org/r/1018970

Change #1018970 merged by jenkins-bot:

[mediawiki/core@REL1_42] Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties

https://gerrit.wikimedia.org/r/1018970

Change #1019071 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput::setUnindexedPageProperty(): use empty string as default value

https://gerrit.wikimedia.org/r/1019071

Change #1019258 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] ParserOutput: clarify that “indexed” refers to value

https://gerrit.wikimedia.org/r/1019258

Change #1019179 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_42] ParserOutput::setUnindexedPageProperty(): use empty string as default value

https://gerrit.wikimedia.org/r/1019179

Change #1019300 had a related patch set uploaded (by C. Scott Ananian; author: Lucas Werkmeister (WMDE)):

[mediawiki/core@REL1_42] ParserOutput: clarify that “indexed” refers to value

https://gerrit.wikimedia.org/r/1019300

Change #1019071 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput::setUnindexedPageProperty(): use empty string as default value

https://gerrit.wikimedia.org/r/1019071

Change #1019258 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput: clarify that “indexed” refers to value

https://gerrit.wikimedia.org/r/1019258

Change #1019360 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] [CMC] Add ::setUnsortedPageProperty and ::setSortedPageProperty

https://gerrit.wikimedia.org/r/1019360

Change #1019179 merged by jenkins-bot:

[mediawiki/core@REL1_42] ParserOutput::setUnindexedPageProperty(): use empty string as default value

https://gerrit.wikimedia.org/r/1019179

Change #1019300 merged by jenkins-bot:

[mediawiki/core@REL1_42] ParserOutput: clarify that “indexed” refers to value

https://gerrit.wikimedia.org/r/1019300

Change #1018798 abandoned by C. Scott Ananian:

[mediawiki/services/parsoid@master] [CMC] Add ::setUnindexedPageProperty and ::setIndexedPageProperty

Reason:

Abandoned in favor of Ied9948e10e49113dc0082726b16759e0676444f8

https://gerrit.wikimedia.org/r/1018798

Change #1019360 merged by jenkins-bot:

[mediawiki/services/parsoid@master] [CMC] Add ::setUnsortedPageProperty and ::setNumericPageProperty

https://gerrit.wikimedia.org/r/1019360

Change #1020340 had a related patch set uploaded (by Arlolra; author: C. Scott Ananian):

[mediawiki/services/parsoid@REL1_42] [CMC] Add ::setUnsortedPageProperty and ::setNumericPageProperty

https://gerrit.wikimedia.org/r/1020340

Change #1020340 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_42] [CMC] Add ::setUnsortedPageProperty and ::setNumericPageProperty

https://gerrit.wikimedia.org/r/1020340

Change #1004193 had a related patch set uploaded (by Subramanya Sastry; author: C. Scott Ananian):

[mediawiki/extensions/ProofreadPage@master] Ensure pages are indexed by quality level

https://gerrit.wikimedia.org/r/1004193

Change #1004193 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Ensure pages are indexed by quality level

https://gerrit.wikimedia.org/r/1004193

Change #1022258 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a2

https://gerrit.wikimedia.org/r/1022258

Change #1022258 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a2

https://gerrit.wikimedia.org/r/1022258