Make validation of block html tags and attributes case insensitive #19207

glendaviesnz · 2019-12-17T23:14:25Z

Description

Fixes #18666 by lower casing html tags and attributes before comparison is done.

How has this been tested?

Have tested manually by modifying html tag and attribute casing in the block editor 'code editor' view and then switching back to 'visual editor' view. Have also updated the existing unit test to cover case differences

Types of changes

Adds lower casing of strings before comparison checks are done.

Checklist:

My code is tested.
My code follows the WordPress code style.
My code has proper inline documentation.

packages/blocks/src/api/test/validation.js

aduth · 2019-12-18T14:58:31Z

packages/blocks/src/api/validation/index.js

+ attribute[ 0 ] = attribute[ 0 ].toLowerCase();
+ return attribute;
+ } ) )
+ .map( fromPairs );


I am hyper-conscious to the performance impact we might have in these changes, considering that this validation runs over every set of attributes for the markup of every block in a (potentially very long) post.

A couple thoughts:

Can we flatten any of these Array#map into a single mapping function?

Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

good point, will take another look at it from the optimisation angle.

A proxy would be a nice way of deferring the case sensitivity check to only failure cases, eg.

const expectedAttributesProxy = new Proxy(expectedAttributes, { get( target, property ) { if ( target[ property ] ) { return target[ property ]; } const caseInsensitiveMatch = target[ Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ]; if ( caseInsensitiveMatch ) { return caseInsensitiveMatch; } return; }, getOwnPropertyDescriptor( target, property ) { if( target.hasOwnProperty( property ) ) { return { configurable: true, enumerable: true }; } // do a case insensitive match if ( Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ){ return { configurable: true, enumerable: true }; } } });

But it doesn't look like the getOwnPropertyDescriptor trap can be polyfilled in IE11. Will look for a more old school approach :-)

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower, as did using a single reduce instead of the maps, but a good old fashioned for loop is quite a bit faster in this instance, eg. with a page with 20 paragraph blocks with class and style attributes:

Original case sensitive check - 0.02 milliseconds
Case insensitive check with extra maps (current PR) - 0.045 milliseconds
Single reducer instead of multiple maps - 0.07 milliseconds
Single for loop instead of maps - 0.005 milliseconds

So we can make a case insensitive check faster than the current case sensitive check if we optimise with a for loop for creating the attribute lookup instead of a functional approach, eg.

const expectedAttributes = {}; for (let i = 0; i < expected.length; i++) { expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ]; } const actualAttributes = {}; for (let i = 0; i < expected.length; i++) { actualAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ]; }

What are your thoughts on this?

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower,

By this, do you mean something like:

const [ actualAttributes, expectedAttributes ] = [ actual, expected ] .map( ( attributes ) => fromPairs( attributes.map( ( attribute ) => { attribute[ 0 ] = attribute[ 0 ].toLowerCase(); return attribute; } ) ) );

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

but a good old fashioned for loop is quite a bit faster in this instance

It doesn't surprise me a whole lot, since for loops tend to be optimized quite well in browsers. The Array#map tends to be a bit more expressive, and I could imagine future browser optimizations to bring the performance characteristics closer together. I'm fine with either approach, to be honest. Optimizing using a for loop could be a nice way to offset the potential impact of these changes.

Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

By defer, my original thought was more on the lines of: Only do the case-insensitive mapping / comparison if we reach a point where the function would otherwise return false;. In other words, optimize for the majority case that the block is valid and that we don't need to do a case-insensitive matching.

If it turns out that this is too cumbersome to implement, I don't think we need to worry about it too much. But it could be a nice optimization if it turned out to be simple to implement.

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

yeh, I was a bit mystified as to why it was slower, maybe I was just made some error with the timers as I was chopping and changing chunks of code a bit, I might give it another go out of interest.

I will also take another look at possible ways to defer the case sensitivity check as you mention.

I am away over the holiday period so probably won't get to it until sometime after 6 Jan. Anyone else is welcome to take over the PR and do some more work on this in the mean time.

I took over this and tried to optimize the checks by using a for loop and deferring the case-insensitive checks for the minority case that block is invalid: d57882d

It has been a bit cumbersome, but comments should help to understand the intention. I'm happy to simplify that if we prefer easier code to read/maintain over optimizations.

gwwar · 2019-12-19T22:37:31Z

Thanks, this tests well for me manually @glendaviesnz ✅The blocker here is seeing if this is performant enough for us to include the more "moderate" check.

Before	After

With the following markup:

<!-- wp:paragraph -->
<P>A P TAG</P>
<!-- /wp:paragraph -->

… for minority case

…hecking the case-insensitive behavior

aduth · 2019-12-20T13:45:57Z

packages/blocks/src/api/validation/index.js

+ for ( let i = 0; i < actual.length; i++ ) {
+ actualAttributes[ actual[ i ][ 0 ] ] = actual[ i ][ 1 ];
+ expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[ i ][ 1 ];
+ }

 for ( const name in actualAttributes ) {


I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

On further review, this isn't possible, because the reason these objects are created separate from the subsequent loop is to support the case that attributes can occur in different orders. However, there's a middle-ground option here, in that only the "expected" comparison needs to be converted to an object. We can avoid creating an actualAttributes object, since we'll loop over the actual set anyways in the next loop.

See 57effe9 .

There's probably not much gained through this, but it's a decent simplification to avoid an unnecessary object.

For me, it's enough to offset a "cost" of doing the lower-casing in all code paths. As noted below, I have some concerns where trying to use it as a fallback could leave potential for unexpected behavior.

aduth · 2019-12-20T13:50:44Z

packages/blocks/src/api/validation/index.js

- if ( ! expectedAttributes.hasOwnProperty( name ) ) {
+ if (
+ ! expectedAttributes.hasOwnProperty( name ) &&
+ // Optimization: case-insensitive check deferred for the minority case that the block is invalid.


I realize this is part of the blocks module, but this utility was meant to be a very generic "HTML equivalence" utility, one which I have some hopes to pull out to a standalone module at some point. References to "block validity", at least within the generic bits (i.e. not isValidBlockContent, etc), would ideally be avoided.

aduth · 2019-12-20T14:20:31Z

packages/blocks/src/api/validation/index.js

 const actualValue = actualAttributes[ name ];
- const expectedValue = expectedAttributes[ name ];
+ const expectedValue = expectedAttributes[ name ] || expectedAttributes[ name.toLowerCase() ];


One thing that leaves me a bit uncomfortable with this is in whether one or these would be a falsey value, where if the expectedAttributes[ name ] value was explicitly an empty string, we should not want it to consider the fallback || here. In trying to come up with a unit test which would fail, I'm unable to. But for the purpose of keeping this maintainable, I'm thinking we should just simplify how we normalize the case. At the very least, I think we should not use a blanket "falsey" check here, but rather condition against expectedAttributes.hasOwnProperty.

Make validation of block html tags and attributes case insensitive

0792f4a

glendaviesnz requested a review from aduth December 17, 2019 23:14

glendaviesnz requested review from ellatrix and youknowriad as code owners December 17, 2019 23:14

glendaviesnz self-assigned this Dec 17, 2019

aduth reviewed Dec 18, 2019

View reviewed changes

mmtr and others added 3 commits December 20, 2019 12:01

Optimize case-insensitive checks with a for loop and deferring checks…

d57882d

… for minority case

Add test cases for isEqualTagAttributePairs and isEqualTokensOfType c…

255aee0

…hecking the case-insensitive behavior

Blocks: Validation: Avoid creation of actualAttributes object map

57effe9

aduth approved these changes Dec 20, 2019

View reviewed changes

mmtr merged commit ddc98cb into master Dec 20, 2019

mmtr deleted the update/block-validation-case-sensitivity branch December 20, 2019 15:23

gwwar mentioned this pull request Dec 20, 2019

Try adding more permissive block validation modes #19188

Closed

youknowriad added this to the Gutenberg 7.2 milestone Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make validation of block html tags and attributes case insensitive #19207

Make validation of block html tags and attributes case insensitive #19207

glendaviesnz commented Dec 17, 2019 •

edited

aduth Dec 18, 2019

glendaviesnz Dec 18, 2019

glendaviesnz Dec 18, 2019 •

edited

glendaviesnz Dec 19, 2019 •

edited

aduth Dec 19, 2019

glendaviesnz Dec 19, 2019

mmtr Dec 20, 2019 •

edited

gwwar commented Dec 19, 2019

aduth Dec 20, 2019

aduth Dec 20, 2019

aduth Dec 20, 2019

aduth Dec 20, 2019 •

edited

Make validation of block html tags and attributes case insensitive #19207

Make validation of block html tags and attributes case insensitive #19207

Conversation

glendaviesnz commented Dec 17, 2019 • edited

Description

How has this been tested?

Types of changes

Checklist:

aduth Dec 18, 2019

Choose a reason for hiding this comment

glendaviesnz Dec 18, 2019

Choose a reason for hiding this comment

glendaviesnz Dec 18, 2019 • edited

Choose a reason for hiding this comment

glendaviesnz Dec 19, 2019 • edited

Choose a reason for hiding this comment

aduth Dec 19, 2019

Choose a reason for hiding this comment

glendaviesnz Dec 19, 2019

Choose a reason for hiding this comment

mmtr Dec 20, 2019 • edited

Choose a reason for hiding this comment

gwwar commented Dec 19, 2019

aduth Dec 20, 2019

Choose a reason for hiding this comment

aduth Dec 20, 2019

Choose a reason for hiding this comment

aduth Dec 20, 2019

Choose a reason for hiding this comment

aduth Dec 20, 2019 • edited

Choose a reason for hiding this comment

glendaviesnz commented Dec 17, 2019 •

edited

glendaviesnz Dec 18, 2019 •

edited

glendaviesnz Dec 19, 2019 •

edited

mmtr Dec 20, 2019 •

edited

aduth Dec 20, 2019 •

edited