Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make validation of block html tags and attributes case insensitive #19207

Merged
merged 4 commits into from Dec 20, 2019

Conversation

glendaviesnz
Copy link
Contributor

@glendaviesnz glendaviesnz commented Dec 17, 2019

Description

Fixes #18666 by lower casing html tags and attributes before comparison is done.

How has this been tested?

Have tested manually by modifying html tag and attribute casing in the block editor 'code editor' view and then switching back to 'visual editor' view. Have also updated the existing unit test to cover case differences

Types of changes

Adds lower casing of strings before comparison checks are done.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code has proper inline documentation.

@glendaviesnz glendaviesnz added [Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Block Validation/Deprecation Handling block validation to determine accuracy and deprecation labels Dec 17, 2019
@glendaviesnz glendaviesnz self-assigned this Dec 17, 2019
packages/blocks/src/api/test/validation.js Show resolved Hide resolved
attribute[ 0 ] = attribute[ 0 ].toLowerCase();
return attribute;
} ) )
.map( fromPairs );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hyper-conscious to the performance impact we might have in these changes, considering that this validation runs over every set of attributes for the markup of every block in a (potentially very long) post.

A couple thoughts:

  • Can we flatten any of these Array#map into a single mapping function?
  • Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will take another look at it from the optimisation angle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A proxy would be a nice way of deferring the case sensitivity check to only failure cases, eg.

const expectedAttributesProxy = new Proxy(expectedAttributes, {
	get( target, property ) {
		if ( target[ property ] ) {
			return target[ property ];
		}
		const caseInsensitiveMatch = target[ Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ];
		if ( caseInsensitiveMatch ) {
			return caseInsensitiveMatch;
		}
		return;
	},
	getOwnPropertyDescriptor( target, property ) {
		if( target.hasOwnProperty( property ) ) {
			return { configurable: true, enumerable: true };
		}
                 // do a case insensitive match
		if ( Object.keys( target ).find( key => key.toLowerCase() === property.toLowerCase() ) ){
			return { configurable: true, enumerable: true };
		}
	}
});

But it doesn't look like the getOwnPropertyDescriptor trap can be polyfilled in IE11. Will look for a more old school approach :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower, as did using a single reduce instead of the maps, but a good old fashioned for loop is quite a bit faster in this instance, eg. with a page with 20 paragraph blocks with class and style attributes:

Original case sensitive check - 0.02 milliseconds
Case insensitive check with extra maps (current PR) - 0.045 milliseconds
Single reducer instead of multiple maps - 0.07 milliseconds
Single for loop instead of maps - 0.005 milliseconds

So we can make a case insensitive check faster than the current case sensitive check if we optimise with a for loop for creating the attribute lookup instead of a functional approach, eg.

const expectedAttributes = {};
for (let i = 0; i < expected.length; i++) {
	expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ];
}
const actualAttributes = {};
for (let i = 0; i < expected.length; i++) {
	actualAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[i][ 1 ];
}

What are your thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of a play around with this, running fromPairs within the lowercasing map made it slower,

By this, do you mean something like:

const [ actualAttributes, expectedAttributes ] = [ actual, expected ]
	.map( ( attributes ) => fromPairs( attributes.map( ( attribute ) => {
		attribute[ 0 ] = attribute[ 0 ].toLowerCase();
		return attribute;
	} ) ) );

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

but a good old fashioned for loop is quite a bit faster in this instance

It doesn't surprise me a whole lot, since for loops tend to be optimized quite well in browsers. The Array#map tends to be a bit more expressive, and I could imagine future browser optimizations to bring the performance characteristics closer together. I'm fine with either approach, to be honest. Optimizing using a for loop could be a nice way to offset the potential impact of these changes.

Could we defer an "expensive" case-insensitive comparison to occur only when an initial failure is detected (i.e. optimize a mismatch as the rare edge case)

By defer, my original thought was more on the lines of: Only do the case-insensitive mapping / comparison if we reach a point where the function would otherwise return false;. In other words, optimize for the majority case that the block is valid and that we don't need to do a case-insensitive matching.

If it turns out that this is too cumbersome to implement, I don't think we need to worry about it too much. But it could be a nice optimization if it turned out to be simple to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of flattening was the first thought I had in looking at this. I can't see how that could possibly be slower than two separate Array#map.

yeh, I was a bit mystified as to why it was slower, maybe I was just made some error with the timers as I was chopping and changing chunks of code a bit, I might give it another go out of interest.

I will also take another look at possible ways to defer the case sensitivity check as you mention.

I am away over the holiday period so probably won't get to it until sometime after 6 Jan. Anyone else is welcome to take over the PR and do some more work on this in the mean time.

Copy link
Contributor

@mmtr mmtr Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took over this and tried to optimize the checks by using a for loop and deferring the case-insensitive checks for the minority case that block is invalid: d57882d

It has been a bit cumbersome, but comments should help to understand the intention. I'm happy to simplify that if we prefer easier code to read/maintain over optimizations.

@gwwar
Copy link
Contributor

Thanks, this tests well for me manually @glendaviesnz ✅The blocker here is seeing if this is performant enough for us to include the more "moderate" check.

Before After
Screen Shot 2019-12-19 at 2 26 12 PM Screen Shot 2019-12-19 at 2 33 43 PM

With the following markup:

<!-- wp:paragraph -->
<P>A P TAG</P>
<!-- /wp:paragraph -->

Comment on lines 414 to 419
for ( let i = 0; i < actual.length; i++ ) {
actualAttributes[ actual[ i ][ 0 ] ] = actual[ i ][ 1 ];
expectedAttributes[ expected[ i ][ 0 ].toLowerCase() ] = expected[ i ][ 1 ];
}

for ( const name in actualAttributes ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to let this get carried away, but with the revised implementation, it seems like we could easily combine these two loops into a single loop, since they're both effectively iterating over the same set of actual attributes.

On further review, this isn't possible, because the reason these objects are created separate from the subsequent loop is to support the case that attributes can occur in different orders. However, there's a middle-ground option here, in that only the "expected" comparison needs to be converted to an object. We can avoid creating an actualAttributes object, since we'll loop over the actual set anyways in the next loop.

See 57effe9 .

There's probably not much gained through this, but it's a decent simplification to avoid an unnecessary object.

For me, it's enough to offset a "cost" of doing the lower-casing in all code paths. As noted below, I have some concerns where trying to use it as a fallback could leave potential for unexpected behavior.

if ( ! expectedAttributes.hasOwnProperty( name ) ) {
if (
! expectedAttributes.hasOwnProperty( name ) &&
// Optimization: case-insensitive check deferred for the minority case that the block is invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is part of the blocks module, but this utility was meant to be a very generic "HTML equivalence" utility, one which I have some hopes to pull out to a standalone module at some point. References to "block validity", at least within the generic bits (i.e. not isValidBlockContent, etc), would ideally be avoided.

const actualValue = actualAttributes[ name ];
const expectedValue = expectedAttributes[ name ];
const expectedValue = expectedAttributes[ name ] || expectedAttributes[ name.toLowerCase() ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that leaves me a bit uncomfortable with this is in whether one or these would be a falsey value, where if the expectedAttributes[ name ] value was explicitly an empty string, we should not want it to consider the fallback || here. In trying to come up with a unit test which would fail, I'm unable to. But for the purpose of keeping this maintainable, I'm thinking we should just simplify how we normalize the case. At the very least, I think we should not use a blanket "falsey" check here, but rather condition against expectedAttributes.hasOwnProperty.

@mmtr mmtr merged commit ddc98cb into master Dec 20, 2019
@mmtr mmtr deleted the update/block-validation-case-sensitivity branch December 20, 2019 15:23
@youknowriad youknowriad added this to the Gutenberg 7.2 milestone Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Block Validation/Deprecation Handling block validation to determine accuracy and deprecation [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Block API: Validation should be case insensitive
5 participants