Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Optimize obj.GetType() != typeof(X) for sealed classes #32790

Merged
merged 7 commits into from Feb 27, 2020

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 25, 2020

Motivation: #29093 (comment)

impIsClassExact now returns true for arrays of sealed classes, e.g. String[] (but returns false for arrays of other things including primitives and enums).

Also, optimizes obj.GetType() == typeof(X) to a const + nullcheck (can be eliminated later by Jit) if we can prove both types are sealed, e.g.:

public static bool Check<T>(T[] array)
{
    // co-variance check, e.g. see Span<T>'s ctor
    return array.GetType() != typeof(T[]);
}

Old codegen for Check<string>:

       mov      rax, 0xD1FFAB1E
       cmp      qword ptr [rcx], rax 
       setne    al
       movzx    rax, al
       ret

New codegen for Check<string>:

       cmp      dword ptr [rcx], ecx  ;; null-check for array
       xor      eax, eax
       ret

Jit-diff (-f -pmi):

(Correct me if I am wrong but the diff should be way bigger once jit learns how to inline generics into shared methods -- because Span<T> uses that check a lot)

Found 5 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -363 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
        -135 : System.Private.CoreLib.dasm (-0.00% of base)
        -129 : System.IO.FileSystem.Watcher.dasm (-0.84% of base)
         -75 : System.Net.Http.dasm (-0.01% of base)
         -24 : System.Text.RegularExpressions.dasm (-0.01% of base)

4 total files with Code Size differences (4 improved, 0 regressed), 158 unchanged.

Top method improvements (bytes):
         -87 (-55.41% of base) : System.Private.CoreLib.dasm - System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:this
         -65 (-10.38% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:Insert(int,System.String):this
         -64 (-10.83% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:RemoveAt(int):this
         -48 (-3.48% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitMultiple(System.ReadOnlySpan`1[[System.Threading.WaitHandle, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,int):int
         -25 (-2.34% of base) : System.Net.Http.dasm - System.Net.Http.Http2Connection:WriteHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-1.43% of base) : System.Net.Http.dasm - System.Net.Http.Http3RequestStream:BufferHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-5.94% of base) : System.Net.Http.dasm - System.Net.Http.Headers.HttpHeaders:GetValuesAsStrings(System.Net.Http.Headers.HeaderDescriptor,HeaderStoreItemInfo,System.Object):System.String[]
         -24 (-9.80% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.Regex:GetGroupNames():System.String[]:this

Top method improvements (percentages):
         -87 (-55.41% of base) : System.Private.CoreLib.dasm - System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:this
         -64 (-10.83% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:RemoveAt(int):this
         -65 (-10.38% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:Insert(int,System.String):this
         -24 (-9.80% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.Regex:GetGroupNames():System.String[]:this
         -25 (-5.94% of base) : System.Net.Http.dasm - System.Net.Http.Headers.HttpHeaders:GetValuesAsStrings(System.Net.Http.Headers.HeaderDescriptor,HeaderStoreItemInfo,System.Object):System.String[]
         -48 (-3.48% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitMultiple(System.ReadOnlySpan`1[[System.Threading.WaitHandle, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,int):int
         -25 (-2.34% of base) : System.Net.Http.dasm - System.Net.Http.Http2Connection:WriteHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-1.43% of base) : System.Net.Http.dasm - System.Net.Http.Http3RequestStream:BufferHeaderCollection(System.Net.Http.Headers.HttpHeaders):this

8 total methods with Code Size differences (8 improved, 0 regressed), 305673 unchanged.

1 files had text diffs but no metric diffs.
System.Runtime.Serialization.Formatters.dasm had 66 diffs

Will add tests later.

cc @AndyAyersMS

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2020
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this.

src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/gentree.cpp Outdated Show resolved Hide resolved
@@ -20742,7 +20756,7 @@ bool Compiler::impCanSkipCovariantStoreCheck(GenTree* value, GenTree* array)
}

// Check for T[] with T exact.
if (!impIsClassExact(arrayElementHandle))
if (!impIsClassExact(arrayElementHandle, false))
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I enable impIsClassExact for arrays by default this call will cause ~400bytes regression. Example: https://www.diffchecker.com/nKWlHaG7

This comment was marked as duplicate.

This comment was marked as duplicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a size regression but quite likely a perf improvement -- the code no longer needs the covariant store check, it just needs a write barrier.

The store check helper also does bounds checking, which now must be done in the jitted code, and is what causes the size growth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the "bad" diffs are like that, I'm ok with a small size regression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github acting weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, I thought it introduced a new bound check. Will post the new diff later then.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 25, 2020

@AndyAyersMS
New diff is:

Found 13 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 22 (0.00% of base)
    diff is a regression.

Top file regressions (bytes):
         147 : System.Linq.Parallel.dasm (0.01% of base)
          55 : Microsoft.Diagnostics.FastSerialization.dasm (0.05% of base)
          26 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
          14 : System.Private.CoreLib.dasm (0.00% of base)
          13 : System.Private.Xml.dasm (0.00% of base)

Top file improvements (bytes):
        -129 : System.IO.FileSystem.Watcher.dasm (-0.84% of base)
         -75 : System.Net.Http.dasm (-0.01% of base)
         -24 : System.Text.RegularExpressions.dasm (-0.01% of base)
          -5 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)

9 total files with Code Size differences (4 improved, 5 regressed), 153 unchanged.

Top method regressions (bytes):
          74 (81.32% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.JaggedArray`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Allocate(int,int):System.Numerics.Vector`1[System.Single][][]
          70 ( 3.43% of base) : System.Private.CoreLib.dasm - System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1[ReadOnlyMemory`1][System.ReadOnlyMemory`1[System.Char]]:Return(System.ReadOnlyMemory`1[System.Char][],bool):this (2 methods)
          35 ( 9.11% of base) : Microsoft.Diagnostics.FastSerialization.dasm - System.Collections.Generic.SegmentedList`1[Vector`1][System.Numerics.Vector`1[System.Single]]:EnsureCapacity(int):this
          32 ( 1.20% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.SortHelper`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:MergeSortCooperatively():this
          31 ( 2.82% of base) : System.Private.CoreLib.dasm - System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Return(System.Numerics.Vector`1[System.Single][],bool):this
          22 ( 5.13% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.SortHelper`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:QuickSortIndicesInPlace(System.Linq.Parallel.GrowingArray`1[Int64],System.Collections.Generic.List`1[Vector`1],ubyte):this
          20 ( 8.23% of base) : Microsoft.Diagnostics.FastSerialization.dasm - System.Collections.Generic.SegmentedList`1[Vector`1][System.Numerics.Vector`1[System.Single]]:.ctor(int,int):this
          19 (16.10% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.AsynchronousChannel`1[Vector`1][System.Numerics.Vector`1[System.Single]]:EnqueueChunk(System.Numerics.Vector`1[System.Single][]):this
          19 ( 4.59% of base) : System.Private.CoreLib.dasm - Bucket[Vector`1][System.Numerics.Vector`1[System.Single]]:Return(System.Numerics.Vector`1[System.Single][]):this
          17 (16.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this
          17 (13.71% of base) : System.Private.CoreLib.dasm - LockedStack[Vector`1][System.Numerics.Vector`1[System.Single]]:TryPush(System.Numerics.Vector`1[System.Single][]):bool:this
          13 ( 0.75% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Runtime.XmlQueryStaticData:.ctor(System.Byte[],System.Type[]):this
          12 ( 5.17% of base) : System.Private.CoreLib.dasm - PerCoreLockedStacks[Vector`1][System.Numerics.Vector`1[System.Single]]:TryPush(System.Numerics.Vector`1[System.Single][]):this
           9 ( 2.81% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceCodeAddresses:get_Item(int):Microsoft.Diagnostics.Tracing.Etlx.TraceCodeAddress:this

Top method improvements (bytes):
         -87 (-55.41% of base) : System.Private.CoreLib.dasm - System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:this
         -65 (-10.38% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:Insert(int,System.String):this
         -64 (-10.83% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:RemoveAt(int):this
         -48 (-3.48% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitMultiple(System.ReadOnlySpan`1[[System.Threading.WaitHandle, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,int):int
         -25 (-2.34% of base) : System.Net.Http.dasm - System.Net.Http.Http2Connection:WriteHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-1.43% of base) : System.Net.Http.dasm - System.Net.Http.Http3RequestStream:BufferHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-5.94% of base) : System.Net.Http.dasm - System.Net.Http.Headers.HttpHeaders:GetValuesAsStrings(System.Net.Http.Headers.HeaderDescriptor,HeaderStoreItemInfo,System.Object):System.String[]
         -24 (-9.80% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.Regex:GetGroupNames():System.String[]:this
          -5 (-0.23% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BuiltInOperators:GetSimpleBuiltInOperators(int,Microsoft.CodeAnalysis.ArrayBuilder`1[BinaryOperatorSignature]):this

Top method regressions (percentages):
          74 (81.32% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.JaggedArray`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Allocate(int,int):System.Numerics.Vector`1[System.Single][][]
          17 (16.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this
          19 (16.10% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.AsynchronousChannel`1[Vector`1][System.Numerics.Vector`1[System.Single]]:EnqueueChunk(System.Numerics.Vector`1[System.Single][]):this
          17 (13.71% of base) : System.Private.CoreLib.dasm - LockedStack[Vector`1][System.Numerics.Vector`1[System.Single]]:TryPush(System.Numerics.Vector`1[System.Single][]):bool:this
          35 ( 9.11% of base) : Microsoft.Diagnostics.FastSerialization.dasm - System.Collections.Generic.SegmentedList`1[Vector`1][System.Numerics.Vector`1[System.Single]]:EnsureCapacity(int):this
          20 ( 8.23% of base) : Microsoft.Diagnostics.FastSerialization.dasm - System.Collections.Generic.SegmentedList`1[Vector`1][System.Numerics.Vector`1[System.Single]]:.ctor(int,int):this
          12 ( 5.17% of base) : System.Private.CoreLib.dasm - PerCoreLockedStacks[Vector`1][System.Numerics.Vector`1[System.Single]]:TryPush(System.Numerics.Vector`1[System.Single][]):this
          22 ( 5.13% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.SortHelper`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:QuickSortIndicesInPlace(System.Linq.Parallel.GrowingArray`1[Int64],System.Collections.Generic.List`1[Vector`1],ubyte):this
          19 ( 4.59% of base) : System.Private.CoreLib.dasm - Bucket[Vector`1][System.Numerics.Vector`1[System.Single]]:Return(System.Numerics.Vector`1[System.Single][]):this
          70 ( 3.43% of base) : System.Private.CoreLib.dasm - System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1[ReadOnlyMemory`1][System.ReadOnlyMemory`1[System.Char]]:Return(System.ReadOnlyMemory`1[System.Char][],bool):this (2 methods)
          31 ( 2.82% of base) : System.Private.CoreLib.dasm - System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Return(System.Numerics.Vector`1[System.Single][],bool):this
           9 ( 2.81% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceCodeAddresses:get_Item(int):Microsoft.Diagnostics.Tracing.Etlx.TraceCodeAddress:this
          32 ( 1.20% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.SortHelper`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:MergeSortCooperatively():this
          13 ( 0.75% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Runtime.XmlQueryStaticData:.ctor(System.Byte[],System.Type[]):this

Top method improvements (percentages):
         -87 (-55.41% of base) : System.Private.CoreLib.dasm - System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:this
         -64 (-10.83% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:RemoveAt(int):this
         -65 (-10.38% of base) : System.IO.FileSystem.Watcher.dasm - ImmutableStringList:Insert(int,System.String):this
         -24 (-9.80% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.Regex:GetGroupNames():System.String[]:this
         -25 (-5.94% of base) : System.Net.Http.dasm - System.Net.Http.Headers.HttpHeaders:GetValuesAsStrings(System.Net.Http.Headers.HeaderDescriptor,HeaderStoreItemInfo,System.Object):System.String[]
         -48 (-3.48% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitMultiple(System.ReadOnlySpan`1[[System.Threading.WaitHandle, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,int):int
         -25 (-2.34% of base) : System.Net.Http.dasm - System.Net.Http.Http2Connection:WriteHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
         -25 (-1.43% of base) : System.Net.Http.dasm - System.Net.Http.Http3RequestStream:BufferHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
          -5 (-0.23% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BuiltInOperators:GetSimpleBuiltInOperators(int,Microsoft.CodeAnalysis.ArrayBuilder`1[BinaryOperatorSignature]):this

23 total methods with Code Size differences (9 improved, 14 regressed), 305658 unchanged.

3 files had text diffs but no metric diffs.
System.Runtime.Serialization.Formatters.dasm had 66 diffs
System.Net.Requests.dasm had 28 diffs
Microsoft.CSharp.dasm had 6 diffs

Asm diffs for top3 regressions:

  1. System.Linq.Parallel.JaggedArray :Allocate

  2. System.Buffers.TlsOverPerCoreLockedStacksArrayPool :Return

  3. System.Collections.Generic.SegmentedList :EnsureCapacity

the first one looks a bit suspicious (see code), others are the optimized covariance check case. JaggedArray + Vector as T -- doesn't sound like a real world application, so I guess it's pmi.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 25, 2020

In Allocate the explicit IR for the array address triggers loop cloning, so you get two copies of the loop and so a larger size impact. This is a known "feature" of the loop cloner; it is very touchy. On the upside you've now likely eliminated a bounds check from the hot path.

What does the diff in System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:thislook like?

@EgorBo
Copy link
Member Author

What does the diff in System.Text.UTF32Encoding:get_Preamble():System.ReadOnlySpan`1[Byte]:thislook like?

Diff: https://www.diffchecker.com/cMI34jXb
Code:

public override ReadOnlySpan<byte> Preamble =>
GetType() != typeof(UTF32Encoding) ? new ReadOnlySpan<byte>(GetPreamble()) : // in case a derived UTF32Encoding overrode GetPreamble
!_emitUTF32ByteOrderMark ? default :
_bigEndian ? (ReadOnlySpan<byte>)new byte[4] { 0x00, 0x00, 0xFE, 0xFF } : // uses C# compiler's optimization for static byte[] data
(ReadOnlySpan<byte>)new byte[4] { 0xFF, 0xFE, 0x00, 0x00 };

Eliminated that check.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thanks!

@EgorBo
Copy link
Member Author

EgorBo commented Feb 26, 2020

With this PR, the following snippet:

static Span<string> ToSpan(string[] a)
{
    return a;
}

Asm diff: https://www.diffchecker.com/KtZmiPvY (new codegen is on the right)
cc @GrabYourPitchforks

@AndyAyersMS
Copy link
Member

cc @dotnet/jit-contrib

@BruceForstall BruceForstall changed the title RuyJIT: Optimize obj.GetType() != typeof(X) for sealed classes RyuJIT: Optimize obj.GetType() != typeof(X) for sealed classes Feb 26, 2020
@AndyAyersMS AndyAyersMS merged commit 366a4d6 into dotnet:master Feb 27, 2020
Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants