Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate duplicate zero initializations more aggressively. #31960

Merged
merged 1 commit into from Feb 11, 2020

Conversation

erozenfeld
Copy link
Member

We have 3 places in the code where we may need to insert explicit zero
initialization. In some cases the initialization may become redundant with prolog
initialization so we have logic to avoid explicit initialization in some cases.
This change makes the logic less conservative.

  1. If the variable is not being initialized in a loop, we can avoid explicit zero initialization if

    • the variable is a gc pointer or a struct with gc pointer fields, or
    • compInitMem is set and the variable has a long lifetime.

    Before this change we inserted explicit zero initalization in one more case: when compInitMem
    wasn't set and the variable was a gc pointer or a struct with gc fields. That is not necessary,
    so this change fixes that in fgVarNeedsExplicitZeroInit.

  2. When we were allocating structs via newobj, we were always inserting explicit zero initializations
    when importing an inlinee. This change applies the same optimization if the basic block can't be
    in a loop after inlining.

  3. When we were initializing inlinee locals, we were only optimizing explicit zero initializations
    for structs; however, the same logic applies to non-structs.

@erozenfeld
Copy link
Member Author

erozenfeld commented Feb 8, 2020

Framework diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -98251 (-0.21% of base)
    diff is an improvement.
Top file regressions (bytes):
           2 : System.ServiceProcess.ServiceController.dasm (0.01% of base)
Top file improvements (bytes):
      -22083 : System.Collections.Immutable.dasm (-1.99% of base)
       -9113 : Microsoft.CodeAnalysis.dasm (-0.52% of base)
       -7269 : System.Net.Http.dasm (-1.10% of base)
       -7097 : System.Private.Xml.dasm (-0.20% of base)
       -6953 : System.Collections.dasm (-1.24% of base)
       -5956 : System.Private.CoreLib.dasm (-0.13% of base)
       -5799 : Microsoft.CodeAnalysis.CSharp.dasm (-0.13% of base)
       -4808 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.09% of base)
       -3493 : System.Data.Common.dasm (-0.23% of base)
       -3003 : System.Reflection.Metadata.dasm (-0.70% of base)
       -1898 : Newtonsoft.Json.dasm (-0.22% of base)
       -1257 : System.Data.SqlClient.dasm (-0.14% of base)
       -1131 : System.Security.Cryptography.X509Certificates.dasm (-0.67% of base)
       -1093 : System.Net.HttpListener.dasm (-0.48% of base)
       -1066 : System.Memory.dasm (-0.43% of base)
       -1018 : System.Net.Mail.dasm (-0.48% of base)
       -1004 : System.Linq.dasm (-0.10% of base)
        -927 : System.Net.Security.dasm (-0.57% of base)
        -773 : System.Security.Cryptography.Pkcs.dasm (-0.18% of base)
        -750 : System.Text.Json.dasm (-0.17% of base)
93 total files with Code Size differences (92 improved, 1 regressed), 70 unchanged.
Top method regressions (bytes):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
          14 ( 0.67% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          14 ( 0.67% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          14 ( 0.67% of base) : System.Security.Cryptography.Pkcs.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
           7 ( 1.72% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(ubyte,MutationInput):MutationResult
           7 ( 1.75% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(int,MutationInput):MutationResult
           7 ( 1.73% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(long,MutationInput):MutationResult
           7 ( 2.01% of base) : System.Net.Http.dasm - KnownHeader:.ctor(String,ubyte,HttpHeaderParser,ref,Nullable`1,Nullable`1):this
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           6 ( 1.49% of base) : System.Linq.Expressions.dasm - LambdaCompiler:EmitLambdaBody(CompilerScope,bool,int):this
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(ubyte,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(short,MutationInput):MutationResult
           5 ( 1.19% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(int,MutationInput):MutationResult
           5 ( 1.21% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(double,MutationInput):MutationResult
           5 ( 0.94% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(Vector`1,MutationInput):MutationResult
Top method improvements (bytes):
       -3243 (-15.73% of base) : System.Net.Http.dasm - HeaderDescriptor:.cctor()
       -1466 (-4.25% of base) : System.Data.Common.dasm - SortExpressionBuilder`1:CloneCast():SortExpressionBuilder`1:this (49 methods)
       -1132 (-15.13% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeProjectedTypes()
       -1117 (-6.97% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        -992 (-7.77% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:SymmetricExcept(IEnumerable`1,MutationInput):MutationResult (7 methods)
        -932 (-18.25% of base) : System.Collections.Immutable.dasm - Builder:GetEnumerator():Enumerator:this (35 methods)
        -858 (-2.05% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
        -853 (-14.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (42 methods)
        -684 (-7.56% of base) : Microsoft.CodeAnalysis.dasm - PeWriter:WriteHeaders(Stream,NtHeader,CoffHeader,List`1,byref):this
        -643 (-17.23% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (28 methods)
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
        -560 (-15.44% of base) : Microsoft.CodeAnalysis.dasm - Enumerator:.ctor(ValueSet):this (7 methods)
        -546 (-19.77% of base) : System.Collections.Immutable.dasm - HashBucket:GetEnumerator():Enumerator:this (14 methods)
        -498 (-24.53% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
        -498 (-24.68% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -486 (-17.63% of base) : System.Collections.Immutable.dasm - Node:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (21 methods)
        -483 (-11.28% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (7 methods)
        -483 (-9.40% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.ICollection.CopyTo(Array,int):this (7 methods)
        -468 (-24.55% of base) : System.Collections.dasm - Enumerator:.ctor(SortedDictionary`2):this (14 methods)
Top method regressions (percentages):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
           7 ( 2.01% of base) : System.Net.Http.dasm - KnownHeader:.ctor(String,ubyte,HttpHeaderParser,ref,Nullable`1,Nullable`1):this
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
           7 ( 1.75% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(int,MutationInput):MutationResult
           7 ( 1.73% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(long,MutationInput):MutationResult
           7 ( 1.72% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(ubyte,MutationInput):MutationResult
           6 ( 1.49% of base) : System.Linq.Expressions.dasm - LambdaCompiler:EmitLambdaBody(CompilerScope,bool,int):this
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           5 ( 1.21% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(double,MutationInput):MutationResult
           5 ( 1.19% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(int,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(long,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(ubyte,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(short,MutationInput):MutationResult
           5 ( 1.12% of base) : System.Net.Requests.dasm - FtpControlStream:GetPortV6(String):int:this
Top method improvements (percentages):
         -32 (-28.32% of base) : Microsoft.CodeAnalysis.dasm - ReversedEnumeratorImpl:.ctor(byref):this
         -84 (-26.84% of base) : Microsoft.CodeAnalysis.dasm - EnumeratorImpl:.ctor(byref):this (3 methods)
         -32 (-26.67% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTriviaList:GetEnumerator():Enumerator:this
         -13 (-25.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - NamespaceSymbol:Microsoft.CodeAnalysis.INamespaceSymbol.get_NamespaceKind():int:this
         -13 (-25.49% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - NamespaceSymbol:get_INamespaceSymbol_NamespaceKind():int:this
         -20 (-25.32% of base) : System.Data.Common.dasm - Index:GetEnumerator(int):RBTreeEnumerator:this
        -224 (-25.31% of base) : System.Memory.dasm - ReadOnlySequence`1:GetEnumerator():Enumerator:this (7 methods)
        -498 (-24.68% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -468 (-24.55% of base) : System.Collections.dasm - Enumerator:.ctor(SortedDictionary`2):this (14 methods)
        -498 (-24.53% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-24.30% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTokenList:GetEnumerator():Enumerator:this
        -260 (-23.92% of base) : System.Collections.dasm - SortedDictionary`2:GetEnumerator():Enumerator:this (7 methods)
         -19 (-23.46% of base) : Newtonsoft.Json.dasm - JsonSerializerSettings:set_Context(StreamingContext):this
         -20 (-22.99% of base) : System.Security.Cryptography.Algorithms.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.Cng.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.Pkcs.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.X509Certificates.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -58 (-22.48% of base) : Microsoft.CodeAnalysis.dasm - Reversed:GetEnumerator():Enumerator:this (3 methods)
        -437 (-22.28% of base) : System.Collections.Immutable.dasm - NodeEnumerable:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (7 methods)
        -437 (-22.28% of base) : System.Collections.Immutable.dasm - NodeEnumerable:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (7 methods)
2713 total methods with Code Size differences (2669 improved, 44 regressed), 241635 unchanged.

@erozenfeld
Copy link
Member Author

erozenfeld commented Feb 8, 2020

Sample diff from Microsoft.CodeAnalysis.dll:

; Assembly listing for method ReversedEnumeratorImpl:.ctor(byref):this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] (  3,  3   )     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T01] (  3,  3   )   byref  ->  rdx        
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
-; V03 tmp1         [V03    ] (  3,  6   )  struct (56) [rsp+0x20]   do-not-enreg[XSB] must-init addr-exposed "NewObj constructor temp"
+; V03 tmp1         [V03    ] (  2,  4   )  struct (56) [rsp+0x20]   do-not-enreg[XSB] must-init addr-exposed "NewObj constructor temp"
;
; Lcl frame size = 88

G_M63972_IG01:
       push     rdi
       push     rsi
       sub      rsp, 88
-      vzeroupper 
       mov      rsi, rcx
       lea      rdi, [rsp+20H]
       mov      ecx, 14
       xor      rax, rax
       rep stosd 
       mov      rcx, rsi
       mov      rdi, rcx
-					;; bbWeight=1    PerfScore 30.00
+					;; bbWeight=1    PerfScore 29.00
G_M63972_IG02:
-      xor      ecx, ecx
-      vxorps   xmm0, xmm0
-      vmovdqu  xmmword ptr [rsp+20H], xmm0
-      vmovdqu  xmmword ptr [rsp+30H], xmm0
-      vmovdqu  xmmword ptr [rsp+40H], xmm0
-      mov      qword ptr [rsp+50H], rcx
       lea      rcx, bword ptr [rsp+20H]
       call     Enumerator:.ctor(byref):this
       add      rdi, 8
       lea      rsi, bword ptr [rsp+20H]
       call     CORINFO_HELP_ASSIGN_BYREF
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
       movsq    
       call     CORINFO_HELP_ASSIGN_BYREF
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
-					;; bbWeight=1    PerfScore 13.83
+					;; bbWeight=1    PerfScore 9.25
G_M63972_IG03:
       add      rsp, 88
       pop      rsi
       pop      rdi
       ret      
						;; bbWeight=1    PerfScore 2.25

-; Total bytes of code 113, prolog size 29, PerfScore 57.78, (MethodHash=e622061c) for method ReversedEnumeratorImpl:.ctor(byref):this
+; Total bytes of code 81, prolog size 26, PerfScore 48.60, (MethodHash=e622061c) for method ReversedEnumeratorImpl:.ctor(byref):this

@erozenfeld
Copy link
Member Author

@dotnet/jit-contrib @AndyAyersMS PTAL

@erozenfeld
Copy link
Member Author

Benchmark diffs:

PMI CodeSize Diffs for benchstones and benchmarks game in F:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Release for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -263 (-0.05% of base)
    diff is an improvement.
Top file improvements (bytes):
        -242 : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm (-5.98% of base)
         -21 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (-0.51% of base)
2 total files with Code Size differences (2 improved, 0 regressed), 80 unchanged.
Top method improvements (bytes):
         -88 (-8.14% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_r(int,int,int,int):this
         -88 (-8.26% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_l(int,int,int,int):this
         -43 (-11.35% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:Run(bool):this
         -23 (-4.54% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:extract(int):int:this
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
Top method improvements (percentages):
         -43 (-11.35% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:Run(bool):this
         -88 (-8.26% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_l(int,int,int,int):this
         -88 (-8.14% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_r(int,int,int,int):this
         -23 (-4.54% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:extract(int):int:this
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
5 total methods with Code Size differences (5 improved, 0 regressed), 1888 unchanged.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!


newStmt = gtNewStmt(tree, callILOffset);
fgInsertStmtAfter(block, afterStmt, newStmt);
afterStmt = newStmt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add jitdump here for the case where the local is used by we decide we don't need to zero init?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@@ -23579,10 +23579,14 @@ Statement* Compiler::fgInlinePrependStatements(InlineInfo* inlineInfo)

CORINFO_METHOD_INFO* InlineeMethodInfo = InlineeCompiler->info.compMethodInfo;

unsigned lclCnt = InlineeMethodInfo->locals.numArgs;
unsigned lclCnt = InlineeMethodInfo->locals.numArgs;
bool bbInALoop = (block->bbFlags & BBF_BACKWARD_JUMP) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set BBF_BACKWARD_JUMP very early, just based on lexical IL. So it may encompass non-loop blocks (eg loop with a branch that leads to a return).

We may also optimize away backwards branches (probably rare, but not impossible).

I wonder how often we end up with extra zeroing from BBF_BACKWARD_JUMP being approximate? Any Idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't know. I guess I can try to allow this optimization for BBJ_RETURN basic blocks in root methods since we can execute those blocks only once even in a loop. That won't account for all cases you are asking about but I don't see an easy way to capture those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To estimate how often this happens, you could perhaps set a new BBF in blocks where we add zero inits. Then later when we run some more robust form of loop detection, see how many non-loop blocks have this BBF.

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2020
if (tmpNum != BAD_VAR_NUM)
{
if (!fgVarNeedsExplicitZeroInit(&lvaTable[tmpNum], bbInALoop))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please use lvaGetDesc(varNum) instead of &lvaTable[tmpNum] here and below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

We have 3 places in the code where we may need to insert explicit zero
initialization. In some cases the initialization may become redundant with prolog
initialization so we have logic to avoid explicit initialization in some cases.
This change makes the logic less conservative.

1. If the variable is not being initialized in a loop, we can avoid explicit zero initialization if
      - the variable is a gc pointer, or
      - the variable is a struct with gc pointer fields and either all fields are gc pointer fields
          or the struct is big enough to guarantee block initialization, or
      - compInitMem is set and the variable has a long lifetime or has gc fields.

   Before this change we always inserted explicit zero initalization when compInitMem
   wasn't set and the variable was a gc pointer or a struct with gc fields. This relaxes that
   as described above.

2. When we were allocating structs via newobj, we were always inserting explicit zero initializations
   when importing an inlinee. This change applies the same optimization if the basic block can't be
   in a loop after inlining.

3. When we were initializing inlinee locals, we were only optimizing explicit zero initializations
   for structs; however, the same logic applies to non-structs.

4. If the initialization basic block is in a loop but is BBJ_RETURN, we may avoid zero initialization
   since the block will be executed at most once per method invocation.
@erozenfeld
Copy link
Member Author

It turns out that the initial PR was too aggressive. My assumption that we always fully zero initialize structs with GC fields in the prolog was incorrect. The prolog zero initialization logic lives in CodeGen::genCheckUseBlockInit and CodeGen::genZeroInitFrame. If CodeGen::genCheckUseBlockInit decides that we shouldn't block init locals and compiler->info.compInitMem is false, CodeGen::genZeroInitFrame will only zero initialize gc fields in the prolog:

if ((varDsc->TypeGet() == TYP_STRUCT) && !compiler->info.compInitMem &&
(varDsc->lvExactSize >= TARGET_POINTER_SIZE))
{
// We only initialize the GC variables in the TYP_STRUCT
const unsigned slots = (unsigned)compiler->lvaLclSize(varNum) / REGSIZE_BYTES;
ClassLayout* layout = varDsc->GetLayout();
for (unsigned i = 0; i < slots; i++)
{
if (layout->IsGCPtr(i))
{
GetEmitter()->emitIns_S_R(ins_Store(TYP_I_IMPL), EA_PTRSIZE,
genGetZeroReg(initReg, pInitRegZeroed), varNum, i * REGSIZE_BYTES);
}
}
}

So I modified my changes to skip zero initialization in structs with GC fields when info.compInitMem is false only when all fields are gc fields or when we can guarantee block initialization.

While studying CodeGen::genCheckUseBlockInit I discovered that a bunch of code there is currently unreachable and has been unreachable since at least 2014. So I made it clear by placing it under #if 0. The unreachable code was potentially affecting both correctness and block init heuristics, so this will need to be investigated but can be done outside of this PR, since it's pre-existing.

I also made the change to not block the optimization for BBJ_RETURN basic blocks inside loops.

This leaves us with about half of the original wins, which is still worthwhile.

@erozenfeld
Copy link
Member Author

New framework diff results:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -54112 (-0.11% of base)
    diff is an improvement.
Top file regressions (bytes):
           4 : System.ServiceProcess.ServiceController.dasm (0.01% of base)
           4 : System.Windows.Extensions.dasm (0.01% of base)
Top file improvements (bytes):
      -18143 : System.Collections.Immutable.dasm (-1.63% of base)
       -7219 : Microsoft.CodeAnalysis.dasm (-0.41% of base)
       -4600 : System.Net.Http.dasm (-0.70% of base)
       -3721 : Microsoft.CodeAnalysis.CSharp.dasm (-0.09% of base)
       -3386 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.06% of base)
       -2777 : System.Private.CoreLib.dasm (-0.06% of base)
       -2746 : System.Collections.dasm (-0.49% of base)
        -969 : System.Data.SqlClient.dasm (-0.11% of base)
        -834 : System.Linq.dasm (-0.08% of base)
        -644 : System.Memory.dasm (-0.26% of base)
        -589 : System.Threading.Channels.dasm (-0.33% of base)
        -493 : System.Text.Json.dasm (-0.11% of base)
        -482 : System.Net.Security.dasm (-0.30% of base)
        -452 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.01% of base)
        -419 : System.Private.Xml.dasm (-0.01% of base)
        -410 : System.Reflection.Metadata.dasm (-0.10% of base)
        -398 : System.Net.HttpListener.dasm (-0.17% of base)
        -398 : System.Net.Mail.dasm (-0.19% of base)
        -384 : System.Threading.Tasks.Dataflow.dasm (-0.05% of base)
        -379 : System.Linq.Parallel.dasm (-0.02% of base)
73 total files with Code Size differences (71 improved, 2 regressed), 90 unchanged.
Top method regressions (bytes):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
           3 ( 0.02% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
Top method improvements (bytes):
       -3243 (-15.73% of base) : System.Net.Http.dasm - HeaderDescriptor:.cctor()
       -1117 (-6.97% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        -932 (-18.25% of base) : System.Collections.Immutable.dasm - Builder:GetEnumerator():Enumerator:this (35 methods)
        -858 (-2.05% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
        -853 (-14.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (42 methods)
        -729 (-5.71% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:SymmetricExcept(IEnumerable`1,MutationInput):MutationResult (7 methods)
        -684 (-7.56% of base) : Microsoft.CodeAnalysis.dasm - PeWriter:WriteHeaders(Stream,NtHeader,CoffHeader,List`1,byref):this
        -643 (-17.23% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (28 methods)
        -639 (-50.00% of base) : System.Private.CoreLib.dasm - StreamReader:DetectEncoding():this (2 methods)
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
        -546 (-19.77% of base) : System.Collections.Immutable.dasm - HashBucket:GetEnumerator():Enumerator:this (14 methods)
        -486 (-17.63% of base) : System.Collections.Immutable.dasm - Node:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (21 methods)
        -448 (-12.36% of base) : Microsoft.CodeAnalysis.dasm - Enumerator:.ctor(ValueSet):this (7 methods)
        -420 (-4.47% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(ReadOnlySequence`1):this (7 methods)
        -402 (-19.80% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
        -402 (-19.92% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -363 (-8.48% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (7 methods)
        -363 (-7.07% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.ICollection.CopyTo(Array,int):this (7 methods)
        -355 (-9.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (14 methods)
Top method regressions (percentages):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           3 ( 0.02% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
Top method improvements (percentages):
        -639 (-50.00% of base) : System.Private.CoreLib.dasm - StreamReader:DetectEncoding():this (2 methods)
         -32 (-28.32% of base) : Microsoft.CodeAnalysis.dasm - ReversedEnumeratorImpl:.ctor(byref):this
         -84 (-26.84% of base) : Microsoft.CodeAnalysis.dasm - EnumeratorImpl:.ctor(byref):this (3 methods)
         -32 (-26.67% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTriviaList:GetEnumerator():Enumerator:this
        -224 (-25.31% of base) : System.Memory.dasm - ReadOnlySequence`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-24.30% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTokenList:GetEnumerator():Enumerator:this
         -58 (-22.48% of base) : Microsoft.CodeAnalysis.dasm - Reversed:GetEnumerator():Enumerator:this (3 methods)
        -156 (-22.07% of base) : System.Collections.Immutable.dasm - ImmutableSortedDictionary`2:GetEnumerator():Enumerator:this (7 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
         -23 (-20.91% of base) : System.ComponentModel.Composition.dasm - CompositionResult:ToResult(Vector`1):CompositionResult`1:this
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -150 (-20.52% of base) : System.Collections.Immutable.dasm - ImmutableSortedSet`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateType(ITypeReference,String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateNamespace(INamespace,IAssemblyReference,String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateExternAlias(String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateXmlNamespace(String,String):UsedNamespaceOrType
        -212 (-20.23% of base) : System.Collections.Immutable.dasm - ImmutableList`1:GetEnumerator():Enumerator:this (7 methods)
        -402 (-19.92% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
         -74 (-19.84% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinaryOperatorAnalysisResult:Worse():BinaryOperatorAnalysisResult:this
        -402 (-19.80% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
1230 total methods with Code Size differences (1218 improved, 12 regressed), 243118 unchanged.

@erozenfeld
Copy link
Member Author

Benchmark diffs:

PMI CodeSize Diffs for benchstones and benchmarks game in F:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Release for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -21 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
         -21 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (-0.51% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 81 unchanged.
Top method improvements (bytes):
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
Top method improvements (percentages):
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
1 total methods with Code Size differences (1 improved, 0 regressed), 1892 unchanged.

@erozenfeld
Copy link
Member Author

@AndyAyersMS @sandreenko PTAL

@@ -4656,6 +4656,9 @@ void CodeGen::genCheckUseBlockInit()
continue;
}

// TODO-Review: The code below is currently unreachable. We are guaranteed to execute one of the
// 'continue' statements above.
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to hold up getting this merged - looks great BTW - but why didn't you just delete this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to go back and see if any of this needs to be resurrected. It looks like part of the unreachable code was supposed to be responsible for correctness. I'll clean this up in a subsequent PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps open an issue for follow up on this, so we don't forget?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to the issue that tracks improving the heuristic: #8890 (comment)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

Seems like we should probably block init more aggressively in the prolog (as well as init more efficiently).

@erozenfeld
Copy link
Member Author

Yes, we have #8890 open [RyuJIT] Improve heuristic for zero-initialization of locals

@erozenfeld erozenfeld merged commit f9dfe45 into dotnet:master Feb 11, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants