|Issue 1518:||Shorter ia32 deferred code fragments|
|5 people starred this issue and may be notified of changes.||Back to list|
Sign in to add a comment
Deferred code segments could generally be shorter and straight-line in the common case. If the Genenerate() methods are changed to take the EXIT label, the deferred code has fewer constraints. Other than the space saving, I don't see this having much impact on benchmarks. Example: 0xf53b0ffd 285 8179ffa14037f5 cmp [ecx+0xff],0xf53740a1 ;; object: 0xf53740a1 <Map> 0xf53b1004 292 0f85483077ff jnz 0xf4b24052 ;; deoptimization bailout 5 0xf53b100a 298 f20f104103 movsd xmm0,[ecx+0x3] 0xf53b100f 303 f20f2cc8 cvttsd2si ecx,xmm0 0xf53b1013 307 f20f2ac9 cvtsi2sd xmm1,ecx 0xf53b1017 311 660f2ec1 ucomisd xmm0,xmm1 0xf53b101b 315 0f85313077ff jnz 0xf4b24052 ;; deoptimization bailout 5 0xf53b1021 321 0f8a2b3077ff jpe 0xf4b24052 ;; deoptimization bailout 5 0xf53b1027 327 85c9 test ecx,ecx 0xf53b1029 329 0f850d000000 jnz 348 (0xf53b103c) 0xf53b102f 335 660f50c8 movmskpd ecx,xmm0 0xf53b1033 339 83e101 and ecx,0x1 0xf53b1036 342 0f85163077ff jnz 0xf4b24052 ;; deoptimization bailout 5 0xf53b103c 348 e9e0feffff jmp 65 (0xf53b0f21) (68 bytes + 4 relocation records) Proposed layout: ENTRY: cmp [r-1],<Map> jnz short bail movsd cvttsd2si cvtsi2sd ucomisd jnz short bail jpe short bail test r,r jnz EXIT movmskpd and r,1 jz EXIT bail: jmp deoptimization_bailout_N (56 bytes + 1 relocation record) The above code branches forward in the unexpected case and backwards (to the main code) in the expected case, consistent with: [Intel® 64 and IA-32 Architectures Optimization Reference Manual Order Number: 248966-024 April 2011] Assembly/Compiler Coding Rule 3. (M impact, H generality) Arrange code to be consistent with the static branch prediction algorithm: make the fall-through code following a conditional branch be the likely target for a branch with a forward target, and make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target. (However, I have read that recent microarchitectures tend to always use dynamic prediction rather than this algorithm.)
Jul 5, 2011
Saving space is good. Since this is in deferred code and the deopt case is almost never executed, there shouldn't be any performance degradation.
|► Sign in to add a comment|