Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH (TOT suite/tests/common/decode) linux 32-bit test_modrm16 #105

Closed
derekbruening opened this issue Nov 27, 2014 · 2 comments
Closed

Comments

@derekbruening
Copy link
Contributor

From derek.br...@gmail.com on April 02, 2009 11:26:52

this was PR 305335: linux CRASH suite/tests/common/decode 32-bit test_modrm16

we never had time to track it down before but now we think we know what's
going on

pasting emails here that explain the bug and repercussions/solutions:


From: Derek Bruening

OK, let me know if I have this right: we end up executing from some data
region used by libc's sscanf, which we thus make read-only for code cache
consistency. But since we ourselves use libc's sscanf we end up tripping
our own write watchpoint and can't make forward progress.

Wow, that's a cool bug. Nice job tracking it down.

Some thoughts:

  • This is another reason to add to the list for not using user libraries:
    an interesting twist on the existing transparency dogma.
  • All code DR uses should really be grouped and protected in the same
    manner. E.g., if we had our own copy of sscanf inside our own library,
    we would disallow the app from executing from it (we'd pretend it hit a
    fault) and would not have this bug. The problem is that we're running
    code from libc but we're not treating it as part of DR.
  • sscanf in particular is problematic for portability: xref issue build: define release package build env; set up nightly regression #36 and
    "__isoc99_sscanf@@GLIBC_2.7". If we can easily stop using sscanf it
    would solve multiple problems.
  • xref issue libc independence on Linux #46 on eventually not using any of libc. Most of the
    heavyweight routines we use are only during init or exit. I was assuming
    that the string routines, which are used in fragile locations, are very
    clean and don't go writing to global data. That seems to not be the case
    for sscanf.
  • xref PR 207635/3157 (not filed on Google yet): linux SIGSEGV should
    consider libc.so part of DR when assigning blame: related issue
  • %gs is used for pthreads TLS. might be interesting to fully understand
    what this _IO_vfscanf code (if it's really in that routine at +1700) is
    doing and what that memory region is but might not be worth the time.
  • I never figured out how to get gdb to display the base address of
    segments (I mention this in Debugging.wiki). This seems like a major
    missing feature so I hope there is a way and I've somehow overlooked it.
  • It does seem like we should perhaps have a separate test that does this
    pathological transfer to a data section that we use, and make the decode
    test a little better behaved to focus on decoding: which is what you were
    suggesting earlier.

Note that one way to avoid using sscanf at runtime is to rely on the
all_memory_areas list, which was supposed to replace the /proc/self/maps
reading but has had some bugs/issues in the past where it gets out of sync
with the maps file. Xref issue #91 (need to watch SYS_brk), PR 213256
(kernel merging regions => mismatch), and PR 246897 (where we switched back
to using maps file on queries).

  • Derek

On Thu, Apr 02, 2009 at 12:38:20AM -0400, Qin Zhao wrote:

in test case code, there is a list of code
addr16 mov %gs:(%di),%esp
ret

%gs: 0x33 (51)
%edi: 0x8040008

after the move instruction, esp value is updated to 0xb7c356b0.
and the target pc stored in (esp) is also 0xb7c356b0.
Area b7c35000-b7c36000 is always the region right before libc

b7c35000-b7c36000 rw-p b7c35000 00:00 0
b7c36000-b7d8e000 r-xp 00000000 08:01 128430
/lib/tls/i686/cmov/libc-2.8.90.so
b7d8e000-b7d90000 r--p 00158000 08:01 128430
/lib/tls/i686/cmov/libc-2.8.90.so
b7d90000-b7d91000 rw-p 0015a000 08:01 128430
/lib/tls/i686/cmov/libc-2.8.90.so

Then when execute ret (actucally is the pop ecx, then search target code),
code fragment start from 0xb7c356b0 will be constructed.
In check_thread_vm_area, region b7c35000-b7c36000 will be added into
executable area. The write permission will be disabled.
b7c35000-b7c36000 r-xp b7c35000 00:00 0

The later in get_memory_info -> get_memory_info_from_os ->
maps_iterator_next -> sscanf -> ...
there is code
0xb7c85844 <_IO_vfscanf+1700>: mov %gs:(%esi),%ecx
0xb7c85847 <_IO_vfscanf+1703>: movl $0x0,%gs:(%esi)

%gs: 0x33 (51)
%esi: 0xffffffdc (-36)

The write code cause the sigsegv again.
And then later in signal handler, every attempt to use maps_iterator_next ->
sscanf will cause the sigsegv, so the DR hangs.

I suspect %gs:(%esi) is pointing to the code in region b7c35000-b7c36000. so
the read is ok but the write cause sigsegv.
Because before the permission changes, the maps_iterator_next never cause
sigsegv,
and maps_iterator_next cause sigsegv right after permission changed.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=105

@derekbruening
Copy link
Contributor Author

From derek.br...@gmail.com on April 04, 2009 09:46:02

I'm going to disable the common.decode* tests for linux for now to get the test suite
running faster. Please re-enable as part of the fix.

@derekbruening
Copy link
Contributor Author

Re-evaluating as part of #1025:

Mangling of far memref w/ 16-bit base or index reg fails: need 2nd scratch reg

CLOSED: [2021-01-21 Thu 16:14]
Seeing a different error:

interp: start_pc = 0xf7b09000
check_thread_vm_area: pc = 0xf7b09000
stack vs 0xf7b09000: official 0xffe9e000..0xffebf000, esp 0xffebd3e4
make_unwritable: pc 0xf7b09000 -> 0xf7b09000-0xf7b0a000
new shared vm area: 0xf7b09000-0xf7b0a000 W--- unexpected vm area
checking thread vmareas against executable_areas
prepend_entry_to_fraglist: putting fragment @0xf7b09000 (shared) on vmarea 0xf7b09000-0xf7b0a000
check_thread_vm_area: check_stop = 0xf7b0a000
  0xf7b09000  65 67 8b 00          addr16 mov    %gs:(%bx,%si)[4byte] -> %eax
  0xf7b09004  c3                   ret    %esp (%esp)[4byte] -> %esp
mbr exit target = 0x46256480
end_pc = 0xf7b09005

exit_branch_type=0x6 bb->exit_target=0x46256480
bb ilist before mangling:
TAG  0xf7b09000
 +0    L3 @0x462b0d10  65 67 8b 00          addr16 mov    %gs:(%bx,%si)[4byte] -> %eax
 +4    L2              c3                   ret    %esp (%esp)[4byte] -> %esp
 +5    L4 @0x462b1050  e9 9f 88 fa ff       jmp    $0x46256480 <shared_bb_ibl_ret>
END 0xf7b09000

reference with fs/gs segment: addr16 mov    %gs:(%bx,%si)[4byte] -> %eax
re-wrote app tls reference: addr16 mov    (%eax,%si)[4byte] -> %eax
bb ilist after mangling:
TAG  0xf7b09000
 +0    m4 @0x462b0e2c  64 a1 48 00 00 00    mov    %fs:0x48[4byte] -> %eax
ERROR: Could not find encoding for: lea    (%bx,%eax) -> %eax
SYSLOG_ERROR: Application /home/bruening/dr/git/build_x86_dbg_tests/suite/tests/bin/common.decode (4030634) DynamoRIO usage error : instr_encode error: no encoding found (see log)

It has to spill a 2nd scratch b/c you can't combine 16-bit and 32-bit.
With that in place and massaging the prefix flags to make it print nicely
we have:

reference with fs/gs segment: addr16 mov    %gs:(%bx,%si)[4byte] -> %eax
re-wrote app tls reference: mov    (%eax,%ecx)[4byte] -> %eax
bb ilist after mangling:
TAG  0xf7b63000
 +0    m4 @0x43901e2c  64 a1 48 00 00 00    mov    %fs:0x48[4byte] -> %eax
 +6    m4 @0x43901e70  64 89 0d 08 00 00 00 mov    %ecx -> %fs:0x08[4byte]
 +13   m4 @0x43902340  67 8d 08             addr16 lea    (%bx,%si) -> %ecx
 +16   L4 @0x43901d10  8b 04 08             mov    (%eax,%ecx)[4byte] -> %eax
 +19   m4 @0x43901a70  64 8b 0d 08 00 00 00 mov    %fs:0x08[4byte] -> %ecx
 +26   m4 @0x43902128  64 89 0d 08 00 00 00 mov    %ecx -> %fs:0x08[4byte]
 +33   m4 @0x43901b68  59                   pop    %esp (%esp)[4byte] -> %ecx %esp
 +34   L4 @0x43902050  e9 9f 88 fa ff       jmp    $0x438a7480 <shared_bb_ibl_ret>
END 0xf7b63000

Mangling of far memref storing into xsp fails: can't pick xsp as index reg for part of mangling

CLOSED: [2021-01-21 Thu 16:15]
We hit another bug after that:

bb ilist before mangling:
TAG  0xf7b4610a
 +0    L3 @0x43a65ee0  65 67 8b 26 03 00    addr16 mov    %gs:0x03[4byte] -> %esp
 +6    L2              c3                   ret    %esp (%esp)[4byte] -> %esp
 +7    L4 @0x43a65e28  e9 9f 88 fa ff       jmp    $0x43964480 <shared_bb_ibl_ret>
END 0xf7b4610a

reference with fs/gs segment: addr16 mov    %gs:0x03[4byte] -> %esp
re-wrote app tls reference: addr16 mov    0x03(,%esp)[4byte] -> %esp
bb ilist after mangling:
TAG  0xf7b4610a
 +0    m4 @0x43a65dcc  64 8b 25 48 00 00 00 mov    %fs:0x48[4byte] -> %esp
SYSLOG_ERROR: Application /home/bruening/dr/git/build_x86_dbg_tests/suite/tests/bin/common.decode (4133677) DynamoRIO usage error : encode error: xsp cannot be an inde
x register

That one we can fix by ruling out the dead reg if it's xsp.

Enable test_data16_mbr on Linux 32-bit => immed signedness issue

CLOSED: [2021-01-21 Thu 16:15]
After that, we then have this:

#    ifdef WINDOWS /* FIXME i#105: crashing on Linux so disabling for now */
    /* PR 242815: data16 mbr */
    print("Testing data16 mbr\n");
    test_data16_mbr();
#    endif

First problem on enabling is:

exit_branch_type=0xa bb->exit_target=0x491a0640
bb ilist before mangling:
TAG  0xf355c024
 +0    L1              b9 ef be ad de       mov    $0xdeadbeef -> %ecx
 +5    L3 @0x491fb390  66 ff d1             data16 call   %cx %esp -> %esp 0xfffffffe(%esp)[2byte]
 +8    L4 @0x491fa830  e9 5f 8a fa ff       jmp    $0x491a0640 <shared_bb_ibl_indcall>
END 0xf355c024

bb ilist after mangling:
TAG  0xf355c024
 +0    L1              b9 ef be ad de       mov    $0xdeadbeef -> %ecx
 +5    m4 @0x491fa6a0  64 89 0d 08 00 00 00 mov    %ecx -> %fs:0x08[4byte]
 +12   L4 @0x491fb390  0f b7 c9             data16 movzx  %cx -> %ecx
 +15   m4 @0x491fa6f0  8d 64 24 fe          lea    0xfffffffe(%esp) -> %esp
ERROR: Could not find encoding for: mov    $0xc02c -> 0x02(%esp)[2byte]

Yet it does exist:

$ disasm 66 c7 44 24 02 2c c0
Disassembling 0x66 0xc7 0x44 0x24 0x02 0x2c 0xc0
llvm-mc:   0x66 0xc7 0x44 0x24 0x02 0x2c 0xc0  movw $49196, 2(%rsp) # imm = 0xC02C
capstone:  66c74424022cc0 mov word ptr [rsp + 2], 0xc02c
bfd:       66 c7 44 24 02 2c c0 movw $0xc02c,0x2(%rsp)
DynamoRIO: 66 c7 44 24 02 2c c0 mov word ptr [rsp+0x02], 0xc02c

Huh, on re-run:

bb ilist after mangling:
TAG  0xf3572024
 +0    L1              b9 ef be ad de       mov    $0xdeadbeef -> %ecx
 +5    m4 @0x4524e6a0  64 89 0d 08 00 00 00 mov    %ecx -> %fs:0x08[4byte]
 +12   L4 @0x4524f390  0f b7 c9             data16 movzx  %cx -> %ecx
 +15   m4 @0x4524e6f0  8d 64 24 fe          lea    0xfffffffe(%esp) -> %esp
 +19   m4 @0x45251bb4  66 c7 44 24 02 2c 20 mov    $0x202c -> 0x02(%esp)[2byte]
 +26   L4 @0x4524e830  e9 5f 8a fa ff       jmp    $0x451f4640 <shared_bb_ibl_indcall>
END 0xf3572024

Non-det: must be the top bit of the immediate? Casting to short solves it.

Enable test_data16_mbr on Linux 64-bit

CLOSED: [2021-01-21 Thu 16:49]

Infinite loop on test_data16_mbr_8.
Somehow the longjmp comes back to the wrong place: it comes back right
after the data16 call to the crashing instr 00 00!
Yet w/o the call it works fine: so the call is messing up the setjmp buffer
or sthg?

The code has a my_setjmp wrapper: that seems problematic, as there's an
exposed retaddr now after the stack is unrolled.
If I directly call __sigsetjmp: that fixes it.

derekbruening added a commit that referenced this issue Jan 21, 2021
Fixes several issues as part of enabling the common.decode (and
common.decode-stress) tests for 32-bit Linux:

+ Fix addr16 far ref mangling bugs:
  - We need a 2nd scratch reg for 16-bit addressing registers, with two
    steps to add to the segment base.
  - Xsp cannot be an index reg.
+ Enable test_data16_mbr on 32-bit Linux
  - Fix signedness problem in data16-call mangling code.
  - Update test and template.
+ Enable test_data16_mbr on 64-bit Linux
  - Eliminate the my_setjmp() layer, which adds an exposed retaddr on
    the longjmp return path.

Fixes #105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant