|
CodeLifecycle
The lifecycle of various kinds of code, including compiled Python functions.
Creating code
All of this happens in a single thread, which lets us assume that the Module doesn't change out from under us in any particular phase. (This will eventually be a bottleneck to threaded compilation, but we'll deal with that when we get there.) The output IR refers to Module globals, including constants, CPython-controlled variables, and eventually inlinable Functions. Changing codeOnce an IR Function is translated from bytecode, its semantics never change. It may be optimized further, but there's currently no way to re-JIT a Function to machine code. Destroying codeLLVM assumes certain things about when it can destroy or change Values, especially Functions, and other things about when users can destroy Values, and Unladen Swallow needs to be compatible with that. Llvm Functions are referenced by a PyLlvmFunctionObject, which is referenced by a PyCodeObject, which is referenced by one or more PyFunctionObjects. A referenced llvm::Function has external linkage to prevent module-level optimizations from deleting it or changing its signature. While a function is executing, its thread keeps a reference to the PyFunctionObject which keeps the PyLlvmFunctionObject alive. When the PyLlvmFunctionObject loses its last reference, we currently 1) set the llvm::Function to internal and 2) if the use-list is empty, free the JITted code if any and erase the Function. We don't yet, but we want to run various Module-level optimization passes. nlewycky has said that these won't delete external Functions even if they're unreferenced, but he wasn't 100% sure that they don't call ReplaceAllUsesWith (RAUW) on external Functions. The JIT assumes that any Function it has compiled will live until the machine code is freed. If we run global optimizations that delete even internal unused Functions, we could run into last week's crash again. So, until the JIT is guaranteed safe, we can't run any Module-wide optimizations that may delete or RAUW code. Once the JIT is guaranteed safe, we'll need to update any internal Value*s to use ValueHandles instead of direct pointers. Machine codeWe call ExecutionEngine::getPointerToFunction(function) to retrieve machine code for any LLVM IR we want to run. The Execution keeps the function pointer around as a handle to the machine code until we call freeMachineCodeForFunction(function), so it's illegal to delete function until then. Functions can be deleted by ModulePasses in addition to user calls to eraseFromParent(), so for now we have to manually inspect each ModulePass we want to run to make sure it won't delete global functions. Marking all functions external would work, but would cause a memory leak. If we reoptimize a Function, we have to be careful about other frames, either in the current thread or other threads, that may be inside that Function's machine code. There are two ways to emit new machine code for a reoptimized function.
When a function is recompiled, we have to forward calls to the old address to the new address. recompileAndRelinkFunction does that for us by overwriting the beginning of the old machine code with a jump to the new machine code. (This means it leaks that stub on every use.) If we copy functions instead, we have to make all calls through a pointer in memory, and we update that pointer atomically when we recompile its function. I think we just give up on propagating reoptimizations to inlined functions and instead rely on their containing function being reoptimized. Clang-compiled codeWe intend to compile some C code with clang to LLVM IR, and use the cpp backend to emit C++ classes that can load this IR into our main module. Then we'll emit calls to these functions from our generated IR. It's absolutely essential that no optimization change the meaning of these functions or delete them, since there's no way to get back the original. Nick has warned that ModulePasses may, for example, notice that a global is never written to in any code LLVM can see and so optimize loads, but he says this should not happen to external globals. To prevent LLVM's optimizations from messing up clang-compiled code, we'll declare all of it external. We'd like to declare it available_externally, but that still allows LLVM to delete the definition if it's unused, which we don't want. Nick says we can add a use to @llvm.used which should prevent changes. (Does this make the symbol effectively external??) |
Shawn Campbell likes to cheat on his wife