Copyright 2004 by M. Uli Kusterer Tue, 30 Dec 1969 07:58:58 GMT Comments on article blog-generating-machine-code-at-runtime at Zathras.de http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm blog-generating-machine-code-at-runtime Comments witness_dot_of_dot_teachtext_at_gmx_dot_net (M. Uli Kusterer) witness_dot_of_dot_teachtext_at_gmx_dot_net (M. Uli Kusterer) en-us Comment 14 by Randy Hollines http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment14 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment14 Executing code for amd64/x64 architectures... thanks to Aaron Kaluszka at Cal Berkeley

// execute buffer for amd64/x64

#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

using namespace std;

typedef long (*jit_fun_ptr)(long v);

int main() {
char buffer[] = {
0x40, 0x81, 0xc7, 0x0a, 0x00, 0x00, 0x00, // param + const
0x40, 0x89, 0xf8, // set return register
0xc3 // return
};
int total_size = sizeof(buffer);

jit_fun_ptr jit_fun = (jit_fun_ptr)valloc(total_size);
memcpy((char*)jit_fun, (char*)buffer, total_size);
mprotect((void*)jit_fun, total_size, PROT_EXEC);

cout << jit_fun(12L) << endl;

return 0;
}
Comment 13 by MegaByte http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment13 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment13 MegaByte writes:
So it turns out that my problems were indeed due to DEP. I found two solutions: either turn off the NX bit for the entire executable with "-z execstack" linker option, or use mprotect in conjunction with valloc to directly turn off the NX bit for the generated code array in the code itself.
Comment 12 by MegaByte http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment12 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment12 MegaByte writes:
I've tried to compile and run the examples given but whenever I hit the function pointer call, the program segfaults. I'm attempting this on gcc 4.3.2 (Fedora 10 on a Pentium M). Does anybody know what might be wrong? Perhaps more robust data execution prevention? Are there any compiler flags that I should be aware of?
Comment 11 by Objeck http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment11 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment11 Objeck writes:
Hey folks, I'm in the process of implementing a JIT compiler for a Java/C# type language. If you're interested in how it's being implemented check out my Wiki page (http://code.google.com/p/objeck/wiki/ObjeckJit).

Cheers!
Comment 10 by Randy Hollines http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment10 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment10 Randy Hollines writes:
Here's another example... I download the IA-32 programmer guides to obtain the opcode and operand values. There are some useful table in the guide i.e. table b-13 (opcode values) and table 2-2 (operand values). Cheers!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char** argv) {
typedef int (*fun_ptr)(int*);

// array of 32-bit values
int values[] = {14, 16};
// inline machine code
char fun_bytes[] = {
/* setup stack frame */
0xff, 0xf5, // push %ebp
0x89, 0xe5, // mov %ebp, %esp
/* add/sub values */
0x8b, 0x4d, 0x08, // movl %ecx, 8(%ebp) - p[] -> c
0x8b, 0x41, 0x00, // movl %eax, (%ecx) - p[0] -> a
// 0x03, 0x41, 0x04, // addl %eax, 4(%ecx) - a + p[1] -> a (add)
0x2b, 0x41, 0x04, // addl %eax, 4(%ecx) - a - p[1] -> a (sub)
/* tear down stack frame and return */
0x8f, 0xc5, // pop %ebp
0xc3 // rtn
};
const int size = sizeof(fun_bytes);
// set up line function
fun_ptr rt_fun = (fun_ptr)malloc(size);
memcpy(rt_fun, fun_bytes, size);
// execute function
int result = (*rt_fun)(values);
printf("?: %x\n", values);
printf("stream length: %d, value: %d\n", size, result);
}
Comment 9 by David Chisnall http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment9 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment9 I forgot to mention, if you're interested in run-time code generation from a practical standpoint, rather than as an academic exercise, you might want to look at GNU Lightning:

http://www.gnu.org/software/lightning/manual/lightning.html

It's quite easy to use, and lets you generate native (although not very well optimised) code at runtime for a number of architectures with a single code path. If you're on OS X, you could use it to create code for PowerPC and x86 without needing different code for each.
Comment 8 by David Chisnall http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment8 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment8 The ISA reference bundled with Shark is a service. It's really great if you're having to wade through a lot of assembly, since you can just highlight the instruction, select the architecture from the services menu, and jump directly to the definition.

Uli, as I recall, returning values in EAX is the MS-DOS calling convention. Most UNIXes pass and return values via the stack. Mach-O uses some hybrid with some very complicated rules about when things go on the stack and when they go in registers. If you're writing a JIT, then for the most part you can make up your own calling convention, but your entry or exit points might need to conform to the platform's calling convention. If you want to do this portably, you can write a little inline assembly shim that will jump into it. For your example, it would be something like this:

static inline int asm_func_shim(FuncPtr asmFunc)
{
int ret;
__asm("CALL %2" : "=a" (ret) : "r" (asmFunc));
return ret;
}

The syntax might be slightly wrong here (I haven't tested it), but doing this would isolate you from any concerns about the target ABI. If you ran this on OS X, then the compiler would optimise the shim away completely. If you ran it somewhere where values were returned on the stack, then the shim would get the return value from the register. The one thing to watch out for is that different ABIs have different rules about which registers can be clobbered by functions. You might want to add the ones you use to the clobbered list in the inline assembly fragment, just to be sure, since that will make it the compiler's problem.
Comment 7 by Uli Kusterer http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment7 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment7 Uli Kusterer writes:
For anyone who wants to do more than a just-in-time compiler, here's a link to the docs for the MachO file format:

http://developer.apple.com/documentation/DeveloperTools/Conceptual/MachOTopics/index.html

That's what one needs to generate to create a Mac executable.
Comment 6 by Uli Kusterer http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment6 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment6 Uli Kusterer writes:
@Blake: Well, ugly is in the eye of the beholder. It's already machine code, after all. Sure, the instructions won't nicely line up in a hex editor and if you want to patch code at runtime like Wolf Rentzsch's mach_star stuff does, you're in for a shock, but in the end, it's all just bytes that need to be output.

I would call Intel assembly average, and PPC assembly sounds positively gorgeous to me :-)
Comment 5 by Blake C. http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment5 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment5 Blake C. writes:
Tyler- there are no fun bit fields within the primary opcode byte, unlike PPC. They're almost arbitrary. There are some nice bitfields in the ModR/M and SIB bytes, when they exist. The highest 2 bits in the ModR/M byte specify one of the 4 addressing modes, 3 of which can include an SIB byte(when the source register is ESP). The Intel manual does a good job of explaining everything, but at the end of the day, x86 machine code is even uglier than x86 assembly.
Comment 4 by Peter Hosey http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment4 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment4 Peter Hosey writes:
Even better, the IA-32 ISA reference that comes with OS X lists the opcode for every instruction. You can get there from Shark, and the underlying PDF files are at /Library/Application Support/Shark/Helpers/XYZ Help.app/Contents/Resources/XYZISA.pdf (for XYZ = {PowerPC,IA32,EM64T}).
Comment 3 by Tyler Vano http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment3 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment3 Tyler Vano writes:
Check this out:

http://groups.google.com/group/alt.2600/attach/ec1d317a2d3e778b/x86_intro?part=2&hl=en
Comment 2 by Uli Kusterer http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment2 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment2 Uli Kusterer writes:
@Tyler: I would love to know how the opcodes are created myself. Intel's description on this is kinda foggy, involving three different bytes that seem to get combined sometimes and sometimes not... I'm still investigating that part.
Comment 1 by Tyler Vano http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment1 http://www.zathras.de/angelweb/blog-generating-machine-code-at-runtime.htm#comment1 Tyler Vano writes:
Fascinating! Please continue this series, as I'm extremely interested in how this all works, as, I'm sure, are many other people. I'm also interested to hear an analysis on the binary format of the opcodes themselves. Knowing that 0x55 means pushl %epb isn't nearly as interesting as knowing *why* 0x55 means pushl %ebp. ;)