
15 Most Recent [RSS]
More...
|
Intel assembler on Mac OS X
I've always wanted to learn another assembler, and with one of my colleagues being a real assembler guru, and the Intel reference books on my bookshelf, and the Intel switch just behind us, I thought this would be a good opportunity to finally get going with x86 assembler.
Now, assembler programming under Mac OS X isn't quite as well documented as one would wish. There's no tutorial that I could find (lots of tutorials for Linux and Windows, but none for Mac OS X yet). This won't be one either, but rather this is a blog posting of me sharing what I found out about assembler on OS X, and is probably only useful to someone who already knows some assembler, but just doesn't know Intel on Mac OS X. My main approach is to compile C source code into assembler source files using GCC. Then I can look at that code and find out what assembler instructions correspond to what C command. If all of this turns out to be correct and I should happen to have loads of time on my hand, I may still go out there and turn this into a decent tutorial.
The basics are pretty simple
.text # start of code indicator.
.globl _main # make the main function visible to the outside.
_main: # actually label this spot as the start of our main function.
pushl %ebp # save the base pointer to the stack.
movl %esp, %ebp # put the previous stack pointer into the base pointer.
subl $8, %esp # Balance the stack onto a 16-byte boundary.
movl $0, %eax # Stuff 0 into EAX, which is where result values go.
leave # leave cleans up base and stack pointers again.
ret # returns to whoever called us.
Now, the underscore in front of "main" is a convention in C, so just accept it. When you enter the _main function, the return address (i.e. the instruction where the program will continue after the function has finished, aka "back pointer") has already been pushed on the stack, taking up 4 bytes. We also save the base pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer (which is where our parameters are). That takes another 4 bytes, so we have 8 bytes now. Since the stack should be aligned on 16 bytes before you can make a call to another function, we subtract another 8 from the stack pointer, which pads out the stack (we could also just do two "pushl $0" for the same effect). If we used any local variables, we would use this opportunity to subtract more for them.
Now comes the actual body of our function. What we do is simply return 0. This is done by stuffing 0 in the eax register.
Finally, we have the tail end of our function, which calls leave (which cleans up by restoring our caller's base pointer and stack pointer) and then call ret, which pops the return address off the stack and continues execution there.
Calling a local function
Calling a function is fairly simple, as long as it's a local one right in the same file as ours. In that case, what you do is you first declare that function:
.text
.globl _doSomething # Our doSomething function.
_doSomething:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
nop # does nothing.
leave
ret
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp # 8 to align, 16 for our 4-byte parameter and padding.
movl $3, (%esp) # write our parameter at the end of the stack (i.e. padding goes first).
call _doSomething # call doSomething.
movl $0, %eax
leave
ret
"nop" is a do-nothing instruction I just inserted here to show where doSomething's code would go. That's pretty easy. You just write the function, push the parameters on the stack and use call to jump to the function, and that will take care of pushing the return address and all that. The only tricky thing is passing the parameters. You have to pad first, and then push (or mov, in our case) the parameters in reverse order (i.e. #1 is at the bottom of the stack, #2 above it etc.). That's because otherwise the function being called would have to skip the padding. Well, could be worse.
Accessing parameters
To acess any parameters, you address relative to the base pointer. The value immediately at the base pointer is generally your caller's base pointer and the return address, so you need to add 4 + 4 = 8 bytes. Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the stack pointer to find something on the stack. The same applies to our base pointer, of course:
movl 12(%ebp), %eax # get parameter 2 at offset 4 + 4 + 4
addl 8(%ebp), %eax # get parameter 1 at offset 4 + 4
Would store your second parameter in eax and then add the first parameter to it, leaving the result in eax, where it's ready for use as a return value. Note the ##(foo) syntax, which adds the number ## to the pointer foo. This is register-relative addressing.
An added benefit of this is that you can actually pass more parameters to a function than it knows to handle, and it will just ignore the rest.
Fetching data
To access data (e.g. strings), it gets trickier. You declare data like the following:
.cstring
myHelloWorld:
.ascii "Hello World!\0"
.text
.globl _main
_main:
. . .
So, you add a .cstring section at the top of the function, and in that you declare a label and use the .ascii keyword to actually stash your string there. So far, so good, there's only one problem:
All data manipulation is done using absolute addresses. But we don't know at what position in memory our program will be loaded. Labels aren't absolute addresses, they get compiled into relative offsets from the start of our code. So, how do we find out at which absolute address our string myHelloWorld is? Well, the trick MachO uses is that it knows that our program will be loaded as one huge chunk. So, we know that the distance between any of our instructions in the code will always stay at the same distance to our string.
So, if we could only get the address of one instruction in our code that has a label, we could calculate the absolute address of our string from that. Now, look above, at our function call code. Notice anything? Our return address is an absolute pointer to the next instruction after a function call. So, all we need to do to get our address is call a function. When you assemble C source code, they call this helper function ___i686.get_pc_thunk.bx, which is quite a mouthful. Let's just call it _nextInstructionAddress:
. . .
call _nextInstructionAddress
myAnchorPoint:
. . .
That's what we call somewhere at the start of our code to find our own address. Note how I cleverly already added a label myAnchorPoint, which labels the instruction whose address we'll get. Then we somewhere (e.g. at the bottom) define that function:
. . .
_nextInstructionAddress:
movl (%esp), %ebx
ret
We don't even bother aligning the stack or changing and restoring the base pointer. This simply peeks at the last item on the stack (the return address) and stashes that in register ebx. Then it returns (and obviously doesn't call leave because we pushed no base pointer that it could restore).
Once we have this address in ebx, we can do the following to get our string's address into a register, and from there onto the stack:
. . .
leal myHelloWorld-myAnchorPoint(%ebx), %eax
movl %eax, (%esp)
. . .
LEA means "Load Effective Address", i.e. take an address and stash it into a register. myHelloWorld-myAnchorPoint calculates the difference between our two labels, and thus tells us how far myHelloWorld is from myAnchorPoint. Since myHelloWorld is probably at the start of the program, e.g. at address 3 maybe, and myAnchorPoint further down, say at address 20, what we get is a negative value, e.g. -17. And xxx(%ebx) is how you tell the assembler that you want to add an offset to a register to get a memory address. ebx contains the address of myAnchorPoint, so what this does is subtract 17 from myAnchorPoint's absolute address, giving us the absolute address of myHelloWorld! Whooo! And this mess is called "position-independent code".
Now, our call to LEAL loads a "Long" (which is 32 bits, i.e. the size of a pointer on a 32-bit CPU) and stashes it into register eax. And the movl call moves that long from our register into the last item on the stack, ready for use as a parameter to a function.
Calling a system function
Now, it'd be really nice if we could printf() or something, right? Well, trouble is, we don't know the address of printf(). But this time it's actually easy. We add a new section at the bottom of our code:
. . .
.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
_printf_stub:
.indirect_symbol _printf
hlt ; hlt ; hlt ; hlt ; hlt
_getchar_stub:
.indirect_symbol _getchar
hlt ; hlt ; hlt ; hlt ; hlt
This is a new section named __IMPORT,__jump_table. It has the type symbols_stubs and the attributes self_modifying_code and pure_instructions. 5 is the size of the stub, and intentionally is the same as the number of hlt statements below.
This section is special, because when our code is loaded, the linker will look at it. It will see that there is an .indirect_symbol directive for a function named "printf", and will look up that function. Then it will replace the five hlt instructions, each of which is one byte in size, with an instruction to jump to that address (hence the self_modifying_code). We also added a label for each indirect symbol, which we name the same as the symbol, just with "_stub" appended.
So, to call printf, all you have to do now is push the string on the stack and then
call _printf_stub
Which will jump to _printf_stub and immediately continue to printf itself. And just to show you that you can have several such imported symbols, I've also included a stub for getchar. Now note that the system usually doesn't name these symbols "_foo_stub", but rather "L_foo$stub" (yes, a label name can contain dollar signs. You can even put the label in quotes and have spaces in it...). Same difference.
Okay, so that's how much I've guessed my way through it so far. Comments? Corrections?
PS - Thanks to John Kohr, Alexandre Colucci, Jonas Maebe, Eric Albert and Jordan Krushen, all of which helped me figure this out one way or the other. Thanks, guys!
Update: Added mention of how to actually access parameters.
Jan Patera writes: Hi,
Do you know if there is a way to make the XCode disassembly feature to use the MacroAssembler syntax, commonly used on Wintel?
E.g. "mov ebp,esp" instead of "movl %esp, %ebp"
-- Jan
|
Uli Kusterer replies: ★ @Wolf: Thanks, Jon! :-) Coming from you, that means a lot to me.
@Jan: Jordan pointed me at http://www.niksula.hut.fi/~mtiihone/intel2gas/, which is a converter to convert Intel syntax (what a lot of tutorials on the web use) to AT&T syntax (what GCC uses and hence I use in this tutorial). Its features also mention it has preliminary support for the reverse, AT&T to Intel. Maybe that can help you? Let me know whether it works for you.
|
bxd writes: @Jan: Xcode->Preferences->Debugging->Disassembly Style from "AT&T" to "Intel".
If you're in bare gdb for some reason, it's "set disassembly-flavor intel" vs "att".
|
Blake C. writes: "We also save the *back* pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer..."
"Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the *stack* pointer to find something in it."
Should *back* and *stack* above instead say 'base'?
Thanks for the writeup :)
|
Uli Kusterer replies: ★ Blake, you're right, that should have been "base" pointer in the first case. The second case is meant to be more general, but I've clarified it a bit since, after all, the example code actually messes with the base pointer. Thanks for catching those!
|
Marc Haisenko writes: Good writeup ! Very nice ! This is exactly how it works on Linux as well. The "Calling a system function" is new to me and very interesting, have to try that on Linux as well.
There's only one thing left which I want to figure out: how to make a system call. On Linux, you put the arguments in %eax, %ebx, ... (%eax contains the number of the system call). Then you call interrupt 0x80. E.g. to print a string on stdout, you do:
movl $4, %eax # syscall: write
movl $1, %ebx # file descriptor 1 (stdout)
movl $string, %ecx # address of the string
movl $len, %edx # length of the string
int $0x80
On FreeBSD, it seems that instead of passing the arguments via registers they get passed via the user stack and call a different interrupt. I wonder how it's done in Mac OS X.
|
Uli Kusterer replies: ★ Haven't played with that yet. It may be the same as BSD on OS X, or it may be different (because BSD doesn't use Mach)... I'd be interested in hearing what you find out.
|
Pete H writes: To answer Marc's question, on Mac OS you use the same 0x80 interrupt. For example, to write a 42 byte string labeled 'output' to stdout, the code would look something like:
pushl $42 #l ength
lea output, %eax
pushl %eax # string address
pushl $1 # file descriptor number
mov $4,%eax # system call number
push %eax
int $0x80 # make the system call
You can see mach/i386/syscall_sw.h for the different software interrupts available, and sys/syscall.h for the different system call numbers.
|
David Liontooth writes: A bunch of us are trying to compile transcode (http://www.transcoding.org) on OSX, both PPC and MacIntel. We're finding that inline assembly is not sufficiently supported for the code to compile, so we were hoping you could give us a pointer (this is being discussed on the transcode and macports mailing lists). Transcode 1.0.3 compiles on PPC but not MacIntel; transcode 1.1.0 causes problems with the asm code on both. A couple of guys working on Pre Make Kit (pmk, see http://sourceforge.net/tracker/index.php?func=detail&aid=1569713&group_id=94395&atid=607694) really push this and end up frustrated, concluding "Projects that mix assembly and C code must all fail on MacOS X Intel machines." They also say mplayer avoids the problem by using C embedded assembly. Could you comment on this whole situation? What's the most likely successful way out?
Cheers,
Dave
|
Uli Kusterer replies: ★ @David: Honestly, I don't know. I've only just started to learn this stuff. I've heard in various places that assembler isn't even portable between different assembers (i.e. between two manufacturers' "assembler compilers" for the same CPU), but there has to be a way. Whether there are problems with embedded or in-line assembly? No idea, I didn't even know the distinction existed.
|
Amade writes: I modified simple helloworld program like it is told here and it is working pretty good even managed to add some other .cstring and print it.
But there is some part of code not mentioned here without which it doesn't compile:
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.weak_definition _nextInstructionAddress
.private_extern _nextInstructionAddress
could you tell something about it?
|
adil writes: Excellent work ! keep it going and you will be the first one to write a tutorial on the topic :)
|
ema writes: So, basically every access to a global variable involves a function call to find out our current address.
And every call to a shared library function results in another indirect call.
This is disturbingly inefficient.
|
Uli Kusterer replies: ★ @ema: Not at all. You only have to get the anchor's address once. Then you can calculate any subsequent addresses using the value in EBX.
Not to mention that a CALL instruction isn't the same as a function call in a higher-level language. It does not do all the stack set-up and tear-down that a real function call in a high-level language would do, nor does it pass any parameters. So, as long as you don't clobber EBX (or save it somewhere and restore it later), this is completely efficient.
Would absolute addressing be faster? Sure, but it's only a few instructions, and the benefit of being able to dynamically load code at runtime without having to worry about address space collisions far outweighs the downside of those three additional instructions.
Stop thinking like a high-level programmer, already, this is machine language. ;-) A typical function call takes dozens of instructions.
|
pete writes: Hey Uli
Thanks for the guide - nice and simple!
Tschuess!
|
Tony writes: Hi,
I am using the gcc translated code as a guidance as well. But i am having trouble with the making calls to Pthread library functions. I kept getting 'illegal instruction' at run time. And the worst is that the debuggers (like gdb) can not reveal at which line actually the program crashed.
The file attached below is the program I experiment with the C library function calls. The program works if I remove the call to '_pthread_mutex_init'.
The function declaration looks like this:
_pthread_mutex_init(pointer to mutex object, NULL)
I am very very confused by Mac OSX.
Can you help me find the error here?
Thanks.
-----------------------------------------------------------------------
.cstring
.align 2
termination_msg:
.ascii "Program has terminated.\0"
.text
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
#pushl %esi
pushl %ebx
call ___i686.get_pc_thunk.bx
"L12$pb":
# indirect addressing for Mac OSX
leal termination_msg-"L12$pb"(%ebx) , %eax
pushl %eax
call L_printf$stub
addl $4 , %esp
pushl $0 # NULL=$0
leal L_mtx$non_lazy_ptr-"L12$pb"(%ebx) , %eax
movl (%eax), %eax # addr(mtx) is in eax now
pushl %eax
call L_pthread_mutex_init$stub
addl $8 , %esp
#pushl $0
#call exit
popl %ebx
#popl %esi
movl %ebp, %esp
popl %ebp
ret
.comm _mtx , 44
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_mtx$non_lazy_ptr:
.indirect_symbol _mtx
.long 0
.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
L_pthread_mutex_init$stub:
.indirect_symbol _pthread_mutex_init
hlt ; hlt ; hlt ; hlt ; hlt
L_printf$stub:
.indirect_symbol _printf
hlt ; hlt ; hlt ; hlt ; hlt
#--------------------------------------------------------------
.subsections_via_symbols
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.weak_definition ___i686.get_pc_thunk.bx
.private_extern ___i686.get_pc_thunk.bx
___i686.get_pc_thunk.bx:
movl (%esp), %ebx
ret
|
Ted Henry writes: > the underscore in front of "main" is a convention in C, so just accept it
Should "C" be "assembly"?
|
alex writes: So here's what I keep getting... does anyone have any ideas?
$ cat asmtest.s
.text
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $0, %eax
leave
ret
$ as -o asmtest asmtest.s
$ chmod +x asmtest
$ ./asmtest
-bash: ./asmtest: Malformed Mach-o file
|
Cory B writes: Here's a nice Hello World program in OS X assembly language for the GNU assembler. Much of it is the same as FreeBSD assembly language. Notice I didn't calculate the absolute address of msg. The program works anyway.
Assemble and link like this:
$ as -o hello.o hello.S
$ ld -e _start -o hello hello.o
-
.data # data section, where we declare our constants
msg: .ascii "Hello, world!\n" # our message
len: .long . - msg # length of our message... the . means the
# current position pointer...
.text # text section, where we execute instructions
.globl _start # make _start visible to the linker
_syscall: # interrupt 80h, "calls" the kernel to perform a syscall
int $0x80
ret
_start: # our entry point
pushl len # push the length onto the stack
pushl $msg # push the (relative) address of msg onto the stack
pushl $1 # push 1 onto the stack (the file descriptor for stdout)
movl $4, %eax # load 4 into eax (the number for the write() syscall)
call _syscall # tell the kernel to write msg to stdout
addl $12, %esp # clean the excess off the register
pushl len # push the length onto the stack, to return it from exit()
mov $1, %eax # load 1 into eax (the number for the exit() syscall)
call _syscall # tell the kernel to exit our program
-- EOF --
Here's a version for nasm for comparison. It's a nearly one-to-one correspondence, so I didn't bother adding comments (assemble with `nasm -f macho hello.asm`, link the same as before):
section .data
msg db "Hello, world!",0xa
len equ $ - msg
section .text
global _start
_syscall:
int 0x80
ret
_start:
push dword len
push dword msg
push dword 1
mov eax, 4
call _syscall
add esp, 12
push dword len
mov eax, 1
call _syscall
|
David writes: Hi,
I just started playing around with assembler under OS X too and wrote my first little proggy. It basicly does nothing but calls the system call 1 which ends the program. Had some difficulties to get it up and running until I realized I couldn't simply use the -arch x86_64 option for as and ld. So now I guess that I have a 32 bit program where I wanted a 64 bit program. Does anyone here know what is needed to create a 64 bit program?
|
hdx writes: Hi, I'm trying to run my first assembly program in my new iMac core 2 duo, but I'm getting the following error:
ld: could not find entry point "_start" (perhaps missing crt1.o) for inferred architecture i386
The code:
# My first Assembly program
.data
HelloWorldString:
.ascii "Hello World\n"
.text
.globl _start
_start:
# Load all the arguments for write ()
movl $4, %eax
movl $1, %ebx
movl $HelloWorldString, %ecx
movl $12, %edx
int $0x80
# Need to exit the program
movl $1, %eax
movl $0, %ebx
int $0x80
The command:
ld -e _start -o Hello Hello.o
Can anybody help?
|
me writes: @hdx
you have to link it first.
use:
as file.s -o file.o
ld file.o -o file
|
Mo writes: Linux will (usually) only use INT 0x80 as a fallback system-call mechanism. If I remember rightly, there's a page present in every process (I can't honestly recall whether it's managed by the libc or by the kernel-I think the kernel, though) which contains stubs for the system calls - on modern systems, this is SYSENTER instructions, whereas on old ones it'll be INT 0x80 (which is a lot slower than SYSENTER). This way, applications and libraries don't have to care what the system call calling convention (phew) is, which is important when it's changed a few times over the lifetime of the OS and backwards-compatibility needs to be preserved.
Somebody with more knowledge than I would have to chime in abut whether Mac OS X/Darwin exhibits any kind of comparable behaviour :)
|
Mo writes: (In relation to my previous comment) - Apple deliberately doesn't publicise the system call mechanism, save for the Darwin sources, and similarly doesn't support static linking. I'm not sure if they've said as much, but presumably this is so that they can change the mechanism in future OS release (or even a minor update) without having to recompile anything except libSystem.
@Ted Henry:
It's actually an ABI convention. Mac OS X uses leading underscores, as do (most) DOS compilers and Win32. No idea if Win64 does, but I'd guess so. Quite a few embedded systems do too. I don't know what the rationale was - I've a feeling that the underscore was (at least once upon a time) used to denote global symbols.
|
al writes: @hdx
if you're having compile problems, the syntax is
gcc youasmfile.as -o executablename
I made the assumption to use "as" and "ld" but those turned up with bus errors and linking problems and such.
|
leffe writes: I tried to asm a hello world example i found which was written for freeBSD on my intel mac with nasm knowing the similarities between osx an bsd syscalls.
It worked, but only if i produced a mach-o obj file (obviously!), or else i would get the:
"ld: could not find entry point "_start" (perhaps missing crt1.o) for inferred architecture i386" error.
when the correct obj file is produced: "ld -e _start -o foo foo.o" should work
|
|  |