gbadev.org forum archive

I have just moved some of my ASM code to inline assembler and I noticed that in my plot pixel routine, I had to double the screen resolution so that the pixel are plotted and why on the full screen. When I didn't, the pixel where only plotted halfway on the screen. Can any one tell me why this is.

I have a function in my ASM file that requires parameters. how will I call that function and supply the necessary parameters from my C source.

thank you
_________________
Keep it real, keep it Free, Keep it GNU

Probably what happened in the first one was your screen pointer was a u16*, so when you're writing in C, it's doing 2 bytes at a time, and ASM all addresses are in bytes, so when you ad your offset, you have to do the multiply by 2 yourself.
For the second, function arguments go into the first 4 regs (r0-r3), and any more than that go onto the stack. I think they go like Func(v1, v2, v3, v4, v5, v6, v7) would be v1-v4 in r0-r3, v5 at sp, v6 at sp + 4, v7 at sp + 8. I'm not positive about that though.
You can check exactly where they are by puting in an infinite loop at the start of your function, gonig to Tools->Dissassembly in VBA, look at r13's value and type it into the little address box at the top and click the go button. Then you can see what values are in memory where the stack is.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Read the ARM/Thumb Procedure Call Standard: http://www.arm.com/support/566FQ9/$File/ATPCS.pdf - it specifies how functions should be passed their arguments, how they should return values, and which registers are allowed to be changed, and which must be saved/restored. GCC generated code follows this specification.

Torne

DekuTree64 wrote:

Probably what happened in the first one was your screen pointer was a u16*, so when you're writing in C, it's doing 2 bytes at a time, and ASM all addresses are in bytes, so when you ad your offset, you have to do the multiply by 2 yourself.

All we and great but I don't have to double the resolution when I am using a pure ASM file or a pure C file only when it is inline ASM. I am guessing that the inline assembler is operation in thumb mode as in the function I didn't specify .ARM or am I talk out of my **S here?
_________________
Keep it real, keep it Free, Keep it GNU

Well, it's easy to tell if you're in ARM or Thumb mode. If you're writing ARM assembler, it's ARM mode. If you're writing Thumb assembler, it's Thumb mode. The instruction sets are not the same; you can't just put .arm or .thumb and have it work. All ARM arithmetic instructions (like add, sub..etc) take three arguments: destination register and two source registers. The corresponding Thumb instructions take only two arguments: the destination is the same as the first source.

If your code works as external ASM but not as inline ASM, then you are probably not doing the inlining correctly. Writing inline ASM is complicated and I've been unable to find a good reference on exactly what you have to do. If you are prepared to deal with the calling conventions yourself (by reading the procedure call standard, the link I posted previously) then I find it's usually easier to just make your inline ASM blocks into external functions in pure assembler. I have yet to use inline ASM for anything on any platform, simply because it's never been worth the effort to me. Not saying you shouldn't, just maybe something to consider.

T.

Omega81 wrote:

All we and great but I don't have to double the resolution when I am using a pure ASM file or a pure C file only when it is inline ASM. I am guessing that the inline assembler is operation in thumb mode as in the function I didn't specify .ARM or am I talk out of my **S here?

Well that was just my best guess, I'd have to see the code to really track down the problem.
And yes, as Torne says, pure ASM is much easier. And usually faster too, because when you pass arguments, the C compiler will try to get them into r0-r3 as efficiently as possible, and then you can work with that directly, instead of having to tell it to give you some arguments, and then ldr/mov or whatever it is you do to get them in the right regs.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

OK here is my Code for ploting a pixel that only plots it half way up the screen when you set x to 240

Code:

global PlotPixel
.arm
.align 4
PlotPixel:
stmfd sp!, {r4-r5,sp}    @ push these registers on the stack
mov r4, #0x6000000 @ make r4 point to video memor
mov r5,r1,lsl#9       @ Shift the y value left by 8 = 256
sub r1,r5,r1,lsl#5       @ r1 =256-(y<<4) = 240
add r1,r1,r0 @ add X value
strh r2,[r4,r1]          @ dump pixel on the screen
ldmfd sp!, {r4-r5,sp} @ pop the registers off the stack
mov pc,lr             @ return to calling function

There.... I am working in mode 3 with background 2 and the resolution should be 240x160 so my program should plot a pixel at the bottom right hand side if in C I say:

Code:

PlotPixel(240,160,0xffff);

The problem is it don't. It only plots the pixel at the bottom right, half way down the screen. But this piece of code works fine, by doubling both the X and the Y max (instead of 240x160, we have 480x320)

Code:

global PlotPixel
.arm
.align 4
PlotPixel:
stmfd sp!, {r4-r5,sp}    @ push these registers on the stack
mov r4, #0x6000000 @ make r4 point to video memory
mov r5,r1,lsl#9          @ Shift the y value left by 9 = 512
sub r1,r5,r1,lsl#5       @ r1 = 512-(y<<5) = 480 (why?)
add r1,r1,r0,lsl#1 @ add X value to r1 also shifting left by 1
strh r2,[r4,r1]          @ dump pixel on the screen
ldmfd sp!, {r4-r5,sp} @ pop the registers off the stack
mov pc,lr             @ return to calling function

My question is why??
_________________
Keep it real, keep it Free, Keep it GNU

Because pixels in mode 3 are 2 bytes each. A row is 240 pixels, so 2 bytes each makes 480 bytes per row, which you did have right in the first version. The reason the second one works is because you also multiplied x by 2, which is correct.
Otherwise if x is 3 for example, you'll be trying to store a pixel in the upper half of the previous pixel. And when you store a halfword on an odd byte boundary, it stores the first byte at the address you specified, and the second byte at the address below that, so your color will get messed up. And of course you'll only make it half way across the screen.

Also, why are you storing all those registers at the start? r3 is unused, so at most you'd need r4, and sp will end up back where it was when you load all the stored ones at the end, so no need to store it. And with a little cleverness, you can do it without r4 either.
And lastly, don't return with a mov pc, lr. Use a bx lr. It's the same speed, and will switch the processor to ARM or THUMB mode depending on which omode you called it from, and thusly makes your function support interworking.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Thanks, I think I get it now. About the redundant storing, I through so too but I wanted to follow the ARM THUMB PROCEDURE STANDARD (ATPS) Standard by the letter. Anyway thanks guys.

One more little thing, I am send a pointer to memory to an ASM function and I was wondering if GCC will send the address pointed to by the pointer to my ASM function or the address of the pointer.
_________________
Keep it real, keep it Free, Keep it GNU

http://forum.gbadev.org/viewtopic.php?t=1119

I posted a number of comments to that thread explaining what the ATPCS implies. Deku is right; you may use r3 as a temporary variable as you are allowed to trash any of r0-r3 which you are not using as return values, and storing sp is a bad plan. Returning in an interworking-safe way is covered in the above thread (bx lr is fine for your function but not for functions that call other functions).

gbadev.org forum archive

ASM > Inline Assembler

#9882 - Omega81 - Thu Aug 21, 2003 2:54 am

#9885 - DekuTree64 - Thu Aug 21, 2003 4:43 am

#9893 - torne - Thu Aug 21, 2003 11:47 am

#9898 - Omega81 - Thu Aug 21, 2003 12:53 pm

#9903 - torne - Thu Aug 21, 2003 1:24 pm

#9906 - DekuTree64 - Thu Aug 21, 2003 3:38 pm

#9913 - Omega81 - Thu Aug 21, 2003 6:36 pm

#9914 - DekuTree64 - Thu Aug 21, 2003 7:09 pm

#9916 - Omega81 - Thu Aug 21, 2003 8:00 pm

#9919 - torne - Thu Aug 21, 2003 8:53 pm