gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > problem with TSTS/STRccB (new to GBA ASM)

#18291 - AntiPasta - Tue Mar 23, 2004 6:37 pm

Hello,

This is my first post on this board. Nice to find there's an active ASM scene still around :-)
Cutting my teeth on a non-working piece of code drove me here. I hate to have someone else debug my code but I've been staring at this for days and it's driving me mad. Basically, what I'm trying to do is draw a 8x8x1bpp character from a font in ROM to a Mode 3 screen, thus 'expanding' each bit in the character to a 16 bit pixel. Due to ARMs funkiness I was glad I could do this with a minimal amount of branching. Here's what I came up with:

Code:

;            R7 = pointer to 8x8x1bpp character in font
;            R8 = RAM location for temporary space, 64 bytes
;            R9 = pointer in VRAM to put sprite (VRAM+(y*480)+x*2)
;            R6 = color to draw the character in
PutChar:
        mov R0, R8              ;backup ram location
        ldr R1, =8              ;loop counter
        sub r4, r4, r4          ;zero R4
        DoLineLoop:
                ldrb r2, [r7]   ;load a byte from the char
                mov r3, 128    ;first check the highest bit
                tsts r2, r3
                strnzh r6, [r0]       ;write the color byte if the bit is 1
                add r0, r0, 2

                mov r3, 64
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 32
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 16
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 8
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 4
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 2
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                mov r3, 1
                tsts r2, r3
                strnzh r6, [r0]
                add r0, r0, 2

                add r7, r7, 1
                subs r1, r1, 1
            bne DoLineLoop



As you can see the character is first drawn to a temporary location (in RAM) and then copied to the VRAM. I have tested the code that does the copying but it appears to be working fine. However, all the above code seems to do is draw a 8x8 square... so the "strnzh" opcodes are always executed!
I'd hate to have you guys do what I've been too stupid to do myself, but could any of you point me to the probably very obvious mistake I've made here?

Thanks in advance!

#18293 - poslundc - Tue Mar 23, 2004 7:00 pm

I haven't examined your algorithm so I can't verify its correctness, but off the top of my head, you don't ever change the address that you are loading from or storing to.

Use the post-indexed offset mode of the load and store commands to increment your address registers whenever you load/store. eg.

Code:
ldrb   r2, [r7], 1   ; load from r7, then increment r7 by 1 byte
strh   r6, [r0], 2   ; store to r0, then increment r0 by 2 bytes


Also, if you're using gcc (but I'm guessing you aren't) then all numerical constants must be prefixed by a # sign.

Dan.

#18295 - Miked0801 - Tue Mar 23, 2004 7:08 pm

If you shift the high bit into Carry or r2, with say adds r2,r2,r2 - you can get rid of each test (if you also load 32 bits at a time and source is aligned which shouldn't be hard to achieve).

Code:

mov  r1, 8

fooLoop:
   ldr r2, [r7]!

; repeat 32 times or loop with a register
   adds r2,r2,r2
   strcsh r6,[r0],2

   subs r1,r1,1


Much faster :)

#18301 - poslundc - Tue Mar 23, 2004 8:06 pm

Miked0801 wrote:
Code:
ldr r2, [r7]!


Does this assemble? Although the meaning of your code should be obvious enough to the assembler, according to the ARM instruction set reference you can only use the writeback flag (in ARM mode anyway; I think it's different in Thumb) if you supply a pre-indexed offset (either immediate or a register).

I'd like to know if this works. It certainly would be a more elegant way than having to specify the size as a post-indexed offset.

Dan.

#18303 - Miked0801 - Tue Mar 23, 2004 8:24 pm

I have no idea - doing blind coding again :P

I seem to remember promising not to blind assemble. My how short my memory is. I'll compile and see if it works now.

#18304 - Miked0801 - Tue Mar 23, 2004 8:26 pm

Ok, it compiled fine. I know this works with ldm/stm commands so I don't know why it wouldn't work with ldr/str.

#18308 - DekuTree64 - Tue Mar 23, 2004 8:51 pm

Miked0801 wrote:
Code:

mov  r1, 8

fooLoop:
   ldr r2, [r7]!

; repeat 32 times or loop with a register
   adds r2,r2,r2
   strcsh r6,[r0],2

   subs r1,r1,1


Sorry Mike, that increment on the store won't happen unless it is executed, so you won't increment over blank pixels. That is a nice trick with the adding though. Or if the loop is unrolled like the original post, you can just use tst with an immediate. Also, is the temporary buffer important for something like drawing characters to bunches of little buffers and waiting for VBlank to copy them? If not, you could do it like this
Code:
mov r0, r9   ;VRAM pos
mov r1, 8

loop:
ldrb r2, [r7], 1

tst r2, 1
strneh r6, [r0, 14]  ;draw last pixel first so we can increment dest on last one. Actually we could increment on first one and use negative offsets for the rest, but that would be even more confusing :P

tst r2, 2
strneh r6, [r0, 12]

tst r2, 4
strneh r6, [r0, 10]

...

tst r2, 64
strneh r6, [r0, 2]

tst r2, 128
strneh r6, [r0], 480  ;increment to next row

subs r1, r1, 1
bne loop


I really don't know why the original one wouldn't work though, but hopefully bypassing the temporary buffer will fix it.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#18314 - AntiPasta - Tue Mar 23, 2004 10:00 pm

Thanks for all your replies guys! Unfortunately I'm too engulfed in my SQL assignment to try implementing any of your suggestions but I'll sure give it a try tomorrow.

And that temporary buffer bit is actually a bit pointless now, when I started writing the code it was intended for Mode 4... and because of the 3@#^Y# mandatory 16bit writes I thought it better to write the output to a buffer first and then copy that over to VRAM 16 bits at a time. Thanks for reminding me :)

#18315 - Lupin - Tue Mar 23, 2004 10:03 pm

Uhm, you are using an EWRAM buffer in mode 3? That could get slow...

Well, using 16 bit writes you get twice the fillrate than in mode 3... I am using 32 bit writes in mode 3 because it's a good speed up.
_________________
Team Pokeme
My blog and PM ASM tutorials

#18318 - Miked0801 - Tue Mar 23, 2004 10:36 pm

Doh. Good find. :)

#18337 - AntiPasta - Wed Mar 24, 2004 1:02 pm

Alright, I wrote a new function:
Code:

;            R7 = pointer to 8x8x1bpp character in font
;            R9 = pointer in VRAM to put sprite (VRAM+(y*480)+x*2)
;            R6 = color to draw the character in

PutChar:
        ldr R1, =8              ;loop counter
       
        LoopV:
                ldr R4, =8
                ldr R2, [R7]
                LoopH:
                        adds R2, R2, R2
                        strcsh R6, [R9]
                        add R9, R9, 2
                        subs R4, R4, 1
                bne LoopH
                add R7, R7, 1
                add R9, R9, 464
                subs R1, R1, 1
           bne LoopV

        bx R14                          ;jump back to caller


This one does actually draw something... but I think I figured out the original problem now. I think that the address I pass to the function is somehow not right. I use Goldroad 1.7, and here's my test code to call the function:

Code:


(some init code)

ldr r7, =testchar
ldr r9, =vram+24120
ldr r6, =0x8ff
bl PutChar

label1
B label1

testchar:
@DCB 60
@DCB 66
@DCB 66
@DCB 255
@DCB 66
@DCB 66
@DCB 66
@DCB 66

@include xblit162.asm
@pool
@endarea



Once again, sorry to bother you with all this neophyte problems of mine but I really hope to have a working font routine done one day, it would be very helpful with my next projects as I can finally print some debug info ;-)
[/code]

#18338 - poslundc - Wed Mar 24, 2004 2:12 pm

Well, for one thing, the second line of LoopV should be ldrb, not ldr. This will definitely cause problems for you.

Also, I know you aren't trying to optimize it at this point, but it really makes sense to use the writeback feature so you don't have to keep manually incrementing your pointers. Just put an exclamation mark after your addresses (eg. ldrb r2, [r7]! or strcsh r6, [r9]!) and you don't have to add a single byte or two each time.

(Sidebar: yeah, I tried this technique in GAS and it seems to be working now, although I could've sworn it didn't work the last time I did, and it still isn't kosher according to the docs. Oh well!)

Dan.

#18340 - AntiPasta - Wed Mar 24, 2004 3:35 pm

Once more, thanks for your assistance. I have removed what I think is the last bug in my code now :)
Of course this 'adds r2, r2, r2' thing wont work well if I only load the lower 8 bits with meaningful data! DOH! I did a MOV with LSL now and I finally got it to draw an 'A' character. Man, ARM sure is funky.

However, and I hope I can stop bugging you after this, there's one thing left: I had to copy the character to IWRAM at 0x03000000 first and then load R7 with 0x03000000. When I tried loading R7 with the "testchar" label, it seems Goldroad (or maybe just me) doesnt put the right offset into R7. So this wont work:

Code:

mov r7, testchar
ldr r9, =vram+24080
ldr r6, =0xFFFF
bl PutChar

This draws garbage, my guess is Goldroad either loads the value at [testchar] into R7 or puts som relative offset there; as this does work:

Code:
ldr r1, =0x03000000
ldr r2, [testchar]
str r2, [r1]+4!
ldr r2, [testchar+4]
str r2, [r1]+4!
ldr r7, =0x03000000
ldr r9, =vram+24120
ldr r6, =0x8ff
bl PutChar


How can I get goldroad to load absolute offsets from labels into a register? Or is it something else entirely (for instance not being able to do LDRB's from ROM?)

#18343 - poslundc - Wed Mar 24, 2004 5:25 pm

AntiPasta wrote:
Of course this 'adds r2, r2, r2' thing wont work well if I only load the lower 8 bits with meaningful data! DOH! I did a MOV with LSL now and I finally got it to draw an 'A' character. Man, ARM sure is funky.


add r1, r1, r1 and mov r1, r1, lsl #1 will do exactly the same thing. Both mutliply the register by two (shift the register left by 1 bit).

For your second part, I'm not 100% sure what you're asking. If it's how to load a pointer that's either 1. in another file or 2. in a distant section of code (so it can't be addressed relative to the PC), do it as follows:

Code:
   ldr      r0, LOCAL_DATA
   ; r0 now has the pointer to myExternalData
   ; blah blah blah more code
   bx      lr
LOCAL_DATA:
   .word      myExternalData

Then in some other file, or much later on in your program:

   .global      myExternalData   ; only necessary if in another file
myExternalData:
   ; data goes here


You will have to replace the .word and .global directives with their Goldroad equivalents; I don't know what they are.

If the data is in proximity to your code, you can use the adr pseudo-op to perform an adjusted addition using the PC to get the correct pointer.

Code:
   adr      r0, myLocalData
   ; r0 now has a pointer to myLocalData in it
   ; more code, blah blah
   bx      lr
myLocalData:
   ; data goes here


Hope this is what you're looking for.

Dan.

#18446 - Lupin - Fri Mar 26, 2004 11:20 am

i just wrote a 12x12 pixel 1 bit font plotter, i managed to code it within a hour and it's working quite good (though i think i will have to change it to use dynamic size fonts).

I don't really understand why you need local data to do this, i just plot the char directly to vram. I also don't understand what this adds r2,r2,r2 think should do (sounds useless to me...).

Btw, does GAS understand macros? If yes you should think about using macros for reading the character bytes and hardcode the function (if you have enough iwram space left). It is also a good idea to use post incrementing as mentioned before, but sadly you can't use it for incrementing the frame buffer pointer :(

Well, you can have a look at my source code here (for arm assembler): http://home.arcor.de/lupin003/plottext.txt
_________________
Team Pokeme
My blog and PM ASM tutorials

#18447 - torne - Fri Mar 26, 2004 12:08 pm

yup, GAS has nice macro support (I used them to implement a recursive constant-analyser that works out whether it's faster to use shifts and adds or a constant-pool load, and generates whichever wins). Read the info file.

#18453 - Miked0801 - Fri Mar 26, 2004 6:50 pm

adds rx,rx,rx shifts the high bit of rx into carry allowing easy checking for the bit being set. It works well in a loop. It's also the exact same thing as movs rx,rx,lsl #1.

#18482 - AntiPasta - Sat Mar 27, 2004 2:05 pm

well all those problems have been solved in my code *except* for loading the *absolute* address of a label into a register with Goldroad... I suspect "ldr r7, =testchar" loads a PC-relative pointer instead of the absolute address.

#18483 - torne - Sat Mar 27, 2004 2:23 pm

No, that loads the absolute address. All load/store instructions take absolute addresses as operands; the only time you use PC-relative addressing is when you are explicitly loading/storing from [pc, #offset], or when you are branching to a label (branching to a register also uses absolute addressing).

#18484 - AntiPasta - Sat Mar 27, 2004 2:28 pm

DOH! I found my problem now! Man, this is embarassing... when I changed the "@textarea" bit near the top of my asm file to "@textarea 0x8000000" everything works like a dream!

Thanks for all your help, now I can finally move on to more interesting things :D