#20580 - nath86 - Thu May 13, 2004 1:18 pm
Hey just wanted to know a quick way to load a array with a integer. Would want something very similar to the memset function except 16bit, any suggestions would be helpful.....I have a feeling it's going to be assembly based...
#20581 - Lupin - Thu May 13, 2004 2:01 pm
You want to fill some memory with just one 16 bit value? You need DMA to do that... use something like this:
Code: |
#define DMAClearMemory16(dest,wc) REG_DMA3SAD = (u32)&clear; \
REG_DMA3DAD = (u32)dest; \
REG_DMA3CNT = wc | DMA_ENABLE | DMA_TIMEING_IMMEDIATE | DMA_16 | DMA_SOURCE_FIXED;
u16 clear = 0x1337;
|
Call this macro by giving it a pointer as dest and the count of 16 bit words you want to fill with your number
_________________
Team Pokeme
My blog and PM ASM tutorials
#20582 - poslundc - Thu May 13, 2004 2:03 pm
Use the 32-bit memset function; just set the 32-bit word that you're using to the same 16 bits in the low and high halfwords. For example, if I want to set the array to 0xFA06, I'd use CPUMemSet with the value 0xFA06FA06 and just tell it to repeat half as many times as you need.
Only problem is that the BIOS functions are apparently quite slow to load, so if you need it to be really quick you should write your own ASM function and place it in IWRAM.
... Heck, I'm not doing anything at the moment, I'll even write it for you:
Code: |
.section .iwram
.arm
.global FastMemSet
.type FastMemSet, function
FastMemSet:
stmfd sp!, {r4-r8}
mov r3, r0
mov r4, r0
mov r5, r0
mov r6, r0
mov r7, r0
mov r8, r0
mov r12, r0
L_MEMLOOP:
stmia r1!, {r0, r3-r8, r12}
subs r2, r2, #8
bgt L_MEMLOOP
ldmfd sp!, {r4-r8}
bx lr |
Then the prototype for this function is:
Code: |
extern __attribute__ ((long_call)) void FastMemSet(unsigned int val, unsigned int dest, unsigned int count); |
The first parameter is the 32-bit word you want copied, the second parameter is the destination address, an the third parameter is how many 32-bit copies you want to make (should be a multiple of 8, meaning you are setting 32 bytes at a time).
And here's an unrolled version for really large transfers. The third parameter is still the number of 32-bit copies, but now it should be a multiple of 64, meaning you are setting 256 bytes at a time).
Code: |
.section .iwram
.arm
.global FastMemSetLarge
.type FastMemSetLarge, function
FastMemSet:
stmfd sp!, {r4-r8}
mov r3, r0
mov r4, r0
mov r5, r0
mov r6, r0
mov r7, r0
mov r8, r0
mov r12, r0
L_BIGMEMLOOP:
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
stmia r1!, {r0, r3-r8, r12}
subs r2, r2, #64
bgt L_BIGMEMLOOP
ldmfd sp!, {r4-r8}
bx lr |
Hope this helps,
Dan.
Edit: fixed a stack error in the code...
Edit: fixed ANOTHER error... if it doesn't work now, I'm going back to programming HyperCard.
Last edited by poslundc on Thu May 13, 2004 8:06 pm; edited 2 times in total
#20583 - poslundc - Thu May 13, 2004 2:05 pm
Lupin wrote: |
You want to fill some memory with just one 16 bit value? You need DMA to do that... |
DMA is generally a poor choice if you are trying to fill it quickly, because it needs to re-read the value being copied each iteration through. I remember DekuTree64 did some testing of it and it came out as being considerably slower overall.
Dan.
#20584 - Lupin - Thu May 13, 2004 2:07 pm
ok, the stmia method is indeed faster... but the function will always fill chuncks of 8 32 bit words or 64 32 bit words (as far as i can state from taking a brief look at the code).
poslundc, won't you use stmfd / ldmfd for stack operations instead of stmia or ldmia? Or doesn't it matter?
_________________
Team Pokeme
My blog and PM ASM tutorials
#20588 - poslundc - Thu May 13, 2004 2:23 pm
Lupin wrote: |
poslundc, won't you use stmfd / ldmfd for stack operations instead of stmia or ldmia? Or doesn't it matter? |
You're correct... I will go back and edit it.
(I didn't claim I'd tested the code... :P)
Dan.
#20598 - ampz - Thu May 13, 2004 4:13 pm
Or if you are working in C (you probably are), and just want to initialize an 16bit array at boot, you do it like this:
short int array[8] = {0xFA06, 0xFA06, 0xFA06, 0xFA06, 0xFA06, 0xFA06, 0xFA06, 0xFA06};
Of course, it is not as efficient as the asm stmia alternative.
#20604 - sajiimori - Thu May 13, 2004 6:03 pm
Code: |
stmia r1!, {r0, r2-r7, r12}
|
You wanna store where? :D
#20616 - poslundc - Thu May 13, 2004 7:59 pm
Goddamnit, this is what I get for blind coding in the morning. OK, as soon as I send this I'm gonna edit the original message for the LAST time. >:(
Dan.
#20658 - nath86 - Fri May 14, 2004 2:36 pm
wow thanks heaps guys, expecially Dan *huggles* I don't understand it, but it looks the part...
#20678 - sajiimori - Fri May 14, 2004 6:38 pm
If you don't understand it, then it's more important to understand its limitations. The first version only works on multiples of 8 words, and the unrolled version only works on multiples of 64 words.
To get around those limitations, write a wrapper function in C that does the remainder first, then calls the unrolled version. The overhead is not significant for large copies.