#9194 - jonathan_ingram - Wed Jul 30, 2003 2:36 pm
Hello,
I'm currenyl doing all of my GBA programming in C (I have no real assembly knowledge) and was wondering if there are any well known efficient screen clearing methods rather than drawing a large black quad or rectangle to the screen with dimensions 240*160?
Thanks.
#9196 - niltsair - Wed Jul 30, 2003 2:47 pm
Create a 16bits variable containing the color you want the screen to be cleared to. Then use Dma, setting the source as the variable and screen memory address as destionation. The option would include fixed source address(so it keep using the variable) 32bits transfer, immediate, and 240*160/4 transfers
#9211 - hnager - Wed Jul 30, 2003 7:02 pm
his is actually something I've been wondering how to do - I haven't been aboe to get the fixed source for the DMA transfer ...examples?
#9213 - niltsair - Wed Jul 30, 2003 7:49 pm
There's only 1 bit to set in the Dma control register. But seing how i'm at work, I can't really post an example. Have you been using Dma3?
#9214 - hnager - Wed Jul 30, 2003 7:59 pm
yeah - I've been using DMA3 - i dont have anything on hand either (at work) - but it was an issue I had in the past - although - maybe that was because I was in MODE4.
#9215 - Vortex - Wed Jul 30, 2003 8:29 pm
hnager wrote: |
his is actually something I've been wondering how to do - I haven't been aboe to get the fixed source for the DMA transfer ...examples? |
Code: |
void DMAClearScreenMode3(u16 color)
{
REG_DMA3SAD = (u32) & color;
REG_DMA3DAD = 0x06000000; //Destination Address
(REG_DMA3CNT_L) = (160 * 240);
(REG_DMA3CNT_H) = 0x9100;
} |
#9218 - NEiM0D - Wed Jul 30, 2003 11:00 pm
It might be possible coding an assembler routine that clears memory using a few stmia's and putting that routine in iwram might even be faster than DMA, however, i am not sure about that because i have never tried it.
#9228 - DekuTree64 - Thu Jul 31, 2003 1:18 am
Try swi 0xc. Make a file called swi.s or something, and put this in it:
Code: |
.global CpuFastSet
.arm
.align 4
CpuFastSet:
swi 0xc0000
bx lr
|
and compile it with GCC exactly as you would a C file, it will take care of figuring out the language by the filename extension (.s). Then in your C file, put
Code: |
extern void CpuFastSet(void*, void*, u32);
|
And call it like CpuFastSet(source, dest, length | BIT24); for a memset, or skip the BIT24 for memcpy. Length is in words (4 bytes each, so for a 256 byte transfer, length would be 64), and must be a multiple of 8 words (32 bytes). For a memset, the source is still a pointer to the 4-byte value you want to fill with (so don't just give it a 0 like a regular memset(), make a var, set it to 0, and give it the address of that).
I haven't done any speed tests myself, but I've heard this is faster than a DMA set with something as big as the screen to fill, since DMA has to read and write each time, and CpuFastSet just loads 8 regs with the first word at source and uses stmia until it's done. But now I'm curious, so I think I will give it a test, I'll post the results later.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#9238 - DekuTree64 - Thu Jul 31, 2003 5:57 am
Ok, according to my speed tests, a 0x2580 word (that would be one 8-bit screen's worth of data) memset (source var in IWRAM, to EWRAM) takes around 0xf900 cycles with CpuFastSet and 0x10700 with DMA. A 0x100 word transfer takes around 0x760 with CpuFastSet and 0x7c0 with DMA. Then a 0x2580 word copy from/to EWRAM took 0x1e000 with CpuFastSet, and DMA took 0x1c000.
A 0x100 word copy was 0xd80 with CpuFastSet and 0xcb0 with DMA.
All times are on hardware.
Then a 0x800 word copy with CpuFastSet to IWRAM was 0x17b0, and DMA was 0x10a0, and 0x40 word was 0x160 and 0x120.
So my conclusion is that DMA is fast for IWRAM, and CpuFastSet is good for EWRAM. Unfortunately I forgot to test a memset to IWRAM with CpuFastSet, and I didn't test it on VRAM either, which is where the screen to be cleared is, so I'll probably try that tomorrow. But before I do, are there any other src/dest/size combos or anything anyone would like me to try?
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#9239 - DekuTree64 - Thu Jul 31, 2003 6:36 am
Ok, I decided to try them tonight. Here's the new results...
Src var in IWRAM to VRAM, 0x2580 word memset, CpuFastSet was 0x6900, DMA 0x7700. 0x100 words, CFS 0x370, DMA 0x3d0. 0x2580 word EWRAM to VRAM copy, CFS 0x15300, DMA 0x12c00. 0x100 words IWRAM to VRAM copy, CFS 0x490, DMA 0x3a0.
For IWRAM to IWRAM, 0x800 word memset, CFS 0xdb0, DMA 0x1090. 0x80 words CFS 0x113, DMA 0x11c.
So generally, CpuFastSet is good for filling, and DMA is good for copying. A screen clear would be faster with CpuFastSet, and an EWRAM backbuffer to mode3 screen copy would be faster with DMA.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#20142 - LunarCrisis - Mon May 03, 2004 2:03 am
Vortex wrote: |
Code: |
void DMAClearScreenMode3(u16 color)
{
REG_DMA3SAD = (u32) & color;
REG_DMA3DAD = 0x06000000; //Destination Address
(REG_DMA3CNT_L) = (160 * 240);
(REG_DMA3CNT_H) = 0x9100;
} |
|
Hmm, where are those REGs defined? I seem not to have them :S
Also, how would I have to change that for mode 4?
_________________
If a tree falls in the forest and no one is there to hear it, why the heck do you care?
#20144 - dagamer34 - Mon May 03, 2004 2:44 am
For mode 4, you would change the resolution to 120 X 160, since there are 2 pixels per 16 bits.
_________________
Little kids and Playstation 2's don't mix. :(
#20147 - poslundc - Mon May 03, 2004 4:02 am
LunarCrisis wrote: |
Hmm, where are those REGs defined? I seem not to have them :S |
They are in the standard gba.h header file. The L and H variations of REG_DMA3CNT just identify the lower and upper halfwords (16 bits) out of the entire 32-bit register. There is no programmatical reason to assign them separately.
Dan.
#20148 - LunarCrisis - Mon May 03, 2004 4:04 am
dagamer34 wrote: |
For mode 4, you would change the resolution to 120 X 160, since there are 2 pixels per 16 bits. |
Thanks for that info dagamer
As for the REG defines, I found them in the source for one of the tutorials on www.thepernproject.com
I got it to clear the screen properly, but unfortunately if I start drawing on the screen to soon after I call the clear screen function, it doesnt display it. I added a delay after it and it seems to have helped, but I wanted to ask if there was any way of knowing wether it had finished clearing the screen or not.
EDIT: to poslundc: The problem was that I didnt know where to find the 'standard' header files.
_________________
If a tree falls in the forest and no one is there to hear it, why the heck do you care?
#24306 - gotenks06 - Sun Aug 01, 2004 4:54 am
I'm trying to make a DMA function to fill an array (mostly for clearing the screen with a color of my choice). I have tried for days and cannot get it to work (the other clear screen function here for some reason worked, but it was always black with my compiler). I am using devkitARM.
Code: |
// gba_dma.h snippet
// ...
#define DMA_DEST_FIXED 0x00400000
#define DMA_DEST_RELOAD 0x00600000
// ...
inline void DMA_FILL16(const u16 src, u16 dst, u16 cnt)
{
REG_DMA3DAD = 0;
REG_DMA3SAD = src;
REG_DMA3DAD = dst;
REG_DMA3CNT = cnt | DMA_SOURCE_FIXED | DMA_ENABLE | DMA_16;
}
#endif
|
Code: |
// main.cpp
#include <gba.h>
int main()
{
SetMode(MODE_3 | BG2_ENABLE);
u16 t = RGB16(0, 0, 31); // So I can pass the address
PlotPixel(5, 5, t);
while(1)
{
//DMA_FILL16(t, *VideoBuffer, SCREEN_WIDTH * SCREEN_HEIGHT);
DMAClearScreenMode3(t);
}
}
|
#24316 - poslundc - Sun Aug 01, 2004 2:14 pm
src, dest, and cnt are all u32s, not u16s.
Dan.
#24366 - Cearn - Mon Aug 02, 2004 8:38 am
REG_DMA3SAD and REG_DMA3DAD contain addresses (which, yes, are 32bits), not values. Off the top of my head, this should work:
Code: |
// note the pointers
inline void DMA_FILL16(const u16 *src, u16 *dst, u16 cnt)
{
REG_DMA3DAD = 0;
REG_DMA3SAD = src;
REG_DMA3DAD = dst;
REG_DMA3CNT = cnt | DMA_SOURCE_FIXED | DMA_ENABLE | DMA_16;
}
// use:
u16 color= 0xdead;
// note the color reference
DMA_FILL16(&color, VideoBuffer, SCREEN_WIDTH*SCREEN_HEIGHT);
|
#24369 - poslundc - Mon Aug 02, 2004 1:42 pm
For the record, there is no reason to zero REG_DMA3DAD. You might, however, want to zero REG_DMA3CNT, in case there are timing issues with multiple calls to the DMA (known to happen).
Dan.
#25266 - Deanonious - Thu Aug 19, 2004 5:57 pm
Okay, I am trying to use the CpuFastSet method and I created a file CpuFastSet.s and included the ASM Code and then in my Header File I added:
extern void CpuFastSet(void*, void*, unsigned int);
as I am not using the typedefs in this project so I changed u32 to unsigned int and I tried calling the function like this:
CpuFastSet( &ColorBlack, VideoBuffer, ( ( 160 * 240 ) / 4 ) );
I am using Mode 3 if it matters but when I compile/link I get an error:
undefined reference to "CpuFastSet( void*, void*, unsigned )"
Dean
#25267 - poslundc - Thu Aug 19, 2004 6:05 pm
You need to link to your .s file by passing it in on the GCC command line. Also, make sure that there is a .global directive in your .s file specifying the function.
Dan (although if you're writing the assembly code anyway, you may as well write your own version that would be faster to access...).
#25286 - Deanonious - Fri Aug 20, 2004 2:07 am
Well I'm using Visual HAM w/out HAMLib and I verified that it is Assembling and Linking the .S file into the Binary but the Linker still claims that it's an Undefined Reference to CpuFastSet?
#25315 - poslundc - Fri Aug 20, 2004 1:53 pm
Is there a .global CpuFastSet directive in your .s file?
Dan.
#25348 - Deanonious - Sat Aug 21, 2004 3:21 am
Yes... here is the contents of my CpuFastSet.s File:
.global CpuFastSet
.arm
.align 4
CpuFastSet:
swi 0xc0000
bx lr
It's basically cut and paste from an earlier post in this thread.
#41731 - Deanonious - Sun May 01, 2005 4:00 pm
Well I finally realized what I was doing wrong all this time, I had to declare it as:
extern "C" void CpuFastSet( void*, void*, unsigned int )
instead of just declaring it:
extern void CpuFastSet( void *, void *, unsigned int )
So it compiles and executes now the problem I am having is when I use the CpuFastSet function the screen clears like it's supposed to but it has noise all over it in the form of three sets of two "lines" for lack of a better word appear on the screen and will not go away.
Also, I was under the impression that for Length in Mode 3 would be 0x2580 but that only seems to clear 1/2 the screen, is 0x5160 the proper value or do I have it totally wrong altogather???
Thanks in advance,
Dean
#41839 - Cearn - Mon May 02, 2005 11:11 am
Is the noise somewhere around line 130? In that case, it might be the overlap with the sprite tiles. On the other hand
Deanonious wrote: |
CpuFastSet( &ColorBlack, VideoBuffer, ( ( 160 * 240 ) / 4 ) );
|
If you're still calling it like this, you could be copying iwram (with stack and global variables and all) to vram. You need to set the fixed-source flag for clears (bit 24). Also, 160*240/4 would be correct if it weren't for the fact that mode 3 uses half-words, not bytes, which is why only half is erased. Finally, 2*0x2580 isn't 0x5160. This hex we're talking about. 2*0x2580 = 0x4B00.
#41920 - Deanonious - Tue May 03, 2005 3:05 am
Thank you very much Cearn, DekuTree64, and Poslundc for all of your help.
After brushing up on Bit Operators and Binary/Hex Number Systems and using your help I finally got it running properlyt and I am fairly happy with the results.
It never ceases to amaze me how super important things can be that I never bothered learning before since I used to write Windows Software for a living, and everyone knows Windows programming is "Lazy" (atleast compared to hardware programming).
Sincerely,
Dean
#41921 - Deanonious - Tue May 03, 2005 3:12 am
Oh, I did have one last question... if I try to CpuFastSet with any Color other than Black it interleaves Black and the other Color every other Vertical line on the Screen.
Could it have something to do with the fact that I am calling it from my VBlank ISR?
Dean
#41929 - FluBBa - Tue May 03, 2005 10:11 am
CpuFastSet takes a pointer to a 32bit value, not a 16bit value.
Make sure you copy your 16bit value to both the high and low part of the 32bit source.
http://nocash.emubase.de/gbatek.htm#biosmemorycopy
Edit: removed the address part...
_________________
I probably suck, my not is a programmer.
Last edited by FluBBa on Tue May 03, 2005 1:05 pm; edited 1 time in total
#41930 - Deanonious - Tue May 03, 2005 10:21 am
Thanks FluBBa, I did realize that already, as I was finding it rather impossible to set BIT24 when I had the variable defined as being 16-bit. It seems I misunderstood an earlier post that said something was a u16 not a u32, but looking back on it I believe now that they were talking about the DMA Copy method and not the CpuFastSet. If that is not what you were meaning then I am afraid that I misunderstood what you meant.
#41934 - FluBBa - Tue May 03, 2005 1:08 pm
Pointers are allways 32bit (on the GBA) it's what they point to that can be different types.
_________________
I probably suck, my not is a programmer.
#42017 - Cearn - Wed May 04, 2005 10:02 am
Instead of worrying about pointers and fill-flags, you could also write the function so that it takes care of these things for you
Code: |
@ DECL: void CpuFastFill(u32 wd, void *dst, u32 count);
.align 2
.global CpuFastFill
.code 16
.thumb_func
swi_fast_fill:
push {r0} @ push wd on stack
mov r0, #128 @ \
lsl r0, r0, #17 @ -- add fill flag
orr r2, r2, r0 @ /
mov r0, sp @ point to stack (where wd is)
swi 0x0C
add sp, sp, #4 @ 'pop'
bx lr
|
This will behave more like one'd expect: you give the word that you want to fill with, where you want to start filling and for how long.
Code: |
// in C file:, fill with red (yes, doubling the color is still necessary)
void CpuFastFill(u32 wd, void *dst, u32 count);
CpuFastFill(0x001F001F, VideoBuffer, 240*160/2);
|
I also have 16 and 32bit memset and memcpy versions, which work for any number of words/halfwords, not just 32byte multiples. Under most circumstances they're about as fast as CpuFastSet (though with a little less overhead). If anyone's interested (crickets chirping...)
#42022 - Deanonious - Wed May 04, 2005 2:53 pm
Hey Cearn,
If you are willing to post those functions I would be happy to take a look at them, if nothing else I am sure that I could examine them and get some good Assembler pointers, I never bothered to learn Assembly Language before, but on a system like the GBA I can see the ultimate potential in finally picking it up.
And I didn't realize FluBBa was telling me that I had to double the Color pointer up until now, darn I'm really slow.
Dean
#42025 - Cearn - Wed May 04, 2005 5:20 pm
You can find the routines here.
#42160 - Deanonious - Fri May 06, 2005 10:30 pm
Thanks again Cearn