#121816 - brave_orakio - Thu Mar 15, 2007 1:48 am
Hi, I've been wondering about where the compiler places variables and structures. If you label it as const, is it placed in EWRAM or just in ROM? If you dont put const does it go to IWRAM? Is this correct?
Also when copying from ROM to VRAM, which would be faster, DMA32 or DMA16? Since the bandwidth is 16-bits for this type of transfer, I thought that DMA16 is faster. Would this be correct?
_________________
help me
#121819 - tepples - Thu Mar 15, 2007 4:12 am
In -specs=gba.specs, const puts a variable in ROM.
In -specs=gba_mb.specs or on the DS, const puts a variable in EWRAM.
If you don't use const or specify a section using an attribute, the variable goes to IWRAM on the GBA or EWRAM on the DS.
The speed of DMA copies among ROM, EWRAM, and VRAM doesn't really depend on 16-bit vs. 32-bit access. A 16-bit copy looks like read, write, read, write; a 32-bit copy looks like read, read, write, write.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#121820 - Ant6n - Thu Mar 15, 2007 4:12 am
when copying to vram i'd be faster to use 32bit accesses.
usually it would take 1 cycle for 16 bit, 2 cycles for 32 bit. But when vram is already being accessed by the gfx hardware, you have to add one, so you'd get 2 vs 3 cycles.
that's at least what gbatek says:
Quote: |
VRAM 16 8/16/32 16/32 1/1/2 *
GamePak ROM 16 8/16/32 - 5/5/8 **/***
GamePak Flash 16 8/16/32 16/32 5/5/8 **/***
GamePak SRAM 8 8 8 5 **
Timing Notes:
* Plus 1 cycle if GBA accesses video memory at the same time. |
...although, now i am not sure anymore, the plus 1 cycle might refer to the cycle, not the access
#121827 - brave_orakio - Thu Mar 15, 2007 5:12 am
Thanks to all! I didn't know that the specs had a part in where the variables were put! Also some intresting things about the DMA. I'd better check those specs as well. Thanks again!
_________________
help me
#122414 - brave_orakio - Mon Mar 19, 2007 4:50 am
1 more question, this time about data alignment. Would it be slower to use a non constant char or short since its placed in iwram? Since it's not world aligned would there be a hit in speed/efficiency?
_________________
help me
#122420 - tepples - Mon Mar 19, 2007 5:19 am
It depends on what you're using it for.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#122421 - Ant6n - Mon Mar 19, 2007 5:22 am
The wait for a u8,16,32 is the same in iwram. trying to load non-aligned memory is mostly going to give bad behaviour -http://nocash.emubase.de/gbatek.htm#cpumemoryalignments - so, don't. In general, I'd think word sized variables are the fastest wo work with since there might be some overhead when using smaller types.
#122433 - brave_orakio - Mon Mar 19, 2007 8:39 am
I'm going to use the short in an array to store the positions of the sprites for sorting. So max would be 128. I want to use short instead of int because of the size. I may need 2 arrays so to store x and y positions.
_________________
help me
#123436 - brave_orakio - Wed Mar 28, 2007 5:17 am
Another question! what is the size of a void pointer? Other sites say it has no size, but that can't be right! It still has to store the adress right?
_________________
help me
#123439 - DekuTree64 - Wed Mar 28, 2007 5:40 am
brave_orakio wrote: |
Another question! what is the size of a void pointer? Other sites say it has no size, but that can't be right! It still has to store the adress right? |
Yeah, void pointers have no size. Addresses are always in bytes. The compiler multiplies by the size of the data type when adding offsets, but that doesn't necessarily mean that the base address is aligned to the same size. Of course, reading 16 or 32 bit units that aren't aligned won't work properly, but for example a 12 byte structure doesn't have to be aligned to 12 bytes.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#123445 - sgeos - Wed Mar 28, 2007 6:44 am
brave_orakio wrote: |
Another question! what is the size of a void pointer? Other sites say it has no size, but that can't be right! It still has to store the adress right? |
The pointer is the size of any other pointer- 32 bits on the GBA, but this will differ from system to system. The data it points to is of an unknown size. Because the size of the data it points to unknown, you can't do anything with the address (until you cast it to a know type), so "it has no size".
-Brendan
#123446 - brave_orakio - Wed Mar 28, 2007 7:10 am
So conceptually it has no size? so in a structure
struct DMA
{
void* src;
void* dest;
u32 foo;
}
src and dest need to point to 32-bit addresses/variables to work correctly?
What ever variable void points to it will take on the size of the variable?
_________________
help me
#123447 - Ant6n - Wed Mar 28, 2007 7:16 am
whatever the void points to the compiler wont know. So you cant dereference that pointer. If you want to access the data, you have to tell the compiler what it is - and then it will know the size (unless its an array of sorts).
#123450 - brave_orakio - Wed Mar 28, 2007 8:38 am
It's becoming a little clearer now! thanks! but I actually don't need the value that void points to(so no need to dereference), only its address for DMA copy.
I the reason I asked in the first place was that I saw the structure that I gave above in the TONC tutorials and I wondered how it could work when the DMA consists of 3 32-bit registers and there are 0-3 DMAs. so to use the fourth DMA the code looked like:
dma_copy = (volatile DMA*)(address of DMA here)
dma_copy[3].src = foo;
dma_copy[3].dest = foo;
...
So I checked what the size of void* is and I got "no size" and I wondered how the above code would work if void* is no size. Please enlighten me on this, because I want dma_copy[3].src to point to an array of char(the image data actually) and I thought that this would probably break If I used it like that.
_________________
help me
#123456 - DekuTree64 - Wed Mar 28, 2007 9:05 am
brave_orakio wrote: |
So conceptually it has no size? so in a structure
struct DMA
{
void* src;
void* dest;
u32 foo;
}
src and dest need to point to 32-bit addresses/variables to work correctly?
What ever variable void points to it will take on the size of the variable? |
Oops, I think I misunderstood the question, and on top of that my answer was a bit poorly worded...
So to recap what the others have said, the pointer itself is 32 bits. Those 32 bits are used to store a memory address, such as 0x02000000 (that's the start of main RAM).
A u16* is basically the same internally, just a 32 bit address. But when you dereference a u16*, it reads 2 bytes starting at that address, because the u16 type is known to be 2 bytes. Also, if you offset a u16*, like when using the array operator
Code: |
u16 *pointer = (u16*)0x02000000;
u16 data = pointer[3]; |
...then the offset is logically in 2 byte units. So that statement loads the 2 byte value from address 0x02000006. The compiler does the multiply by 2 internally.
Now going back to void*, what happens when you try to dereference it? A compiler error. That's because the void type has no size (that is, the pointer doesn't know what kind of data it's pointing to, so it doesn't know how many bytes to read from the address).
Offsetting doesn't work either, for the same reason. In the u16* example, the offset is in 2 byte units. But with void*, the offset is in unknown size units, so you get an error.
And just ignore my comment about structures. I was talking about pointers to structures, but it was confusing and would take too long to explain in full detail.
As for the DMA registers, you're working directly with hardware at that point so the C/C++ concept of data types goes out the window. You have to tell it the unit size (16 or 32 bits) by setting different values in the control register.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#123461 - brave_orakio - Wed Mar 28, 2007 10:02 am
Ah! Now its all clear! Thanks much to all who answered!
_________________
help me
#123488 - Cearn - Wed Mar 28, 2007 2:39 pm
One further point here. The reason I use 'void*' for DMA and memcpy/set is because it doesn't really matter what the types of the data pointed to are. All the routines need is source/destination addresses and an indication of how much to copy. Addresses correspond to pointers in C. The use of 'void*' says you don't care about the datatype of the data that is stored there. Using 'void*' (and 'const void*' for the source) for copy routines is useful because you won't have to explicitly cast when you make the assignment. Looks cleaner and it saves the user from a little extra work.
Code: |
u8 array[...];
u16 *ptr= array; // ERROR: can't cast from u8* to u16*
u16 *ptr= (u16*)array; // Fine, explicit cast.
void *ptr= array; // Fine, every pointer is a void pointer too.
|
#123679 - brave_orakio - Fri Mar 30, 2007 5:50 am
I'm learning a lot in this forum! By the way great work on the TONC tutorials Cearn!
_________________
help me
#126428 - brave_orakio - Mon Apr 23, 2007 5:30 am
Another question here!
What happens when the variables aren't aligned?
I had a piece of code that would compile in release 18 of Devkitarm but would not compile in the later release. I can't remember the code anymore(hard drive got fried) but I remeber that the variables weren't aligned.
_________________
help me
#126437 - chishm - Mon Apr 23, 2007 6:28 am
The compiler normally makes them alligned. If you force them to be unaligned, then you'll end up with corrupt data. The effects depend on whether you are reading or writing unaligned memory.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#126447 - keldon - Mon Apr 23, 2007 8:36 am
Depends on why it was not compiling, could be anything.
#126449 - brave_orakio - Mon Apr 23, 2007 8:42 am
I read that release19 and up didn't align the variables automatically.
I didn't force unalign the variables though. Would this stop the code from compiling?
_________________
help me
#129632 - brave_orakio - Thu May 24, 2007 1:50 pm
Another question is, when do I align data? Would it be proper to align say an array of char(image data) or do we just align structs? Or can I do something like this:
struct
{
u32
u16
u16(padding)
}
_________________
help me
#129633 - tepples - Thu May 24, 2007 2:08 pm
You can align a char array. But for at least 4-bit image data, I'd recommend making it a u32 array anyway, as a 32-bit data type is convenient for representing an 8x1 pixel slice of a tile. Besides, your converter should output big blocks of read-only data as assembly language (*.s) files, not C (*.c, or incorrectly *.h) files, which take much longer and much more PC RAM to compile.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#129811 - brave_orakio - Sun May 27, 2007 4:08 pm
would it also have an effect on runtime performance if I use a .c instead of a .s file using a DMA copy to vram? And when do you recommend using align?
_________________
help me
#129818 - tepples - Sun May 27, 2007 4:47 pm
brave_orakio wrote: |
would it also have an effect on runtime performance if I use a .c instead of a .s file using a DMA copy to vram? |
No, but the compile time performance is worth it, especially once you're rebuilding 14 MB of data that you got back from your artists, level designers, and musicians.
Quote: |
And when do you recommend using align? |
Whenever the data needs to be aligned. If you're doing a 16-bit or 32-bit copy from or to an 8-bit buffer, or you're doing a 32-bit copy from or to a 16-bit buffer, you need attribute aligned. In what cases do you want to know whether or not the copy meets these criteria?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#129845 - brave_orakio - Mon May 28, 2007 5:02 am
I get it! Copying from 1 memory area to another with different bus widths require aligning? So copying from IWRAM to OAM or ROM to VRAM will require alignment?
_________________
help me
#129860 - Cearn - Mon May 28, 2007 12:10 pm
No. What matters is which datatypes you read or write in, not the memory areas. Data types have 'natural' alignments, indicated by their size: halfwords read/writes require 2-byte alignment, Words require 4-byte alignment.
The compiler knows the natural alignments and accounts for this when assigning addresses to your data. Alignment problems only occur if you cast pointers from a small type to larger type: byte-pointers to word-pointers, for example. Byte arrays aren't automatically aligned to word addresses, so interpreting them as word-arrays may be problematic. If you never cast to a larger type, there can be no problems.
#129867 - tepples - Mon May 28, 2007 2:57 pm
Cearn wrote: |
If you never cast to a larger type, there can be no problems. |
Except that if the data is not aligned, memcpy() assumes that the destination is byte writable. Not only does this decrease copy speed, but it also screws up the data copied to VRAM.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#129873 - Cearn - Mon May 28, 2007 3:24 pm
tepples wrote: |
Cearn wrote: | If you never cast to a larger type, there can be no problems. |
Except that if the data is not aligned, memcpy() assumes that the destination is byte writable. Not only does this decrease copy speed, but it also screws up the data copied to VRAM. |
memcpy() effectively casts to words if possible. It's implicit, but it's still casting.
#129932 - brave_orakio - Tue May 29, 2007 5:53 am
I see. So casting is dangerous if you don't align data. How would this affect structures though? Or an array of structures. For example buffered OAM data. Oh, and since assembly was mentioned is there a way to creat a structure in assembly? Like a array of buffered OAM data?
_________________
help me
#129942 - Cearn - Tue May 29, 2007 10:54 am
brave_orakio wrote: |
I see. So casting is dangerous if you don't align data. How would this affect structures though? Or an array of structures. For example buffered OAM data. |
Pre-r19, aggregates (i.e., structs, classes, unions and in this case files too) were always word-aligned. In r19 and after, it seems that they are aligned to their largest member. An OAM struct is usually defined as 4 halfwords, so that'd mean 2-byte alignment. An unfortunate consequence for struct copies is that if they're not word-aligned, memcpy() is used even if the struct is very small. This is not only slow, but memcpy() works with byte copies if the size is less than 16, which won't work for OAM. It might be best to add alignment attributes there as well, but again it depends on how you intend to use the things
brave_orakio wrote: |
Oh, and since assembly was mentioned is there a way to creat a structure in assembly? Like a array of buffered OAM data? |
No structures, IIRC. But you can still create data in a struct-like fashion and interpret the data as structs. The declaration in the header file determines how the data is used.
#130026 - brave_orakio - Wed May 30, 2007 8:55 am
I get it now! Thanks alot! I will also look into assembly and how to declare and interpret variables like a structure.
_________________
help me
#130035 - Cearn - Wed May 30, 2007 11:07 am
brave_orakio wrote: |
I get it now! Thanks alot! I will also look into assembly and how to declare and interpret variables like a structure. |
Small example of that:
Code: |
// Header file
typedef struct foo
{
u32 a;
u16 b;
u8 c;
u32 d;
} foo;
extern const foo bar;
extern const foo xyzzy;
|
Code: |
@ Assembly file
.section .rodata @ Put it in ROM
.align @ Align to words
.global bar @ Make it visible from outside
bar:
.word 0x03020100 @ bar.a
.hword 0x0504 @ bar.b
.byte 0x06 @ bar.c
.byte 0xFF @ Padding for alignment of bar.d
.word 0x0B0A0908 @ bar.d
.align @ Align to words
.global xyzzy @ Make it visible from outside
xyzzy:
.word 0x03020100, 0xFF060504, 0x0B0A0908
|
In the header file, a struct called foo is defined and bar and xyzzy are declared as instances of that type. Their definitions are in the assembly file. bar is written in a way that corresponds to the struct format. Please note that padding byte that's required to make the d member word-aligned. This is the way C expects things (unless you use the packed attribute). xyzzy is just formatted as an array of words, but the formatting here really doesn't matter. In memory, bar and xyzzy look exactly the same.
Alternatively, you could write structs to a binary file and use the .incbin directive to include it in an assembly file. Or use a bin2o tool to convert it to an object file, as is done in a number of libgba/libnds examples.
#130051 - brave_orakio - Wed May 30, 2007 2:59 pm
Ah, I see. Same values, just declared in different ways. Quite flexible this assembly. Thanks for the example!
_________________
help me
#130538 - gmiller - Mon Jun 04, 2007 10:42 pm
If you use DMA to copy data then the chunk size (16 or 32 bit) will require alignment concerns if your data is not 2 byte (16 bit) or 4 byte (32 bit) aligned. I ran into this when doing a DMA copy of OAM memory with a 32 bit chunk size. The structures are filled with shorts (16 bit) so their starting address is 2 bytes aligned. I saw OAM values that did not match my copy values until I forced 4 byte alignment or changes the chunk size to 16 bit.
#130653 - brave_orakio - Wed Jun 06, 2007 4:57 am
Doing a DMA copy is like casting to a bigger variable type isn't it? I see why alignment is required for this type of copy.
_________________
help me
#130670 - gmiller - Wed Jun 06, 2007 12:50 pm
DMA copy can be done in 16 bit (2 byte) or 32 bit (4 byte) "chinks" so not only does your data need to have a size that is a multiple of the "chunk" size, its address must also must be aligned on the "chunk" size boundary. For the 2 byte alignment the low order bit must be zero in the address and for 4 byte alignment the low order 2 bit must be zero. Of course other low order bit could be zero but the minimums are as stated.
You reference to up-casting is the correct concept. In general though up-casting is risky from an alignment point of view. Down-casting would be safe from an alignment point of view. If something is 4 byte aligned then it is also 2 byte aligned. If you can declare the variable using a type that has the alignment you require (short - 2 byte, int/long - 4 byte, long long - 8 byte) with the size that you need then the compiler will align the data for you and you can down-cast the address of the variable to anything you want.
For example:
long foo[256]; // 256 elements 4 bytes each 4 byte aligned, or 1024 bytes
char * stuff = (char *) foo; // or (char *) &foo[0]
#137814 - brave_orakio - Thu Aug 16, 2007 1:41 am
Here I am again with another question.
I just noticed that registers are defined as for example:
#define REG_X *(vu32*)0x4000000
is it a double pointer? or does it mean that 0x400000 is an address(the first *) and REG_X is pointing to it(the second *)?
_________________
help me
#137815 - tepples - Thu Aug 16, 2007 2:11 am
Not a double pointer. The part in the parentheses is a cast. A lot of expressions in C can be read from the inside out:
0x04000000 - a value
(vu32 *)0x04000000 - the same value, reinterpreted as a pointer to vu32
*(vu32 *)0x04000000 - the vu32 value at this address
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#137817 - brave_orakio - Thu Aug 16, 2007 3:42 am
Thank you!
one more, I forgot to add, how do I know if I require vu32 or vu16 in the #define? In general, is it vu32 for memory locations and 32 bit registers and vu16 for 16 bit registers?
_________________
help me
#137820 - tepples - Thu Aug 16, 2007 4:18 am
brave_orakio wrote: |
In general, is it vu32 for memory locations and 32 bit registers and vu16 for 16 bit registers? |
Yes. In fact, there are still other aggregate data types used within the I/O for scrolling, affine transformation, timers, DMA, and the like. See the libgba headers (and to a lesser extent libnds headers) for some Stupid Struct Tricks(tm).
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#137883 - brave_orakio - Fri Aug 17, 2007 2:10 am
Thanks again! but I just noticed something about the dma registers
They go like this:
#define REG_DMA0SAD *(vu32*)0x40000B0
#define REG_DMA0DAD *(vu32*)0x40000B4
#define REG_DMA0CNT *(vu32*)0x40000B8
Aren't they 32 bits each? So why are they in offsets of +0x04?
_________________
help me
#137884 - kusma - Fri Aug 17, 2007 2:32 am
brave_orakio wrote: |
Aren't they 32 bits each? So why are they in offsets of +0x04? |
Because 32 bit is the same as 4 bytes.
#137887 - brave_orakio - Fri Aug 17, 2007 3:23 am
shouldn`t it be +0x20 for 32 bits? Can you explain how the +0x01 is equal to 1 byte?
_________________
help me
#137890 - Lick - Fri Aug 17, 2007 3:40 am
Memory access often goes in chunks as small as bytes. Therefore in C++ when you're using pointers, you will only be able to offset in byte-sized steps. Bits are less commonly used and not directly "point"-able with an address.
0x20 is 32 bytes.
_________________
http://licklick.wordpress.com
#137891 - brave_orakio - Fri Aug 17, 2007 4:02 am
Now I get it. Were not computing variables, were computing memory offset in in bytes. Thanks!
_________________
help me
#138344 - brave_orakio - Thu Aug 23, 2007 3:43 am
Alright another one, about how to align. When I align a variable, this error occurs:
error: expected '=', ',', ';', 'asm' or '__attribute__' before 'ALIGN4'
Also I have declared ALIGN4 as
#define ALIGN4 __attribute__((aligned(4)))
How is this done?
edit: another question is about DMA_copy16. Is the count in 32 bit chunks or is it in 16 bit chunks?
_________________
help me
#138448 - elyk1212 - Fri Aug 24, 2007 7:25 am
brave_orakio wrote: |
Now I get it. Were not computing variables, were computing memory offset in in bytes. Thanks!
#define REG_DMA0SAD *(vu32*)0x40000B0
#define REG_DMA0DAD *(vu32*)0x40000B4
#define REG_DMA0CNT *(vu32*)0x40000B8
|
Well, these are registers, that happen to be memory mapped for your hardware convenience. So, yeah, you're right, those are just memory addresses (but know that they point to specific use registers).
Know that *all* modern architectures are byte addressable (I say 'all' loosely since right now working on 18bit arch for job, which is a strange, really :( ). This makes more sense when you realize how many address lines it would take to access memory should it be bit addressable.
Example:
If you had 4 bytes of memory:
Byte addressable:
you would only need 2 address lines since there are 4 possibilities, given with 2 address lines. {00.01,10,11}
Bit addressable:
You would need 5 address lines for all 32 bits. Think about that for a min: {00000 -11111 (31) }= 32 possibilities.
This is not huge since my example is small memory, but it grows significantly. Also, it does not make a whole lot of sense, from hardware stand point, as you have a bus that is X bits wide. If you were to only pull 1 at a time, it would be very wasteful.
Quote: |
error: expected '=', ',', ';', 'asm' or '__attribute__' before 'ALIGN4'
|
As for your error. Sounds like it could be something leading to that code ("before 'ALIGN4'), as I cannot see anything wrong.
I am guessing DMA16 would be the 16 bit version, name fitting, maybe in THUMB mode (?)
#138792 - brave_orakio - Wed Aug 29, 2007 3:44 am
Yep, I did something stupid to get the error. Thanks for the extra information!
_________________
help me
#138793 - brave_orakio - Wed Aug 29, 2007 4:28 am
I just noticed this now, but does copying to memory using dma significantly slow down the processor or am I doing something wrong with the data alignment? I tried incrementing attribute 2 of oam(for animation) and also tried using dma (overwriting the previous frame with the next one without incrementing atrribute 2), and I noticed a significant slowdown with dma. This is only one sprite by the way.
Also I noticed that when I use dma, the sprite sometimes get corrupted for a split second before returning tyo normal.
_________________
help me
#138797 - elyk1212 - Wed Aug 29, 2007 6:11 am
I am rather new to GBA, so I am not absolutely sure as to all its eccentricities/etc. However, DMA typically does not put significant burden on the CPU. It can however cause contention on a bus that is being used for the transfer (so any logic ops that would use this bus must wait). Not sure how that ties in with what you are doing exactly.
A good read:
http://en.wikipedia.org/wiki/Direct_memory_access
But anyhow, I have never seen what you are talking about with image tearing/corruption, except when I didn't use interrupts. But it likely has something to do with you changing sprite gfx memory during a VBLANK operation. Therefore, half the image is the old frame, half the new frame.
I really recommend checking for the proper time to change gfx memory (waitforVblank() or whatever). Also, consider interrupts, unless there is some good reason not to use them in your project.
#138884 - brave_orakio - Thu Aug 30, 2007 4:45 am
Whoops! I was doing something stupid so my index went out of bounds(the corrupted picture) and my dma was copying every iteration of the loop even though the animation frame did not change(The slowdown)! thanks for the ideas though!
Anyway, when is is efficient to use dma instead of an stmia/ldmia assembly routine? How large a variable is it still efficient in?
_________________
help me