gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > Is Carmack Crazy?

#4018 - regularkid - Sun Mar 16, 2003 8:32 am

Ok, I'm currently making a game using a wolfenstein-like engine (raycasting). Now, I have Carmack's code from wolfenstein but can't seem to figure out how he does the drawing. Here is the problem I'm having:

If I am close up to a wall, then the wall obviously fills the entire screen. In order to draw each wall strip (2 bytes wide each) I need to figure out each wall pixel's texture coordinate. Now, I have optimized this as much as I can think of using huge look-up tables and pure assembly but can't seem to use fewer than 5 or 6 instructions per pixel. Since I'm using mode 4 (240x160), then this equals around 96000 to 115200 instructions to draw the entire screen!!! This is obviously too much and my frame rate is really bad! But, for the life of me I can't figure out Carmack's drawing code! He must be doing something crazy to get the speed he needs. The GBA port of wolfenstein runs very smooth (i'm estimating 30-60 fps), so it's definately possible to draw all this and still have game logic going on.

So, my question is:

Does anyone have any idea how Carmack is able to draw so much and keep a good framerate. Or does anyone have any suggestions for speeding up the drawing part of a wolfenstein-like raycasting engine since that is the most costly part? Thanks soooooo much. Any help would be absolutely wonderful!
_________________
- RegularKid

#4026 - Dev - Sun Mar 16, 2003 1:24 pm

Don't use C. Problem solved.

#4028 - Torlus - Sun Mar 16, 2003 3:08 pm

You said that you used 5-6 by pixels. For a wolf-like raycaster, i think you may be able to use only 3 instructions by pixel (without considering for the moment that you use mode 4, where one byte = one pixel and its memory alignment drawback) :

- one "ldr" , to get texture colour value
- one "str", for setting the previous value to pixel's position
- one "add", for your texture coordinate updating

Your extra instructions may be a compare and a branch. You can avoid them by unrolling your loop 160 times and set the pc register according to the number of pixels you have to draw in a line.

Hope this helps

#4042 - tepples - Sun Mar 16, 2003 9:22 pm

regularkid wrote:
But, for the life of me I can't figure out Carmack's drawing code! He must be doing something crazy to get the speed he needs. The GBA port of wolfenstein runs very smooth (i'm estimating 30-60 fps)

A 30fps frame rate means drawing the screen every other frame. Because there are 280,896 cycles in a frame, you have about 500,000 cycles to draw the screen after taking into account game logic, audio mixing, and the like. I haven't seen Wolf3d for GBA, but I know Doom cheats by 1. using low detail horizontally and 2. using the bottom few scanlines for a status bar. This means it's rendering only about 120x128, or 15360 pixels per screen.

Quote:
Does anyone have any idea how Carmack is able to draw so much and keep a good framerate. Or does anyone have any suggestions for speeding up the drawing part of a wolfenstein-like raycasting engine since that is the most costly part?

Are you sure that the texture mapping part is the slowest? When I played FaceBall 2000 on the Super NES, it slowed down when walls were far away from me, indicating that the slowdown was from the raycasting process itself rather than drawing the results to the screen. I've glanced at the Wolf3d source code, and I saw comments in there that even for simple worlds made of square prisms, a BSP would be faster than raycasting.

Torlus: If you claim that 'ldr', 'str', and 'add' are enough for a tmapper's inner loop, then what about updating the pointer to the destination pixel (another add)? And what about the fact that 'ldr' takes 3 cycles and 'str' takes 2? And will this whole unrolled loop fit in IWRAM along with an audio mixer, etc?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#4044 - regularkid - Sun Mar 16, 2003 9:58 pm

tepples:

good point. I forgot that I wouldn't have to draw the bottom part of the screen due to the status bar.

Thanks for the cycle numbers (280896 per frame). Can you point me to a document that shows what each instruction takes (ldr = 3, str = 2)? That would help me alot! Thanks
_________________
- RegularKid

#4053 - tepples - Mon Mar 17, 2003 3:17 am

regularkid wrote:
Can you point me to a document that shows what each instruction takes

Try the ARM documentation section in http://www.gbadev.org/docs.php

General rules:

any non-executed conditional instruction (ARM only): 1 cycle
branch: 3 cycles
ldr: 3 cycle plus wait states
ldm: 2+n cycles plus wait states
str: 2 cycles plus wait states
stm: 1+n cycles plus wait states
swp (ARM only): 4 cycles
mul: 2 to 5 cycles depending on second(?) operand
mla: one more than mul
mull (ARM only), mlal (ARM only): one more than mul or mla
other instructions: 1 cycle
write to r15 (PC): 2 extra cycles
shifting a register by the contents of a register: 1 extra cycle

Another reminder: If you're using cycle-timed code, run it in 0 wait state memory. IWRAM is the only writable 32-bit 0 wait state memory.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#4056 - Maddox - Mon Mar 17, 2003 6:25 am

Dear regularkid,
Carmack used compiled slivers to draw the slices of the screen in Wolfenstein3D. That is to say that he compiled code from graphics so that it would draw itself. Since this code was made for 286/386's with no cache, this was the best way to go. It's very fast. Yes, I have done this on GBA but I'm not sure if it's faster than an IWRAM ARM loop streaming stuff off of the ROM. Plus it bloats the size of the ROM if you have sh*tloads of graphics. Heh, I made a sprite compiler -- I kick *ss!

Dear tepples,
I beleive, Torlac or whatever must have in mind that one could use the post-increment addressing mode of the ARM instruction set such as:

Code:
strh r0, [r1], #2


Just my to sense.
_________________
You probably suck. I hope you're is not a game programmer.

#4080 - regularkid - Tue Mar 18, 2003 1:34 am

Maddox,

Cool. That sounds fast. Can you please explain the compiled code technique, i'm not sure i understand. Does he create seperate code functions for each possible wall slice? That would seem wasteful? Thanks!
_________________
- RegularKid

#4132 - Maddox - Thu Mar 20, 2003 6:29 am

regularkid,
At present I do not think this technique has much merit on GBA. In Wolfenstein3D, Carmack or whoever allocated code buffers and generated code for each size of each sliver, I believe. Remember when you resized the view in Wolf3D and it said it was doing some calculations? Part of those calcs were recompiling the slivers. There aren't that many textures in Wolf3D, if you noticed and they had 2MB of XMS to work with. If only there were 2MB of memory on GBA! Screw you, Nintendo!
So to answer your question: yes, they did make separate functions for every sliver of every texture. Someone correct me if I'm wrong.

-King Maddox
_________________
You probably suck. I hope you're is not a game programmer.