gbadev.org forum archive

I'm writing a perpective-correct software renderer at the moment and I'm trying to render a lot of surfaces that are at a distance and at an angle to the camera. I'm in need of some filtering! Bilinear ain't hacking it, and I don't really think trilinear'll make too much difference.
Any help?

So my regular pixel putting loop looks like this

Code:

while (width > 0)
{
float z = 1.0f / one over z;
float u = u overz * z;
float v = v overz * z;

output pixel = texture[v][u]

one over z += d one over z dx;
u over z += d u over z dx;
v over z += d v over z dx;
width--;
}

Bilinear would involve

Code:

float low u = floor(u);
float low v = floor(v);
float u ratio = u - low u;
float v ratio = v - low v;
float one minus u ratio = 1 - u ratio;
float one minus v ratio = 1 - v ratio;

and then the final pixel would be,
output pixel = (texture[v][u] * one minus u ratio + texture[v][u + 1] * u ratio) * one minus v ratio + (texture[v + 1][u] * one minus u ratio + texture[v + 1][u + 1] * u ratio) * v ratio;

What do I do for anisotropic filtering? I can't seem to find a lot online about how to code this, like where do I put my samples? I know it's dependent on the angle of the surface that I'm trying to render.

Ta guys :-)

EDIT: not sure if I've put this in the right forum - is graphics appropriate?
_________________
Big thanks to everyone who donated for Quake2

For anisotropic filtering, you should basically find the view-direction in texture-space, and sample along the 2d-line that generates. It's pretty damn expensive to do in software, and I can't help feeling that it's somewhat not the right thing to do there. Perhaps the mipmap-variation where you store each possible combination of power-of-two reduction over both axes (or lazy evaluate it and cache the result to save memory) is better and gives good enough results? Also, keep in mind that anisitropic filtering is an improved bilinear, not a replacement.

Ah, just the person I was hoping would reply!

I'm sort of getting the problem that's in http://en.wikipedia.org/wiki/Image:Anisotropic_compare.png just not quite that severe; that horizontal smearing effect.
I'll definately try mip mapping again tomorrow (I had a dabble a few months back), and esp those non-uniform mip map levels to see what happens. I figured AF'd be pretty expensive, and it'd definately chew the memory system a bit. I was just going to try all the options to see which gives the best result then pick the one that gives the best quality/performance ratio.

So...if you were to code this, what would you do? Ta mate :-)

PS: what other options do I have to improve my texture mapped results other than mip maps/bilinear/trilinear/AF?
_________________
Big thanks to everyone who donated for Quake2

simonjhall wrote:

I'm sort of getting the problem that's in http://en.wikipedia.org/wiki/Image:Anisotropic_compare.png just not quite that severe; that horizontal smearing effect.

Well, that effect is basically due to mip-mapping; sampling more times along the axis that stretches the texels the most allows you to pick a higher resolution mipmap without getting aliasing. It does not apply without mipmapping.

Quote:

I'll definately try mip mapping again tomorrow (I had a dabble a few months back), and esp those non-uniform mip map levels to see what happens.

I don't want to sound like a critique here, but do you know why your artifacts appear, or are you just randomly stabbing at the problem? Your explanation leads me to believe the second to be true, but please correct me if I'm wrong. If you're not mipmapping already, anisotropic filtering shouldn't help - at least not in theory :P

If you ARE already using mipmapping, then the non-uniformly resampled textures can make some sense to try. Keep in mind that you need to take the non-uniform scaling into account when picking mip-level.

Quote:

I figured AF'd be pretty expensive, and it'd definately chew the memory system a bit. I was just going to try all the options to see which gives the best result then pick the one that gives the best quality/performance ratio.

So...if you were to code this, what would you do? Ta mate :-)

First, I would try to avoid steep angles when mip-mapping. This might not always be possible, though - especially not in interactive environments. Second, I don't think I'd try anisotropic filtering when performance is an issue - it's simply too expensive. Bilinear filtering in itself is pretty damn expensive, at least when implemented in traditional way. All these techniques multiply the amount of memory-fetches by quite a lot, and texture-swizzling to improve locality is often quite expensive in software, due to most CPUs lack of bit-interleaving operations.

Quote:

PS: what other options do I have to improve my texture mapped results other than mip maps/bilinear/trilinear/AF?

I'm a sucker for per-pixel effects. dot3-bumpmapping and embm looks ace, imo. And they are both feasible, even on GBA ;)

kusma wrote:

Well, that effect is basically due to mip-mapping; sampling more times along the axis that stretches the texels the most allows you to pick a higher resolution mipmap without getting aliasing. It does not apply without mipmapping.

M'kay, I'll check it out with and without mip mapping and see what happens. Isn't the whole point of mip-mapping the improvement in texture-cache performance, not the improvement in image quality? Shouldn't I (in an ideal world) be increasing the number of texture samples the further away the pixel I'm writing out is from the camera? (as in more Z, more samples?)

Quote:

I don't want to sound like a critique here, but do you know why your artifacts appear, or are you just randomly stabbing at the problem?

Stabby stabby stab stab!

Quote:

If you ARE already using mipmapping, then the non-uniformly resampled textures can make some sense to try. Keep in mind that you need to take the non-uniform scaling into account when picking mip-level.

Yeah I was wondering how you'd pick which mip-level to pick as I've only done it before on regular half-sized/quarter-sized/etc-sized mip images. Can't be too hard, right? I guess I just need to check out my U and V step values to infer what to do...

Btw I guess it's obvious that graphics isn't what my speciality is - thanks for the help!
_________________
Big thanks to everyone who donated for Quake2

simonjhall wrote:

Isn't the whole point of mip-mapping the improvement in texture-cache performance, not the improvement in image quality? Shouldn't I (in an ideal world) be increasing the number of texture samples the further away the pixel I'm writing out is from the camera? (as in more Z, more samples?)

Yes, but it's also a matter of image quality. Since one usually uses a box-filter to calculate a mipmap level from the base texture, you end up actually doing more samples.

Quote:

Yeah I was wondering how you'd pick which mip-level to pick as I've only done it before on regular half-sized/quarter-sized/etc-sized mip images. Can't be too hard, right? I guess I just need to check out my U and V step values to infer what to do...

It's actually quite simple. For uniformly scaled miplevels, "just" take log2(max(max(abs(dudx), abs(dudy)), max(abs(dvdx), abs(dvdy)))) at each pixel. Yeah, so choosing mipmap level per pixel might be a bit expensive, but I'm sure you can use this info to find your own way of doing a per-polygon mipmap selection.

For non-uniformly scaled miplevels, just lookup your table based on
log2(max(abs(dudx), abs(dudy))) and log2(max(abs(dvdx), abs(dvdy))).

If it wasn't for that damn perspective correction the per-pixel mipmap level would be the same as the per-polygon mipmap level, but because of the division dudx, dudy, dvdx and dvdy changes from each pixel to the next. (or usually, 2x2 pixel group, but that's a whole different story)

Oh, and in case you're not aware of this quite commonly used terminology (I'm sure you are, but here you go anyway):

dudx is the amount the u coordinate (texture space "x") changes between two pixels in the x-direction in screenspace. dudy is the same, but in y-direction. dvdx and dvdy are the same, but for the v coordinate (texture space "y"). Also keep in mind this commonly used texture space definition:

Code:

t=1, v=4
.-------------------.
| | | | |
|----+----+----+----|
| | | | |
|----+----+----+----|
| | | | |
|----+----+----+----|
| | | | |
t=0, v=0 `-------------------?
s=0 s=1
u=0 u=4

As you can see, <s,t> is the un-normalized texture coordinates. They range from <0,0> to <1,1> across the texture, while <u,v> are normalized and range from <0,0> to <w,h>. As you probably understand, the mipmap level is calculated from the <u,v>-set, as the texture size would have to be taken into account.

i hope i'm not wrong with my understanding, but:

- mipmapping should reduce pixel oversampling (often seen in nds games as high frequency color changes due to distant objects' texture samples - and thus colors - being changed too fast). i fail to see any memory benefits when you have multiple different sized instances of one image in memory... le't not talk about the special cases, where you have only a few low res instances in memory. nds games tend not to have great view distances anyway. ;^)

- af is - to my knowledge -, as kusma already wrote, rather an improvement to mipmapping or better: trilinear filtering. you could still do it without the mips, but i don't know if that would pay off in the end. well, in general i'd say, it's a way to minimize undersampling... and maybe oversampling, too. man, you got me thinking right now! i'll check that asap. ;^D

did you think about rip mapping? it's like a computationally less expensive, although limited, inbetween step towards af. although it would require more mip levels... :^S

i guess anything beyond trilinear mapping would be insane, though. but actually i wish you'll manage to pull it off anyway. :^D

well, i just wonder what you want to do with that neat little piece of software when it's done? :^)

greets!

ps: nice infos kusma (mip level formula!)

silent_code wrote:

i fail to see any memory benefits when you have multiple different sized instances of one image in memory...

While the visual quality is definately a good property of mipmapping, you also have the benefit of better cache-locality; if less memory is required for a texture-surface, less cache-lines will have to be read in to the cpu. This applies only to cached architectures, but you CAN in theory take benefit from it even on non-cached architectures like GBA by having an explicit TCM/IWRAM texture cache. That's however quite tricky to get efficient.

Quote:

trilinear filtering. you could still do it without the mips, but i don't know if that would pay off in the end. well, in general i'd say, it's a way to minimize undersampling... and maybe oversampling, too. man, you got me thinking right now! i'll check that asap. ;^D

uuuh... trilinear filtering is by definition linear interpolating between two different mip-map levels based on the fractional part of the LOD (the equation I gave was basically the LOD-formula - it might have an off-by one or something when I think about it). So it makes no sense in talking about trilinear filtering without mipmapping.

Quote:

did you think about rip mapping? it's like a computationally less expensive, although limited, inbetween step towards af. although it would require more mip levels... :^S

Indeed, "rip mapping" is a commonly used term for non-uniform scaled mipmaps, and it was the name I was looking for but couldn't remember ;)

@ kusma
Hang on, I thought UVs went from 0-1 (hence the normalised thing), and ST co-ordinates went from 0-width and 0-height?

And cool on that big ol' log formula. However I obviously don't want to do that per-pixel. The maxes I can do easily enough but the log2 looks a bit of an arse! Are there any cheats that I can do to avoid doing this computation too frequently?
I reckon I can get away with just doing a per-poly mipmap selection as my geometry isn't unlikely to cross mipmap boundaries too frequently...

Just to clarify, if I'm going for non-uniform mipmaps do I select them like
the u dimension: log2(max(abs(dudx), abs(dudy)))
the v dimension: log2(max(abs(dvdx), abs(dvdx)))
?

Oh and I completely forgot that mip-maps were resampled offline - I was definately having a special moment there! So I guess it is effectively the same as having more samples per pixel as the Z increases.

@ silent_code
rip mapping is the same as non-uniform mip mapping, right? As in you've got your big texture (say) 512x512, then rather than having 256x256, 128x128, 64x64 etc you've got 512x256, 256x512, 256x256, 256x128 etc, right?

I'm trying to use this piece of tech in some image processing software and the results of this are going to be machine-analysed, which is why quality's an issue. I would do this on the GPU but that'd mean I'd have to wait until the end of the frame (and use a shader to do the image processing) but that's much too late.
I think you're right about anything more than trilinear being tough. I was in the shower earlier trying to figure out the asm for bilinear/trilinear and was kicking myself (in the shower) for considering more samples per pixel! Nearest-neighbour seems a whole lot easier in comparison ;-)
_________________
Big thanks to everyone who donated for Quake2

simonjhall wrote:

Hang on, I thought UVs went from 0-1 (hence the normalised thing), and ST co-ordinates went from 0-width and 0-height?

No, that's not how it's usually defined. Check figure 3.10 in the OpenGL 2.0 specification if you don't believe me. Ofcourse, this is just a commonly used terminology, though. What you call your coordinates is completely up to you - I'm just trying to keep a somewhat consistent terminology so we don't talk in circles :)

Quote:

And cool on that big ol' log formula. However I obviously don't want to do that per-pixel. The maxes I can do easily enough but the log2 looks a bit of an arse!

How so? Remember that what you really want to do (as long as there's no trilinear filtering) is floor(log2(x)). For fixed-point numbers, log2() is just a clz-instruction with input-shift on an ARM9 (there's an x86 instruction as well, but I don't remember it's name right now), and you can do it easily with two iterations of binary search and an 256 element lookup-table for the rest.

For floating point it should be a matter of picking the exponent-bits from the binary representation - if a fast log2() function isn't already available, that is.

Or how about clamping the delta-values to some range, and looking up the most significant bits in a LUT to decide miplevels?

Quote:

Are there any cheats that I can do to avoid doing this computation too frequently?

Sure. Most perspective correct mappers use scanline subdivision to reduce the amount of RCPs, and you could pick the miplevel for each span, I guess.

Quote:

I reckon I can get away with just doing a per-poly mipmap selection as my geometry isn't unlikely to cross mipmap boundaries too frequently...

The Quake software rasterizer used something like z-distance only, and it worked out quite well there, so I guess you're right. All in all, you just have to try it out and see how it looks. Stabby stab stab stab ;)

Quote:

Just to clarify, if I'm going for non-uniform mipmaps do I select them like
the u dimension: log2(max(abs(dudx), abs(dudy)))
the v dimension: log2(max(abs(dvdx), abs(dvdx)))
?

Apart from the typo, yes. It's not supposed to be dvdx twice.

Quote:

Oh and I completely forgot that mip-maps were resampled offline - I was definately having a special moment there! So I guess it is effectively the same as having more samples per pixel as the Z increases.

Absolutely, yeah. You can do SOME filtering online if it makes sense, but usually just pre-filtering everything works well.

kusma wrote:

No, that's not how it's usually defined.

Word.

Quote:

For floating point it should be a matter of picking the exponent-bits from the binary representation - if a fast log2() function isn't already available, that is.

Kewl. I'll look into this. Yeah I guess this makes sense and I should be able to do this in just a couple of instructions.

Quote:

Sure. Most perspective correct mappers use scanline subdivision to reduce the amount of RCPs, and you could pick the miplevel for each span, I guess.

Cool. I was thinking about this but have yet to get round to coding it as I'm stuck in meetings all day today! Again, it's all about how good it looks. If it doesn't make too much of a difference to the image quality (even at dramatic angles) then I'll do this! Also depends how much faster I can make it without the divides, I guess. Might only save me one clock per loop...hmm...

Ugh, I'm so bored today! I have zero motivation to doing anything. I did just go and get my shoes reheeled though, so I guess that's exciting. Ugh.
_________________
Big thanks to everyone who donated for Quake2

yepp, you're right about the rip maps. ;^D

@kusma: i know af works like that, but not knowing too much about the impementation, i was just wondering if it ould contribute anything to unmipped textures. that's because all my old software renderers didn't have texturing - i kind of skipped that chapter in favor of learning to use hw acc. ;^D

aaaand i misunderstood parts of your first two posts. that and simons posts got me thinking if i had missed something about af. to me it has always been a better trilin. then i just thought ... not a good idea when it's 3 in the morning and you're just happen to go to bed. ;^)

EDIT: ok, i tested it on the pc and af without mipmapping does *reduce* image quality. basically, it introduces border texel artifacts due to oversampling (because of the lack of mipmapping, i guess). when disabled, the image looks as expected. i might temporarily upload some images showing these artifacts when needed.

kusma wrote:

(or usually, 2x2 pixel group, but that's a whole different story)

I forgot to ask you - what did you mean by this?
I'm implementing the rip-maps (rip-raps, cos I think that sounds cooler) right now and have re-read this post about nine times :-)
_________________
Big thanks to everyone who donated for Quake2

iirc, persp. corr. isn't computed for all pixels, but for small pixel blocks, e.g. 2x2 pixels. i haven't done anything in that field for years, so i don't quite remember enough to tell you how you could benefit from this... thinking about memory access, it makes more sense to me to do it per scanline and make the needed subdivisions occurs every few (2, 4, 8, whatever) pixels.

i imagine one could use a list (read: array) for the blocks and after precomputing the correction values per block, the scanline filler could access those values. again, i don't know about memory access. such a list could fit into the cache for rather medium to small triangles (as usually depending on screen resolution), but small triangles don't need p.c.

i hope i'm not totally wrong this time. as i said, it's been a while. ;^)
hope kusma will clarify things and post about the effect on memory access etc. :^)

that's a rather interesting topic. i feel a little bit warm on the inside when writing about software rendering... :^D

The 2x2 comment was basically that you need the derivate function for the perspective corrected texture coordinates (not the deltas before perspective correction), and calculating this analytic is a bitch - and things gets even more complicated when doing indirect texturemapping. So instead of trying to do that, graphics hardware make fragments from pixels in 2x2 pixel groups "meet" in the pipeline, and assign all four pixels the difference in sampling-coordinates as derivative. This mechanism also makes it possible to take the derivate of any function in a fragment shader. The down-side is that 2x2 pixels groups gets the same derivate values, and hence the same texture-lod.

hey, simon!
i was wondering in what direction you went with the renderer?
greetings! :^D

Alright, I'm back on this bad boy full time now rather than the the messing around that I was doing before. I really wanna give more background info etc...but I can't because of the confidential nature of the game etc.

Anyway, the kind of textures that I'm using are quite small and are likely to always be small-ish. However there's gonna come a time when I need to use big textures, but I've only got a small amount of memory dedicated to my 'live' texture store (with my entire texture collection living elsewhere in memory).

I can see three texturing scenarios:
1) I've got a handful of small textures - their combined sizes is less than the live textures store, therefore they can all be loaded in upfront into this memory and I can texture directly from it
2) I've got lots of small textures - no one texture is larger than the live texture store so I can pull them in on an LRU basis (like I do in Quake ;-)
3) I've got a number of large textures - but at least one texture is larger than my live store. I'm gonna have to pull in small blocks of the full texture and evict them based on a LRU scheme.

Any other scenarios?

The beef that I have with #3 is that it's gonna be well slower compared to the first two - with them I can get the texel that I want in just a handful of clock cycles, but with the third I'm going to have to compute which block of my texture that texel lies in then see if that block exists within the cache. Assuming it is, I can pick out the texel. If not then I've got to evict an old block and load in a new one then pull out the texel. I've got to do this scheme *per texel* too, right? (this would normally be all done in hardware...)

This is obviously a pretty phat bit of code just to fetch one texel. Are there any other ways of doing this? Are there any assumptions that I can make about a triangle/scanline/block of pixels?
One way I've been thinking about is given the current scanline output a list of all the texels that I'm going to need to compose the output pixel and then whilst I'm doing my rasterisation for the next scanline I can be reading in my texels for the previous scanline - no caching involved.

Anyone got any things to try/suggest/whatever?

/random thoughts...
_________________
Big thanks to everyone who donated for Quake2

as an extension to your curent approach:
you're sure doing perspective correction once in a while (every so and so pixels and scanlines). maybe you could find out what "pixel blocks" (about the "blocks" / tiles: think of how the power vr renders stuff [if i'm not totally wrong here], only you use the PCT pixel and scanlines as blocks) would need what texture during that step.
then, you could render those pixel blocks in an order, that would require a minimum change in textures (so that a texture fragment [a texture pixel block, that may or may not represent the whole texture or just a part of it that firts the cache] wouldn't have to be fetched more than once for a given primitive.)
i haven't tried it out, as i've not gotten into sw rendering for quite a while, but you might be able to pull this off. after all, the x360 has a quite similar problem with high resolutions and AA, because the "render buffer" memory is only 10mb. ;^D

as most of the time, this was just a friendly suggestion. it would be nice to read, if that helped *anything* at all. ;^D
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

I am actually doing all my rendering into small tiles (like you mention, roughly 128x128) as I can't fit either the frame buffer nor the depth buffer into memory. But that does imply that there can only be so many texels needed within one of these tiles for a given triangle, so maybe there's something that I can do with that...

Hmm...

In fact re-reading what you said, I think the thing about rendering the tiles in an order that some how guaranteed that I don't need to switch out a texture cache block would be just plain mega. I'll have to think about that!
Really, I don't want to be 99% sure that the texel I want is in memory (as that would require me to do a check per access) so some kind of guarantee would be just great!

Hmm! Good stufff. Any more ideas?

(oh and the 360 thing - a graphics programmer that I work with used to do 360 and said the same thing, and mentioned that when you use AA "you need three tiles" or something!)
_________________
Big thanks to everyone who donated for Quake2

Anyone ever tried the large model data set from Standford University?
http://graphics.stanford.edu/data/3Dscanrep/
I've been staring at spinning teapots for a week and feel a need for a change...but they're a little high poly for my taste. Although the file format doesn't look too hard to load.

Actually, does anyone know how a modern 3D card will handle rendering a model with a shit load of faces? Some of those models have millions of vertices and I've never tried drawing something like that with OpenGL before.
_________________
Big thanks to everyone who donated for Quake2

i've seen some of them being used in research, so, i guess it's possible. although, i'll have to check it out some day, as it's really interesting. :^)

but i honestly don't know if you'd sort for vertex cache hits or overdraw with the really hughe models. i guess the former, as they don't look like having too much overdraw, right? (the me is stupid. [because this sounds soooo random!])
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

I don't use opengl on PC (generally Direct3d). millions of vertices, generally depending on the scene I'd stuff it all into 1 vertex buffer and setup a indices buffer indexing the vertices for the triangles and just render it out. the only time u need to stop todo anything else is if there is state or texture changes. overall most modern 3d cards will just draw that much data reasonable well, alot of games have scenes with up to a million triangles now, so long as you can fit it into video memory it just renders.

modern 3d cards are amazing things.

simonjhall wrote:

Anyone ever tried the large model data set from Standford University?
http://graphics.stanford.edu/data/3Dscanrep/

I'm sorry, but I have PROPER real-world datasets, not just university-thesis bullcrap :)

Can you share? ;-)

But yeah how is scanning a bunny a thesis? I can just imagine it when you go for an interview:
"so what did you do for your thesis?"
"oh, I spent years developing a bunny scanner. now we can scan as many bunnys as we like in high-res, so that we can render them back out again. it was a reeeeaally hard degree."
"ok... next?"
_________________
Big thanks to everyone who donated for Quake2

gbadev.org forum archive

Graphics > Anisotropic filtering in a software renderer

#153945 - simonjhall - Tue Apr 08, 2008 5:23 pm

#153948 - kusma - Tue Apr 08, 2008 5:37 pm

#153962 - simonjhall - Tue Apr 08, 2008 6:38 pm

#153973 - kusma - Tue Apr 08, 2008 8:32 pm

#153982 - simonjhall - Tue Apr 08, 2008 11:12 pm

#153993 - kusma - Wed Apr 09, 2008 1:20 am

#153997 - silent_code - Wed Apr 09, 2008 2:59 am

#154005 - kusma - Wed Apr 09, 2008 9:27 am

#154006 - simonjhall - Wed Apr 09, 2008 9:44 am

#154010 - kusma - Wed Apr 09, 2008 11:22 am

#154017 - simonjhall - Wed Apr 09, 2008 1:57 pm

#154020 - silent_code - Wed Apr 09, 2008 3:23 pm

#154121 - simonjhall - Fri Apr 11, 2008 4:46 pm

#154126 - silent_code - Fri Apr 11, 2008 7:29 pm

#154130 - kusma - Fri Apr 11, 2008 8:13 pm

#155235 - silent_code - Sun Apr 27, 2008 12:28 am

#157284 - simonjhall - Wed May 21, 2008 6:59 pm

#157296 - silent_code - Wed May 21, 2008 9:11 pm

#157306 - simonjhall - Wed May 21, 2008 10:26 pm

#157688 - simonjhall - Tue May 27, 2008 9:12 pm

#157692 - silent_code - Tue May 27, 2008 9:21 pm

#157697 - DensitY - Tue May 27, 2008 9:51 pm

#157715 - kusma - Wed May 28, 2008 12:50 am

#157724 - simonjhall - Wed May 28, 2008 7:43 am