#156011 - Lazy1 - Tue May 06, 2008 8:40 am
So I'm looking at libmpeg2 and it's mpeg2dec example...
I want to rip my eyes out...
To me example code is no good unless I end up understanding what and why it's doing, copy-paste is not an option :/
How do all of you handle such code?
Maybe I should look at ffmpeg/libavcodec instead, but I'm not sure if that's suitable for the DS.
#156015 - sgeos - Tue May 06, 2008 8:49 am
I'd rather have a bad example to work from than no example at all. I guess I usually end up heading to google. Answers tend to turn up if I'm really interested in solving the problem.
-Brendan
#156017 - simonjhall - Tue May 06, 2008 9:13 am
Ah I'm pissing around with that library at the moment and I've got it rigged up into our framework at work. So if you have any questions I can hopefully help you...however I'm not too comfortable with the 'api' so don't ask too much!
Btw have you tried debugging it? It's written in macros and gcc doesn't generate debugging information for functions which are entirely made of macros...
_________________
Big thanks to everyone who donated for Quake2
#156019 - sgeos - Tue May 06, 2008 9:24 am
simonjhall wrote: |
Btw have you tried debugging it? It's written in macros and gcc doesn't generate debugging information for functions which are entirely made of macros... |
Macro languages have their place, but pasting a macro language layer on top of something like C is definitely wrong...
Good luck? It seems like documentation must live somewhere...
-Brendan
#156021 - simonjhall - Tue May 06, 2008 9:35 am
It looks like they really want to be using templates but instead do the whole thing with macros. Makes profiling a bit of a bitch too, as since there's no debug line information I can't figure out which lines of code are the most expensive! Instead I get useless info like "24000 clocks spent on this line". Nice.
_________________
Big thanks to everyone who donated for Quake2
#156023 - Lazy1 - Tue May 06, 2008 11:59 am
simonjhall wrote: |
Ah I'm pissing around with that library at the moment and I've got it rigged up into our framework at work. So if you have any questions I can hopefully help you...however I'm not too comfortable with the 'api' so don't ask too much!
|
Thanks, I'll keep that in mind while trying to dissect and understand the example.
I did however give ffmpeg/libavcodec a go and it does in fact compile and run on the DS, unfortunately the output is garbage and I get a bunch of decode errors that aren't there on the PC.
Libavcodec does seem a bit bloated for a streaming app though so I'll just keep hacking around with libmpeg2, maybe some day I'll dump a single frame of video!
#156025 - simonjhall - Tue May 06, 2008 12:04 pm
libmpeg2 works ok on the DS, it's just a bit too slow to be usable...
_________________
Big thanks to everyone who donated for Quake2
#156026 - Lazy1 - Tue May 06, 2008 12:05 pm
Unless I'm mistaken moonshell uses libmpeg2 and that seems fast enough, besides - I'm only looking for 12-18 fps.
Maybe there is a faster mpeg decoding library?
EDIT:
DURRRR!
I missed the samples entirely...
Again, insomnia and coding do not mix.
#156033 - Lazy1 - Tue May 06, 2008 2:10 pm
Well, it actually works!
With video displayed: 9-10 FPS
Just decoding: 18-23 FPS
I'm guessing it's the colour conversion functions slowing it down, I'll take a look at that later for optimizations.
Still, even at 9FPS that's faster than any remote view application so far. (though I'm playing from disk)
If I can squeeze 15 FPS out of this then maybe I'll get around to a screen capture/scale/encode app for remote play goodness :D
#156035 - silent_code - Tue May 06, 2008 2:14 pm
-> awesome! <-
#156040 - Lazy1 - Tue May 06, 2008 4:11 pm
It seems the libmpeg2 version in svn has ARM optimizations, unfortunately it seems broken since it doesn't decode anything and mpeg2_parse returns STATE_INVALID.
Really annoying... so close but so far.
#156057 - strager - Tue May 06, 2008 8:55 pm
Lazy1 wrote: |
With video displayed: 9-10 FPS
Just decoding: 18-23 FPS
I'm guessing it's the colour conversion functions slowing it down, I'll take a look at that later for optimizations.
[...]
If I can squeeze 15 FPS out of this then maybe I'll get around to a screen capture/scale/encode app for remote play goodness :D |
Have you considered parallel processing, assigning the colour conversion (YUV => RGB*, right?) to the ARM7? The ARM7 can write to VRAM banks C and D (if I recall), and you can swap them to the GPU to render.
* Are you converting YUV (the native format) to RGB888 to RGB555? If so you can cut out the middle conversion and save lots of cycles by changing the algorithm a little bit.
Perform some profiling and see if you can move some sensitive code parts and data structures to ITCM and DTCM, respectively. This really helps, especially with access to very commonly accessed data and code.
Lazy1, if you can extract the ARM code and the C code it replaces, an able person may be able to see the problem here and patch it. Of course, you could always ask on the libmpeg2 forums (if such a place exists) or look at earlier revisions of the source code.
#156107 - simonjhall - Tue May 06, 2008 11:34 pm
From the profiling that I've done I spend the majority of my CPU time doing the YUV->RGB32 conversion, so I reckon you may do to. If you're feeling lazy1 (heh), try just drawing the Y (which is at full resolution) and skipping the colour conversion altogether. Also, make sure that you've got dithering turned off as that's gonna cost you too...
I'm not 100% sure though that you can compare my timings to yours though as my hardware has floating-power goodness :-)
Man, I spent all day watching Star Trek IV rolling over and over just trying to make it crash :-)
_________________
Big thanks to everyone who donated for Quake2
#156166 - Lazy1 - Wed May 07, 2008 11:29 am
I'm still not too happy about the decode speed even without the conversion/display.
I'll have to look around for some more optimizations, and it seems I have a lot of reading to do since I have never worked with anything but RGB. :D
#156169 - simonjhall - Wed May 07, 2008 11:59 am
Here's a post that I did a while ago when I first had a bash with this mpeg library - it's got timing info in it pulled straight from a DS. I'm doing YUV->RGB16 in it btw. Seems the majority of the time is spent doing the idct bit (inverse discrete cosine transform).
http://forum.gbadev.org/viewtopic.php?p=132123#132123
_________________
Big thanks to everyone who donated for Quake2
#156263 - Lazy1 - Thu May 08, 2008 5:09 am
Interesting...
I also found that the ARM optimized motion compensation did not help noticeably, unless somehow it's not activated.
Do you know of any optimized idct functions for ARM?
Maybe this is a good time to start learning ARM assembly. (for the 9th time)
What really sucks though is I'll have to do this under windows, I really prefer linux for dev but really linux just isn't for gaming.
Another thought is a "turbo" mode where I just upscale 128x96 video and jitter it using hardware.
That'll look like shit but maybe allow higher framerates/bitrates for games where fine details don't matter.
EDIT:
Mod - rename thread?
#156292 - Miked0801 - Thu May 08, 2008 4:53 pm
Isn't IDCT just a table lookup? Or am I forgetting something?
#156343 - tepples - Thu May 08, 2008 10:52 pm
IDCT == inverse discrete cosine transform
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#156359 - Miked0801 - Thu May 08, 2008 11:30 pm
I know what it is, but I am pretty sure that all the methods I've seen for using it (JPEG decompressors) used a table lookup for the IDCT step.
#156363 - strager - Thu May 08, 2008 11:56 pm
Lazy1 wrote: |
Do you know of any optimized idct functions for ARM?
Maybe this is a good time to start learning ARM assembly. (for the 9th time) |
If you provide the source code, I may be able to rewrite some of the IDCT functions (in particular, mpeg2_idct_add_c) in ARM ASM for speed optimizations. (I could just blindly "test" from libmpeg2 directly, but it'd be nice to test on the actual target.) Does the code work in an emulator? If so that'd be really awesome. =] Hopefully I'll be able to somewhat accurately profile the code, so I don't produce slower code than what GCC generates. At least a "time it took to decode" would be nice (and perhaps all that would be needed if simply changing the code).
What optimization level (-O flag) are you using, by the way? You might be able to use -O3 to speed things up even further (or cause the code to crash).
#156367 - Lazy1 - Fri May 09, 2008 12:09 am
I'll have to look at my code closely again since for some reason the videos are green on hardware.
They work fine in an emulator though.
#156413 - Lazy1 - Fri May 09, 2008 2:49 pm
You know what might be even better?
Youtube.
It's definitely possible if you have a server running to transcode the video into say MPEG-4 then stream it to the DS.
We know the DS can play XVID at 10 frames per second at 256x192 and that's good enough for streaming video.
The only thing that would be nice is maybe a boost in WiFi speed, I get about 40 KB/s < 5ft from my AP.