#108515 - sectionboy - Thu Nov 09, 2006 8:34 pm
HI, I need some help with gba_nds_lib and non-ascii long filename. Basicly, the lib works find with Chinese character names if I fread/fwrite it using its short name. But when I want to get the long name, the lib returns some strange chars. It works flawlessly with ascii filename, long or short. Does someone have any idea?
PS: Chinese chars are nothing but chars greater than 128 (thus non-ascii), and two chars to represent one Chinese character.
#108517 - tepples - Thu Nov 09, 2006 8:48 pm
Are the characters encoded in UTF-8 encoding?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#108541 - sectionboy - Thu Nov 09, 2006 11:36 pm
Yes, you are right. Thanks! But... seems it only return the lower byte of the 2-byte UTF8 code :(
#108551 - chishm - Fri Nov 10, 2006 12:49 am
gba_nds_fat is deliberately limited to 8 bits per character, despite VFAT's 16 bits per character. This is to remain compatible with the char-based strings used by the standard file functions (fopen, chdir, etc.).
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#108572 - sectionboy - Fri Nov 10, 2006 4:05 am
Thank you for clearing that for me. Will you add 2-byte charachter support in new libfat?
#108588 - chishm - Fri Nov 10, 2006 8:47 am
Most likely not, as I am even more restricted in the interfaces that I can provide.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#108624 - Dwedit - Fri Nov 10, 2006 4:35 pm
I've just rewritten many pieces of the library to try to support wide characters, but I haven't tested it, or even compiled it yet. It will probably spit out tons of compiler errors, and may have bugs. Want a copy?
Note that short file names are still ascii only, only long names get unicode. So don't call the functions that operate on short file names. (Why do those functions even exist anyway?)
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#108627 - sectionboy - Fri Nov 10, 2006 6:00 pm
Thank you all!
@chishm: What a pitty. Is there any practical reason you don't wanna do that?
@Dwedit: Thank you again for your kind sharing. But I don't have any knowledge about FAT format/ flash memory, I am just using your guys' wonderful works, and enjoy programming my puppy book reader.
Keep up the good works!
#108656 - chishm - Sat Nov 11, 2006 12:34 am
Dwedit:
Are you talking about functions exported from gba_nds_fat that operate only on short filenames, specifically FindFirstFile and FindNextFile? That's because they only needed a 13 character buffer to hold the output filename, which will uniquely identify the file. All functions in libfat operate on the long filename if it exists.
sectionboy:
I am no longer maintaining gba_nds_fat. libfat is restricted to the interfaces provided by libc in DevkitPro, which all use 8 bit character strings.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#108668 - tepples - Sat Nov 11, 2006 3:47 am
Is not a UTF-8 encoded string an "8-bit string"? Or which part of the C standard would this violate?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#108683 - chishm - Sat Nov 11, 2006 6:05 am
tepples wrote: |
Is not a UTF-8 encoded string an "8-bit string"? Or which part of the C standard would this violate? |
None, I suppose. I must admit, I am not overly knowledgable about locales in POSIX/C. If you can tell me how to convert the 16-bit Unicode characters in a long filename to a UTF-8 encoded string (and back again) using standard functions provided in libc or other parts of DevkitPro, I may add an option for UTF-8 encoded Unicode string support. This will, however, double the space required to internally store a path and triple the needed string buffer length needed for strings passed to/from external functions.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#108731 - tepples - Sat Nov 11, 2006 6:25 pm
chishm wrote: |
If you can tell me how to convert the 16-bit Unicode characters in a long filename to a UTF-8 encoded string (and back again) using standard functions provided in libc or other parts of DevkitPro |
It's not too hard to write your own UTF-8 encoding functions. However, you will need to watch out for "surrogates", or UTF-16 code words in 0xD800 through 0xD8FF, and convert those to single code points before converting them to UTF-8. Permissively licensed for reading and writing UTF-8 strings can be found in the Allegro library or in Unicode.org's examples.
Quote: |
I may add an option for UTF-8 encoded Unicode string support. This will, however, double the space required to internally store a path |
Isn't the space used to store the nickname in the DS firmware doubled as well?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#108770 - chishm - Sun Nov 12, 2006 12:19 am
tepples wrote: |
Quote: | I may add an option for UTF-8 encoded Unicode string support. This will, however, double the space required to internally store a path |
Isn't the space used to store the nickname in the DS firmware doubled as well? |
Yes, but a firmware nickname is 13 characters long, versus 256 characters for a long filename.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com
#108776 - Lick - Sun Nov 12, 2006 12:49 am
O gee.. 512 BYTES!
[/stupid comment] =P
_________________
http://licklick.wordpress.com
#108826 - chishm - Sun Nov 12, 2006 6:39 am
Lick wrote: |
O gee.. 512 BYTES!
[/stupid comment] =P |
Multiplied by however many directory entries are needed to parse the paths supplied to a function (up to 4 directory entries total, if you assume 2 paths to a function and 1 directory entry each for iterating the path and returning the found file). This is all placed on the 16KB of stack space given to the ARM9. So that's 13% of the stack used simply for storing filenames in a library function called by the user's code.
I've seen many people (including myself) have bugs appear due to stack overflows, and they aren't easy to trace (without a good debugger). I'm trying not to contribute to that problem.
I may end up having to malloc the space required instead.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com