gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > infinite_fgets()

#120815 - Dracker - Tue Mar 06, 2007 6:48 am

I've written a function called infinite_fgets() to make file input a lot easier.

It seems I've let a bug sneak in somewhere, because it seems that sometimes a program can hang in infinite_fgets()'s loop and never get out. It also appears that this causes a memory leak, as when left in this state, the DS screens eventually blank.

I can't find any bugs in here - can you? I'm fairly certain the bug is in infinite_fgets() but it is possible that it's elsewhere.

Code blocks seem to screw up indentation, not to mention lack of highlighting. A much prettier version is available at http://pastebin.ca/383227


Code:
#include <stdio.h>
#include <stdlib.h>
#include "infinite_fgets.h"

/*
 * Basically, this function works like fgets except it handles
 * an indefinite length input line.  Reads until it hits \n or EOF.
 * Any line of text returned ends with \0.
 *
 * None of the lines returned contain/end with a newline!
 *
 * Returns NULL on error (malloc returns NULL, etc) and 0 if
 * EOF is reached before reading any input.  If end-of-line
 * or some other non-EOF termination is reached before reading
 * any input, returns NULL.
 */
char* infinite_fgets(FILE * stream) {
   char *buffer;
   short int i;
   size_t size = sizeof(char);
   i = fgetc(stream);           /* read in one character */
   if ( i == EOF ) { /* EOF is -1, cannot store in char */
      return 0;
      /* This retval must be handled by whatever called this */
   } else if (   i == '\0'   || /* Null */
         i == NULL   || /* Null */
         i == '\255'   || /* Null */
         i == '\r'       || /* carriage return */
         i == '\n') return NULL;  /* Line Feed */

   buffer = malloc(size); /* size = sizeof(char) at this point */
   if (buffer == NULL) { /* malloc error */
      return NULL;
   }
   *buffer = (char)i; /* set first character */
   for(;;) {
      i = fgetc(stream);   /* read in one character */
      if (   i == EOF   ||
         i == '\0'   || /* Null */
         i == NULL   || /* Null */
         i == '\255'   || /* Null */
         i == '\r'       || /* carriage return */
         i == '\n') break;  /* Line Feed */

      buffer = realloc(buffer,size+sizeof(char));
      /* Buffer is now 1 char longer than its contents */
      if (buffer == NULL) { /* realloc error */
         return NULL;
      } /* END IF */
      size+=sizeof(char);
      buffer[size-1] = (char)i;
      /* Add the character to the buffer
       * This assumes sizeof(char) == 1
       * Fortunately, this is true for all
       * platforms I use this on */

   } /* END FOR */
   buffer = realloc(buffer,size+sizeof(char));
   /* Make room for null termination */
   if (buffer == NULL) { /* realloc error */
      return NULL;
   } /* END IF */
   buffer[size] = '\0'; /* Null-terminate the string */
   return buffer;
} /* END INFINITE_FGETS */


The .h file only includes a prototype.
_________________
--Drack
Onyx DS Lite, M3 Perfect Lite, Flashme V7, Flashed using SocketMe
Gamecube DOL-001 with Qoob Pro
Wii, not modded.

#120830 - keldon - Tue Mar 06, 2007 11:08 am

I would say to restart the code to read with a maximum value first. Once you have done that then modify it to then allocate more memory using the maximum length each time you have ran out of memory. Then create a final buffer (the one you will return) using the length that you have identified and deallocate chunks used to compute all of this.

Also try to rely less on having every line commented to understand the code; if correctly factored the code requires much less commenting.

#120836 - KayH - Tue Mar 06, 2007 12:23 pm

"realloc" has the potential for a lot of errors!

1) if realloc fails it returns NULL but the old buffer is still there! This can cause a mem leak if you not free them.
2) if the size given is zero, than the mem is freed and the old pointer is still valid (this can also cause problems)

Please read the docu for realloc carefull! You may also rethink your design.
I would suggest to malloc space in portions of e.g. 80 chars (depending on your environment). If your app try to exceed this you malloc new 80 chars. With additional method you can combine both or more strings later ...
(as keldon alread suggest)

HTH
KaY

#120840 - tepples - Tue Mar 06, 2007 1:33 pm

The C standard states that sizeof(char) is always 1 by definition, even on word-addressed DSPs and the like where char == short == int == a 32-bit type. (Only Java makes char equivalent to unsigned short.)

And yes, it's best to realloc() in chunks larger than 1 byte. You could have it start at size = 64 bytes, and then do bufferSize += bufferSize / 2; every time size threatens to exceed bufferSize.

And yes, it's best to hold the result of realloc() in a temporary variable and then copy it over buffer only once you know it has succeeded.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#120850 - Dracker - Tue Mar 06, 2007 7:18 pm

Ok, how about this as a revision, to address those flaws?

pastebin: http://paste.biz/paste-852.html

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "infinite_fgets.h"

/*
 * Basically, this function works like fgets except it handles
 * an indefinite length input line.  Reads until it hits \n or EOF.
 * Any line of text returned ends with \0.
 *
 * None of the lines returned contain/end with a newline!
 *
 * Returns NULL on error (malloc returns NULL, etc) or if
 * no input is read before end of line/file
 */

char * infinite_fgets(FILE * stream) {
   char *buffer;
   size_t strsize = 1;
   short int i;
   size_t bufsize = 10;
   i = fgetc(stream);
   if (   i == EOF        || /* no meaningful data before end of string */
      i == '\0'   ||
      i == '\255'   ||
      i == '\r'       ||
      i == '\n') return NULL;
   
   buffer = malloc(bufsize);
   if (buffer == NULL) { /* malloc error */
      return NULL;
   }
   buffer[strsize-1] = (char)i;
   buffer[strsize] = '\0';
   strsize++;
   for(;;) {
      i = fgetc(stream);
      if (   i == EOF   || /* end of string */
         i == '\0'   ||
         i == '\255'   ||
         i == '\r'       ||
         i == '\n') {
         break;
      }
      
      if (strsize == bufsize - 1) { /* buffer is exactly full */
         char * tempbuffer = malloc(bufsize * 2);
         if (tempbuffer == NULL) {
            free(buffer);
            return NULL;
         }
         strncpy(tempbuffer,buffer,bufsize);
         free(buffer);
         buffer = tempbuffer;
         bufsize*=2;
      }
      buffer[strsize-1] = (char)i;
      buffer[strsize] = '\0';
      strsize++;
   } /* END FOR */

   char * retval = malloc(strlen(buffer)+1);
   /* strlen does not include space for the '\0' */
   if (retval == NULL) {
      free(buffer);
      return NULL;
   }
   strncpy(retval,buffer,strlen(buffer)+1);
   /* strncpy will copy the '\0' */
   free(buffer);
   return retval;
}

_________________
--Drack
Onyx DS Lite, M3 Perfect Lite, Flashme V7, Flashed using SocketMe
Gamecube DOL-001 with Qoob Pro
Wii, not modded.


Last edited by Dracker on Tue Mar 06, 2007 7:49 pm; edited 1 time in total

#120851 - Dracker - Tue Mar 06, 2007 7:35 pm

Oh, and I found the original bug I was looking for.

It seems "return 0" and "return NULL" are equivalent when return type is char * - so I could not distinguish between EOF and other things that would cause a NULL return.

The infinite loop was not the one in infinite_fgets() but the one reading the file. Without knowing that it had hit EOF it kept on reading the EOF character and malloc'ing 1 byte over and over, hence the mem leak.

I should have just used feof(stream) instead of rolling my own retval.
_________________
--Drack
Onyx DS Lite, M3 Perfect Lite, Flashme V7, Flashed using SocketMe
Gamecube DOL-001 with Qoob Pro
Wii, not modded.

#120852 - sajiimori - Tue Mar 06, 2007 7:40 pm

Um... ouch.

Factor out the pseudo-implementation of std::vector into its own module, then write infinite_fgets in terms of the container.

Or just use cin >> std::string, which already works.

#120854 - KayH - Tue Mar 06, 2007 7:55 pm

cin would require cpp, it seems it is only c yet

I have to agree sajiimori cin and vector would be better, but if you want to use your own method, than it seems ok so far (at least for me :-) ).
An implementation of a vector is probably also nice for further reusage, good point sajiimori!

you do not need strlen(), as you already have the length: strsize

#120856 - tepples - Tue Mar 06, 2007 8:21 pm

sajiimori wrote:
Or just use cin >> std::string, which already works.

But doesn't using iostream add 200 KB even to Hello World?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#120858 - sajiimori - Tue Mar 06, 2007 9:07 pm

Maybe in a sucky environment or with bad settings.

#120861 - Dracker - Tue Mar 06, 2007 9:21 pm

Nice catch on the strlen() - fixed in my sources now.

As for CPP stuff, I am keeping infinite_fgets() entirely in C.

Thanks for the help everyone.
_________________
--Drack
Onyx DS Lite, M3 Perfect Lite, Flashme V7, Flashed using SocketMe
Gamecube DOL-001 with Qoob Pro
Wii, not modded.

#120893 - tepples - Wed Mar 07, 2007 2:45 am

sajiimori wrote:
Maybe in a sucky environment or with bad settings.

Then both MinGW and devkitARM are sucky environments with bad default settings.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#120965 - sajiimori - Wed Mar 07, 2007 7:49 pm

Well yeah, MinGW does make suckily large binaries, doesn't it? Maybe there's a setting to make it behave well, but I haven't found it.

As for devkitARM, the default setting is -O0 with exceptions and RTTI, no? I don't blame it for using generous defaults, but if you don't specify the settings you want, then you can't really complain.

Then again, I haven't really used devkitARM, so maybe it has the same problems as MinGW.

#121009 - wintermute - Thu Mar 08, 2007 4:09 am

sajiimori wrote:
Well yeah, MinGW does make suckily large binaries, doesn't it? Maybe there's a setting to make it behave well, but I haven't found it.


MinGW makes pretty large binaries for C++, C binaries are comparable to msvc in most cases.

iostream is pretty heavy everywhere afaik, you just don't notice because mostly you use it on a machine with an operating system and shared libraries.

MinGW can't make use of shared libraries for C++ code since the ABI and name mangling aren't compatible between different compilers.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog