#73513 - sgeos - Sun Feb 26, 2006 5:01 am
Can anyone recommend any books on writing a compiler? I'm specifically looking for books that cover OO concepts related to compilers.
I want to write a compiler for a simple OO language that includes coroutine support.
-Brendan
#73520 - sajiimori - Sun Feb 26, 2006 7:56 am
OO features are actually pretty easy to implement after you build a complete non-OO language. It's generally a matter of translating the OO constructs into more fundamental ones. For instance, in C++, vtables are just arrays of function pointers and methods are just functions that take an invisible first argument.
Coroutines are even easier. Allocate a stack for each one, and swap out sets of registers when changing contexts.
The dragon book is a good starting point.
#73539 - keldon - Sun Feb 26, 2006 2:11 pm
If you are at uni, your library would have an entire section on compiler writing. I know that mine does.
There is also a programming problem which translates into a compiler which automatically creates a compiler from code; which I found interesting. But just check your library if you're at uni. Your local library may even have one - or a local university may also allow open access to the public.
#73592 - sajiimori - Sun Feb 26, 2006 9:18 pm
Quote: |
There is also a programming problem which translates into a compiler which automatically creates a compiler from code |
Uh. What?
#73600 - keldon - Sun Feb 26, 2006 9:33 pm
sajiimori wrote: |
Quote: | There is also a programming problem which translates into a compiler which automatically creates a compiler from code | Uh. What? |
I'm trying to link to their slides. They mentioned it at the NWERC programming competition, but their server is sloppy at the moment.
http://www.nada.kth.se/contest/nwerc/2005/ <-- when it does decide to work; there was a presentation they did on a saturday where they spoke about their amazing database optimisations that makes theirs something like 2000 times as fast as postgresql. And they also mentioned the 'compiler for free', which I think is in their slides.
However it will not teach you how to write your own compiler anyway; I just thought that it was interesting to know.
#73644 - sgeos - Mon Feb 27, 2006 4:42 am
sajiimori wrote: |
The dragon book is a good starting point. |
From what I've gathered, the dragon book is standard, but it's dated. The only advantage is has over other books is that it covers parsing in depth. Other books don't cover parsing much anymore. How much do I want to learn parsing? If I want to make an interperted scripting language, would a strong knowledge of parsing be especially handy?
-Brendan
#73652 - sajiimori - Mon Feb 27, 2006 5:52 am
Well Brendan, if you're writing your first compiler, chances are you don't need a book on cutting-edge optimization techniques for hyperthreaded processors or whatever else. :)
About parsing, it's important to understand how it works. You should write at least one by hand.
For embedded interpreted languages, you'll want to compile the source to bytecodes, so the parsing will happen as a preprocessing step.
#73655 - sgeos - Mon Feb 27, 2006 6:05 am
First compiler. The dragon book may be good to start with, although this is the current book I'm thinking about buying:
Programming Language Pragmatics, Second Edition (Paperback)
by Michael L. Scott
Paperback: 912 pages
Publisher: Morgan Kaufmann; 2 edition (November 7, 2005)
Language: English
ISBN: 0126339511
These are a couple of the other top hits on Amazon:
Engineering a Compiler (Hardcover)
by Keith Cooper, Linda Torczon
Hardcover: 801 pages
Publisher: Morgan Kaufmann (September 2003)
Language: English
ISBN: 155860698X
Advanced Compiler Design and Implementation (Hardcover)
by Steven Muchnick
Hardcover: 856 pages
Publisher: Morgan Kaufmann; 1st edition (August 1, 1997)
Language: English
ISBN: 1558603204
Is the dragon book my best bet, or do I want to get one of the above books (or something else) instead? (I'd rather not drop $300 on compiler books =)
-Brendan
#73665 - sajiimori - Mon Feb 27, 2006 7:52 am
Honestly, since you're not doing anything especially advanced, it doesn't matter much which book you buy. Just get one that covers lexical analyzing, parsing, symbol tables, and emitting.
#73671 - sgeos - Mon Feb 27, 2006 9:48 am
Sweet. In that case I'll probably buy by price. =)
BTW, how is polymorphism handled?
-Brendan
#73677 - keldon - Mon Feb 27, 2006 10:36 am
sgeos wrote: |
Sweet. In that case I'll probably buy by price. =)
BTW, how is polymorphism handled?
-Brendan |
I thought (guessed)it involved name mangling.
Check this link
Also you might want to check out this link: http://www.cs.jhu.edu/~scott/cw/lectures/ . And maybe check google scholar too. The best thing about using the universities is that they recommend books and have a defined course for you to learn and their courseworks provide a means to test your ability.
#73685 - sgeos - Mon Feb 27, 2006 12:38 pm
Thanks for the links!
keldon wrote: |
The best thing about using the universities is that they recommend books and have a defined course for you to learn and their courseworks provide a means to test your ability. |
A university course would be fantastic, but I'm in Japan right now. Completing a language and a compiler strikes me as a good way to test one's ability. (Rewritting the compiler in that language seems even better. =) How long did it take the compiler writers out there to finish your first compiler from the time you picked up a book?
Note to self and others:-Brendan
#73714 - sajiimori - Mon Feb 27, 2006 7:47 pm
Name mangling is for reusing function names. For runtime polymorphism, it depends on what you need. C++ uses vtables.
#73767 - sgeos - Tue Feb 28, 2006 1:48 am
Thanks. A quick search on vtables turned up:
http://cpptips.hyperformix.com/cpptips/vtables
Which gives me more or less an idea of how to handle multiple inheritance.
-Brendan
#73823 - gladius - Tue Feb 28, 2006 6:49 am
If you want to build a self-hosting compiler that goes down to assembler, it will take a good while on your first try :). I wrote a simple pascal compiler a ways back and it took me about 4 months of part-time hacking to get it self-hosting.
If you compile to C or an existing bytecode (or a nice one you make), then it would be quite a bit simpler. Plus that way you don't have to worry about doing assembly optimization.
I can also recommend the Dragon book, it covers the fundamentals quite well.
#73845 - sgeos - Tue Feb 28, 2006 9:17 am
I'll probably compile to C. How long do you suspect that would take?
-Brendan
#73892 - sajiimori - Tue Feb 28, 2006 8:00 pm
I found it easy to emit native code on my first try -- but it was really slow native code! :)
How hard it is to emit C largely depends on how direct the translation is, which depends on the design of your language.
#73940 - sgeos - Wed Mar 01, 2006 2:08 am
Slow is fine to start. I'm looking to create a language that is easier for me to program in- even if it runs a little slower.
-Brendan
#73943 - sajiimori - Wed Mar 01, 2006 2:21 am
Have you thought much about what your language will be like? I've always been interested in programming language design.
#73994 - sgeos - Wed Mar 01, 2006 1:28 pm
No function prototypes. Having to modify a declaration in two places sucks.
Cookie based coroutines. Do you prefer the prefix or postfix version?
Code: |
// prefix
cookie mycookie@routine;
mycookie@routine(x,y,z); // pass parameters on the first call
mycookie@routine(); // no parameters on subsequent calls
@routine(x,y,z); // compiler creates a default cookie for you
@routine(); // compiler passes default cookie
// postfix
cookie routine@mycookie;
routine(x,y,z)@mycookie; // pass parameters on the first call
routine()@mycookie; // no parameters on subsequent calls
routine(x,y,z)@; // compiler creates a default cookie for you
routine()@; // compiler passes default cookie
routine(x,y,z); // implicit default
routine(); // implicit default |
The ability to use only the interface of another class. (Simple implementaion being all methods are virtual)
Code: |
class data_heavy
{
private:
type bigarray[];
public:
type getCell(type pX, type pY);
};
class algorithmic_data mimics data_heavy
{
public:
type getCell(type pX, type pY); // I don't store anything
}; |
Special keywords for decorator classes.
Code: |
class horrible_stench decorates critter
{
} |
Undeclared methods are implicitly created as:
Code: |
type method(params)
{
return host.method(params);
} |
I hate having to declare 300 "return host.method(params);" functions. host is a keyword. =) Naturally, not calling the host function is perfectly legal. (My zombie_critter decorator might always return 0 for max HP).
I've got more on coroutine memory sharing, and grouping but I'll type that up later.
-Brendan
#74033 - sajiimori - Wed Mar 01, 2006 9:42 pm
Are you sure you want a special syntax for coroutines? Why not simply:
Code: |
Coroutine myCoroutine(startingFunction);
myCoroutine(x, y, z); // Pass parameters on the first call.
myCoroutine(); // No parameters on subsequent calls.
|
If you make your coroutine objects look like regular function objects from the outside, you can change the implementation without affecting client code, e.g. if the cost of allocating a stack becomes too great and you have to switch to finite state machines or something else. Also, generic code can then operate on both coroutines and function objects.
I'm not sure I follow the "default cookie" idea, but if it's a piece of global state, that's scary and seemingly unnecessary. It sounds like having a "default int" that you operate on if you don't give your int a name.
The idea about only using the interface of another class seems like a way to avoid defining abstract classes. I'd recommend sticking with abstract classes.
The decorator idea sounds good -- I also hate defining tons of forwarding methods. Additionally, a "lift" keyword would be nice to expose particular methods of private data members, while still hiding the majority of its interface:
Code: |
class Inner
{
public:
void methodA();
void methodB();
void methodC();
};
class Outer
{
public:
lift i.methodB;
private:
Inner i;
};
void test()
{
Outer o;
o.methodB(); // calls o.i.methodB()
}
|
Edit: Check out the "Nice" language for lots of ideas. Maybe it's even what you're looking for!
Edit again: "Lift" should also let you specify a new name for the method, so something like Anim::numFrames() can be converted to the more specific numAnimFrames() in the outer class.
Last edited by sajiimori on Thu Mar 02, 2006 2:38 am; edited 1 time in total
#74063 - sgeos - Thu Mar 02, 2006 1:53 am
sajiimori wrote: |
I'm not sure I follow the "default cookie" idea, but if it's a piece of global state, that's scary and seemingly unnecessary. |
It's local. Basically the compiler inserts a local cookie for you and uses that.
I like the cookie idea because it allows multiple instance of a coroutine within the same stack level. Treating coroutines as objects could work:
Code: |
coroutine CoroutineType
{
// Shared data
type x;
type y;
externalReturn myCoroutine(params)
{
// constructor
}
internalReturn routineA()
{
}
internalReturn routineB()
{
}
}
CoroutineType myCoroutine(params); // create
myCoroutine(); // first call, default function
myCoroutine.routineA(); // first call, explicit function
myCoroutine(); // subsequent calls |
Your syntax is probably better.
-Brendan
#74066 - sajiimori - Thu Mar 02, 2006 2:59 am
Ah, then it's like having a default local int for when you don't want to give your int a name. :) It still sounds odd to me, but I think that's a side-effect of the differences in the ways we think of coroutines.
It sounds like you consider a "coroutine" the combination of some code (a function) and also the state of execution of that code (a cookie object), whereas I consider a "coroutine" just the state of execution, with no strict association with any particular code.
If the code and state are inextricably tied together, you could refer to a coroutine by the function name rather than the object name, so anonymous coroutine objects make sense.
However, it's important that a coroutine object is not tied to any particular function or set of functions. They have to start execution somewhere, but after that, they should be free to go anywhere and yield from anywhere. Otherwise you couldn't write utilities that are reusable across coroutines.
For instance, if I want a function called walkTo() that doesn't return until the character gets to the destination, that function should be usable from any coroutine. I don't want to write a whole class just for walkTo(), either -- I should be able to write it as a normal function that happens to contain a call to yield().
Code: |
bool walkTo(Creature* me, Vector3 dest)
{
while(!me->collide(dest))
{
me->walkToward(dest);
yield();
if(me->walkFailed())
return false;
}
return true;
}
|