This guide/tutorial is about Object-oriented and prototype-based programming. This guide includes general knowledge about object-oriented paradigm and tries to clear out some misunderstandings or misconceptions programmers may have about OOP.
As the name suggest, object-oriented developpement use objects to create programs. You may ask yourself what is an object? It may seems like a stupid question but it is not. I'm sure many programmers (even experienced one) don't have a clear definition in their head.
Well, I will give you a simple one : objects are the runtime entities.
- They are the living things when your program runs (a.k.a instances)
- They have an internal state (instance variables)
- They can respond to certain message (their behavior)
- Their state can be mutated via message sending.
You might ask : what is message sending?
Well it is simple : method invokation.
var x = new Foo();
x.bar(); # We send the message `bar` to the object `x`.
An important notion to grasp is to differentiate function call (Ă la Lisp) and message sending.
In non-object oriented language like C, we have function call, like so:
#include <stdio.h>
void bar(void) { printf("calling `bar`\n"); }
int main() {
bar(); // function call.
return 0;
}
In non-OOP languages, function definition provides two things :
- A signature;
- an implementation.
In the C program above, when we call bar
, we know the behavior beforehand.
However, in OOP languages, methods split the signature from the implementation, ie given a signature we can't know for sure which implementation (the body of the method) will be executed.
This is possible since object-oriented provides subtype polymorphism which grant us the ability to delayed the implementation resolution at runtime.
Here's an example of message sending :
class Main {
static class A {
void foo() { System.out.println("A::foo"); }
}
static class B extends A {
@Override
void foo() { System.out.println("B::foo"); }
}
public static void main(String[] args) {
A b1 = new B(); // b1 isa `B` at runtime but statically typed as an `A`.
b1.foo(); // Will print `B::foo`.
}
}
Here, given the method foo
introduced by A
, when we send
the message foo
to an instance b1
typed by A
, B::foo
will be
executed. In other words, even though we know the signature of foo
we can't
be sure which code will be executed. The signature foo
isn't bound to a particular
implementation. How Java knows which function to call? Well, this is done at runtime
(late-binding) through an indirection table called the virtual function table (vft).
late-binding : the binding or the "function retrieval" is done late, a.k.a at runtime, a.k.a during the program execution.
NOTE: virtual function table are irrelevant to understand OOP concepts.
In any bad school in the universe, we will give you the following half-ass definition:
Polymorphism : "poly" is for "many" and morphism for "forms", thus "many-forms".
However, the definition is pretty useless considering there's many kind of polymorphism. But at school and in OOP, when we talk about "polymorphism" we mainly refer to subtype polymorphism.
Subtype polymorphism is the late-binding I previously talked about, ie method overriding. Furthermore, this kind of polymorphism is done at runtime (a.k.a which implementation to invoke?).
If you limit the definition of subtype polymorphism to methods, it becomes simpler to
manage (IMO). Otherwise, we would need to talk about classes : given a class A
and its
set of direct and undirect children Cs
, the type of A
might refer to any child (direct
or indirect) of A
. However, I don't like to talk about classes since at the end of the
day, the unpredictable behavior of method invokation is done by overriding.
Here's a non-exhaustive list of other kinds of polymorphism :
- parametric polymorphism ~Â a.k.a generics
- ad hoc polymorphism ~ a.k.a overloading (the most missed used and badly understood among the community).
An interesting fact about parametric and ad hoc polymorphism is they are both static polymorphism which mean they are resolved at compile time.
IMPORTANT : subtyping doesn't equal inheritance.
IMPORTANT2: in most mainstream language subtyping is equivalent (nominal typing).
This is not true for all language, eg OCaml use structural typing.
Inheritance is only the mechanism by which we reuse code. Many non-OOP languages have that feature, like Haskell (through Typeclasses) or Rust (through Traits). Even some old language use implicit inheritence where two classes who lookalike are implicitly related to each other.
This distinction is blurred since in OOP, in order to achieve a subtyping relationship between two classes, we must use inheritance. ie the inheritence of methods and attributes is the consequence of subtyping.
To conclude this chapter, here's the main idea to keep in mind :
Object-oriented paradigm is all about objects and message sending.
- Objects : living entities when the program runs, a.k.a your instances.
- Message sending : method invokation where the signature isn't bound to a particular implementation.
Here I will present the metamodel described in this article : DucourPrivat This part is hard to understand and to explain clearly. I recommand to read the original article and this guide a couple of times to fully absorb these new concepts.
class A
fun foo: Int do return 1
end
class B
super A
redef fun foo do return 2
end
class C
super A
redef fun foo do return 3
end
This code snipet is written and NIT, it is fairly easy to understand.
The thing I want you to focus on is the method foo
. We see in class B
that
foo
is redefined over its ancestor A::foo
. Same thing for class C::foo
.
In an informal way, we can say that B::foo
, C::foo
and A::foo
are related to each
other in some way. In addition, we can say that A::foo
act as the "parent" of
B::foo
and C::foo
, ie they are equivalentish.
Well, that's because the class A
introduces a global property called foo
where
B::foo
and C::foo
are linked to. It is the first occurence of this property in the
hierarchy, thus the global one.
Here are some useful definitions :
- Property -- anything declared or defined inside a class, this includes attributes, methods definitions and redefinitions (overriding), etc.
- Global property -- a property introduced by a class for the first time.
- Local property -- any property defined in a class. They are linked to the class where they are defined and they belong to a global property. Every global property are local to the class they are defined in.
Given the definition of local property, we can say that B::foo
and C::foo
are local property
of B
and C
respectively. Finally, these local properties belong to A::foo
.
Here's a recap:
class A
# Global + Local property of `A`.
fun foo: Int do return 1
end
class B
super A
# Local property of `B`, belongs to `A::foo`
redef fun foo do return 2
end
class C
super A
# Local property of `C`, belongs to `A::foo`
redef fun foo do return 3
end
If we were to create a class model (a.k.a meta-model) of all of this :
class Class
var specilizes: Array[Class] # Its parents
var introduces: Array[GlobalProperty]
var defines: Array[LocalProperty]
end
class GlobalProperty
end
class LocalProperty
var belongs_to: GlobalProperty
var redefines: nullable LocalProperty
end
Here, LocalProperty::redefines
is a nullable field since the first introduction
of a property do not redefine any other local properties.
Overloading isn't bad per see, but most people doesn't know about the meta model I described earlier and about variance.
The variance in OOP-language describe the behavior of subtyping between types. We can find three types of variance :
- Invariance : no variance at all
- Covariance : subtyping follows the same direction of inheritance
- Contravariance : subtyping goes in the opposite direction of inheritance.
Variance is usefull to understand generic subtyping rules and method redefinition rules. NOTE: We will not talk the variance on generic.
class Food
end
class Grass
super Food
end
class Animal
fun eat(food: Food) do ...
end
class Cow
super Animal
super Food
fun eat(food: Grass) do ...
end
var c1 = new Cow
c1.eat(new Grass) # OK!
# c1.eat(new Cow) # Compile Error : eat needs Grass not Cows
((Animal)c1).eat(new Cow) # COMPILE OK but will fail at runtime : TypeCast error.
If our language supports covariant method redefinition, then the global property
Animal::eat
can be redefine with subtype of Food
.
Covariant method redefinition is useful the modelize real-world scenarios like the animals on a farm.
Covariant breaks the subtyping safety since we can upcast any class to one of its parent until we hit the global property with losser argument requirements.
This is caused by the fact Cow::eat
is linked to the global Animal::eat
which accepts
any kind of Food
instance.
class C
end
class D
super C
fun didi do ...
end
class A
fun foo(d: D) do ...
# returns an instance of D
fun bar: D do ...
end
class B
super A
redef fun foo(d: C)
do
# Here we can't call `super` since
# it would break the signature if `B`'s
# superclass.
..
end
# Returns an instance of C
redef fun bar: C do ...
end
var b1 = new B
var a1 = new A
a1.foo(new D) # OK
b1.foo(new D) # OK since D is subtype of C
b1.foo(new C) # OK. So far so good..., but not for so long
var d1 = ((A)b1).bar, # Statically correct
d1.didi # ERROR ABORT : `d1` isn't an instance of `D`.
Here contravariance is safe over input type argument, however, as return type they are not safe.
We've seen that covariance and contravariance are not safe. They may be useful, but most programming languages choose to keep invariant method redefinition.
So what is the problem with overloading?
Well, since most programming language are invariant over their redefinition AND they support overloading, it means this code works but in unexpected way :
class Food
end
class Grass
super Food
end
class Animal
fun eat(food: Food)
do
print "eating {food}..."
end
end
class Cow
super Animal
super Food
fun eat(food: Grass)
do
print "eating grass happily, because I'm a cow"
end
end
var c1 = new Cow
c1.eat(new Grass) # OK, as expected
c1.eat(new Cow) # OK, What???
Since overloading is supported, eat(Grass)
and eat(Food)
does not belong to the
same global property. They are two distinct global property with the same name.
Morever, the dispatch is done statically at compile time, ie the eat is disambiguated at compile time, a.k.a this is not subtype polymorphism.
NOTE: let me be clear on something... This is not unexpected behavior, however, it is unintuitive for most students and programmers. Most people forget about those rules and relearn it 30 times.
Overloading is mainly shit and not useful. What you really want from a class based language is covariant redefinition/multimethod like CLOS has. Some languages offer virtual type which emulate covariant redefinition.
Among the mainstream programming language, we find two types of OOP: class-based and prototype-based.
Class based languages uses classes to describe the content and behavior of your objects. In most statically typed languages, classes do not live outside the compilation process.
Well, this is a lie. To be more precise, classes may live during the runtime, however, they are mostly used to store the virtual function table and other implementation detail information.
What I'm trying to say is the concept of a "class" only exist during your development and compilation. At the end of the day, only objects live at runtime.
This is kinda another lie since we could argue that some language, like Ruby, unify class with objects (for dynamic metaprogramming). But let things be simple for a while. Metaprogramming in Ruby will be addressed in another guide.
So! Class are static beings. Onced you compile your program they disappear. They are blueprints for your objets. This is the case for many languages like C++, NIT, Eiffel and OCaml (maybe)
Remember at the beginning when I talked about OOP? Well I wrote OOP is only two things:
- objects;
- message sending.
Well, this is all about prototype-based language.
Just kinding, but seriously.... Prototype-based languages are only living objects.
SO....
For the rest of this guide we will talk exclusively about prototype-based language, because they are simple and beautiful IMO.
In this section I will talk about prototype-based language, how it works and how it relates to javascript.
Before with talk about javascript, we must talk about some characteristics of proto-based language.
- They are mostly dynamic, in fact I think they are all dynamic (well I'm not aware of any kind of prototype-based language which is static). This is due to the fact that we have no class at all. Thus, we can't infer how our object will be layout or their behavior at compile time (it's really dynamic).
- Prototype-based language must provide a way to create object from ex nihilo.
- Most proto-based language adopts a "everything is an object" philosophy.
- Method-lookup is implemented with delegation or some kind of concatenation.
- Every object has a prototype-chain.
- Every object can be the prototype of another object.
And I have no more characterics of proto-based that come to my mind.
The expression ex nihilo means "out of nothing", thus without a constructor or a class. Prototype-based languages always provide a way to create them via a keyword or special syntax. In javascript, this is done via object literal construction.
var object1 = {
"attr1" : 1,
"attr2" : 2,
...
}
Here object1
is a living object created at runtime from the curly braces object literal syntax.
NOTE: This is a similar syntax for the Lua language.
Personnally, I would prefer calling it instance-based since everything must be created from living object or from ex nihilo. That being said, prototyping is somewhat related to the prototype design pattern where an initial object is created as "default" schema so that new instance can be cloned from it and customized to their needs.
Both cloned objects and the prototypes live their life and may be mutated.
In a language where don't have any notion of class, you must use some of your objects as "reference" which you derive other objects.
Here's an example in javascript without the new
keyword:
// my "reference" ~= prototype ~= class
var person = {
this.age = 0,
this.name = "default",
}
var p1 = {}
p1.__proto__ = person
console.log(p1.name) // prints "default"
p1.name = "louis"
console.log(p1.name) // prints "louis"
console.log(person.name) // prints "default"
As you can see I created two objects :
person
serves as a "reference", a.k.a a prototype;p1
is an "instance" ofperson
and "inherits" its behavior.
In other words, prototype is how you reuse behavior amongs objects in an object oriented language which is has no classes.
It's just stupid simple, it is literally the only natural way to achieve some classification and reusability with objects only.
With the __proto__
field we can build a chain of prototype the simulate class
inheritance, like so :
var A = {
foo: function() {
console.log("A::foo")
},
}
var B = {
bar: function() {
console.log("B::bar")
},
}
B.__proto__ = A
var C = {
x: 0
}
C.__proto__ = C
// We create a new "instance" of `C`
var c1 = {}
c1.__proto__ = C
var class = c1.__proto__
// This loop will print out "{ x: 0 } , { bar: [Function] }, { foo: [Function] }, {}, "
// ie, it prints the content of each prototype (parent) in the prototype chain.
while(class != null) {
console.log(class, ", ")
class = class.__proto__
}
The __proto__
field used in the previous example is used in Javascript as
a way to do method lookup. When you send a message to an object, like age
,
the method/attribute is looked inside the targeted object for a property called
age
, if it fails to find the property, the method lookup goes up in prototype chain.
Newer version of javascript comes with the keyword class
, but keep in mind that
javascript is a prototype-based language, thus it has no class.
As the specification of Mozilla says, the notion of class
is only syntactic sugar
for "simpler" code.
The keyword this
refers to the context and not the current object instance
like in Java or any other OOP language.
By default javascript comes with a global object, called window
in the browser
or global
in node. This global object is also known as the global context.
By default if you enter a function without binding the context, like in a top level function,
then this
refers to the global object :
function foo() {
return this;
}
foo() == window; // true
By default every function call must have a receiver, if no receiver is prefixed to
the call like <receiver>.foo
, then javascript implicitly bind this
before calling a function.
This is the similar to implicitly adding this
before every function call : <this>.foo
.
This is pretty standard in most OO language to have a global object and the implicit
this
binded to every function call, this is the case for Ruby and NIT too.
You might think this is a weird behavior, but the nature of OOP require that every message
is sent to a receiver. Javascript is no exception to the rule since everything is an
object (even function).
In fact, Javascript's this
is no different from any mainstream language. The akwardness
of this
comes from the fact that function aren't bound to any receiver by default.
If we compare javascript with Ruby :
class Toto
def foo
return self.method(:bar)
end
def bar
return self
end
end
t = Toto.new
f = t.foo()
p f.call() == t # true
p f.receiver == t # true
And the javascript equivalent :
var t = {
foo: function() {
return this.bar // even if I prefix `bar` by `this`, the function object returned
// by `foo` will still be unbound to `t`.
},
bar: function() {
return this
}
}
var f = t.foo()
f() == t // false
f() == global // true
Unlike Ruby, we see that function objects do not belong to any receiver whatsoever.
As for Ruby, the method named "method" return a Method
instance. We could try to
simulate javascript's behavior:
class Toto
def foo
return self.method(:bar).unbind
end
def bar
return self
end
end
t = Toto.new
f = t.foo()
p f.call() == t # Error: undefined method `call` for `UnboundMethod`
As you can see, Ruby is less permissive on its calling rules.
To simulate Ruby's behavior in Javascript, we need to call bind
, like so:
var t = {
foo: function() {
return this.bar.bind(this)
},
bar: function() {
return this
}
}
var f = t.foo()
f() == t // true
f() == global // false
The metamodel I previously presented is somewhat not really useful in the context of prototype-based language, since property lookup is done by traversing the entire prototype chain. In addition, objects in dynamic language can add, remove and override any inherited properties without respecting any contract or semantics previously established by its ancestors. Having a clean meta model to describe the relation between properties becomes less important but it is still relevant to understand prototype-based languages.
TODO