The Crack Programming Language Guide

This is the Manual for version 0.3 of the Crack Programming Language. Version 0.3 is a fairly complete programming language. We believe the language to be usable for real work at this time, but there are still a number of important features that remain to be added (including generics, exceptions, annotations and partial closures). While the compiler is generating debug information at this time, only a limited subset of it is currently usable (you can get a stack trace in gdb).

But that said, if you want to get in on the ground floor of a new scripting language that is C-like, fast, and interfaces well with C and C++ code, Crack is the language and this version is still pretty close to the ground floor.

So without further caveats - let's do some Crack!

Overview

If you're a seasoned programmer, here is a quick profile to help orient you to Crack:

Major Influences

C, C++, Java, Python

Syntax

C-style, curly-brace

Typing

Static, strong (with some implicit conversion)

Compiler

JIT native compiled (at runtime)

Paradigms

Object oriented, procedural

Garbage Collection

Reference counted objects, programmer controlled

OO Features

Virtual functions, overloaded functions, multiple inheritance

Crack has been developed on Linux x86 and x86-64. It is highly questionable whether it will build under any other platform. Portability will play a bigger role in future versions of the language.

Installation

See the INSTALL file for the latest installation instructions.

Hello World

Here's the crack "hello world" program:

    #!/usr/local/bin/crack
    import crack.io cout;
    cout `hello world!\n`;

If you write this as a script and "chmod u+x" it, when you run it you should see "hello world!" written to the terminal.

The first line is the standard unix "#!" line. It tells the kernel to execute the script by passing its full name as an argument to the "/usr/local/bin/crack" program.

The second line imports the "cout" variable from the crack.io module. Like C++, Crack uses "cin", "cout" and "cerr" for its standard input, output and error streams. "cout" and "cerr" are both "formatters," which means that they support the use of the back-tick operator for formatting.

The third line actually uses the back-tick operator to print some text. We won't go into too much detail on this operator right now - suffice it to say that this line is roughly equivalent to "cout.format('hello world!\n');" The "\n" at the end of the string translates into a newline (the ASCII "LF" character, character code 10).

Expressions consisting of a value followed by back-tick quoted text and code are called "interpolation expressions."

Comments

Crack permits the use of C, C++ and shell style comments:

    /* C Style comment */
    // C++ style comment
    # shell style comment

For code that you hope to get a lot of re-use out of, we recommend the convention of Doxygen-style doc-comments for classes, functions and global variables:

    /** C-style doc-comment */
    /// C++ style doc-comment
    ## shell-ish doc-comment

These currently get treated the same as any other comments. However, future versions of Crack will parse them and store them with the meta-data for the code, permitting the easy extraction of reference documentation from the source.

Variables and Types

Like most languages, Crack allows you to define variables. But unlike most other scripting languages, Crack is statically typed so you must specify (or imply) a type for all variables.

    # define the variable i and initialize it to 100.
    int i = 100;

You can also define variables using the more terse ":=" operator, which derives the type from that of the value:

    i := 100;           # equivalent to "int i = 100;"
    j := uint32(100);   # equivalent to "uint32 j = 100;"

If you don't specify an initializer for a variable, the default initializer will be used. For the numeric types, this is zero. For bool, it's false. For complex types (which we'll discuss later), a null pointer is used..

Built-in Types

The Crack language defines the following set of built-in types - these can be expected to exist in every namespace without requiring an explicit import:

void

The "void" type - this only exists so you can have a function that doesn't return anything. Bad things will happen if you try to define void variables.

byte

An 8-bit unsigned integer (like C's unsigned char)

bool

A boolean. Values are true and false, which are built-in variables.

int32

A 32-bit signed integer.

uint32

A 32-bit unsigned integer.

int64

A 64-bit signed integer.

uint64

A 64-bit unsigned integer.

float32

A 32-bit floating point.

float64

A 64-bit floating point.

int

An integer of the C compiler's default int-size for the platform (this is an alias to either int32 or int64).

uint

An unsigned integer of the C compiler's default unsigned int-size for the platform (this is an alias to either uint32 or uint64).

float

A floating point of the C compiler's float size for the platform (this is an alias to either float32 or float64).

byteptr

A pointer to an array of bytes (roughly like C's char*)

voidptr

A pointer to anything (like C's void*). All high level classes can implicitly convert to voidptr.

array[class]

The low-level array type. You should generally avoid using this in favor of high-level data structures (see crack.container). They are not memory-managed, and don't do memory management of their elements.

This is Crack's only existing generic datatype - to use it, you specialize it with another class type, for example: array[int]

VTableBase

The base class used for all classes that can have virtual functions (more on this later).

Object

The implicit base class of all classes that don't define base classes (extends VTableBase)

String

An immutable, memory managed string of bytes.

StaticString

This is a String whose buffer can point to read-only memory.

Class

The type of class objects themselves. Crack classes exist at runtime as well as compile time. See Classes are Variables.

Of these, the byte, bool, int, uint and float types (including all variations of int, uint and float) are primitives. These types are notable in that they are copy-by-value and consume no memory external to the scope in which they are defined.

The byteptr, voidptr and array types are classified as primitive pointer types. These are essentially memory addresses: they are copied by value, but the memory they reference is not.

Primitive types, primitive pointer types, and the void type are all classified as low-level types. They are distinguished from the higher level aggregate types by naming convention: low-level types will always be all lower case (and digits), high-level types (at least the ones in the standard libraries) will always begin with an upper-case character. You may not currently subclass low-level types (see Inheritance), this restriction will be lifted in a future version of Crack.

High level or aggregate types are first class objects: variables of this type are pointers to allocated regions of memory large enough to accommodate the state data defined for the type. They can be extended to create other high-level types through sub-classing (more on this later).

Type names in Crack are very simple. They are either a single word or "array[ other-type-name ]" The latter form, though currently used only for arrays, will eventually be expanded as the instantiation mechanism for generic types, similar to generics in Java.

Integer Constants

Integer constants can be defined as an integer value in the code:

    int i = 100;
    int j = -1;

Integer constants are "adaptive", which means that they will convert to whatever type is required by their usage as long as the value is within the range of values for that type (for example, "uint u = -1" would be illegal).

Integer constants can be defined using hexadecimal, octal and binary notation as well as the default decimal notation. Examples follow:

    int x = 0x1AF;  # hex constant
    int o1 = 0117, o2 = 0o117; # both the same octal constant, c-style and
                               # normalized notation.
    int b = 0b10110; # binary constant

Implicit Conversion

In certain cases, types will automatically convert to other types. Most types will implicitly convert to boolean, allowing pretty much anything to be used as the condition in an if or while statement.

Aggregate types will implicitly convert to voidptr.

Numeric types will implicitly convert between one another as long as there is no risk of precision loss. In cases where there is a risk of precision loss, you can use explicit construction to force a conversion - truncating the value if necessary.

    # implicit conversions
    byte b;
    int32 i32 = b;
    uint32 u32 = b;
    int64 i64 = i32;
    i64 = u32;
    uint64 u64 = u32;
    float32 f32 = b;
    float64 f64 = i32;
    f64 = u32;
    
    # explicit conversions
    i32 = int32(i64);
    b = byte(f32);
    i64 = int64(u64);

Strings

Most programming languages support strings of characters, which are usually implemented as some kind of array. Crack strings are strings of bytes - you can embed any kind of byte values you want in them, there are no assumptions about encoding.

String constants are sequences of bytes enclosed in single or double quotes (which are equivalent forms):

    String s = "first string";
    t := 'second string';

String constants are actually instances of the "StaticString" class - they're just like strings except that since their buffers are constants, they don't try to deallocate them on destruction.

As in the other C-like languages, string constants (both single and double quoted) can have escape sequences in them. We've dealt with one of these already ("\n"). The full list is:

\t

ASCII Tab character (9).

\n

ASCII newline character (10).

\a

ASCII alarm character (7).

\r

ASCII carriage return (13).

\b

ASCII backspace (8).

\x XX

Two digit hex character value (examples: "\x1f", "\x07")

\ OOO

1 to 3 character octal character value. (examples: "\0", "\141")

\ literal-newline

If you put a backslash in front of the end of the line in a string, the newline is ignored. This allows you to wrap large strings across multiple lines.

Control Structures

Crack 0.3 supports three control structures: the "if/else" statement, the "while" statement and the "for" statement. "if" runs code blocks depending on whether a condition is true or false:

    import crack.io cout;
    if (true)
        cout `true is true\n`; 
    else
        cout `something is wrong\n`;

The code above will always print out "true is true".

If we wanted to do something a little more useful, we could have used it to check the command line argument:

    import crack.sys argv;
    import crack.io cout;
    
    if (argv.count() > 1 && argv[1] == 'true')
        cout `arg is true\n`;
    else
        cout `arg is false\n`;

There's a lot of new stuff going on here: first of all, we're importing the "argv" variable form crack.sys. This variable contains the program's command line arguments.

count() is a method (a function attached to a value called "the receiver") that returns the number of items in argv. argv[1] accesses item 1 of the argument list (indexes are zero-based, so item 1 is the second element of the sequence).

The "&&" is a short-circuit logical and: it returns true if both of the expressions are true, but it won't evaluate the second expression unless the first is true. This is important in this case, because if we were to check argv[1] in a case where argv had less than two elements, a fatal error would result.

There is also a "||" operator which is a short-circuit logical or. It returns true if either expression is true but does not evaluate the second expression if the first is true.

The if statement need not be accompanied by an else:

    if (argv.count() > 1 && argv[1] == 'true')
        cout `arg is true\n`;
    cout `this gets written no matter what the args are\n`;

The code in an if or an else can either be a single statement, or a sequence of statements enclosed in curly braces:

    if (argv.count() > 1 && argv[1] == 'true') {
        cout `arg is true\n`;
        cout `and so are you!\n`;
    }

You can also chain if/else blocks:

    argCount := argv.count();
    if (argCount > 2)
        cout `more than one arg\n`;
    else if (argCount > 1)
        cout `just one arg\n`;
    else
        cout `no args.\n`;

Note that blocks of code in curly braces can include the definitions of new variables that are only visible from within that block. Each block is a namespace that inherits definitions from the outer namespace. The top-level code in the file is the module namespace.

The while statement

The while statement repeatedly executes the same code block while the condition is true. For example, we could iterate over the list of arguments with the following code:

    import crack.sys argv;
    import crack.io cout;
    
    uint i;
    while (i < argv.count()) {
        cout `argv $i: $(argv[i])\n`;
        ++i;
    }

Note that the code in the while is enclosed in curly braces. In general, the code managed by a control structure can either be a single statement, or a group of statements enclosed in curly braces. The if statement works the same way.

This example also introduces the primary feature of the back-tick operator: variable interpolation. A dollar sign followed by a variable name formats the variable. A dollar sign followed by a parenthesized expression formats the value of the expression.

The for statement

Crack supports two different flavors of "for": C-style and "for-each" style (as used in Python, Javascript, Java and Perl).

The C-style for statement looks like a while statement, but the parenthesized form at the beginning consists of three parts:

an initializer, which is executed prior to anything else in the loop.
a condition, used in the same way as the condition in a while loop.
a post-loop clause, which is called at the end of every iteration before the condition is evaluated.

So for example, the more concise way to express the loop above would be:

    for (uint i; i < argv.count(); ++i)
        cout `argv $i: $(argv[i])\n`;

Notice that the first section is a variable definition. This is allowed. The semantics are similar to C++: the variable is defined for the scope of the loop.

Any one (or all) of the sections of the parenthesized form after the "for" keyword can be omitted. So this would be the equivalent of a "while" statement:

    for (;expr;) stmt

Likewise, an inifinite loop could be written as:

    for (;;) stmt

The for statement also supports a for-each style usage:

    String argVal;
    for (argVal in argv)
        cout `arg is $argv\n`;

In the example above, argVal will be set to each element of argv in succession. There is also a ":in" variation on the "in" keyword that defines the variable for the scope of the loop, so we could have omitted the argVal definition and done this:

    for (argVal :in argv)
        cout `arg is $argv\n`;

The value on the right side of the "in" keyword (in this case argv) must conform to an iteration protocol:

It must provide an iter() method that returns an object conforming to the iterator protocol (an "iterator type").
The iterator type must have an elem() method that returns the element referenced by the iterator.
The iterator type must have a next() method that forwards the iterator to the next element.
The iterator type must convert to bool, converting to true if the iterator is valid, false if not (if it has run out of elements).

There is no way to explicitly identify a pair of types as conforming to this protocol: an object that implements all of these methods automatically conforms and can be used as the target of a "for-in" statement.

"for-in" creates an underlying iterator variable, but hides it. It is often desirable to have access to the iterator. For example, you may want to modify a sequence as you iterate over it and a the sequence might have a mutator that accepts an iterator.

For this use case, there is also a "for-on" variation of this statement:

    for (iter :on argv) {
        argVal := iter.elem();
        cout `arg is $argv\n`;
    }

As the example suggests, the "on" keyword provides a ":on" variation which defines the iterator variable for the scope of the loop.

Functions

Functions let you encapsulate common functionality. They are defined with a type name, an argument list, and a block of code, just like in C:

    int factorial(int val) {
        if (val == 1)
            return 1;
        else
            return val * factorial(val - 1);
    }

Also note that Crack supports recursion: you can call a function from within the definition of that function.

You can define a function that doesn't return a value by using the special "void" type:

    void printInt(int i) {
        cout `$i\n`;
    }

Primitive types (such as "int") are always passed "by value." The system makes a copy of them for the function. If they are high-level types, you can modify the objects that they reference from within the function.

Multiple functions can share the same name as long as their arguments differ: this feature is called overloading. For example, rather than "printInt" above, we could have defined a print function for multiple types:

    void print(int64 i) {
        cout `int $i\n`;
    }
    
    void print(uint64 u) {
        cout `uint $u\n`;
    }
    
    void print(String s) {
        cout `String $s\n`;
    }

The compiler chooses a function using a two-pass process: the first pass attempts to find a match based on the argument types without any conversions. The second pass attempts to find a match applying conversions whenever possible.

The general order of resolution in both passes is:

search for a match in the current namespace by order of definition.
repeat the search in each of the parent namespaces.

So for example, if we called print() with a uint64 parameter, the resolver would check the first print, then check the second print, find a match and use print(uint64 u). If we called it with an int32 parameter, the resolver would try all three functions, and not find a match. It would then repeat the search with conversion enabled and immediately match the first function, because int32 can implicitly convert to int64.

We mentioned searching across namespaces: functions can be defined in most block contexts, including within other functions:

    void outer() {
        void inner(int i) {
            cout `in inner\n`;
        }
        
        inner(100);
    }
    
    # we can't call "inner() from here...

If there were another function, "inner(uint u)" defined in the same scope as outer(), the resolver would consider inner(int i) prior to inner(uint u).

Note that it is currently an error to use variables defined in the outer function in the inner function:

    void outer() { int a; int inner() { return a; } } # DOESN'T WORK

Attempting to do this will result in a compile-time error.

[it should be noted that a future version of Crack will support this partially, without assignment, and will also support a limited form of closure]

Operators

Crack supports the complete set of C operators. As in C++, they can be used with non-numeric types through Operator Overloading.

Comparison Operators

Comparison operators compare two values and return a boolean.

==

True if the two values are equal. 1 == 1 is true.

!=

True if the two values are not equal. 1 != 2

>

True if the left value is greater than the right value. 2 > 1

<

True if the left value is less than the right value. 1 < 2

>=

True if the left value is greater than or equal to the right value. 2 >= 2

<=

True if the right value is greater than or equal to the left value. 2 <= 2.

is

True if the object on the left is identical (not merely equal) to the object on the right. This isn't defined for numbers, only for aggregates and primitive pointer types. It essentially checks for the equivalence of the pointers.

Basic Arithmetic

All integer and floating point types support the basic arithmetic operators.

+

Add two values. 2 + 2 == 4

-

Subtract two values. 4 - 2 == 2

/

Divide one value by another. 6 / 3 == 2

*

Multiply one value by another

Unary plus and minus are also supported, so we can say:

    x := 2;
    y := 10 + -x;  # y is 8
    z := 10 - +x;  # z is also 8

Unary plus and minus are special when applied to constants. In this case, they create a new constant, preserving their adaptiveness.

Bitwise Operators

All of the integer types support the following bitwise operations:

&

Bitwise and. 5 & 4 == 4

|

Bitwise or. 5 | 4 == 5

<<

Shift all bits left by the amount specfied on the right. 1 << 2 == 4

>>

Shift all bits right by the amount specified on the right. 4 >> 2 == 1 If this is done on a signed operator, this is an arithmetic shift. Arithmetic shifts preserve the sign of the value shifted.

^

Exclusive or. 5 ^ 6 == 3

Augmented Assignment

All of the binary operators but the comparison operators can be used with assignment to do C-like "augmented assignment." In general, the augmented expression x op = y is equivalent to x = x op y.

So specifically, the augmented assignment operators are: +=, -=, /=, *=, &=, |=, >>=, <<=, ^=.

The Ternary Operator

Like the C family of languages, crack supports the "ternary operator." The ternary operator evaluates to one expression or another based on the results of a boolean expression:

    a = b ? c : d;

If b is true, a will be set to c and d will not be evaluated. If false, a will be set to d and c will not be evaluated..

c and d may be of different types. If they are, the result will be of the same type as c if d is of a type derived from c's type (see Inheritance below) and the same type as d if c is of a type derived from d.

If the types of c and d do not share a common ancestor, the compiler will first attempt to convert d to the c's type. That failing, it will attempt to convert c to d's type. If none of this works, it will finally give an error.

Precedence

Operator precedence is the same as in C. So the rules are (highest precedence first):

() [] . ++ -- (right side operators)
++ -- + - ! ~ (left side operators)
* / % (binary operators)
+ - (binary operators)
<< >> (binary shift operators)
< <= > >= (binary comparison operators)
== != (binary equality check operators)
& (bitwise and)
| (bitwise or)
&& (logical and)
|| (logical or)
?: Ternary operator
= += -= *= /= %= &= |= <<= >>= (assignment operators)

Classes

Classes are a feature of object oriented programming languages that combine a set of data variables with a set of special functions called "methods." As a simple example of a class, consider the representation of an x, y graphics coordinate:

    import crack.lang Writer;
    import crack.io cout, Formatter;

    class Coord {
        int x, y;
        
        oper init(int x0, int y0) : x = x0, y = y0 {}
        oper init() {}

        void writeTo(Writer out) {
            Formatter(out) `Coord($x, $y)`;
        }
    }

This class has two "instance variables:" x and y. These get bundled together in a package whenever we create an instance of the class.

The "oper init" syntax creates a constructor, which is a special function that gets called when an instance of the class is created. The constructor performs basic initialization of all of the instance variables. The second "oper init", the one without arguments, is called the "default constructor." As in C++, default constructors get generated automatically if the class has no other defined constructors. If the class does define constructors, and you want a default constructor, you have to specify one explicitly as we've done above.

We can create an instance of Coord like so:

    c := Coord(3, 4);

Alternately, we can use a more C-like syntax:

    Coord c = {3, 4};

Both of these are just different syntactic flavors of the same thing: in both cases we're defining a variable "c" that is a reference to a Coord object that is allocated on the heap. The system initializes this variable by:

Allocating memory large enough to accommodate a Coord object
calling the appropriate "oper init" function for the construction arguments ("3, 4" in the examples above).
Assigning the address of the newly created Coord object to c.

Note that the all variables of class types are references - they behave very much like pointers in C. So if we were to initialize one variable from another, both variables would refer to the same object:

    c := Coord(3, 4);
    d := c;
    c.y = 5; # d.y is now also 5

This is different from the way that the primitive types behave. Primitive types are always passed "by value." So:

    c := 100;
    d := c;
    c = c + 1; # c is now 101, d is still 100

We can tell if two variables are references to the same object using the special is operator:

    c := Coord(1, 2);
    d := c;
    e := Coord(1, 2);
    if (c is d)
        cout `this will always be printed\n`;
    if (c is e)
        cout `this will never be printed\n`;

Note that identity (the property tested by the is operator) in Crack is a different concept from equality (as tested by the == operator). Two objects have the same identity if their underlying references are equal. However, references to two different object may still be equal if they have the same state (as determined by the cmp() method). In the example above, it might be reasonable to expect that c and e are equal, since they both have values (x = 1, y = 2), although in fact they would not be unless Coord implemented a cmp() method which provided this logic. The cmp() method provided by Object is simply an identity check.

There is a special constant, "null" which allows you to clear these kinds of variables so that they don't reference any object.

    # initialize c to null, then set it conditionally
    Coord c = null;
    if (positive)
        c = Coord(1, 1);
    else
        c = Coord(-1, -1);

You can use the is operator on null values:

    void drawImage(Coord pos, Image img, Coord size) {
        if (size is null)
            copyImage(pos, img);
        else
            stretchImage(pos, img, size);
    }

For classes derived from Object, null values are always treated as false:

    Coord c = null;
    if (!c)
        cout `this will always be printed\n`;

Our Coord class also has a writeTo() method. This allows us to implement the writeTo() method which controls how an Object is written using the back-tick operator. For example:

    cout `$(Coord(10, 20))\n`; # prints "Coord(10, 10)" to standard output.

writeTo() uses the instance variables x and y. One characteristic of methods is that instance variables and other methods can be used without qualification (you don't need a "self" or "this" variable, although this is possible, see below). As another example, we could define a method to give us the square of the distance from the origin as follows:

    int distOrgSquared() {
        return x * x + y * y;
    }

We could then add this information to our writeTo() method:

    void writeTo(Writer out) {
        Formatter(out) `Coord($x, $y) [dist squared = $(distOrgSquared())]`;
    }

Methods also have a special variable called "this". Just as in C++, this refers to the object that the method has been called on. In traditional Object-Oriented parlance, this object is called "the receiver."

We could have rewritten distOrgSquared() as follows:

    int distOrgSquared() {
        return this.x * this.x + this.y * this.y;
    }

The this variable is mainly useful for passing the receiver to other functions.

Classes are Variables

In addition to being compile-time entities, Classes are also variables that can be accessed at runtime. They are of type Class. So, for example, we can do this:

    class Foo {}
    Class foo2 = Foo;
    if (foo.isSubclass(Object))
        cout `Foo is an Object\n`;

Constructors

We mentioned the "oper init" functions earlier. These are called constructors. In Java and C++, constructors are defined using a function that looks like the class name. In the interests of providing uniform syntax for all special methods, Crack uses the "oper" keyword to introduce overloaded operators and special methods, including the constructors and destructors.

Constructor definitions have some special syntax. The return type can be omitted, and you can provide an initializer list for member variables and base classes.

In the example above, we defined two constructors:

    oper init(int x0, int y0) : x = x0, y = y0 {}
    oper init() {}

In the first case, the initializer list initializes the x and y member variables from the arguments x0 and y0. Note that the initializers are specified using assignment syntax: "x = x0" instead of the construction syntax that C++ would have used: "x(x0)".

The construction syntax can be used, too, but it has a different meaning. Construction syntax means "construct the variable with the given arguments." Assignment syntax means "initialize the variable from the given value."

So, for example, "x(x0)" would be equivalent to "x = int(x0)", which is also perfectly legal. The uses for these two types of syntax becomes more obvious when we deal with members that are themselves class instances.

For example, let's say that we want to define a line segment:

    class LineSegment {
    
        # two coordinates
        Coord c0, c1;
        
        ## Construct from two coordinates.
        oper init(Coord initC0, Coord initC1) : 
            c0 = initC0,  
            c1 = initC1 {
        }
        
        ## Construct from raw x and y values
        oper init(int x0, int y0, int x1, int y1) :
            c0(x0, y0),
            c1(x1, y1) {
        }
    }

In the first constructor, we're using the assignment syntax because we want to bind the objects passed in (initC0 and initC1) to the c0 and c1 variables. If we had instead used construction syntax:

    oper init(Coord initC0, Coord initC1) : 
        c0(initC0),  
        c1(initC1) {
    }

the compiler would have tried to find a Coord constructor that accepts another Coord object as an argument. Since there is no such constructor, we would have gotten an error. We could have instead done this:

    oper init(Coord initC0, Coord initC1) : 
        c0(initC0.x, initC0.y),  
        c1(initC1.x, initC1.y) {
    }

This would have called the two argument constructors and created two new Coord objects for c0 and c1. There's an important difference between this and the assignment syntax we started with: with the assignment syntax, c0 and c1 become references to the objects that were passed into them. If we did this:

    Coord c0 = {10, 10}, c1 = {20, 20};
    ls := LineSegment(c0, c1);
    c0.x = 20;  # l.c0.x is now also 20.

changing c0.x in this case also changes the value within ls because the ls's c0 is the same object as the caller's c0. If we had instead used the construction syntax, ls would have had its own copies of the Coord objects, and changing c0's x value wouldn't have had any effect on ls.

If you don't specify an initializer for one of your instance variables, the constructor will initialize the variable based on whatever initializers you gave it in the instance variable definition. So, for example, if we wanted coordinates to default to "-1, -1" for some reason, we could have done this:

class Coord { int x = -1, y = -1; }

As with ordinary variables, null or zero is used if no initializers are specified (null for aggregate types, zero for primitive numeric types).

Initializers are not necessarily run in the order that you specify them: they are run in the order of member definition. So in our examples above, if we had specified an initializer list of ": y = y0, x = x0", x still would have been initialized first.

You can define as many constructors as you want as long as their arguments have different types. This is another example of overloading: the compiler can tell the difference between them from their argument types.

The default constructor is the constructor without any arguments. If you don't define any constructors in your class, the compiler will attempt to generate a default constructor for you - it will generate a constructor that initializes the members with their variable initializers, using their default constructors if there were no initializers.

In future versions of Crack, if a class defines no constructors, it will attempt to inherit all of the constructors of the base classes (see Inheritance).

Inheritance

One important property of object-oriented programming languages is inheritance: the ability to create a new class by extending an existing class. Crack supports inheritance with a syntax similar to that of C++. Let's say that we wanted a coordinate like in our last example, only we also wanted it to have a name. We could create a new class for this:

    class NamedCoord {
        int x, y;
        String name;
    }

but then we'd have to write everything that we wanted to reuse over again in the new class. And every time we fixed a bug in Coord, we'd have to fix the same bug in NamedCoord. Inheritance provides a better way to reuse code:

    class NamedCoord : Coord {
        
        String name;
        
        oper init(int x, int y, String name0) : Coord(x, y), name = name0 {}
        
        void writeTo(Writer out) {
            Formatter(out) `NamedCoord($x, $y)`;
        }
    }

In the example above, we're creating a new class called NamedCoord that is derived from Coord. It will inherit all of Coord's instance variables and methods. We call Coord NamedCoord's base class. NamedCoord is a subclass or derived class of Coord.

In addition to allowing reuse of code, inheritance also has the advantage that instances of the derived class can be used in situations that call for an instance of the base class. So if we had a function that accepted a Coord, we could pass it a NamedCoord:

    void drawLine(Coord c0, Coord c1) { ... }
    
    NamedCoord c1 = {1, 2, 'c1'}, c2 = {3, 4, 'c2'};
    drawLine(c1, c2);

Note that this is not conversion: instances of NamedCoord are already instances of Coord. As such, function calls passing classes derived from argument types will match in the first resolution pass.

One of the first things we have to deal with in creating NamedCoord is Coord's constructor. Note that in the new initializer list, we have an entry for the base class as well as for the name variable. If we didn't specify a constructor, the compiler would have used the default constructor if there was one.

Like member initializers, base classes are initialized in the order in which they are defined. All base class initializers are run before any of the instance variable initializers for the class. Consider the following example:


    import crack.io cout;
    
    class A {
        oper init(String name) { cout `initializing $name\n`; }
        oper init() {}
    }
    
    class B : A {
        A a1, a2;
        
        # the order of initializers is ignored.
        oper init() : a2('a2'), a1('a1'), A('base class') {}
    }
    
    # create a temporary instance of B, prints
    B();

This will print the following:

    initializing base class
    initializing a1
    initializing a2

Going back to our NamedCoord example, we also defined another writeTo() method:

    void writeTo(Writer out) {
        Formatter(out) `NamedCoord($x, $y)`;
    }

We did this because Coord's writeTo() method writes out "Coord($x, $y)". We want to write "NamedCoord($x, $y)".

Sometimes you want to call the base class version of a function that is overridden in the derived class. Most often this is used to extend the base class functionality. Crack lets you do this by qualifying the method with the class name. For example, we could have instead overridden writeTo() like this:

    void writeTo(Writer out) {
        out.write('Named');
        Coord.writeTo(out);
    }

Multiple Inheritance

Crack supports multiple inheritance: you can have any number of base classes. With the exception of the special VTableBase class, it is an error to inherit from the same base class multiple times, even indirectly. Eventually, Crack will support this using virtual base classes like in C++.

Destructors

In addition to "oper init" constructors, Crack classes can have destructors. These are called by Object.oper release() when an object's reference count drops to zero. They can also be called explicitly by objects implementing their own memory management strategies.

You can implement the destructor for a class by defining an "oper del" method:

    class Noisy {
        oper del() { cout `Noisy object deleted\n`; }
    }
    
    Noisy x;  # Prints a message when x goes out of scope.

After calling the user defined code, oper del automatically calls oper release on all of the instance variables that have an oper release method (see Reference Counting). It then automatically calls the oper del method of each of its base classes. In both cases, these calls are in reverse order of initialization: first the instance variables in the reverse order that they are defined, then the base classes in the reverse order that they are listed.

Because of all of this automatic destruction, most oper del method don't need to have any user code at all - everything takes care of its own cleanup. If you don't define an oper del method, the compiler will generate one by default.

The only cases where you really need to define an oper del method are in the case of certain external consequences: for example, a File object might want to make sure that its file descriptor is closed upon destruction.

It should be noted that an object must do nothing to change its own reference count during processing of oper del, such as assigning it to an external variable, or inserting it into an external collection. If you do this, the object will still be deleted and the external reference will be invalid. Future versions of Crack will have some degree of protection against this, but for now - don't do it.

Static and Final Methods

By default, all methods in a class derived from VTableBase are virtual. Since Object is derived from VTableBase, that means that in practice most of the methods that you write are virtual.

This isn't always what you want. For one thing, virtual methods are relatively expensive compared to normal functions. Calling them involves looking up their addresses at runtime from the vtable, which may be overkill if you have no intention of ever overriding them. Furthermore, because the vtable is associated with an instnace, a virtual method can not be called on a null value. Finally, there are situations where you may want a function to be scoped to a class, but it isn't really a method: it doesn't need to access this.

To deal with these cases, Crack defines two special built-in annotations: @static and @final. (See Annotations for a more general description of this feature)

@static and @final can be used in front of a method definition to give it special status. @final indicates that the method should not be virtual. This makes the method more like a normal function: no vtable lookup is required, and the receiver can be null. Using @final prevents the method from ever being overriden. If you try to override it in a derived class, you will get an error.

@static effectively makes the method a normal function scoped to the class. As a result, the function can be called without a receiver.

For examples, let's return to our Coord class:

    class Coord {
        
         # ... methods defined above omitted ...
         
         ## Returns true if the coordinate is non-null and not the origin.
         @final bool isNonOrigin() {
            return !(this is null) && (x || y);
         }
         
         ## Returns the length of an x, y position, whether or not it's 
         ## wrapped in a Coord object.
         @static int length(int x, int y) {
            # let's pretend we have an integer square root function
            return squareRoot(x * x + y * y);
        }
    }

The Special Base Classes

There are three "special" base classes in Crack:

Object
VTableBase
FreeBase

The first two are available from any Crack code, FreeBase must be explicitly imported from crack.lang.

Object is the default base class for all other classes. If you don't specify any base classes, your class will implicitly be derived from Object. (that's not entirely true: there is a bootstrapping mode in which classes have no default base class, but that's another story).

Object supports a general set of functionality that is applicable to most types, including:

Reference counting.
Boolean conversion.
Formatting.
Comparison operators.

VTableBase is the base class for all classes with a vtable, which is the implementation mechanism of virtual functions. It is a special class that is defined by the compiler, and it has no special contents other than a hidden vtable pointer instance variable.

Object is derived from VTableBase, so by default most methods in Object and all of its derived classes are virtual.

FreeBase is a base class that can be used in cases where you don't want to be derived from Object (like when defining a class that mirrors a C structure). FreeBase does not support virtual functions, memory management, or anything you don't put into your derived class. If you're going to use it, you should at minimum figure out how to deal with memory management.

There are situations where you get a base class but you suspect or know that it is a derived cast. Like C++, Crack lets you typecast a base class to a derived class using cast() and unsafeCast().

Typecasting is generally deprecated in object-oriented paradigms. However, there are certain situations where it is necessary, and others where it is just the easiest way to get something done. Consider the case of containers:

    import crack.container Array;
    
    # create an array of coordinates
    coords := Array();
    coords.append(Coord(1, 2));
    coords.append(Coord(3, 4));

We've stored a couple of Coord objects in the array, but we can't use these directly because Array stores an array of objects:

    # gives an error because there is no drawLine(Object, Object) function.
    drawLine(coords[0], coords[1]);

This is the same problem that early versions of Java had - it will be fixed in a later version of Crack through the introduction of generics. But for now we can work around this with a type cast (in Crack 0.3, we also have a weak form of generics implemented using macros, see Generic Containers):

    drawLine(Coord.cast(coords[0]), Coord.cast(coords[1]));

The cast() function is defined for all classes that derive from VTableBase (including all classes derived from Object). If you attempt to cast an object to a type that it is not an instance of, the program will abort with a (fairly useless) class cast error.

For classes not derived from VTableBase, you can use unsafeCast():

    import crack.lang FreeBase;
    class Rogue : FreeBase {}
    
    FreeBase f = Rogue();
    Rogue r = Rogue.unsafeCast(f);

Unlike cast(), unsafeCast() does no checking whatsoever - the programmer is responsible for insuring that the object is of the type that he is casting it to. If it's not, unsafeCast() will happily deliver a reference to an invalid object.

For classes derived from VTableBase, you can verify prior to doing an unsafeCast() in the same method that cast() does, by looking at the associated class object:

    Foo obj;
    Coord c = null
    if (obj.class.isSubclass(Coord))
        c = Coord.unsafeCast(c);

Every object derived from VTableBase has a special class attribute - it's like an instance variable, only you can't assign it. It is implemented using a virtual function. The class attribute returns the object's class (recall that classes are also values that exist at runtime). So we could also do something like this:

    Coord c;
    if (c.class is Coord)
        cout `this will always get printed\n`;

Note that you usually don't want to use the is operator to check the class because it's usually acceptable for the class to be either the same as the class you are checking for or derived from the class you are checking for. Use isSubclass() instead.

Interfaces

As an interim solution to accomodate the lack of virtual base classes, Crack 0.3 provides an interface concept similar to that of Java. In fact, the interface implementation makes it suitable for use as a limited form of virtual base class, but we'll mainly focus on the interface use case.

The crack.exp.ann module provides two special annotations: @interface and @implements (for more general information see Annotations).

To use them, define an interface class:

    @import crack.exp.ann interface, implements;
    import crack.exp.error err;
    
    @interface Drawable {
        void draw(Canvas canvas) {
            err.do()
                'Drawable.draw() not implemented in $(this.class.name)\n`;
        }
    }

In the example above, we're defining a Drawable interface that provides a draw() method that generates a runtime error if called.

Now lets create a flavor of our LineSegment class that implements it:

    
    class DrawableLineSegment : LineSegment @implements Drawable {
        void draw(Canvas canvas) {
            canvas.drawLine(c0, c1);
        }
        
        oper init(Coord initC0, Coord initC1) : LineSegment(initC0, initC1) {}
    }

We can now pass DrawableLineSegment to any function that accepts Drawable objects without having to worry about doing anything special to accomodate reference counting. That is to say, we can do something like this:

    void myFunction(Drawable x) {
        x.draw();
    }

Of course, we could do this with normal inheritance, too, but that doesn't solve the problem for classes that we don't control (like those provided by an external module), or for cases where we want an object to implement more than one interface.

This approach has some limitations. The @interface and @implementation annotations are really just syntactic sugar for class definitions. @interface generates a class definition from the syntax that follows it and injects three special methods: oper bind(), oper release() and get IFaceName Object() (where IFaceName is the name of the interface, Drawable in the example above). The get IFaceName Object() methods have a return type of Object and they exist to provide an instance of Object for the oper bind() and oper release() methods to delegate to. The get IFaceName Object() methods defined for an interface always return null because interfaces are never derived from Object: they are derived from VTableBase.

@implements does something similar with the implementation class: for every interface class name that follows it (you can have a comma separated list of them) it generates a get IFaceName Object() method that simply returns "this". Impelementation classes must therefore derive from Object, either indirectly or directly and they must do so explicitly (so "class A @implements IFace" would be illegal, you'd have to use "class A : Object @implements IFace" instead.

These annotations are a very weak form of syntactic sugar, as such:

errors are likely to be non-obvious
there is no verification that all abstract methods are implemented at compile time (if we forgot to implement draw(), we wouldn't know about it until runtime)
the methods that they rely on for obtaining an Object instance are constructed from the class name, and subject to non-obvious naming conflicts with similarly named methods defined by the programmer.

Versions of Crack after 0.3 will most likely to continue to provide some form of the @interface and @implements annotations (hopefully a refined form) but will ultimately implement them in terms of virtual inheritance.

Special Methods

Certain methods have special meaning within the language or the standard libraries.

final is used to designate methods that are declared with the @final annotation in crack.lang.Object. As explained in Static and Final Methods, this indicates that the method is non-virtual - even if the class derives from VTableBase, the method will not be turned into a virtual method. As such, the method can not be overloaded. It may also be invoked using a null value as the receiver.

oper init

A constructor.

oper del

The destructor.

bool toBool()

(final) If this method is defined, instances of the class can be implicitly converted to null (see Implicit Conversion). Object implements this.

This will be replaced with the more general "oper to type" form in a future version of the language.

bool isTrue()

Returns true if the object is "true" when converted to a boolean. This is a virtual function defined in Object that is called for non-null values by toBool(). It allows derived classes to easily override conversion to bool.

int cmp(Object other)

Compare the object with another object. Return a value that is greater than zero if the receiver is greater than other, returns a value less than zero if it is less than other, and returns zero if the two objects are equal.

If you implement this, all of the normal comparison operators ("==", "!=", "<", ">", "<=" and ">=") will work for you.

void writeTo(Writer writer)

Write the receiver to writer. This is used to allow the object to write itself in its most natural representation - whatever that means for the object type.

void format( type object)

This method is used by the back-tick operator to format objects of specific types in specific ways. See The Formatter Interface.

Operator Overloading

The oper keyword originated as a short form of the "operator" keyword in C++ which is designed to allow you to define your own implementation of the operators (e.g. "+", "-", ">" ...).

The following operators can be overloaded:

oper +( type other)

Binary plus.

oper -()

Unary negate.

oper -( type other)

Binary minus.

oper *( type other)

Binary multiply.

oper /( type other)

Binary divide.

oper %( type other)

Binary remainder.

oper []( type index)

Array element access.

oper []=( type index, type value)

Array element assignment.

oper --()

Unary pre-decrement (post-decrement, pre-increment and post-increment don't exist yet, not sure why this one does).

oper !()

Unary boolean negate.

oper ~()

Unary bitwise negate.

oper ==( type other)

(final) Binary "equals." Object implements this as "cmp(other) == 0".

oper !=( type other)

(final) Binary "not equals." Object implements this as "cmp(other) != 0".

oper <( type other)

(final) Binary "less than." Object implements this as "cmp(other) < 0".

oper <=( type other)

(final) Binary "less than or equal to." Object implements this as "cmp(other) <= 0".

oper >( type other)

(final) Binary "greater than." Object implements this as "cmp(other) > 0".

oper >=( type other)

(final) Binary "greater than or equal to." Object implements this as "cmp(other) >= 0".

oper |( type other)

Bitwise or.

oper &( type other)

Bitwise and.

oper <<( type other)

Left shift.

oper >>( type other)

Right shift.

oper ^( type other)

Exclusive or.

oper op =( type other)

(all Augmented Assignment operators). If these exist, they will be called when the augmented assignment operators are used. If they are not defined, they can still be used if the type defines the simple operator that it is based on. So for example, if a class defines "oper +", you can use the += opertator on an instance without defining it.

As in the last section, final means the method is non-virtual for Object derivatives and cannot be overriden.

The primitive types mostly have intrinsic implementations of the operators.

Certain operators can not be overloaded:

The ternary operator.
The short-circuit and and or operations ("&&" and "||")
The assignment operator ("=").
The define operator (":=").
The "is" operator.

Method Resolution in Classes

When resolving an overloaded method, Crack uses the same rules as for normal function resolution: check each method in each namespace in the order defined, then do the same in the parent namespaces. If no result is found, repeat with conversions.

For classes, "parent namespaces" are the base classes. So if we have:

    class Base {
        void func(B b) {}
        void func(A a) {}
    }
    
    class Derived : Base {
        void func(A a) {}
    }

when we try to resolve func(val), the compiler will check:

Derived.func(A a)
Base.func(B b)
Base.func(A a)

This is somewhat problematic because, in the case above, if B is derived from A we probably don't want to override the more specific func(B) when we override the more general func(A), but that's what will happen because Derived.func(A) will match calls to func with B as an argument.

This results in even more weirdness when we deal with Base as an abstract interface:

    Derived().func(B());   # calls Derived.func(A)
    Base base = Derived();
    base.func(B());        # calls Base.func(B)!

For these reasons, method resolution will change in a future version of crack so that overrides will not be checked as part of the method set in the override's context - they will only be checked in the base class where they were first defined.

Modules

We've been making casual use of the import statement throughout this document. The import statement is used to import symbols from modules, for example we've use it to import the global variable cout from the crack.io module:

    import crack.io cout;

The general format of the import statement is:

    import  module-name  name-list;

module-name is a dot-delimited module name. name-list is a comma separated list of functions, variables and classes defined in the module that you wish to import into the current namespace.

Module names correspond directly to directory and file names in the Crack "library path." When resolving a module name, the system:

checks to see if the module has already been loaded, if so it just uses the existing module information.
loads the parent module (if the parent module is not found, this is not an error)
splits the name up by periods, concatenates all but the last part of the name into a relative directory path. The last part of the name becomes the filename. Example: "foo.bar.baz" -> path = "foo/bar", filename = "baz"
for every directory in the crack library path, search for a subdirectory matching the path and the filename with a ".crk" extention
when we find it, compile it and then execute the module top-level code (everything that's not in a function).

So for example, to load the crack.lang module for the first time we:

First try to load the crack module.
Search the library path for "crack/lang.crk"
Compile and execute the file.

The crack library path is specified with the "-l" option values on the command line. By default, the executor inserts the $PREFIX/lib/crack$VERSION path and the current directory into the beginning of the search path.

Variables defined in the module top-level are not released until program termination. Cleanups are called in the reverse order of definition.

The Formatter Interface

As we've shown, the back-tick operator allows us to do formatted output of static data, variables and expression values:

    int a;
    cout `a = $a, a + 1 = $(a + 1)\n`;

Expressions of this form are called "interpolation expressions," because they interpolate values into format strings. The interpolation expression above is equivalent to the following code:

    if (cout) {
        cout.format('a = ');
        cout.format(a);
        cout.format(', a + 1 = ');
        cout.format(a + 1);
        cout.format('\n');
    }

The cout variable is defined in crack.io as an instance of Formatter. Interpolation expressions are not limited to use with Formatter, they can be used on any object that supports conversion to boolean and format() methods for all of the values in the expression. For example, we could create our own formatter that could be used in the expression above:

    class SumOfInts {
        int total;
        
        ## ignore static strings.
        void format(StaticString s) {}
        
        ## make integer formatting add the value to the sum.
        void format(int val) { total = total + val; }
    }
    
    SumOfInts sum;
    sum `a = $a, a + 1 = $(a + 1)\n`;
    
    # sum.total is 2a + 1

More often when doing this, you'll want to derive from formatter and extend its functionality:

    import crack.io Formatter;
    
    ## Formatter that encloses strings in quotes.
    class StrQuoter : Formatter {

        oper init(Writer w) : Formatter(w) {}

        ## implemented so we don't quote StaticString
        void format(StaticString s) { rep.write(s); }
        
        ## Write strings wrapped in quotes.
        void format(String s) {
            rep.write('"');
            rep.write(s);
            rep.write('"');
        }
    }
    
    String s = 'string value';
    
    # wrap standard output's underlying writer with our formatter and use it 
    # to format the value.
    StrQuoter(cout.rep) `value is $s\n`;

Note that we had to reimplement format(StaticString) in the example above. The static content in an interpolation expression is of type StaticString like all string constants in crack. If we had not defined format(StaticString), the normal format method would have been used for the "value is " string (this is because of the current resolution order of methods: it will be changed in a future version of the language so that we don't have this problem).

You can create your own Formatter objects given a Writer object. There are already a few specializations of this class in the crack.io module.

StringFormatter allows you to construct a string using a formatter:

    import crack.io StringFormatter;
    
    f := StringFormatter();
    f `some text`;
    s := f.createString();  # s == "some text"

Reference Counting

Reference counting is a simple form of memory management. Every object is assigned a reference count, which is essentially the number of other objects or variables referencing the object. When a new reference is added, the reference count is increased. When a reference is removed, the reference count is decreased. When the reference count drops to zero, the destructor is called and the object's memory is released.

Crack's reference counting mechanism is actually implemented in the language as part of the implementation of Object in the crack.lang module. The compiler uses two special hooks - the "oper bind" and "oper release" methods - to notify an object when a reference is being added (by calling "oper bind") and released (by calling "oper release"). These methods are implicitly non-virtual: they cannot be overridden by a derived class, do not make use of the vtable and therefore they can be safely applied to null objects.

It is possible to implement the bind and release methods in classes derived from FreeBase or VTableBase to implement your own memory management. For example, the Wrapper class in crack.exp.bindings uses it oper release to always free the Wrapper instance when it is released, allowing it to essentially exist in the scope in which it is defined. Note that if you were to pass such an object out of that scope, the results would be undefined.

For efficiency, Crack does not bind and release every time you might expect: for one thing, objects passed as function arguments are not bound and released for the function call - we know that the external caller has a reference to these objects. The called function can simply borrow them.

Crack also has the notion of "productive" and "non-productive" expressions. A productive expression is one that produces a reference. A non-productive expression simply borrows an existing reference. Variable references are always non-productive. Functions returning values are (almost) always productive.

The compiler will call oper bind when assigning a non-productive value to a reference, or when returning a non-productive value. It will call "oper release" when a variable goes out of scope or when productive temporary value is cleaned up. In general, temporaries get cleaned up at the end of the outermost expression. For the "&&" and "||" operators, temporaries get cleaned up for the secondary expression prior to cleanup of outer expressions.

There's one thing you need to be aware of about reference counting: the mechanism is susceptible to the problem of reference cycles - this is when an object directly or indirectly references itself. When this happens, the entire cycle of objects can become unfreeable, potentially resulting in a memory leak. This is because each object retains a reference from the last object in the sequence, so even when all external references are removed, none of the objects will drop to a reference count of zero.

There's currently no good way around this: you just have to be aware that if you create a reference that can introduce a cycle, you'll need to take certain remedial measures to avoid leaking the objects. This is typically accomplished by breaking the cycles at some point, normally during the destruction of some external object that references the cycle without participating in it.

Primitive Bindings

Crack allows you to directly import and call functions from shared libraries. A special variation of the import statement allows you to import symbols from a shared library:

    # import malloc() and free()
    import "libc.so.6" free, malloc;

After doing this, it is necessary to provide declarations of the functions you've imported:

    byteptr malloc(uint size);
    void free(byteptr val);

You can then use them like any other function:

    mem := malloc(100);
    free(mem);

Many C functions require special arguments like pointers to integers. These can be passed using primitive arrays:


    # import "free()"
    import "libc.so.6" free;
    void free(voidptr mem);

    # import C function "void doSomething(int *inOutVal)"
    import "somelib.so" doSomething;
    void doSomething(array[int] inOutVal);
    
    # call it
    v := array[int](1);
    v[0] = 100;
    doSomething(v);
    
    # clean up our array
    free(v);

The crack.exp.bindings defines an Opaque class. This can be used for structures returned from C functions that contain no user-servicable parts. For example:

    import crack.exp.bindings Opaque;
    import "libFoo.so" Foo_Create, Foo_Destroy;
    class Foo : Opaque {}
    
    # create a Foo instance, then destroy it.
    foo := Foo_Create();
    Foo_Destroy(foo);

Opaque doesn't attempt to free the object, so it is important that you manage the object correctly yourself.

Sometimes C functions want to accept a function pointer to use as a callback. You can get this effect by defining a function and using a parameter type of voidptr for the callback parameter:

    import "libFoo.so" Foo_SetCallback;
    void Foo_SetCallback(Foo obj, voidptr callback);
    
    void myCallback(Foo obj) { cout `callback called\n`; }
    Foo_SetCallback(Foo_Create(), myCallback);

This won't work for overloaded functions: the compiler won't be able to tell which overload to use.

Crack's current approach to bindings is not without its problems:

This whole business is extremely platform dependent. There's no guarantee that the shared libraries that you're importing will have the same names on other platforms, or that the functions you define will not be implemented as macros. See Extensions for information on a more portable approach.
You can't currently import global variables.

Annotations

Annotations are a meta-programming mechanism: they essentially allow you to extend the Crack compiler in Crack. The annotation system is very much a work in progress: much remains to be done with it. That said, you can still use it to do some very impressive things.

As a simple example, lets say that we want to define a macro that expands to write a marker to standard output. In module myann we could define:

    import crack.compiler CrackContext;

    void writeMarker(CrackContext ctx) {
        # the built-in annotations @FILE and @LINE refer to the current 
        # filename and line number (something like "myann.crk" and 7, in this 
        # case).
        ctx.inject(@FILE.buffer, @LINE, 'cout `marker\\n`;'.buffer);
    }

From another module, we can use the macro by doing a compile-time import of it and invoking the annotation:

    @import myann writeMarker;
    
    # equivalent to 'cout `marker\n`;'
    @myann

In myann, we define the writeMarker() function, which accepts a CrackContext object as its sole argument. CrackContext is an interface to the compiler. We can use it to inject text, read and inject tokens, set callbacks from the compiler, emit errors and warnings and create other annotations.

CrackContext and all of the other objects defined in crack.compiler are fairly primitive: they are defined before the basic types (String and Object) even exist. As such, all of these interfaces must be built up from the primitive types. That's why we pass @FILE.buffer and 'cout marker\n;'.buffer to inject instead of just @FILE and 'cout marker\n;'. CrackContext.inject() accepts a byteptr argument. The buffer instance variable is a byteptr that points to the raw character data of the string.

The first and third arguments to inject() must also be null terminated. That means that the data must end with a zero byte value (this is the native way that strings are represented in C). All static strings in Crack happen to be null terminated, so we can easily do this with a static string. If we wanted to do the same with a constructed string we'd have to do something to guarantee that the string were null terminated (converting it to a CString would work, but there are often cheaper ways to accomplish this).

Using an annotation requires some special syntax. First, we import it using the @import statement. @import is just like import except that it imports the module at compile time into the annotation namespace. Symbols in the annotation namespace can be invoked with the syntax "@" symbol. symbol must refer to a function accepting a CrackContext object as its argument.

An annotation is invoked at compile time immediately after tokenization of the symbol.

Here's a more complicated example involving reading tokens. In this example, we process a simple variable definition and generate a getter and a setter for it:

    import crack.compiler CrackContext;

    void attr(CrackContext ctx) {
        # get the next token, assume it is a type
        type := ctx.getToken();
        if (!type.isIdent()) ctx.error(tok, 'Identifier expected');
        
        # get the attribute
        var := ctx.getToken();
        if (!var.isIdent()) ctx.error(tok, 'Identifier expected');
        
        # get a closing semicolon
        tok := ctx.getToken();
        if (!tok.isSemi())
            ctx.error(tok, 'semicolon expected after attribute definition');
        
        # get the token text - note that like all other strings in this 
        # interface, these are of type byteptr, not String
        typeText := type.getText();
        varText := var.getText();
        
        # format and inject code (note that we add a null character at the end)
        StringFormatter fmt = {};
        fn := @FILE; ln := @LINE; fmt `
            $typeText __$varText;
            
            $typeText get_$varText() { return __$varText; }
            void set_$varText($typeText val) { __$varText = val; }
\0`;
        ctx.inject(fn, ln, fmt.createString().buffer);

In addition to injecting full strings, you can also store single tokens: see the documentation in the header files in the compiler directory of the source tree for details.

When doing injection, one thing to keep in mind is that single tokens and blocks of injected text will be parsed in reverse order of how they were inserted: so if we inject block A and then B, the parser will read the contents of B and then of A. The tokens within an injected block are not in reverse order (the inject function takes care of this) but the blocks themselves are.

Inspecting Parser State

CrackContext provides two functions for inspecting parser state: these are getScope() and getParseState(). getScope() returns one of SCOPE_MODULE, SCOPE_FUNCTION or SCOPE_CLASS depending on whether the innermost enclosing context is within a module, function or class.

getState() currently only returns an indicator as to whether the parser is in the "base state." A return value of BASE_STATE indicates that we are not currently in the middle of parsing a definition or statement. For example, in the following code:

    int x;
    for (x = 0; x < 100; ++x)
        cout `$x\n`;

Annotations found immediately before the int, for and cout words and after the final semicolon would find the parser in the base state. Annotations anywhere else would not be in the base state.

These functions can be used in conjunction with parser callbacks to insure that an annotation used to modify a definition or statement preceeds the right kind of syntactic construct.

Here's an example of an annotation that verifies that it has been used in the base state of a class:

    void checkBaseState(CrackContext ctx) {
        if (ctx.getScope() != SCOPE_CLASS ||
            ctx.getParseState() != STATE_BASE
            )
            ctx.error('checkBaseState can only be used in a class definition '
                       'outside of a nested method or instance variable '
                       'definition.'.buffer
                      );
    }

Registering Parser Callbacks

The addCallback() and removeCallback() methods allow you to register and deregister parser callbacks. The parser calls these callbacks at certain points in the subsequent code.

Like annotations, callbacks are called with a CrackContext object as their only argument.

The following callbacks are available:

PCB_FUNC_DEF

Called immediately after the opening parenthesis of the parameter list of a function definition.

PCB_FUNC_ENTER

Called immediately after the opening curly brace of a function definition.

PCB_FUNC_LEAVE

Called immediately before the closing curly brace of a function definition.

PCB_CLASS_DEF

Called immediately after the "class" keyword in a class definition or forward declaration.

PCB_CLASS_ENTER

Called immediately after the opening brace of a class definition.

PCB_CLASS_LEAVE

Called immediately prior to the closing brace of a class definition.

PCB_VAR_DEF

Called immediately after the assignment operator, semicolon, or comma following a variable definition.

PCB_EXPR_BEGIN

Called immediately before parsing the outermost expression in a statement or the initialization clause of a for loop.

PCB_CONTROL_STMT

Called immediately after the keyword signifying a control statement (while, for or if).

As stated earlier, we can combine parser state with a callback to insure that an annotation is placed in a certain location in the code. The following code creates a funcMod annotation that can only be called prior to a function definition:


    @GenericObjArray(CallbackArray, Callback);
    CallbackArray callbacks;

    void deregisterCallbacks() {
        for (cb :in callbacks)
            removeCallback(cb);
    }

    void good(CrackContext ctx) {
        deregisterCallbacks(ctx);
    }
    
    void bad(CrackContext ctx) {
        deregisterCallbacks(ctx);
        ctx.error('@funcMod can only be used before a function '
                   'definition'.buffer
                  );
    }
    
    void funcMod(CrackContext ctx) {

        if (ctx.getScope() != SCOPE_CLASS ||
            ctx.getParseState() != STATE_BASE
            )
        ctx.error(" annotation can not be used here (it must precede a "
                   "function definition in a class body)".buffer
                   );
    
        callbacks.append(ctx.addCallback(PCB_FUNC_DEF, good));
        callbacks.append(ctx.addCallback(PCB_CLASS_DEF, bad));
        callbacks.append(ctx.addCallback(PCB_EXPR_BEGIN, bad));
        callbacks.append(ctx.addCallback(PCB_CONTROL_STMT, bad));
    }

Function Flags

The parser contains a function flag register that can be used by an annotation to set flags on function definitions that follow it. For example, we could implement the @static built-in annotation as follows:

    import crack.compiler CrackContext, FUNCFLAG_STATIC;
    
    void static(CrackContext ctx) {
        ctx.setNextFuncFlags(FUNCFLAG_STATIC);
    }

The only other function flag supported at this time is FUNCFLAG_FINAL.

Error Contexts

Sometimes it is useful to cause the parser to print additional information with an error message. For example, if you get a syntax error in a macro, you would ideally like to see the location within the macro that caused the error as well as the location where the macro was invoked.

You can do this with error contexts. Error contexts are string messages that are printed on the line following the original error message. There can be as many lines of error context as you want, they are stored as a stack and the last context is printed first.

Error contexts are pushed using CrackContext.pushErrorContext(). They can be removed from the stack either by calling CrackContext.popErrorContext() or by putting back a special token of type TOK_POPERRCTX. The latter causes the error context to be popped the next time the token is read, which is useful if your annotation injects a sequence of tokens and then returns - the error context will remain active until after your sequence of tokens is parsed.

As an example, let's say that we have registered beginClass() and endClass() as the PCB_CLASS_ENTER and PCB_CLASS_LEAVE callbacks. We could add the message "in a class!" to the error message with the following code:

    void beginClass(CrackContext ctx) {
        ctx.pushErrorContext('in a class!'.buffer);
    }
    
    void endClass(CrackContext ctx) {
        ctx.popError();
    }

An annotation can register other annotations. For example, if we wanted to create an annotation to define our beginClass() and endClass() annotations, we could do this:

    void registerAnnotations(CrackContext ctx) {
        ctx.storeAnnotation('beginClass'.buffer, beginClass);
        ctx.storeAnnotation('beginClass'.buffer, endClass);
    }

Like all annotations, these will exist throughout the scope of the module.

When registering an annotation, it is often useful to associate the annotation with user data. For example, the following example is a very weak macro facility. @record binds the next token to an identifier, when the identifier is used as an annotation it expands to its value:

    import crack.compiler CrackContext, Token;

    void _playback(CrackContext ctx) {
        tok := Token.unsafeCast(ctx.getUserData());
        ctx.putBack(tok);
    }
    
    void record(CrackContext ctx) {
        # get the identifier (error checking omitted)
        ident := ctx.getToken();
        
        # get the token to attach to it
        val := ctx.getToken();

        # register the identifier as an annotation, store 'val' as user data.        
        ctx.storeAnnotation(ident.getText(), _playback, val);
        
        # VERY IMPORTANT: need to bind a reference to the user data so it 
        # doesn't get deallocated
        val.oper bind();
    }

We could use it like this:

    @record var 'some string';
    String x = @var;

Note that we perform oper bind() on the value token after storing it with the annotation. This prevents the token from being garbage-collected when record() terminates. User data is a voidptr, not an object, and therefore it can not be managed. We also have to use an unsafeCast() to restore it to a Token.

We can access existing annotations with CrackContext.getAnnotation(). Annotation objects contain the following access methods:

    voidptr getUserData();
    byteptr getName();
    voidptr getFunc();

We've demonstrated obtaining and injecting tokens with the getToken() and putBack() methods. We can also create our own tokens:

    import crack.compiler CrackContext, Token, TOK_STRING;

    # annotation to convert the next token to a string
    void stringify(CrackContext ctx) {
        tok := ctx.getToken();
        ctx.putBack(Token(TOK_STRING, tok.getText(), tok.getLocation()));
    }

Usage would be as follows:

    String x = @stringify test;

Details of the annotation interface can be found in the header files in the crack/compiler directory installed in your include tree.

Built-in Annotations

Crack defines the following special built-in annotations:

@final

Marks a class method as final.

@static

Marks a class method as static. (See Static and Final Methods above)

@FILE

Expands to a string containing the current filename.

@LINE

Expands to an integer containing the current line number.

The Standard Annotations

The crack.exp.ann module defines a set of standard annotations. Two of them are @interface and @implements where are documented above in Interfaces. The others allow you to define macros. Crack macros are similar to C preprocessor macros with some improvements.

To define a macro for use within a module, simply use the @define annotation:

    @import crack.exp.ann define;
    
    @define max(Type) {
        Type max(Type a, Type b) {
            return a > b ? a : b;
        }
    }
    
    int a = 3, b = 5;
    x := @max(a, b);   # x becomes 5

When defining a macro, the identifier after the @define annotation is the macro name. The argument list follows, then the body of the macro enclosed in curly braces.

Defining a macro creates a new annotation with the macro name. Macro annotations parse an argument list that follows them and then expand the macro with the arguments.

When a macro is expanded, all of the tokens within the curly braces are expanded into the location of the macro. All macro arguments are expanded to the values passed in for those arguments.

Macros also set error contexts for the span of their expansion - if you get a syntax error in the body of the macro, the error message will report the location of the error within the definition of the macro, followed by the location of the annotation that expanded the macro.

As in C macros, you can do argument concatentation and stringification. A single dollar sign in a macro stringifies the following argument, a double dollar sign concatenates the argument with the adjacent tokens.

For example, if we wanted to define a macro to produce command-line flags, we might do something like this:

    @define flag(Type, name) {
        Type flag_$$name = Type$$Flag($name);
        flags.addFlag(flag_$$name);
    }
    
    @flag(int, port);

This would generate the following code:

    Type flag_port = intFlag('port');
    flags.addFlag(flag_port);

By default, macros are limited to the modules that they are defined in. To make a macro available to other modules, you have to export it:

    @import crack.exp.ann define, export, exporter;
    
    # Indicates that the module will be exporting macros.
    # What this actually does is to import various utlity functions from 
    # crack.exp.ann needed to do an export.
    @exporter
    
    @define myMacro(a, b) { ... }
    @export myMacro

We can then import the macro as an annotation to another module:

    @import mymodule myMacro;
    
    @myMacro(1, 2)

Extensions

In addition to the approach for calling C functions identified in Primitive Bindings, Crack supports a formal extension concept allowing you to load specially crafted shared libraries as if they were Crack modules.

When importing a module, if the executor is unable to find a native Crack module in the search path, and if there is a shared library with that name on the search path, and if the shared library defines an initialization function corresponding to the module name, the shared library is loaded as an extension module, and the initialization function is called at load time.

Initialization functions must have a name corresponding to the canonical name of the module. The convention used is to join all of the elements of the canonical name with underscores and then append "_init" to the result. So for the extension module foo.bar.baz, the initialization function would be named foo_bar_baz_init.

Initialization functions should be C++ functions declared as 'extern "C"'. They accept a parameter of type crack::ext::Module *. The role of the initialization function is to populate the module object with the module's interface. To describe it another way, the initialization function configures everything that is necessary to import the module at compile time.

For example, let's say that we want to provide a binding for a C function called myfunc():

    void myfunc(char *data) { printf("my data: %s", data); }

Our init function (again, assuming the module is foo.bar.baz) would be as follows:

    #include <crack/ext/Module.h>
    #include <crack/ext/Type.h>
    #include <crack/ext/Func.h>

    extern "C" void foo_bar_baz_init(crack::ext
    // define a function returning void
    Func *func = mod->addFunc(mod->getVoidType(), "myfunc", (void *)myfunc);
    // add the "data" argument, parameters are the type and the parameter name
    func->addArg(mod->getByteptrType(), "data");

The details of the extension interface are docummented in the header files in the crack/ext directory installed with your distribution. To compile your extension, you'll want to:

build it into a shared library, linked with whatever support code it needs (usually the code it's wrapping)
Add the include directory for crack headers to the include path (usually something like /usr/local/include/crack-0.3)

The Extension Generator

Creating extensions by hand is way too much work. In the interests of simplifying it we have the crack.exp.extgen module. This defines a @generateExtension annotation that hijacks the compiler to generate a source file defining the extension. Its usage is best illustrated by example:

    @import crack.exp.extgen generateExtension;
    
    # generate an extension that will be imported as 'foo.bar'
    @generateExtension foo.bar {
    
        # name of the source file to generate.
        @filename "foobar.cc"
        
        # code to inject into the extension source file
        @inject '#include "FooBar.h"\n';
        
        # function accepting a char *
        void puts(byteptr string);
        
        # function accepting a float and an integer pointer
        int hairy(float val, array[int] result);

        # opaque type and a function that uses it.        
        class MyType;
        void myFunc(MyType val);
        
        # constants (must be integer or float types)
        const int
            # with no value it will be set to the value of the C constant
            SAME_NAME_AS_C_CONSTANT,
            
            # when defined from an identifier, the identifier is the C 
            # constant
            SIMPLE_NAME = LESS_AESTHETIC_NAME_OR_VALUE,
            
            # when assigned from an integer, the value is the integer
            ONE_HUNDRED = 100,
        
            # a string value can be used to indicate a complex expression
            SIMPLE_NAME = 'ONE_HUNDRED >> 1';
    }

Threading

As it stands, Crack 0.3 is written with little regard for threads. You can attempt to use the normal threading libraries, if you like, but you're likely to run into some problems. In particular, you should be aware that the reference counting mechanism is not thread-safe, so memory management will most likely fail in really hard to debug ways if you share lots of objects between threads.

This will be remedied in a future version through the introduction of atomic operations, which will allow the reference counting mechanism to be implemented safely - with some cost to performance of threaded applications.

Debugging

Crack has only minimal support for debugging in 0.3. If your program seg-faults or aborts, you can at least get a sparse stack-trace by running it under a fairly recent version of GDB (7.0 or later).

Appendix 1 - Libraries

Crack comes with its own (sparse but growing) set of standard libraries. These are loosely organized as the modules under crack and crack.exp. crack.exp includes modules that are designated as "experimental." These have interfaces that are likely to change as Crack matures. The other libraries are also changeable, but should be frozen by version 1.0.

Special Language Types

Strings and Buffer Types

Strings are derived from Buffer. A buffer is just a byteptr (stored in the instance variable buffer) and a size (stored in the instance variable size).

The memory that a buffer references is presumed to be read-only, but nothing in the language enforces this. For cases where a writable buffer is desired, use something derived from a WriteBuffer subclass. WriteBuffer inherits from Buffer, but the read-only requirement is relaxed: it is legal to modify the buffer of a WriteBuffer.

Most output functions accept a Buffer, most input functions accept a WriteBuffer. For cases where you want a WriteBuffer that cleans up after itself, use a ManagedBuffer.

Constant strings are actually of type StaticString. This is a class for strings with buffers that are not to be deleted (because static strings can reference data in read-only memory segments). Constant strings are always guaranteed to be null terminated, making them suitable for use with low-level C functions defined in extensions and primitive bindings. This is not true of strings or buffers in general, see CString for a more general solution.

Like any object, a buffer can be converted to a bool. A buffer converts to true if it has a non-zero size.

Any two buffers can be compared - a byte-for-byte comparison is performed. If one buffer is identical to the beginning of the other, but is shorter, the shorter buffer is "less than" the larger buffer.

The String class is one of the few types that don't need to be imported. Buffer and its other descendents can be imported from crack.lang.

CString

When dealing directly with C functions expecting null terminated strings, we often need to convert a Crack string to a null-terminated value. CString is one way to do so. Given an arbitrary buffer type, CString will create a copy of the buffer and insure that there is a null terminator at buffer[size] (note that the size of a CString excludes the byte for the null terminator).

This allows us to safely do something like this:

    import runtime strcpy;
    
    String userInput = getUserInput();
    data := byteptr(userInput.size + 1);
    
    strcpy(data.buffer, CString(userInput).buffer);

ManagedBuffer

A managed buffer is a WriteBuffer that manages its underlying buffer memory. These are designed for use with IO operations. The typical use case is to read into it and then convert it to a string, either by copying the underlying buffer or by passing ownership of it to the new string:

    # create a 1K managed buffer.
    ManagedBuffer buf = {1024};
    
    # read it, store the size of what we read
    buf.size = cin.read(buf);
    
    # convert it to a string and take ownership of the buffer (the second 
    # argument of the constructor is "takeOwnership")
    String s = {buf, true};

Note that Reader includes a utility method that does this more intelligently (it only transfers ownership if the amount read is at least 75% of the buffer size):

    # 's' is a string.
    s := cin.read(1024);

SubString

The SubString class provides a lightweight string that references the buffer of an existing string. You can create one by providing the constructor with an existing string, a start position and a size:

    import crack.lang SubString;
    
    s := SubString('this is a test', 10, 4);

SubString is derived from String, so you can use it in the same way that you would a string without the cost of managing and copying a portion of the buffer.

MixIn

The MixIn class allows you to do multiple inheritance safely - without risk of breaking reference counting. The MixIn approach is a stand-in for virtual inheritance, which Crack 0.3 does not yet support.

MixIn defines bind and release operators and a virtual function called getObject() that knows how to get the Object instance that the class is derived from. This allows bind and release to operate on a reference to the MixIn (which is not derived from Object) by convering it to an Object with a getObject() method implemented by the concrete class.

For example, we could use MixIn to define a pure interface like this:

    import crack.lang MixIn;

    class MyIface : MixIn {
        void doSomething() { die('MyIface.doSomething() not implemented.'); }
    }
    
    class MyConcrete : Object, MyIface {
        
        # we have to define getObject() to satisfy a MixIn.
        Object getObject() { return this; }
        
        # implement the interface.
        void doSomething() { cout `something happened\n`; }
    }

Unfortunately, because a class can only exist once in another class's inheritance tree (except for VTableBase), you can only do this trick once in a single class' ancestor list. For example, if Iface1 derives from MixIn, and Iface2 also derives from MixIn, it is illegal to define a MyConcrete derived from both Iface1 and Iface2 because they share the MixIn ancestor.

Because of this single-inheritance restriction, the MixIn strategy is really more useful as a pattern than as a base class. To make a class a MixIn, do something like this:

    class MyIface : VTableBase {
        
        Object getMyIfaceObject() {
            _die('MixIn.getObject() not implemented.');
            return null;
        }
    
        oper bind() { if (!(this is null)) getMyIfaceObject().oper bind(); }
        oper release() {
            if (!(this is null)) getMyIfaceObject().oper release();
        }
        
        void doSomething() { die('MyIface.doSomething() not implemented.'); }
    }
    
    class MyConcrete : Object, MyIface {
        
        Object getMyIfaceObject() { return this; }
        
        void doSomething() { cout `something was done\n`; }
    }

This trick will allow you to define interfaces and other types of mix-ins that can be safely inherited from in arbitrary ways.

Note that all of this trickery is temporary: later versions of Crack will support virtual base classes in a form that will make this functionality a part of the language.

Containers

Containers provide support for the basic sequential and mapping datastructures expected from a modern programming language. Crack's container library is crack.container All container classes are derived from Container.

Crack containers all store objects of type Object. You can store derived types, but in order to do anything useful with them after retrieval you pretty much need to cast them to the type that you expect them to be.

For example, if we have an array of objects of type Foo with method bar(), we need to do something like this to call bar on the zeroth element:

    Foo.cast(arr[0]).bar();

Java-like generics will be supported in a later version of the language, and you'll be able to have an Array[Foo].

All containers have a count() method that returns the number of elements in the container. They also convert to boolean based on whether or not they are empty:

    c := createContainer();
    if (c)
        cout `container has $(c.count()) elements\n`;
    else
        cout `container is empty\n`;

Iterators

Iterators are like iterators in java. All container types have an iter() method which returns an Iterator object for the container.

Iterators allow you to iterate over the set of objects in a container one at a time. As an example, consider an array:

    Array arr = createArray();
    
    i := arr.iter();
    while (i.nx())
        cout `got $(i.elem())\n`;

arr.iter() creates an iterator over the array initialized at the first item. The nx() method forwards the iterator to the next element except when it is called for the first time. It returns true if the iterator is valid after being forwarded. elem() returns the element associated with the iterator.

The nx() method is a stop-gap solution - it exists because Crack doesn't have a for statement yet. nx() lets us do iteration in a while loop. Don't expect it to stick around for very long.

You can also use the more permanent next() method - this forwards the iterator regardless of whether it is pointing to the first element or not.

Arrays

Arrays are a collection that preserves element order with fast random-access assignment and retrieval. Unlike their low-level counterparts, the are safe for general use. They manage both the size of the underlying array and the reference counts of the elements.

Construct an array with the expected number of elements:

    import crack.container Array;
    Array arr = {10};  # 10 element array

The number of elements specified is the array capacity - this is the number of elements that can be added before it is necessary for the array to reallocate its underlying, low-level array.

Like other containers, arrays have a count() method that returns the actual number of elements in the array. At the time of construction, the count is zero.

To add an element to an array, use the append() method:

    arr.append('first element');
    arr.append('second element');
    cout `count = $(arr.count())\n`;  # writes "2"

Note that append() will occasionally be an O(n) operation: if we don't have capacity for a new element, append() will reallocate the array to one twice as big.

To get an element in an array, use the bracket operator:

    cout `elems: $(arr[0]), $(arr[1])\n`;

You can also replace an existing element in an array using element assignment:

    arr[1] = 'new second element';

Note that this only works to replace an element: arr[2] = 'something' would result in a runtime error.

Finally, like all containers, you can iterate over the elements of an array:

    i := arr.iter();
    while (i.nx())
        cout `elem: $(i.elem())\n`;

Linked Lists

The List class implements a linked list. Linked lists aren't very good for random access, but they do support constant time insert and append, making them preferrable to arrays in certain cases.

The List interface looks very much like the Array interface:

    import crack.container List;
    
    List l;
    
    # add some elements
    l.append('first');
    l.append('second');
    
    # iterate over them
    i := l.iter();
    while (i.nx())
        cout `got element $(i.elem())\n`;

Random access lookup, delete and insert are all available, although they are all O(n) operations:

    cout `$(l[1])\n`;       # prints "second"
    l.delete(0);            # deletes 'first'
    l.insert(0, 'first');   # reinserts 'first'

You can also push a new element onto the front of the list (equivalent to insert(0, ...) but terser and slightly faster):

    l.pushHead('zero');

pushTail() is also available as a synonym for append().

Likewise, you can pop items off the head of the list:

    elem := l.popHead();  # after the last operation, elem == 'zero'

popTail() can not be implemented efficiently: if you need this, use a DList.

Doubly-Linked Lists

The DList class implements a doubly-linked list. DList provides some additional functionality over List and can facilitate faster operations, but at the cost of higher memory usage (each DList element requires an additional pointer).

DList supports inserting and deleting the element referenced by an iterator:

    DList dl;
    fillList(dl);

    i := dl.iter();
    while (i.nx()) {
    
        # delete the 'delete me' string from the list
        if (i.elem() == 'delete me')
            dl.delete(i);
        
        # insert 'inserted' before 'before me'
        else if (i.elem() == 'before me')
            dl.insert(i, 'inserted');
    }

After insert, the iterator points to the new element. After delete, it points to the element after the deleted element.

DList also supports a bi-directional iterator:

    # get a bi-directional iterator - an argument of "true" causes 
    # the iterator to be initialized to the end of the list.  "false" would 
    # start it at the beginning.
    i := dl.bidiIter(true);
    while (i) {
        cout `$(i.elem())\n`;
        i.last();
    }

You can use a bi-directional iterator everywhere you can use a normal iterator.

In addition to the popHead() function, DList supports a popTail() function:

    DList dl;
    dl.append('first');
    dl.append('second');
    elem := dl.popTail();  # elem == 'second'

RBTree Maps

A "map" or an "associative array" is a container that stores a set of values each indexed by a key. Both keys and values can be arbitrary objects. The only kind of map that Crack 0.3 supports is red-black trees. Red-black trees provide a natural sort order, and provide O(log n) insertion and lookup with constant time rebalancing.

To create an RBTree map:

    import crack.container RBTree;
    
    map = RBTree();

To add elements:

    map['first'] = '1';
    map['second'] = '2';
    map['third'] = '3';

To access them:

    cout `$(map['first'])\n`;  # writes "1"

Iteration is the same as for arrays, except that the elements are KeyVal objects. As their name suggests, KeyVal objects have a key and val attribute:

    i := map.iter();
    while (i.nx()) {
        kv := KeyVal.cast(i.elem());
        cout `map[$(kv.key)] = $(kv.val)\n`;
    }

Algorithms (Sort Functions)

The crack.exp.algorithm module includes two sorting algorithms for arrays. For example:

    import crack.containers Array;
    import crack.exp.algorithm quickSort, insertionSort;
    Array arr = getAnArray();
    
    # try out both kinds of sort
    quickSort(arr);
    insertionSort(arr);

Sorting relies on the comparison functions of Object, which in turn rely on the overridable cmp() method.

Generic Containers

"Generics" is a language feature that allows you to create classes that are parameterized by other classes. It is particularly useful for containers. For example, rather than just an "array", we typically want to be able to create an "array of strings." Generic containers have the following advantages over standard containers:

Type safety. An "array of strings" will produce an error at compile time if you attempt to put something that's not a string in it.
No need for typecasting.
Improved performance. A lot of things that standard containers need to do at runtime, through virtual functions, can instead be done at compile time.
Support for primitive types. All of the standard containers require you to wrap primitive values in aggregate types derived from Object.

Crack 0.3 does not yet support generics. However, it does provide a weak substitute for generic containers in the form of container macros (see The Standard Annotations for complete info on macros). All of the standard container types (Array, List, DList and TreeMap) have generic implementations.

For example, if we wanted to build a TreeMap allowing us to lookup strings based on integer keys, we could do this:

    import crack.lang die, cmp;
    import crack.io cout, Formatter;
    @import crack.exp.cont.treemap GenericTreeMap;
    
    int cmp(int a, int b) { return a - b; }
    @GenericTreeMap(IntStringMap, int, String);
    
    IntStringMap vals = {};
    vals[0] = 'test';
    for (item :in vals)
        cout `$(item.key): $(item.val)\n`;

@GenericTreeMap is a macro that expands to a class definition for (in this case) the IntStringMap class, which has keys of type int and values of type String. Note that we need to import some other things to be able to use it (die, cmp, cout and Formatter). The requirements of each of the generic container types are identified in the comments at the beginning of their source files.

Arrays are a little different in that there are primitive and object based varieties. Here's an example of the creation of an array of strings:

    # imports required for generic arrays
    import crack.lang die, free;
    @import crack.exp.cont.array GenericArray, GenericObjArray;
    
    import crack.io cout;
    
    @GenericObjArray(StringArray, String);
    
    StringArray strings = {};
    strings.append('first');
    strings.append('second');
    for (s :in strings)
        cout `$s\n`;

Note that we have to import both GenericArray and GenericObjArray. GenericObjArray is actually implemented in terms of GenericArray.

If we wanted to create an array of primitives (for example, of floating point values), we would use @GenericPrimArray:

    import crack.lang die, free;
    @import crack.exp.cont.array GenericArray, GenericPrimArray;

    @GenericPrimArray(FloatArray, float);

GenericList and GenericDList are also provided in the crack.exp.cont.list module.

Macro-based generic containers are not as good as real generic containers:

They require explicit import of support objects.
They require explicit instantiation. (With generic types, we could just use Array[String], for example, instead of having to declare it somewhere).
If the same container type is used in more than one module, the same code will be generated in multiple places, resulting in an unnecessarily large runtime footprint.

Support for true generics is planned for Crack 0.4.

Error Handling

Future versions of crack will have real C++/Java like exception handling. But for version 0.3, we at least have the error module. The error module lets you record an error in a manner that can be controlled by the callers. The default action is to abort.

Here's an example of how we might use it to verify a parameter:

    import crack.exp.error err;
    
    Object checkForNull(String context, Object val) {
        if (val is null)
            err.do() `$context got a null argument`;
        return val;
    }

err.do() returns a formatter that writes to a writer stored in the err object. When this formatter is destroyed, it will call err.finish() which calls abort(), terminating the program, if err.fatal is true. If err.fatal is false, err.gotError is set to true and the program continues.

To prevent a function from aborting the program when an error occurs, we can use the ErrorCatcher class:

    import crack.exp.error err, ErrorCatcher;
    
    void myFunc(Object arg) {
        catcher := ErrorCatcher();
        checkForNull("myFunc", arg);
        if (catcher.gotError())
            cout `We got a null and don't want to die: $(catcher.getText())\n`;
    }

When you create an ErrorCatcher, it sets error.fatal to false and sets error.writer to an internal StringWriter, causing the error handler to have no visible effect. The caller can then check to see if there was an error from the catcher, and obtain the error text from the catcher's getText() method. When the catcher is destroyed, it restores the error handler to its previous state.

It is not advisable to use this mechanism for errors that can conceivably indicate something other than a programming error. For example, it would not be advisable for a file interface to call a fatal error for a file not found. Better to use a lightweight return code and let the caller decide whether this represents a programming error.

As a convenience, the crack.exp.error module also provides a strerror() method. strerror() returns the error text of the last error that occurred in the C library.

Program Control (sys)

crack.sys contains symbols for interacting with the executable context. It defines the exit() function and argv variable.

The exit() function can be used to terminate the program at any point with the given exit code:

    import crack.sys exit;

    # terminate with the normal code.
    exit(0);

The argv variable is a StringArray instance that allows you to access the command line arguments:

    import crack.sys argv;

    # write out all of the args except the program name (argv[0])
    uint i = 1;
    while (i < argv.count()) {
        cout `arg $i = $(argv[i++])\n`;
    }

StringArray is a special purpose class just for storing arg lists without having to do annoying casting. You can use it as a general purpose string sequence if you like, it will be supplanted by Array[String] once crack supports generics (post 0.3).

Input/Output

The crack.io module contains crack's basic input/output interfaces. The Reader and Writer classes are the core interfaces for reading and writing data. These make use of the Buffer hierarchy and are implemented using the mix-in pattern so that they can be combined with each other and other classes.

The crack.io module defines three global variables: cin, cout, and cerr. These are wrappers around the system standard input, output and error streams. cin is a kind of Reader (an FDReader, to be precise) and cout and cerr are both Formatters. Formatter, in turn, implements Writer, so we can use these streams to illustrate the Reader and Writer interfaces.

Writer has one salient method called write(). This method writes a buffer to the underlying output object:

    cout.write('some data');

Since String is derived from Buffer, we can use the write() method to write a constant string. Classes implementing Writer must implement the write() method.

Reader provides two methods: an efficient method that reads into a WriteBuffer and a heavier form that reads a buffer of specified size and returns a string.

    ManagedBuffer buf = {1024};  # ManagedBuffer is derived from WriteBuffer
    bytesRead = cin.read(buf);  # read up to 1024 bytes into 'buf'

read(WriteBuffer) doesn't modify the size of its input buffer. It does return the number of bytes read. It is the responsibility of the caller to change the size of the buffer if desired.

We could have done something similar with the heavy-weight form:

    String data = cin.read(1024);

This is more expensive than read(WriteBuffer) because it actually allocates a buffer and a String object.

We could implement the most basic functionality of the cat command like this:

    ManagedBuffer data = {4096};
    while (data.size = cin.read(data)) {
        cout.write(data);
        data.size = 4096;
    }

cin, cout and cerr are implemented using FDReader and FDWriter. These classes implement Reader and Writer to read from and write to a file descriptor. You can obtain a file descriptor from any of the low-level IO functions on your system ("open()", for example).

The crack.io module also provides the Formatter class. See The Formatter Interface for details on how to use this.

Line Readers

The crack.readers module provides a LineReader class that lets you read from a writer a line at a time:

    import crack.readers LineReader;
    
    reader := LineReader(cin);
    String line = null;
    while (line = reader.next()) {
        cout `got line: $line\n`;
    }

The next() method returns null when there is no more data to read.

Files and Directories

crack.exp.file contains two classes that provide a basic interface for manipulating files as objects, including obtaining file information and simple reading and writing.

FileInfo

A FileInfo object can be constructed from a String, which should contain a full path name (where the file does not have to exist). The class will provide methods for obtaining information on a file (if it exists) and manipulating file and path information. If the file exists, information will include size, owner and group information, file permissions, and the like. Regardless of file existence, file information such as such as glob matching, basename, and dirname will be available.

Currently, however, the only implemented methods are matches, for matching a file name against a pattern such as '*.txt', basename to return the file name portion of a full path (optionally stripping extension), and dirname for returning just the directory portion of a full path.

Example:

    fi := FileInfo('/etc/resolv.conf');
    if (fi.matches('*.conf'))
        cout `it's a conf file!\n`;
    basename := fi.basename(true); // strip extension
    // basename contains a string 'resolv'
    basename := fi.basename(false); // don't strip extension
    // basename contains a string 'resolv.conf'
    dirname := fi.dirname();
    // dirname contains '/etc';

File

The File object can be used for reading and writing to files on disk. It extends FileInfo and so all informational functionality available with that class is also available to File.

A File should be constructed with the full path to the file and the mode to open in, which is one of 'r' or 'w' for read or write, respectively. If the file is opened in read mode, a LineReader is automatically created which allows for reading the file line by line via the nextLine method. Note that the trailing newline is include in the return value.

Read Example:

 
    f := File('/etc/resolv.conf', 'r');
    String lineBuf = null;
    lineNum := 1;
    while (lineBuf = f.nextLine()) {
        cout `$lineNum: $lineBuf`;
    }

When opened in write mode, a file may be written to by passing a String to the write method:

    
    f := File('/tmp/myfile.txt', 'w');
    f.write("lorem ipsum rock");

Note, there is currently no append mode; if the file already exist, it will be overwritten upon opening in write mode.

File objects may be closed with the close method. This is done automatically when the File object destructor runs (if it hasn't been closed manually).

Regular Expressions

Crack provides a regular expression library in the form of crack.exp.regex. You use it by creating a Regex object:

    import crack.io cin, cout;
    import crack.exp.readers LineReader;
    import crack.exp.regex Regex, Match;

    rx := Regex('(\\w+)=(.*)');
    String line = null;
    LineReader src = {cin};
    while (line = src.next()) {
        Match m = rx.search(line);
        if (m)
            cout `got $(m.group(1)) = $(m.group(2))\n`;
    }

The Regex.search() method returns a Match object if there was a matching substring and null if there wasn't. The Match object allows you to get the entire text of the matched substring, the start and end positions of it, and also allows you to get this information for parenthesized subgroups (employed in the example above).

We uses values of 1 and 2 to access the subgroups: the zeroth subgroup is the text of the entire expression.

To get the start and end position of the matched substring:

    uint startPos = m.begin();
    uint endPos = m.end();

Likewise, we can do this with subgroups:

    uint start1 = m.begin(1);
    uint end1 = m.end(1);

Note again that the zeroth subgroup (m.begin(0)) is equivalent to the entire group, so we start with group 1.

We can also use named subgroups. For example, we could have written our expression like this:

    rx := Regex('(?P<var>\\w+)=(?P<val>.*)');

And then the output statement would look like this:

    cout `got $(m.group('var')) = $(m.group('val'))\n`;

Likewise, the begin() and end() functions can be used with group names.

Named subgroups can greatly enhance the readability of regular expression code.

crack.exp.regex is a wrapper around the PCRE library (perl-compatible regular expressions). As the name suggests, the regular expression syntax supported by this library is very close to the syntax supported by perl 5.

Math

The crack.math library provides access to functions and constants from the C standard math library. A description can be found on WikiPedia or a more complete reference at Dinkumware.

Since Crack allows function overloading, only one function name exists for both float32 and float64 values. E.g. there is no sinf function for float32 values. The long double type in C is not currently supported.

The library provides:

trigonometric functions

sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, asinh, acosh, atanh.

logarithmic/exponential functions

exp, exp2, expm1, log, log10, log1p, log2, logb, modf, hypot.

power and absolute value functions

pow, sqrt, cbrt, fabs, hypot.

gamma and error functions

erf, erfc, lgamma, tgamma,

integer functions

ceil, floor, nearbyint, rint, round, trunc, nextafter.

library error handling functions

errno, setErrno, testexcept, clearexcept.

other functions

fdim, fmod, remainder, copysign, fpclassify, isfinite, isinf, isnan, isnormal, sign, atoi, atof, usecs.

floating point classification and error condition constants

HUGE_VAL, INFINITY, NAN, FP_INFINITE, FP_NAN, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, FP_ILOGB0, FP_ILOGBNAN, ALL_EXCEPT, INVALID, DIVBYZERO, OVERFLOW, UNDERFLOW.

physical constants

E, LOG2E, LOG10E, LN2, LN10, LNPI, PI, PI_2, PI_4, PI1, PI2, SQRTPI2, SQRT2, SQRT3, SQRT1_2, SQRTPI, GAMMA.

Below is a simple example:

    import crack.math sqrt, pow, PI;
    
    float length(float x, float y) {
        # x*x + y*y would perform better but would be less illustrative
        return sqrt(pow(x, 2) + pow(y, 2));
    }
    
    len := length(10, 5);
    x = cos(30.0 / 180 * PI) * len;
    y = sin(30.0 / 180 * PI) * len;

Bindings

The crack.exp.bindings library provides some facilities to ease integration with C libraries. It is often possible to model C types and functions using Crack classes and functions providing that you can remove some of the crack facilities (such as garbage collection and vtables).

Opaque

The Opaque class is used as the base class for types that where the crack code does not need to access instance variables. In these cases, the C code must define functions to obtain and manipulate the object. More information on Opaque can be found in Primitive Bindings.

Wrapper

Wrapper is the base class for types where C code requires the address of a primitive value (typically for an output parameter).

The crack.exp.bindings module currently defines two wrappers:

IntWrapper

Used when C code calls for a pointer to a single integer.

ByteptrWrapper

Used when C code calls for the address of a char * (or byteptr in Crack).

You can also derive from Wrapper directly to implement your own pointer functionality.

It is important to note that Wrapper and Opaque are not derived from Object, and as such they have no garbage collection. Opaque objects should be associated with real objects that know how to clean them up using their low-level functions. Wrapper objects must not be assigned to other variables, otherwise they will be garbage collected twice. Values referenced by wrappers conform to whatever lifecycle arrangements are provided by the low-level APIs that they are obtained from.

Networking

The crack.net module includes support for TCP/IP socket level programming.

A socket is a communication endpoint for TCP/IP communications. Sockets can either be connection channels, allowing you to send data to a matching socket elsewhere on the network, or listeners which wait for new connections.

We create a socket as an instance of the Socket class:

    import crack.net sockconsts, Socket;
    
    s := Socket(sockconsts.AF_INET, sockconsts.SOCK_STREAM, 0);

This creates a "stream" (TCP) socket for a reliable connection over the INET protocol.

With a little more code, we can create a very basic server program:

    import crack.net sockconsts, Socket, SockAddrIn;
    
    Socket srv = {sockconsts.AF_INET, sockconsts.SOCK_STREAM, 0};
    
    # allow the socket's port to be immediately reused after the program 
    # terminates (useful for debugging and server restarts)
    srv.setReuseAddr(true);

    # bind to port 1900 on all interfaces
    if (!srv.bind(SockAddrIn(sockconsts.INADDR_ANY, 1900)))
        err.do() `error binding socket`;
    
    # queue up to 5 connections
    srv.listen(5);
    
    # accept loop
    while (true) {
        
        # accept a new client connnection
        accepted := srv.accept();
        if (!accepted)
            err.do() `error accepting new connection`;
        
        # read up to 1K from the new connection
        data := accepted.sock.read(1024);
        
        # write it back to the new connection (we could have also used the 
        # send() and recv() functions)
        accepted.sock.write(data);
    }

A client script might look like this:

    import crack.net sockconsts, SockAddrIn, Socket;
    import crack.io cout, ManagedBuffer;
    import crack.exp.error err, strerror;
    
    Socket s = {sockconsts.AF_INET, sockconsts.SOCK_STREAM, 0};
    
    # connect to localhost (127.0.0.1)
    if (!s.connect(SockAddrIn(127, 0, 0, 1, 1900)))
        err.do() `error connecting: $(strerror())`;
    
    # this time, we'll use send and recv
    if (s.send('some data', 0) < 0)
        err.do() `error sending: $(strerror())`;
    
    # receive from the socket
    ManagedBuffer buf = {1024};
    int rc;
    if ((rc = s.recv(buf, 0)) < 0)
        err.do() `error receiving: $(strerror())`;
    buf.size = uint(rc);
        
    cout `got $(String(buf))\n`;

When writing a server, you usually want to manage multiple connections. You can use the Poller class for this. Poller wraps the POSIX poll interface, which allows you to wait for an event on a set of Pollable objects (Socket is derived from Pollable, so you can use this to manage sockets).

Poller allows you to add any number of Pollables and the events you care about on them. You can then wait for an event (or a timeout) using the wait() method. Here is an example of a simple server program (it just echos back whatever it reads from a client) written using Poller:

    import crack.net sockconsts, Poller, PollEvent, Socket, SockAddrIn;
    import crack.container DList;
    import crack.exp.error err, strerror;
    
    Socket srv = {sockconsts.AF_INET, sockconsts.SOCK_STREAM, 0};
    
    # allow the socket's port to be immediately reused after the program 
    # terminates (useful for debugging and server restarts)
    srv.setReuseAddr(true);

    # bind to port 1900 on all interfaces
    if (!srv.bind(SockAddrIn(sockconsts.INADDR_ANY, 1900)))
        err.do() `error binding socket`;
    
    # queue up to 5 connections
    srv.listen(5);

    # our list of clients    
    DList clients;

    # remove a client from the list
    void removeClient(Socket client) {
        i := clients.iter();
        while (!(client is i.elem()))
            i.next();
        
        if (!i)
            err.do() `unable to remove client: $(client.fd)\n`;
        
        clients.delete(i);
    }
    
    # accept loop
    while (true) {
        Poller p;

        # add the server socket and all of the client sockets        
        events := sockconsts.POLLIN | sockconsts.POLLERR;
        p.add(srv, events);
        i := clients.iter();
        while (i.nx())
            p.add(Socket.cast(i.elem()), events);
        
        # wait indefinitely for the next event
        rc := p.wait(null);
        if (rc < 0)
            err.do() `error during poll: $(strerror())`;
        
        PollEvent evt = null;
        while (evt = p.nx()) {
            cout `got event\n`;
            # if this is the server, do an accept.  Otherwise do a read.
            if (evt.pollable is srv) {
                accepted := srv.accept();
                if (!accepted)
                    err.do() `error accepting new connection`;
                
                clients.append(accepted.sock);
            } else {
                # read some data, write it back (what we should really do is 
                # add sockconsts.POLLOUT to the events that this client is 
                # listening for, and then write when the socket becomes ready 
                # to write, but in the interests of simplifying the example 
                # we'll just write back to it)
                client := Socket.cast(evt.pollable);
                data := client.read(1024);
                if (!data) {
                    cout `removing client $(client.fd)\n`;
                    removeClient(client);
                } else {
                    client.write(data);
                }
            }
        }
    }

GTK

crack.exp.gtk is an experimental (and currently very minimal) GTK module. It currently provides classes for the following GTK objects:

Toplevel

A top-level (window manager level) window.

Tooltips

Used to tie fly-over help text to widgets.

Button

A pushbutton widget.

Entry

A single line text entryfield.

HBox

A container that arranges child widgets horizontally in a row.

VBox

A container that arranges child widgets vertically in a column.

App

Models an application with an init function and a main loop.

Here's a sample program:


    import crack.exp.gtk App, Button, Entry, Handlers, HBox, Label, Toplevel,
        VBox;
    import crack.sys argv;
    import crack.io cout;
    
    App app;
    
    # this is going to be our window.
    class MyWindow : Toplevel {

        # create a bunch of child widgets.        
        Label lbl = {'Your Name:'};
        Entry dataEnt;
        Button doneBtn = {'Close'};
        VBox v1 = {false, 10};
        HBox h1 = {false, 10};

        # print out all of the data collected by the window.
        void getData() {
            cout `Entry text: $(dataEnt.getText())\n`;
        }
        
        # handler for the button class - print out the data and quit.
        class DoneBtnHandler : Handlers {
            # note that this creates a reference cycle - we need to do 
            # something to break the cycle if we care about memory leaks.
            MyWindow win = null;
            oper init(MyWindow win0) : win = win0 {}
            bool onClicked() {
                win.getData();
                app.quit();
                return false;
            }
        }
        
        oper init() {
            # arrange all of the widgets
            add(v1);
            v1.add(h1);
            h1.add(lbl);
            h1.add(dataEnt);
            v1.add(doneBtn);

            # set the done handler.            
            doneBtn.setHandlers(DoneBtnHandler(this));
            doneBtn.handleClicked();

            # show all of the widgets            
            lbl.show();
            dataEnt.show();
            doneBtn.show();
            v1.show();
            h1.show();
            show()
        }
    }
    
    app.init(argv);
    MyWindow win = {};
    app.main();

Crack's GTK libraries are really a toy implementation or proof of concept. You can probably implement minimal programs with them, you can implement larger programs if you care to augment the library.

Appendix 2 - Migration

Crack guarantees backwards compatibility between ternary release numbers (so 0.2.1 should be compatible with 0.2). We will adhere to this policy until we release 1.0, at which point we will begin guaranteeing backwards compatibililty until the next major version number (2.0).

Until this time, the language and all of its libraries are considered to be "in flux:" anything can change between minor version numbers.

That said, we want to make the prospect of early adoption to be tolerable. If you want to avoid code breakage when upgrading, we recommend two strategies:

Bind your toplevel scripts to a specific version of crack. You can do this by adding the version number to the binary in your '#!' line:
```
        #!/usr/local/bin/crack0.3
```
You will want to install your own modules under the module tree for the version that they were developed for. For example, if you create module "foo.bar", you would typically install it somewhere like /usr/local/lib/crack-0.3/foo/bar.
Read the migration section of the manual for the changes involved in upgrading to the new version.
We will attempt to provide migration facilities, both in the form of documentation and migration warnings.

The remainder of this appendix describes the things you need to know when upgrading between versions.

From 0.2 to 0.3

The following words are now keywords, and no longer available for use as identifiers: in on for
The semantics of aggregate variable definitions without an explicit initializer have changed. Uninitialized variables use to initialize to an object created with the default constructor, they now initialize to null. So for example in this code:
```
        class A {}
        A x;
```
x would have been assigned to an instance of A in version 0.2, in version 0.3 it would be assigned to null.
The -m flag can be used to help you identify instances where your code is likely to be confused by this. It generates a warning for places where you are using default initialization (presumably expecting default initialization) and also where you are assigning the variable to null (since it is now the default and does not need to be specified explicitly).
We recommend that you run all code that you are upgrading with this option and audit all sites where one of these warnings are triggered.