Skip to content

The Perlang Language

Draft: The Perlang Language

This is a more in-depth page describing the Perlang language. It tries to cover all features currently implemented. If you are impatient and just want to see some examples of what Perlang can look like in action, the quickstart page might be a better start for you.

The text is currently a draft of the text which we provide as a "public preview". Once the document describes all aspects of Perlang in a satisfiable way, we will remove the "draft" status.

Language features

The top-level scope

Similar to script-based languages like JavaScript, Ruby and Python, a program in Perlang does not necessarily have to consist of a class or function (which is the case in languages like Java and C). This is because of the existence of a top-level scope. You can write statements in this scope, and they will be executed when the program is executed:

// printing-from-the-top-level-scope.per
print "Printing from the top-level-scope";

You can also declare variables in this scope and refer to them later in your program. It makes sense to think of the top-level scope as an "implicit main method" or even an "implicit class", if you come from a background in other languages where this way of thinking makes sense to you.

// defining-a-variable.per
var a = 1;
print a;

Variables

We already cheated a bit and defined a variable in the previous section, but let's look a bit more in-depth at this now. Variables can be defined in two ways: with explicit or implicit ("inferred") typing specified.

// two-types-of-variables.per
var a = 1;
var b: int = 2;

print a;
print b;

The above variable declarations are the same in essence. However, be not deceived; Perlang is not a dynamic language even though it supports constructs like var a = 1. These examples illustrates this point further:

// invalid-reassignment-to-typed-variable.per
var a: int = 1;
a = "foo";

If you save the above and try to run it, you'll get an error like this:

Error at 'a': Cannot assign 'ASCIIString' to 'int' variable

This is because once a variable is declared, the type of this variable (explicitly or implicitly defined) is stored. The Perlang typechecker uses this information to ensure the type-wise correctness of your program, much like any other statically typed language.

Integer types

Perlang currently supports the following integer types. Their usage is demonstrated below.

  • int — 32-bit, signed (-2,147,483,648 to 2,147,483,647)
  • uint — 32-bit, unsigned (0 to 4,294,967,295)
  • long — 64-bit, signed (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • ulong — 64-bit, unsigned (0 to 18,446,744,073,709,551,615)
  • bigint — Dynamic number of bits, signed. No upper size limit.

Automatic type inference

When using var without an explicit type annotation, the compiler automatically selects the smallest integer type in which the literal value fits, but never smaller than 32 bits, i.e. int or uint. The following program illustrates this:

// integer-types.per
var i = 2147483647;
var u = 4294967295;
var l = 9223372036854775807;
var x = 1231231230912839019312831232;

print i.get_type();
print u.get_type();
print l.get_type();
print x.get_type();

// Expected output:
// perlang.Int32
// perlang.UInt32
// perlang.Int64
// perlang.BigInt

As can be seen above, the compiler will determine the appropriate size (from int and upwards) to use as the type as values grow larger. Here are some other examples:

var a = 1;                    // int   (fits in 32 bits)
var b = -1;                   // int
var c = 2147483648;           // uint  (too large for int, fits in 32-bit unsigned)
var d = -2147483649;          // long  (too large for int)
var e = 4294967296;           // long  (too large for uint, fits in 64-bit signed)
var f = 9223372036854775808;  // ulong (too large for long)

Compile-time assignment checks

Assigning an int to a long is perfectly permissible, since such assignments can be performed without data loss. In other words, the following code compiles without errors:

var i: int = 12345;
var v: long = i;

On the other hand, trying to assign a long to an int variable will fail, because the value cannot be guaranteed to fit into to the smaller variable size:

var v: long = 8589934592;
var i: int = v; // Error: Cannot assign long to int variable

In many languages, the above behaviour can be overriden using explicit casts, like int i = (int)v in C, C# and Java. In Perlang, such explicit casting is currently not supported.

Alternative literal formats

Integer literals can be written in decimal, hexadecimal, octal, or binary notation. When applicable, underscores can be used as digit separators for improved readability:

// integer-literal-formats.per
var year_of_quake = 1996;
var vga_framebuffer = 0xA0000;
var file_mode = 0o755;
var high_nibble_set = 0b11110000;
var one_megabyte = 1_048_576;

print year_of_quake;
print vga_framebuffer;
print file_mode;
print high_nibble_set;
print one_megabyte;

// Expected output:
// 1996
// 655360
// 493
// 240
// 1048576

Floating-point types

Perlang supports two floating-point types, based on the IEEE 754 standard:

  • float — 32-bit, single-precision (~7 significant decimal digits)
  • double — 64-bit, double-precision (~15–17 significant decimal digits)

Automatic type inference

Floating-point literals without a suffix default to double. To get a float instead, append the f suffix to the literal. The d suffix can also be used explicitly, to indicate double precision:

// floating-point-types.per
var d = 1.0;    // When no suffix is specified, floating-point literals will use
                // 'double'
var e = 1.0d;   // Explicit 'd' suffix can also be used
var f = 1.0f;   // Explicit 'f' suffix forces single precision ('float')

print d.get_type();
print e.get_type();
print f.get_type();

// Expected output:
// perlang.Double
// perlang.Double
// perlang.Float

Explicit type annotations

You can also declare floating-point variables with an explicit type, like with integer types:

// floating-point-explicit-types.per
var d: double = 3.14159265358979;
var f: float = 3.14f;

print d;
print f;

// Expected output:
// 3.14159265358979
// 3.14

Precision considerations

Because of the way floating-point numbers are represented in memory, not all decimal values can be stored exactly. double offers higher precision than float and is typically preferred unless memory usage or interoperability with single-precision data is a concern.

Assigning an integer to a float or double variable is permitted, but may lose precision for large values. For example, a 32-bit integer assigned to a float may not be represented exactly, since float can only represent integers without data loss in the range −2²⁴+1 to 2²⁴−1. Assigning a long (64-bit integer) to a float is not allowed and will produce a compile-time error; assigning it to a double is permitted, but may likewise lose precision for values outside the range −2⁵³ to 2⁵³.

Floating-point values cannot be assigned to integer variables — the following will produce a compile-time error:

var d: double = 3.14;
var i: int = d; // Error: Cannot assign double to int variable

Like with integers, the above can be overridden using explicit casts in other languages, but in Perlang this is currently not supported.

Strings

This section reflects the currently implemented functionality; not the full intended functionality. For more details, see #370 Implement string interface with smart handling of string literals.

String literals in Perlang are enclosed in double quotes ("string-value"). The easiest way to use strings are by using type inference (i.e. "implicit typing"). The example below also shows the actual type used under the hood:

// string-types.per
var s1 = "Hello, World";
var s2 = "こんにちは、世界"; // "Hello World" in Japanese

print s1;
print s2;
print;

print s1.get_type();
print s2.get_type();

// Expected output:
// Hello, World
// こんにちは、世界
//
// perlang.ASCIIString
// perlang.UTF8String

The above example can also be rephrased to use the generic string type explicitly, like this. As illustrated, the actual types used are still the same:

// string-type-agnostic.per
var s1: string = "hello, world";
var s2: string = "こんにちは、世界";

print s1.get_type();
print s2.get_type();

// Expected output:
// perlang.ASCIIString
// perlang.UTF8String

String types

The examples above illustrate usage of three of the most common string types in Perlang: string, ASCIIString and UTF8String. Here is a brief explanation of how they work.

  • string is the most generic type. It can be used for code saying "I can accept any kind of string". Note that retrieving the string length, or indexing the string based on character position is not possible with this type. This is because for some string types (most notably UTF8String), such operations are not easily supported. If indexing the string is required, you must convert it to ASCIIString or UTF16String, by calling as_ascii() or as_utf16() respectively.

  • ASCIIString is the underlying type used for strings which contain only ASCII characters. It's an efficient type for such content, using one byte per character.

  • UTF8String is used for string literals containing non-ASCII content. It uses the UTF-8 encoding, which is space-efficient for both ASCII and non-ASCII content, but has the significant disadvantage of using varying length for each character (code point). Each individual code point can be between 1 and 4 bytes, per RFC 3629.

  • UTF16String is never used implicitly for string literals, but can be used for programs that need to index a string based on position. It supports the full Unicode range, but code points outside the BMP (Basic Multilingual Plane) will use two characters using something called a surrogate pair. To create a UTF16String, call as_utf16() on an existing string.

All these types can be used explicitly in variable declarations:

// string-explicit-types.per
var s1: string = "hello";
var s2: ASCIIString = "hello";
var s3: UTF8String = "héllo";
var s4: UTF16String = "hello".as_utf16();

Concatenation

Perlang currently does not support Ruby/C#-style string interpolation. For more details, see #295 Support string interpolation

Strings can be concatenated using the + operator. Perlang also supports concatenating strings directly with numeric types, without requiring an explicit conversion:

// string-concatenation.per
var s1: string = "Hello";
var s2: string = "world";
var i: int = 2026;

print s1 + ", " + s2 + "!";
print "The year is " + i;

// Expected output:
// Hello, world!
// The year is 2026

Comparison

Strings are compared by value using the == operator:

// string-comparison.per
var s1: string = "hello";
var s2: string = "hello";
var s3: string = "world";

print s1 == s2;
print s1 == s3;

// Expected output:
// true
// false

Note that comparisons are currently "dumb"; they perform a character-by-character comparison and do not take different locales into consideration. For example, the following strings are "semantically equivalent" (café vs cafe + combining acute accent (U+0301)), but will be compared as non-equal with our current operator:

// string-comparison-unicode.per
print "caf\u00e9" == "cafe\u0301";

// Expected output:
// false

String length

The .length property is available on ASCIIString and UTF16String. For non-indexable strings like UTF8String, this information is not available.

// string-length.per
var ascii: ASCIIString = "hello";
var utf16: UTF16String = "こんにちは".as_utf16();

print ascii.length;
print utf16.length;

// Expected output:
// 5
// 5

Like for UTF8String, the .length property is not available on string, so attempting to access it on such objects will result in compilation errors.

The char type

The char type represents a single character. Character literals are enclosed in single quotes:

// char-literal.per
var c1: char = 'A'; // ASCII character
var c2: char = 'ø'; // Unicode character

print c1;
print c2;

// Expected output:
// A
// ø

Unicode support

char supports characters from the Unicode Basic Multilingual Plane (BMP), covering code points U+0000 to U+FFFF. This includes most characters used in the world's writing systems.

The char type in Perlang is 16-bit wide — wider than C and C++ (8-bit char), but narrower than languages like Rust which use a 32-bit character type. This is the same width as C# and Java, and is a deliberate compromise: 16 bits is enough to represent the vast majority of the world's writing systems, without doubling the memory footprint of character data compared to a 32-bit type. The trade-off is that characters outside the BMP — most notably emoji — cannot be stored in a single char. They can still be used in Perlang via UTF8String and UTF16String; they just cannot be used as char literals.

Escape sequences

The following escape sequences are supported in char literals:

  • '\n' — LF (Line feed/newline)
  • '\r' — CR (Carriage return)
  • '\t' — TAB
  • '\0' — NUL character

Hexadecimal ('\x1B') and octal ('\033') escape sequences are not supported. The '\0' null sequence is the only exception to the octal rule, included for convenience.

Usage in switch statements

char values can be used as the expression in switch statements, including with range conditions. See the switch statements section for examples.

Functions

Top-level functions are currently defined using the fun keyword. Here's a simple example of how a function can be defined and called:

// defining-and-calling-a-function.per
fun foo(): void {
  print "foo";
}

foo();

Many functions take one or more parameters. Here's an example of how such a function can be defined and called:

// defining-and-calling-a-function-with-parameters.per
fun greet(name: string, age: int): void {
  print "Hello " + name + ". Your age is " + age;
}

// Expected output: Hello John Doe. Your age is 42
greet("John Doe", 42);

The last example is interesting in a different way as well. It illustrates a language feature available in Perlang which we share with other languages like Java, C# and JavaScript - being able to concatenate string and int values without any conversions. Other languages like Ruby and Python are more strict in this regard, requiring an implicit conversion to String/str.

I imagine the reason for this to be the dynamic nature of these languages. In a dynamic language, it is not certain that a particular variable or parameter has a given type, so forcing the user to call i.to_s makes quite a bit of sense. By doing so, you ensure that the operation will do what the user expected. What would happen if you try to concatenate an integer and a random DTO/model instance? Such an operation does not make so much sense, so forcing the user to call model.to_s if they really want to do that does make the code more explicit and clear to the reader.

As a compiled, statically typed language, Perlang can make compile-time guarantees that implicit coercions like this will succeed — or produce a compilation error if they cannot. This is consistent with the behavior of our statically typed friends — Java and C#.

Interestingly enough, JavaScript wants to be different - it is indeed a dynamic programming language, but it still supports concatenation of arbitrary (non-numeric) objects. For example, doing new Object() + new Object() gives you the string [object Object][object Object]. To have a custom representation of the object being used in this case, you implement a custom toString() method for the object in question.

switch statements

Like many languages in the C family, Perlang supports switch statements for branching based on a value. The following types can be used as the switch expression: int, char, string, and enum types. A default branch can be added to handle values not matched by any case.

A notable feature is support for range conditions using the .. operator, which allows matching a contiguous range of values in a single case. The example below shows both range conditions and "regular", single conditions.

// switch-statement.per
fun classify(i: int): void
{
    switch (i) {
        case 1..5:
            print "one to five";
        case 6..8:
        case 9:
        case 10:
            print "six to ten";
        case 11:
            print "eleven";
        default:
            print "other";
    }
}

// Expected output:
// one to five
// six to ten
// eleven
// other
classify(3);
classify(7);
classify(11);
classify(42);

Multiple cases can share the same branch by listing them consecutively without a body between them. Note that unlike C and C++, there is no implicit fallthrough between cases — each case is independent. Because of this, there is no need to use the break keyword in this context.

Switch statements can also branch based on char values:

// switch-statement-char.per
fun classify_char(c: char): void
{
    switch (c) {
        case 'a'..'c':
            print "a to c";
        case 'd'..'f':
            print "d to f";
        case 'q':
            print "special case";
        default:
            print "other";
    }
}

// Expected output:
// a to c
// d to f
// special case
// other
classify_char('b');
classify_char('e');
classify_char('q');
classify_char('し');

Classes

Perlang supports defining user-defined classes with instance methods, static methods, constructors, and fields. Fields are immutable by default; use the mutable keyword to allow reassignment. Inheritance is not yet supported.

Here is a simple example of a user-defined class with a constructor, a private field, and an instance method:

// user-defined-class.per
public class Greeter
{
    private name_: string;

    public constructor(name: string)
    {
        this.name_ = name;
    }

    public say_hello(): void
    {
        // Both the fully qualified "this.name_" form and the shorter "name_" forms can be used here
        // for  referring to the field
        print("Hello, " + name_ + "!");
    }
}

// Expected output: Hello, World!
var greeter = new Greeter("World");
greeter.say_hello();

Static methods

Classes can also define static methods, which can be called directly on the class without instantiating it first:

// user-defined-class-with-static-method.per
public class Greeter
{
    public static say_hello(): void
    {
        print("Hello World from static class method");
    }
}

// Expected output: Hello World from static class method
Greeter.say_hello();

Destructors

Like in C++, a class can define a destructor. The destructor will always be called when the object goes out of scope. This is different from languages like Java and C#, where you have less control over when an object will actually be destroyed.

// user-defined-class-with-destructor.per
public class Greeter
{
    private name_: string;

    public constructor(name: string)
    {
        this.name_ = name;
    }

    public destructor()
    {
        print("Goodbye from " + this.name_ + "!");
    }

    public say_hello(): void
    {
        print("Hello, " + this.name_ + "!");
    }
}

// Expected output:
// Hello, World!
// Goodbye from World!
var greeter = new Greeter("World");
greeter.say_hello();

The existence of null

The concept of null — a reference that points to no object — is well-known from languages like C, Java, and C#, but can be a common source of runtime errors2. Perlang includes null primarily for interoperability with other ecosystems, but deliberately restricts its use. Consider the following program:

// defining-and-calling-a-function-with-null-parameter.per
fun greet(name: string, age: int): void {
  print "Hello " + name + ". Your age is " + age;
}

// Expected error: [line 2] Error at 'name': Null parameter detected
greet(null, 42);

Running this program gives you an error like this:

[line 7] Error at 'greet': Null parameter detected for 'name'

The path we have chosen here is to let null exist as a concept in Perlang, mainly for interoperability with C, C++ and other languages that uses null references extensively. Making it impossible to use null would significantly limit the ability to e.g. use existing C libraries. Hence, we have decided to include null in the language.

But: we deliberately restrict the use of null in an attempt to steer the user to better constructs, when possible. Whenever null is encountered, a compiler warning is emitted. By default, all compiler warnings are considered errors1, which is why you get the above error whenever you try to use null.

Now, including null in the language but making it impossible to use would be kind of pointless. What we have instead is a mechanism to demote this warning from an error to an actual warning:

$ perlang -Wno-error=null-usage defining-and-calling-a-function-with-null-parameter.per
[line 7] Warning at 'greet': Null parameter detected for 'name'
[line 3] Operands must be numbers, not string and null

As can be seen, the previous Error at 'name' has now turned into a slightly more friendly Warning at 'greet'. However, we then get a runtime error (the "line 3" output) because "Hello " + name is not a valid operation in cases where name is null. string + null will produce a runtime error as above.

The reason for why these errors seem to come in the "wrong order" in terms of the line numbers is because the compilation and analysis phase of the program happens first, as a separate stage, before the actual execution of the program beings. In other words, all compilation warnings for a program would appear before any runtime errors would be emitted.

The standard library

The standard library is in a very early stage of development. It is currently being rewritten from C# to C++, and more functionality is being added to it.

The future

The future is not set (John & Sarah Connor, Terminator 2: Judgment Day)

There is currently no road map as for exactly "when" and "if" various features will be implemented into the language and its standard library. Your best bet for now is looking at the milestones in the GitLab repo, where various ideas are roughly categorized into projected releases, depending on when we imagine that they may get introduced into the language.

Footnotes

1: Making warnings be considered errors by default is a deliberate, conscious design decision in an attempt to ensure that a codebase is not littered with numerous minor errors - errors which are really there but the developers have learned to look the other way, to ignore them. It is our experience that this can too-easily become the case when warnings are ignored by default.

2: Tony Hoare, who invented the null reference in the ALGOL W programming language, famously called it his "billion-dollar mistake": "It was the invention of the null reference in 1965. [...] I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years." (QCon London, 2009)