Gyoji Programming Language
Help Wanted!
This is a work in progress and help is wanted to contribute and make the project better. If you are interested in learning about compilers or contributing to the project, these are projects that can be worked on somewhat independently of other work. This is intended to be a very accessible project for a dedicated enthusiast. Building compilers is very fun and rewarding and I'd love to have the help.
If you are interested, please contact me through GitHub by raising an issue or a pull-request. E-mail is ok too, but I don't really like spam, so I'm going to discourage that form of communication for this project.
Support unsafe vs safe
One of the most important safety features of Rust is that some code constructs are identified as 'unsafe' such as de-referencing raw ponters. The syntax currently supports this idea, but it is not yet enforced by the semantic processing. We should add support for 'unsafe' by identifying when we are inside an 'unsafe' block and identifying that constructs are potentially unsafe so we can flag them as errors.
Default initialization of everything
We should make sure that everything is initialized before being read. This may involve default-initializing data for each variable declaration to ensure that it is in some well-defined state prior to being used.
This should be true both for primitive types as well as composite types like classes to ensure that we never enter an invalid state.
Forced Array Bounds checking
All array access must be bounds-checked. This can take the form of inserting a bounds-check in the compiler, but this is problematic because if a bounds-check fails, what should you do? You could panic() like in Rust, but this is a cop-out.
Instead, this project aims to perform static analysis to ensure that each potential index value falls within the allowed bounds of the array. This can be done with "abstract interpretation" of the MIR to find the min and max values of variables used in indices. This comes at the cost of making sure that arrays always have a compile-time known size, so this won't work for general dynamically allocated data, but will work for the majority of common cases.
The idea is the following: Given this code, an abstract interpretation algorithm can easily detect that the minimum value of 'a' is -2 and the maximum value is 7. It can then reject the program based on that fact and provide a message to the user pointing out the specific line that sets the lower bound
i32 a; i32[10] array; a is now unbounded. if (x) { a = -2; a minimum is -2 } else { a = 7; a maximum is 7 } Combined, a is bounded by -2 <= a <= 7 array[a]; This is an invalid access because a can be out of bounds.
Class constructors/destructors/methods
As it stands, classes aren't fully functional with methods, constructors, or destructors. This project involves creating the concept of a class method and setting up initialization and class member calling structures. Ordinary function calls are already supported, so this is mainly turning the syntax of method calls into ordinary functions calls where the first parameter is a reference to the object and can be accessed through the member access operator.
Important considerations for constructors are the fact that we cannot provide access to the "this" pointer so that it might accidentally be leaked. We also need to enforce that in the constructor, every variable must be written to BEFORE it is read. This is not guaranteed in C++, and poses some problems in terms of "half-initialized" objects and objects entering an unknown or invalid state.
Variable scope rules
As it stands, variable scopes are not handled correctly in the sense that variables can go out of scope without being "undeclared". This poses a problem for both the constructor/destructor guarantees as well as the borrow checker. This needs to be resolved in order for the language to provide the kind of guarantees that we want.
Syntax Testing and verification
In order to make the system as robust as possible, it is important to have a suite of tests to verify the behaviour of the system. Specifically, it would be great to have a set of example files written in the language as both "good" and "bad" examples of syntax that may be presented for compilation and verify whether they do or do not compile as expected.
The purpose of this project is to build a test-suite for the parser and syntax tree so that we have good confidence that the data in the parse tree accurately represents the source-code.
This project would mainly involve creating new files (probably under the "test" directory) and building test jigs to verify their parse behavior in various scenarios.
Website
The purpose of this project is to communicate about the project and its goals and hopefully encourage more interest in using and participating in the project.
Specifically, highlight the nice attributes of the language such as:
- High-quality compiler using C-like syntax.
- Like a half-way point between C and C++. Sort-of like C++ before people got carried away with it.
- Borrow-checker of similar quality to Rust.
- Minimalist in terms of dependencies
- General-purpose in the sense that it makes very few assumptions about the platform, libraries, and architecture of the target.
- Not syntax-compatible with C, but should be very familiar to C and C++ developers.
Type Conversion and casting
As it currently stands, there are no facilities for converting types to one another except for a very primitive "widening" that allows integers to be promoted to larger sizes. This is safe, but is unrealistically restrictive.
Type cast operations should be introduced to allow, for example, certain conversions from signed to unsigned integer types and to and from floating-point integer types. Each of these conversions should be handled carefully so that the authors of programs can know precisely how the conversions will be handled. For example, in a conversion from a signed to an unsigned value, the sign information may be lost and it's important to be specific about what the conversion will do both in a numeric sense and in a 'bitwise' sense to the values it manipulates. Similarly for sign-extending values and truncation.
Borrow Checker
The purpose of this project is to provide a Rust-like borrow-checker using the base logic outlined in the Polonius project. The algorithm is sound, and can be implemented on this MIR to provide safety guarantees in a manner similar to rust.
Bootstrapping
The purpose of this project is to build a minimal version of the compiler using the language itself. Of course, this assumes that there is enough of the language working that we can express all of the complicated things it takes to build the language, so this is a pretty late-stage project.