Building a Minimalist C Compiler in C/C++ - Part 1 (Symbol Table)
The purpose of this tutorial series is to construct a simple C compiler. We will build the Compiler using C++.
In the first step to do so, we are going to implement a Symbol Table. A Symbol Table is a data structure maintained by the compilers in order to store information about the occurence of various entities such as identifiers, objects, function names etc. Information of different entities may include type, value, scope etc. At the starting phase of constructing a compiler, we will construct a Symbol Table which maintains a list of Hash Tables (Also known as Scope Tables) where each Hash Table contains information of symbols encountered in a scope of the source program.
We will implement the following three classes.
SymbolInfo
Parameter
ScopeTable
SymbolTable
SymbolInfo
This class contains the information regarding a symbol faced in the source program. It has the following members:
name
- Name of the Symbol. This is astring
.type
- Type of the Symbol. This is also astring
.next
- This is a pointer to the nextSymbolInfo
object as we need to implement a chaining mechanism to resolve collisions in the hash table. This is of typeSymbolInfo *
.size
- Size of the symbol. If the symbol is a variable, its size will be0
. If it’s an array, its size will be the number of elements in the array. Lastly, if it’s a function, its size will be-1
, for our convenience. This is anint
.offset
- Each variable will have a local offset (index) with respect to its scope. This is also anint
.defined
- Indicates the state of a function. For a function name, its name is inserted into the Symbol Table when it is declared.defined
is set tofalse
initially. But, after the function has been defined, we set it totrue
. This is abool
.global
- Indicates whether a variable / array is global or not. This is also abool
.parameterList
- For function names, we need to store the parameters (names and types) as well. We do that using this vector. This is of typevector<Parameter>
(We will define the classParameter
shortly).
We create two files for our SymbolInfo
class: SymbolInfo.h
and SymbolInfo.cpp
. Initially SymbolInfo.h
will look like this->
#ifndef SYMBOLTABLE_SYMBOLINFO_H
#define SYMBOLTABLE_SYMBOLINFO_H
#include <bits/stdc++.h>
#include "Parameter.h"
using namespace std;
class SymbolInfo {
private:
string name, type;
SymbolInfo *next;
int size, offset;
bool defined, global;
vector<Parameter> parameterList;
};
#endif //SYMBOLTABLE_SYMBOLINFO_H
After that, we add the following constructors to our class.
SymbolInfo(); // Empty constructor
SymbolInfo(const string &name, const string &type); // Constructor for Variable
SymbolInfo(const string &name, const string &type, int size); // Constructor for Array
SymbolInfo(const string &name, const string &type, const vector<Parameter> ¶meterList); // Constructor for Function
SymbolInfo(const SymbolInfo &symbolInfo); // Copy Constructor
We also add the following destructor to our class.
~SymbolInfo(); // Destructor
Now, SymbolInfo.h
will look like this:
#ifndef SYMBOLTABLE_SYMBOLINFO_H
#define SYMBOLTABLE_SYMBOLINFO_H
#include <bits/stdc++.h>
#include "Parameter.h"
using namespace std;
class SymbolInfo {
private:
string name, type;
SymbolInfo *next;
int size, offset;
bool defined, global;
vector<Parameter> parameterList;
public:
SymbolInfo(); // Empty constructor
SymbolInfo(const string &name, const string &type); // Constructor for Variable
SymbolInfo(const string &name, const string &type, int size); // Constructor for Array
SymbolInfo(const string &name, const string &type, const vector<Parameter> ¶meterList); // Constructor for Function
SymbolInfo(const SymbolInfo &symbolInfo); // Copy Constructor
~SymbolInfo(); // Destructor
};
#endif //SYMBOLTABLE_SYMBOLINFO_H
We also have to add getters/setters and other necessary related methods for each of the private members.
// name
const string &getName() const;
void setName(const string &name);
// type
const string &getType() const;
void setType(const string &type);
// next
SymbolInfo *getNext() const;
void setNext(SymbolInfo *next);
// size
int getSize() const;
void setSize(int size);
// offset
int getOffset() const;
void setOffset(int offset);
// defined
bool isDefined() const;
void setDefined(bool defined);
// global
bool isGlobal() const;
void setGlobal(bool global);
// parameterList
vector<Parameter> getParameterList() const;
void setParameterList(const vector<Parameter> ¶meterList);
bool isVariable() const;
bool isArray() const;
bool isFunction() const;
Finally SymbolInfo.h
will look like this:
#ifndef SYMBOLTABLE_SYMBOLINFO_H
#define SYMBOLTABLE_SYMBOLINFO_H
#include <bits/stdc++.h>
#include "Parameter.h"
using namespace std;
class SymbolInfo {
private:
string name, type;
SymbolInfo *next;
int size, offset;
bool defined, global;
vector<Parameter> parameterList;
public:
SymbolInfo(); // Empty constructor
SymbolInfo(const string &name, const string &type); // Constructor for Variable
SymbolInfo(const string &name, const string &type, int size); // Constructor for Array
SymbolInfo(const string &name, const string &type, const vector<Parameter> ¶meterList); // Constructor for Function
SymbolInfo(const SymbolInfo &symbolInfo); // Copy Constructor
~SymbolInfo(); // Destructor
// name
const string &getName() const;
void setName(const string &name);
// type
const string &getType() const;
void setType(const string &type);
// next
SymbolInfo *getNext() const;
void setNext(SymbolInfo *next);
// size
int getSize() const;
void setSize(int size);
// offset
int getOffset() const;
void setOffset(int offset);
// defined
bool isDefined() const;
void setDefined(bool defined);
// global
bool isGlobal() const;
void setGlobal(bool global);
// parameterList
vector<Parameter> getParameterList() const;
void setParameterList(const vector<Parameter> ¶meterList);
bool isVariable() const;
bool isArray() const;
bool isFunction() const;
};
#endif //SYMBOLTABLE_SYMBOLINFO_H
Now we move on to implement our constructors, destructor and other methods. We add the following codes to SymbolInfo.cpp
.
Constructors’ Implementations
#include "SymbolInfo.h"
SymbolInfo::SymbolInfo() : SymbolInfo("", "") {}
SymbolInfo::SymbolInfo(const string &name, const string &type) : SymbolInfo(name, type, 0) {}
SymbolInfo::SymbolInfo(const string &name, const string &type, int size) {
this->name = name;
this->type = type;
next = nullptr;
this->size = size;
defined = false;
global = false;
}
SymbolInfo::SymbolInfo(const string &name, const string &type, const vector<Parameter> ¶meterList) : SymbolInfo(name, type, -1) {
this->parameterList = parameterList;
}
SymbolInfo::SymbolInfo(const SymbolInfo &symbolInfo) {
name = symbolInfo.name;
type = symbolInfo.type;
next = symbolInfo.next;
size = symbolInfo.size;
offset = symbolInfo.offset;
defined = symbolInfo.defined;
global = symbolInfo.global;
parameterList = symbolInfo.parameterList;
}
Destructor Implementation
SymbolInfo::~SymbolInfo() {
// We do nothing :(
}
Methods’ Implementations
const string &SymbolInfo::getName() const {
return name;
}
void SymbolInfo::setName(const string &name) {
this->name = name;
}
const string &SymbolInfo::getType() const {
return type;
}
void SymbolInfo::setType(const string &type) {
this->type = type;
}
SymbolInfo *SymbolInfo::getNext() const {
return next;
}
void SymbolInfo::setNext(SymbolInfo* next) {
this->next = next;
}
int SymbolInfo::getSize() const {
return size;
}
void SymbolInfo::setSize(int size) {
this->size = size;
}
int SymbolInfo::getOffset() const {
return offset;
}
void SymbolInfo::setOffset(int offset) {
this->offset = offset;
}
bool SymbolInfo::isDefined() const {
return defined;
}
void SymbolInfo::setDefined(bool defined) {
this->defined = defined;
}
bool SymbolInfo::isGlobal() const {
return global;
}
void SymbolInfo::setGlobal(bool global) {
this->global = global;
}
vector<Parameter> SymbolInfo::getParameterList() const {
return parameterList;
}
void SymbolInfo::setParameterList(const vector<Parameter> ¶meterList) {
this->parameterList = parameterList;
}
bool SymbolInfo::isVariable() const {
return size == 0;
}
bool SymbolInfo::isArray() const {
return size > 0;
}
bool SymbolInfo::isFunction() const {
return size == -1;
}
Finally, SymbolInfo.cpp
will look like this:
#include "SymbolInfo.h"
SymbolInfo::SymbolInfo() : SymbolInfo("", "") {}
SymbolInfo::SymbolInfo(const string &name, const string &type) : SymbolInfo(name, type, 0) {}
SymbolInfo::SymbolInfo(const string &name, const string &type, int size) {
this->name = name;
this->type = type;
next = nullptr;
this->size = size;
defined = false;
global = false;
}
SymbolInfo::SymbolInfo(const string &name, const string &type, const vector<Parameter> ¶meterList) : SymbolInfo(name, type, -1) {
this->parameterList = parameterList;
}
SymbolInfo::SymbolInfo(const SymbolInfo &symbolInfo) {
name = symbolInfo.name;
type = symbolInfo.type;
next = symbolInfo.next;
size = symbolInfo.size;
offset = symbolInfo.offset;
defined = symbolInfo.defined;
global = symbolInfo.global;
parameterList = symbolInfo.parameterList;
}
SymbolInfo::~SymbolInfo() {
// We do nothing :(
}
const string &SymbolInfo::getName() const {
return name;
}
void SymbolInfo::setName(const string &name) {
this->name = name;
}
const string &SymbolInfo::getType() const {
return type;
}
void SymbolInfo::setType(const string &type) {
this->type = type;
}
SymbolInfo *SymbolInfo::getNext() const {
return next;
}
void SymbolInfo::setNext(SymbolInfo* next) {
this->next = next;
}
int SymbolInfo::getSize() const {
return size;
}
void SymbolInfo::setSize(int size) {
this->size = size;
}
int SymbolInfo::getOffset() const {
return offset;
}
void SymbolInfo::setOffset(int offset) {
this->offset = offset;
}
bool SymbolInfo::isDefined() const {
return defined;
}
void SymbolInfo::setDefined(bool defined) {
this->defined = defined;
}
bool SymbolInfo::isGlobal() const {
return global;
}
void SymbolInfo::setGlobal(bool global) {
this->global = global;
}
vector<Parameter> SymbolInfo::getParameterList() const {
return parameterList;
}
void SymbolInfo::setParameterList(const vector<Parameter> ¶meterList) {
this->parameterList = parameterList;
}
bool SymbolInfo::isVariable() const {
return size == 0;
}
bool SymbolInfo::isArray() const {
return size > 0;
}
bool SymbolInfo::isFunction() const {
return size == -1;
}