Frontend

The frontend module builds the Intermediate Representation from a source file.

Language Definition

Languages are defined in the Frontend by creating a module containing a file in the form _languagename_ast.yaml, describing the tokens of the language. For example, if you are writting a frontend module to implement Fortran, you would write a file called _fortran_ast.yaml in a module called Fortran. The directory structure will look like this

Fortran/
   __init__.py
   _fortran_ast.yaml
   fortran_parser.py
   fortran_lexer.py

The fortran_parser and fortran_lexer files implement respectively the parser and lexer for the language. However, these files are not mandatory. Any kind of parser may be used, as long as it produces the yaCF Intermediate Representation. See the Intermediate Representation section for further information.

Currently, the parsers implemented in yaCF are written following the structure of the pycparser project. If you want to follow this structure, you can use the tool _ast_gen.py in the Frontend directory to create the Python classes for the Intermediate Representation, reading the token list from the _fortran_ast.yaml file.

Details of the language definitions already implemented can be found in the documentation of the C and OpenMP modules.

Frontend languages

The C and Omp modules implement the parsing of these languages.

The C parser has been inherited from the pycparser project. Some modifications have been made in order to adapt it to the purpose of the yaCF project.

New languages will be added as modules inside the Frontend module.

Parsing a file

The file Parse from the Frontend module contains the functions required to parse a file, returning the Intermediate Representation.

The main method is the parse_source function:

Passing a source to this function will produce an AST, composed by Frontend.IRNode.IRNode. .. (XXX IRNode está ahora en Common. Debe ser Frontend.Common.IRNode.IRNode) However, this AST is not the Intermediate Representation. It needs to be converted, using the method transform in the Frontend.InternalRepr.AstToIR to get the additional information needed in the compiler. See the Intermediate Representation section for detailed information.

Intermediate Representation

Note

The correct term is not Internal Representation, but Intermediate Representation. Expect a name change of the class in following versions.

The Intermediate Representation (IR) of a compiler is the set of data structures used to represent language specific entities in a machine-readable format. As the compiler analyses the code, this IR is modified, and additional ones might be created, while the compiler increases its knowledge of the original source, or modifies the program to generate the destination code.

Currently, yaCF is intented to be a tool to construct High-Level source to source code transformations, thus, it does not have Low-Level capabilities inside.

In yaCF, the base structure of the IR is the Abstract Syntax Tree (AST). It is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

The syntax is abstract in the sense that it does not represent every detail that appears in the real syntax. For instance, grouping parentheses are implicit in the tree structure, and a syntactic construct such as an if-condition-then expression may be denoted by a single node with two branches (one for the condition and another for the expression).

This makes abstract syntax trees different from concrete syntax trees, traditionally called parse trees, which are often built by a parser as part of the source code translation and compiling process (despite a perhaps unintuitive naming). Once built, additional information is added to the AST by subsequent processing, e.g., semantic analysis.

yaCF uses multilevel ASTs as IR. This means that during some translation operations, the AST is annotated with additional information. This annotation is represented in the structure with additional attributes to the node object. This is possible under Python by ussing the setattr() function.

[IR-0] Base AST

The base structure of the yaCF IR is an abstract node, defined at the class Frontend.IRNode.IRNode. A node represents some construction of the code, e.g, there is a For node in the C parser, representing the For statement of the C language.

After parsing, a node has the following attributes:

  • Attributes: Store information directly related to the node (for example, function name)
  • Child node: A single node contained by the current node. For example, the Identifier node of a Declaration
  • List of child nodes: When the current node contains several child nodes.
  • Parent : Reference to the direct antecesor node. It is None for the first element of the translation unit.
  • Coord: Stores the coordinates (file + line number) where the current node is located in the original source file (information needed in order to track syntactic errors)

Two methods (childrens , and show) are also added for convenience.

[IR-1] Parsed Source

The kind of nodes for a specific language are defined in the description file of its Frontend Package (see Frontend for more information). The AST structure is built while the parsing processes the original source. A node instance of the class corresponding to the code structure that the parser wants to represent will be created. This class will be a descendant of Frontend.IRNode.

This is the first level of the intermediate representation, also called AST.

Wherever we refer to AST in the yaCF documentation, we are denoting this first level of the tree.

[IR-2] Symbol information

Information in the IRNode is not enough for some modules of the compiler, therefore, nodes needs to be annotated with additional information. The Frontend.InternalRepr.AstToIR contains the methods required to add semantic information to the AST. This second level is referred as the IR within the documentation. .. (XXX We need to look for another name)

The Frontend.InternalRepr.AstToIR implements a Flyweight pattern,

so, only one instance of the class exist for each IR.

Whenever the class is instantiated, the reference of the first node is checked, and, if it has been previously created, instead of building a new instance, it returns the previously created one.

Implementation details can be found on the docstrings of Frontend.InternalRepr.AstToIR.__new__() and Frontend.InternalRepr.AstToIR.__init__() methods for implementation details.

The __init__ method of the AstToIR class require two arguments: the Writer class that want to be used, and the non-annotated AST. The __str__ method of the Frontend.IRNode.IRNode is replaced with the fast_write method, which uses the Writer specified during the instantiation. This allow using the Python str function to pretty print the intermediate representation.

The annotation method create the symbol table of the AST, and updates the parent link of each node. This is required in order to allow dynamically adding and removing parts of the Intermediate Representation.

After the Symbol Table is created, the following nodes are added to nodes of ID type (identifiers in C language).

  • sequence: Position related to the first node of the IR.
  • depth: Depth of scope at which this identifier was declared.

To all nodes:

  • __str__: Method to write the node again as it was in the original source.
  • The parent attribute is updated with the current status of the tree.

Symbol Table

The symbol table of yaCF is not a common symbol table, in the way that it is not created during the parsing process, but can be recreated in any point of the process by updating the IR. This poses a significant design challenge, because common assumptions about scope cannot be done.

[IR-3] Backend-specific information

Each backend might add specific information to the intermediate representation, thus, creating a new level of the intermediate representation.

This level is referenced by [IR-3].XXX , where XXX is the backend name. For example, the IR of the CUDA backend will be called [IR-3].Cuda

Config variables for parser

For parser generator we can use some config variables. This variables are located in config.py file.

Variable Description
LEX_OPTIMIZE YACC_OPTIMIZE YACC_DEBUG Optimize the ply lexer. True indicates that optimization is enabled. Optimize the ply parser. True indicates that optimization is enabled. Mode debug. This value is False by default. To see tha grammar file productions set this variable to True