We propose a new language, llc (La Laguna C) to increase HPC programming productivity. llc is a high level parallel language [Dorta:2006:BSL] where parallelism is expressed through the use of compiler directives that follow the OpenMP syntax. The performance of the MPI and hybrid MPI+OpenMP code generated by the llc compiler has been studied in previous works [Reyes:2009:AHM]. Preliminary results for automatic code generation with GPU were also presented in [Reyes:2010:ACGG] .. [la ref. anterior XXX no sale].
With llc and its compiler, llCoMP, we do not aim to increase the diversity of HPC languages and programming environments. On the contrary: We present an approach to easy development within different architectures, filling the gap between system engineers and HPC developers. We do not plan to set a new standard for parallel programming, but to demonstrate how OpenMP with an additional set of extensions could be a language suitable for a wide range of platforms.
llCoMP is a source to source compiler that translates C code annotated with llc directives into high-level parallel code.
The need to add llc directives to OpenMP arose when we tackled the problem of the mixture of programming models, i. e: we not only want to produce code for SPMD machines on shared memory, but we also want to produce SIMD code for GPU accelerators, or SPMD programs for distributed memory machines.
Although theoretically we could be able to produce multiplatform code using only OpenMP clauses, this would require a compiler not only capable of translating a syntactically correct OpenMP program to several destination platforms, but also to be capable of migrating the OpenMP programming model to whatever programming model required by the target platform. This would involve a tremendous development effort, and, although some attempts exists (see Related Work section), the problem has not been solved adequately.
With additional llc directives, we can get more information from the programmer, without the need of extracting complex information from the user code, and keeping portability across different platforms.
llc is intended to be an platform-neutral language. However, during the design of the current revision we have focused on a small set of platforms. Although these platforms are different among them, future target platforms might have specific details that would require additional information from the user, thus, forcing the addition of new clauses to the language.
Taking this into account, the current version of llc has been designed with the following platforms in mind:
To specify the destination code, user has to specify the llc target directive, in a similar fashion to Barcelona [Ayguade:2009:PEO] .. [XXX No sale. No la he incluido en publications]. See the llc directives section for more information.
llc code has been designed having the extension of OpenMP language in mind. However, although syntax is similar to the latter, in order to clarify the differences among programming models, OpenMP pragmas and/or functions must be ignored by llc, specifically inside llc regions.
Users having an OpenMP code may go an step forward to increase the performance by changing OpenMP constructs to equivalent llc-ones. Although it could be possible to use the same constructs as OpenMP we have chosen not to do it. This way, users will know, just by watching/writting source code, that the underliying programming model is different.
A compiler supporting llc language must accept all constructs defined in XXX.
llc does not pretend to be a new language to substitute OpenMP nor the new frontend for OpenCL development. What we intend with this language definitions is to explore different techniques in the transparent code generation and performance portability. If we were using an existing language, like the mentioned OpenMP, we will be too restricted in the scope of transparent code generation. However, mainstream languajes or compilers could extracts some ideas from llc towards improving the use-cases of their implementations.
Rather than focusing on thread parallelism (as OpenMP programming model does), llc Programming Model focus on memory coherence. While an OpenMP region represents a number of threads running a set of parallel constructs, an llc region represents a particular piece of code that could be run into a different memory space, and might even be on a different device or host.
An llc conformant region has the following characteristics:
It is a block of code inside a block of statements
It is a SESE block (Ref XXX)
It does not contain references to global variables nor pointers
using the syntax specified at XXX.
does not assume a particular operating system or filesystem position)
the model assumes that the var is unitary, unless another size is specified using syntax at XXX.
Current version does not support region nesting.
Function calls within a region must be annotated using the proper syntax. A non-annotated function call within a region will issue a semanthic error.
The code within an llc region is susceptible to be transformed in whatever way the compiler or the runtime would require. This means, as stated before, that no particular destination platform could be assume. Common C99 datatypes are automatically converted into the best suitable equivalent of the destination platform. A compilant compiler does not guarante to find a compatible equivalent for user/system defined datatypes.
In contrast to OpenMP regions, llc region does not guarantee that the code inside the region is to be run in parallel. If the user want to specify that a particular block of code is to be parallelized, a parallel-skeleton clause must be specified for that particular block.
llc defines X parallel skeletons suitable to appear within a llc region:
# The FOR skeleton: Following the OpenMP programming model definition, it means that XXX. ## FOR skeleton assumes that the following FOR is to be run in a SIMD-fashion, thus, SIMD-programming restrictions apply. ## Clauses: Only FOR-specific clauses from OpenMP plus llc clauses Example:
pramga llc region copy_in(a, b) copy_out(b) {
int sum; #pragma llc for reduction(+:sum)
for (int i = 0; i<10; i++) {
sum += i; #pragma llc fcall in(a) out(b) call_to_function(a[i], b[i])}
}
# The TASK skeleton
pramga llc region copy_in(a, b) copy_out(b) {
int numblocks = NBLOCKS;
for (int i = 0; i < numblocks; i++) {
- #pragma llc task in(a, b, i) inout(c)
create_block(a[i], b[i], c[i])} for (int i = 0; i < numblocks; i++) {
- #pragma llc task in(a, b, i) inout(c) // Notice how the fcall construct is not needed here, as the task construct holds all information
compute_block(a[i], b[i], c[i])} #pragma llc taskwait // Waits until all tasks have finished
}
# The MAP/REDUCE skeleton
pragma llc region copy_in(info_matrix) copy_out(wanted_info) {
#pragma llc map {
// Code to search for info within an element of info_matrix for (int i = 0; i < info_matrix; i++) {
- if (info_matrix[i]) {
- wanted_info->data = info_Matrix[i];
}
}
} #pragma llc reduction {
// Code to glue the results wanted_info->data = data wanted_info->next = wanted_info;}
}
## Code preceded with the MAP pragma would be outlined an used as outlined function. ## A REDUCE function must be specified within the REDUCTION clause. # The PIPELINE skeleton ## Whatever
What we propose is that, after completing its parallelization using OpenMP directives, she can add complementary llc directives. Using these directives, she will increase the portability of its code to a wider variety of platforms, without writing specific code for them. This adds an additional iteration in the incremental parallelization of the code, thus, still preserving the semantics of the serial and OpenMP versions of the code.
In order to simplify readability, sometimes the user will need to swap some existing OpenMP clauses or directives into equivalent llc ones. To maintain the possibility of incremental parallelization, an llc-compliant driver will define the _LLC macro. This macro is defined to have the decimal chain yyyymm where yyyy and mm are the year and month designations of the OpenMP API version that the implementation supports. .. [XXX queda oscuro: clarify]
This facilitates llc developers the possibility of building compilation rules that will declare an OpenMP directive or its llc equivalent depending on which compiler is being used.
Note that, if this macro is the subject of a #define or an #undef preprocessing directive, the behaviour is unspecified.
llc What is an llc thread? | Which is its relation with OpenMP threads?
llc regions / zones.
An OpenMP region (as it is on the OpenMP 3.1 standard) is all code encountered during a specific instance of the execution of a given construct. A region includes any code in called routines, as well as any implicit code introduced by the OpenMP implementation.
It is important to note that, during the execution of an OpenMP program, a construct may give rise to many regions.
An active parallel region is a parallel region that is executed by a team consisting of more than one thread.
llc regions
statement ::= PRAGMA LLC TARGET {llc_region_clauses}* llc_region_clauses ::= DEVICE ( string_literal) NAME ( string_literal ) IF ( expression )
OpenMP Parallel regions:
statement ::= PRAGMA OMP PARALLEL {omp_parallel_clauses}* PRAGMA OMP PARALLEL FOR {omp_parallel_clauses}* iteration_statement
statement ::= PRAGMA OMP PARALLEL FOR {omp_for_clause }* NL {llc_directives}* NL iteration_statement
llc_directives ::= PRAGMA LLC {llc_clauses} llc_clauses ::= REDUCTION TYPE(type) PRAGMA LLC WEIGHT (w) PRAGMA LLC RESULT (p, n {, p, n}*)
The llc weight clause is aimed to provide additional information to the compiler driver to make a proper load balancing among each iteration. The parameter is an integer value. .. [XXX (salvo que quieras cambiarlo)].