Scheduler Specific
-
Polish synchronization
-
Make sure that locking is not excessive
-
Make sure that variables are not volatile if not necessary
-
-
Find the right criteria for renaming
-
Performance close to out-of-place for codes with anti-dependencies
-
No performance impact for codes without anti-dependencies
-
-
Look into the performance of loop reversal (ACCUMULATOR)
-
Implement a work stealing policy which does not significantly deteriorate data reuse
-
Consider lock-less implementations of data structures (hash, list)
Imminent
-
Implement new functionality using the dynamic scheduler
-
"communication avoiding" QR/LQ factorization
-
matrix inversion using Cholesky, LU and QR
-
Short Term
-
GPU support
-
development of CUDA kernels at the granularity of thread block per tile
-
integration of GPU kernels with the dynamic scheduler
-
-
Modular and extendable system for testing, timing and autotuning
-
simple template for adding a new routine
-
small number of executables (ideally one)
-
precision generation for multiple precisions
-
Long Term
-
Mechanism for multithreaded library interoperability
-
sharing of threads between, e.g., PLASMA and multithreaded BLAS
-
-
Development of a front-end for coding linear algebra algorithms
-
SMPSs-like extensions for annotating tasks and simplifying task queuing
-
extensions for easily accessing elements of matrices stored in tile format
-
-
Development of SVD and Eigenvalue routines