Search
Skip to Search Results-
A compiler for parallel execution of numerical Python programs on graphics processing units
DownloadFall 2012
Modern Graphics Processing Units (GPUs) are providing breakthrough performance for numerical computing at the cost of increased programming complexity. Current programming models for GPUs require that the programmer manually manage the data transfer between CPU and GPU. This thesis proposes a...
-
1993
Technical report TR93-07. A \"first\" implementation of the Modular Smalltalk object-oriented programming language is presented. The implementation includes an object-oriented parser, object-oriented representation for code fragments and an object-oriented C-code generator, all implemented in...
-
Compiler-Only Code Generation for Performant and Modular Matrix-Multiplication Micro Kernels Using Matrix Engines
DownloadFall 2021
General Matrix-Matrix Multiplication (GEMM) is used widely in many high-performance application domains. In many cases, these applications repeatedly execute their matrix-multiplication subroutine, as is the case in the implementation of a particle-physics simulator or the repeated convolutions...
-
Fall 2023
A new microprocessor within a given processor architecture may introduce performance-improving features that either can only be accessed through novel instructions or require new code-generation techniques to be beneficial. In response, compilers must be extended/improved to make use of these new...
-
Fall 2009
This thesis introduces FlowGSP, a general-purpose sequence mining algorithm for flow graphs. FlowGSP ranks sequences according to the frequency with which they occur and according to their relative cost. This thesis also presents two parallel implementations of FlowGSP. The first implementation...
-
Fall 2023
The presence of control-flow divergence in loops can either hinder or impede auto-vectorization as a compiler transformation to exploit parallelism enabled by Single-Instruction Multiple-Data (SIMD) instructions. A solution is to linearize control flow through the use of predicated execution....
-
Implementation of Path Profiling in the Low-Level Virtual-Machine (LLVM) Compiler Infrastructure
Download2010
Technical report TR10-05. Profiling monitors a program's execution flow via the insertion of counters at key points in the program. Profiling information can then be used by a compiler's optimization passes to increase the performance of frequently executed sections of code. This document...
-
Fall 2010
Heterogeneous computing platforms that use GPUs and CPUs in tandem for computation have become an important choice to build low-cost high-performance computing platforms. The computing ability of modern GPUs surpasses that of CPUs can offer for certain classes of applications. GPUs can deliver...
-
Fall 2012
This thesis makes improvement to the process of ahead-of-time feedback-directed optimization (FDO) in compiler design. It examines multiple aspects of FDO from profile collection and representation through to the performance evaluation of FDO code transformations. Two guiding principals knit the...