Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Amaral, Nelson; Niewiadomski, Robert; Furtak, Timothy

doi:doi:10.7939/R3GB2D

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

220 views
1143 downloads

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Author(s) / Creator(s)
Technical report TR07-02. Most contemporary processors offer some version of Single Instruction Multiple Data (SIMD) machinery -- vector registers and instructions to manipulate data stored in such registers. The central idea of this paper is to use these SIMD resources to improve the performance of the tail of recursive sorting algorithms. When a recursive computation reaches a set threshold, data is loaded into the vector registers, manipulated in-register, and the result stored back to memory. Four implementations of sorting with two different SIMD machinery -- x86-64's SSE2 and G5's AltiVec -- demonstrate that this idea delivers significant performance improvement. The improvements provided by the tail optimization of sorting are orthogonal to the gains obtained through empirical search for a suitable sorting algorithm. When integrated with the Dynamically Tuned Sorting Library (DTSL), this new code generation strategy improves the performance of DTSL by up to 18%. Performance of d-heaps is similarly improved by up to 35%. | TRID-ID TR07-02
Date created

2007
Subjects / Keywords
Type of Item

Report
DOI

https://doi.org/10.7939/R3GB2D
License

Attribution 3.0 International

Language
- English