Publications
- Overview
- J.S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, “Keeneland: Bringing heterogeneous GPU computing to the computational science community,” IEEE Computing in Science and Engineering, 13(5):90-5, 2011. http://dx.doi.org/10.1109/MCSE.2011.83
- University of Tennessee - Matrix Algebra on GPU and Multicore Architectures (MAGMA):
- Georgia Tech - Keeneland System Software (exploratory and deployment path)
- "Lynx: Dynamic Instrumentation System for Data-Parallel Applications on GPGPU-based Architectures," Naila Farooqui, Andrew Kerr, Greg Eisenhauer, Karsten Schwan, Sudhakar Yalamanchili, ISPASS-2012, April 1-3, 2012, New Brunswick, NJ
- "Pegasus: Coordinated Scheduling for Virtualized Accelerator-based systems," Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, Parthasarathy Ranganathan, ATC 2011
- "Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies," Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan, VTDC 2011
- Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17–24, New York, NY, USA, 2009. ACM. (PDF) (doi:10.1145/1519138.1519141)
PUBLICATIONS
-
Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators, H. Ltaief, S. Tomov, R. Nath, and J. Dongarra, Submitted to IEEE Transaction on Parallel and Distributed Computing, 2010.
-
Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms Through Hybrid GPU-Based Computing, Stanimire Tomova, Rajib Natha, and Jack Dongarra, accepted in Parallel Computing, July 2010.
-
From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platformGPU Programming, Peng Du, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson, Jack Dongarra, submitted to Parallel Computing, August 2010.
-
Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs, Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov, Nvidia GPU Gems.
-
Dense Linear Algebra on Accelerated Multicore Hardware, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, in High Performance Scientific Computing: Algorithms and Applications, Editors Michael W. Berry, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Alex Pothen, and Yousef Saad, 2011.
-
LU Factorization for Accelerator-based Systems, Emmanuel Agullo, C´edric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Stanimire Tomov, submitted to THE 9TH ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2011, June 27th - June 30th 2011, Sharm El-Sheikh, Egypt.
-
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, E. Agullo, C. Augonnet, J. Dongarra, M. Feverge, H. Ltaief, S. Thibault, S. Tomov, IPDPS 2011, Anchorage, AK, May 2011.
-
Autotuning GEMMs for Fermi, J. Kurzak, S. Tomov, J. Dongarra, submitted to SC11, November 2011.
-
Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators, J. Kurzak and Jack Dongarra, submitted to SC11, November 2011.
-
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures, M. Hirton, S. Tomov, and J. Dongarra, submitted to 2011 Symposium on Application Accelerators in High Performance Computing, 19-21 July, 2011, Knoxville TN.
-
"Quantifying NUMA and contention effects in multi-GPU systems", Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units.
-
Pegasus: Coordinated Schduling for Virtualized Accelerator-based systems, Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, Parthasarathy Ranganathan, ATC 2011.
-
Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies, Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan, VTDC 2011.
-
G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: A Dynamic Optimizing Compiler for Bulk Synchronous Applications in Heterogeneous Systems,” IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, September 2010.
-
N. Farooqui, A. Kerr, G. Diamos, S. Yalamanchili, and K. Schwan, “A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot,” Proceedings of Fourth Workshop on General-Purpose Computation on Graphics Processing Units, March 2011.
-
Kerr, G. Diamos, and S. Yalamanchili, “GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot,” GPU Computing GEMS, vol. 2, 2011.



