Publications

  • Overview
  1. J.S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, “Keeneland: Bringing heterogeneous GPU computing to the computational science community,” IEEE Computing in Science and Engineering, 13(5):90-5, 2011. http://dx.doi.org/10.1109/MCSE.2011.83
  • University of Tennessee - Matrix Algebra on GPU and Multicore Architectures (MAGMA):
    1. List of publications
       
  • Georgia Tech - Keeneland System Software (exploratory and deployment path)
  1. "Lynx: Dynamic Instrumentation System for Data-Parallel Applications on GPGPU-based Architectures," Naila Farooqui, Andrew Kerr, Greg Eisenhauer, Karsten Schwan, Sudhakar Yalamanchili, ISPASS-2012, April 1-3, 2012, New Brunswick, NJ
  2. "Pegasus: Coordinated Scheduling for Virtualized Accelerator-based systems," Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, Parthasarathy Ranganathan, ATC 2011
  3. "Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies," Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan, VTDC 2011
  4. Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17–24, New York, NY, USA, 2009. ACM. (PDF) (doi:10.1145/1519138.1519141)

 

PUBLICATIONS

 

  1. Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators, H. Ltaief, S. Tomov, R. Nath, and J. Dongarra, Submitted to IEEE Transaction on Parallel and Distributed Computing, 2010.

  2. Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms Through Hybrid GPU-Based Computing, Stanimire Tomova, Rajib Natha, and Jack Dongarra, accepted in Parallel Computing, July 2010.

  3. From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platformGPU Programming, Peng Du, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson, Jack Dongarra, submitted to Parallel Computing, August 2010.

  4. Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs, Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov, Nvidia GPU Gems.

  5. Dense Linear Algebra on Accelerated Multicore Hardware, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, in High Performance Scientific Computing: Algorithms and Applications, Editors Michael W. Berry, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Alex Pothen, and Yousef Saad, 2011.

  6. LU Factorization for Accelerator-based Systems, Emmanuel Agullo, C´edric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Stanimire Tomov, submitted to THE 9TH ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2011, June 27th - June 30th 2011, Sharm El-Sheikh, Egypt.

  7. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, E. Agullo, C. Augonnet, J. Dongarra, M. Feverge, H. Ltaief, S. Thibault, S. Tomov, IPDPS 2011, Anchorage, AK, May 2011.

  8. Autotuning GEMMs for Fermi, J. Kurzak, S. Tomov, J. Dongarra, submitted to SC11, November 2011.

  9. Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators, J. Kurzak and Jack Dongarra, submitted to SC11, November 2011.

  10. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures, M. Hirton, S. Tomov, and J. Dongarra, submitted to 2011 Symposium on Application Accelerators in High Performance Computing, 19-21 July, 2011, Knoxville TN.

  11. "Quantifying NUMA and contention effects in multi-GPU systems", Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units.

  12. Pegasus: Coordinated Schduling for Virtualized Accelerator-based systems, Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, Parthasarathy Ranganathan, ATC 2011.

  13. Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies, Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan, VTDC 2011.

  14. G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: A Dynamic Optimizing Compiler for Bulk Synchronous Applications in Heterogeneous Systems,” IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, September 2010.

  15. N. Farooqui, A. Kerr, G. Diamos, S. Yalamanchili, and K. Schwan, “A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot,” Proceedings of Fourth Workshop on General-Purpose Computation on Graphics Processing Units, March 2011.

  16. Kerr, G. Diamos, and S. Yalamanchili, “GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot,” GPU Computing GEMS, vol. 2, 2011.