Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

Primary tabs

SciDAC 2
Ibrahim, K. Z., Epifanovsky, E., Williams, S. W., & Krylov, A. I. (2016). Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends. LBNL. - Report Number: LBNL-1005853
TitleCross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends
AuthorsK. Ibrahim, E. Epifanovsky, S. Williams, A. Krylov
AbstractCoupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts to extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240x speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.
NoteSC Office of Advanced Scientific Computing Research (SC-21)


Disclaimer: This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor the Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or the Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or the Regents of the University of California.

Copyright: Unless explicitly noted otherwise in the text, this manuscript has been authored by an author at Lawrence Berkeley National Laboratory under one of the following three contracts with the U.S. Department of Energy (or its predecessor agencies), depending on the date of the publication:

  • Contract No. DE-AC02-05CH11231—2005 to the present
  • Contract No. DE-AC03-76SF00098—1982-2005
  • Contract No. W-7405-ENG-48—1943-1982

The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

This Publication is Subject to Copyright and Disclaimers. Click Here for More Information.