(Slide 1: Introduction to High Performance Computing)
The purpose of this presentation is to provide an overview of High Performance Computing which is abbreviated as HPC.
(Slide 2: Presentation Outline)
For this video, I will present some
examples of applications that use HPC, then I’ll describe
what is an HPC computer, then I’ll talk about
shared memory parallel programming,
distributed memory parallel programming, and then I’ll give my
(Slide 3: Examples of Applications needing HPC)
Some important applications that use HPC are:
car crash testing,
diesel engine design,
jet engine design,
financial industry applications, and
artificial Intelligence (AI) including deep learning
For all these applications, speed is critical.
Instead of building a prototype car and crashing it, the car industry simulates car crashes using HPC machines. This is much less expensive than building prototypes and allows the testing of many designs in a short amount of time.
HPC is also used in the design of diesel engines. Cumins diesel has a complex application program that simulates the entire, running diesel engine. Using HPC, they can perform one simulation in less than 4 hours. This allows them to optimize power, cost, efficiency, pollution, etc. over thousands of design parameters.
There are also complex, application programs that simulate the running of a jet engine. These programs require HPC to run in a timely manner. This allows jet engine designers to test many different designs and to optimize over design parameters.
Genomic mapping calculations are not possible without HPC.
HPC is critical for the financial industry for high speed trading where there are thousands of transactions performed every second. Using worldwide bond trading data, bond firms use HPC to predict future bond prices in a matter of a few hours. This information is then used to buy and sell bonds.
Today, Artificial Intelligence and Deep Learning are active areas of research. For both, HPC is essential to be able to process very large training sets in a reasonable amount of time. Speech recognition systems is a Deep Learning application that also requires HPC.
(Slide 4: What is an HPC Computer?)
The design of HPC machines is continually changing and evolving. There are many designs of HPC computers but most contain hundreds to hundreds of thousands of compute nodes with each compute node having one or more processors sharing a common memory and with each processor having multiple cores. Compute nodes are interconnected via a high-speed communication network. Compute nodes may have accelerators, such as an Nvidia GPU or an Intel Phi co-processor for added speed.
(Slide 5: Moore’s Law)
Moore’s Law is commonly used to predict the performance of future HPC machines. Moore’s Law is not a law but an observation made by Intel co-founder Gordon Moore in 1965. He noticed that the number of transistors per square inch on integrated circuits had doubled every 18 months since their invention. Moore’s observation in 1965 has generally held true through today.
(Slide 6: Amdahl’s Law)
Amdahl’s Law is a law governing program speedup when using p processors for an application assuming no parallelization overhead. Suppose an application takes 100 hours to run with a single processor and 95 hours can be perfectly parallelized and 5 hours cannot be parallelized. Then the program execution time using p processors is 5 + 95/p so the speedup = (serial time)/(parallel time) approaches 20 as p approaches infinity! Because of this for many years people thought parallelization had limited usefulness. Thankfully, for most applications as the problem size increases, the amount of time spent in serial execution approaches 0.
(Slide 7: HPC Programming)
Special programs are needed to take advantage of the multiple processors in HPC computers. Since each compute node has a shared memory, one can use shared memory parallelization techniques, for example OpenMP. To use multiple compute nodes, one must use distributed memory parallelization techniques, for example MPI which stands for Message Passing Interface. MPI may also be used to parallelize programs for shared memory computers.
(Slide 8: Shared Memory Parallel Programming)
The common method used to write parallel programs for shared memory computers is to use OpenMP. This is accomplished by inserting OpenMP directives or pragmas into serial Fortran, C or C++ programs. For example, the “parallel do” directive tells the compiler that different parallel threads can execute different iterations of the loop. The following slide shows one way to parallelize a = b + c where a, b and c are one dimensional arrays.
(Slide 9: OpenMP: a = b + c example)
In the following Fortran program, the “parallel do” directive starts a parallel region and tells the compiler to parallelize the following loop. The “end parallel do” directive ends the parallel region. The iterations of the do loop will be divided among OpenMP threads and executed in parallel.
(Slide 10: Distributed Memory Parallel Programming)
Programs for distributed memory computers usually use MPI routines for passing messages between processors. For example: insert mpi_send and mpi_recv in a Fortran, C, or C++ program. To write an MPI program, one must first decide how to distribute data and computation among processors to provide good load balancing and to minimize message passing time.
(Slide 11: MPI example: sum = dot product(a,b))
This example shows how to compute the global dot product of two 1 dimensional arrays a and b distributed across p MPI processes. Each MPI process executes the displayed program. After the local dot products have been calculated, the value of the global dot product is placed on all p processes by calling mpi_allreduce.
(Slide 12: Concluding Remarks)
In conclusion, HPC is a critical technology for many important applications.
OpenMP is used to write parallel programs for shared memory machines. A comprehensive OpenMP tutorial is available from Lawrence Livermore National Laboratory.
MPI is used to write parallel programs for both shared memory and distributed memory machines. A comprehensive MPI tutorial is available from Lawrence Livermore National Laboratory.
I would like to thank the College of Liberal Arts and Sciences for their support of this project.