Graduate Level Educational Competencies for Computational Science Overview
The Ohio Supercomputer Center is working under a contract from TeraGrid to help define the educational competencies associated with undertaking the use of leadershipclass computing (graduate level and beyond) in science and engineering research. The intent is to draft competencies and lead discussions across the high performance computing community to come to a consensus on the topics that should be incorporated into graduate programs so that a new generation of researchers is prepared to take advantage of the extraordinary computational power at the leadership level.
The intent of the introductory competencies is to provide all students with an overview of all of the major code development and optimization techniques so that they understand the full range of issues that may arise, can begin to analyze the codes being used in their science or engineering discipline, and can follow best practices for code development so that their work effectively contributes to the advancement of the science.
Prerequisites
 The specification of the competencies assumes that students have come from undergraduate backgrounds with introductory computational science skills
 These could come from a formal program such as the minor program in computational science at the Ralph Regula School of Computational Science (RRSCS) collaborative institutions (see the Undergraduate Competencies tab above)

This and other related programs generally require the following courses for undergraduates:
 Introduction to modeling and simulation
 An introductory programming class
 Numerical methods
 A discipline oriented modeling and simulation course (Computational X  biology, chemistry, engineering, environmental science, physics, etc.)
 Electives in parallel computing and/or scientific visualization
 This assumes that students interested in graduate computing will start with these skills or acquire them before taking courses to enhance their skills with the competencies in this document
Area 1: IntermediateLevel Scientific Computing

Prereqs:
 Introduction to modeling and simulation, introductory programming (any language), numerical methods, one disciplineoriented computational science course.
 This course will use either Fortran or C/C++ and will emphasize topics commonly encountered in scientific computing/computational science. It primarily addresses serial computing competencies and is a prerequisite to the HPC/parallel computing area.
 This course includes programming skills at an intermediate level and focuses specifically on scientific computing skills.

Ability to contribute code in the programming language of greatest importance in the research
domain (Fortran or C):
 Students will understand basic language features and concepts
 Students will be able to understand code written by others
 Students will be able to use input/output functions effectively
 Students will complete a project to implement a new code and add a new capability to an existing code
 Students will be introduced to userdefined types, structures, classes, or similar mechanisms in their primary HPC programming language
 Students will be introduced to the concepts of objectoriented design and programming
 Example Activity: Students will write a few small programs using common sort and search algorithms

Students will be introduced to basic debugging techniques:
 Students will recognize common runtime errors and program failures and will be able to describe some common mistakes that could cause them
 Students will be introduced to debugging strategies and rules of thumb
 Students will understand capabilities of commonly available symbolic debugging tools such as gdb
 Students will understand some common memory problems and how to fix them, for example, array overruns, invalid pointers, mismatched parameters in function calls
 Example Activity: Students will follow a script to start a debugging session using a symbolic debugger on a sample code containing one or more common errors. The script will demonstrate useful debugging features such as setting break or watch points and inspecting variable values

Understand number representation:
 Students will understand the uses of floating point and integer arithmetic
 Students will understand the range limits of different size integers and will know how to select the appropriate size
 Students will understand the range, precision, and memory requirements of single and double precision floating point numbers and will know how to select the appropriate precision
 Students will be able to explain the concepts of floating point representation, including range, precision, overflow, and underflow
 Example Activity: Students will do some simple exercises to illustrate the differences among the data types, including allocating large arrays and checking their sizes; experimentally finding the largest representable number (approximately); and for floating point numbers experimentally finding both the smallest representable positive number (approximately) and the machine precision

Understand numerical errors:
 Students will be able to describe various kinds of numerical errors (roundoff, overflow, underflow)
 Students will be able to describe absolute and relative error
 Students will discuss error propagation
 Students will understand loss of significance and methods to avoid loss of significance
 Students will be able to describe the effect of problem conditioning or sensitivity on the correctness of a computed solution
 Example Activity: Students will experiment with solving systems of linear equations involving wellconditioned and illconditioned matrices

Students will be introduced to software engineering best practices:
 Students will understand how to define software requirements
 Students will be introduced to the concepts of software design
 Students will be introduced to the concepts of unit and regression testing
 Students will understand the purpose and concepts of source control
 Students will use source control for a class project
 Students will include useful comments in their code
 Students will format code consistently
 Students will understand and use the build process
 Example Activity: Students will review the software design process with a case study of a computational science code

Ability to identify efficient file formats:
 Students will understand the difference between text and binary file formats and the uses and limitations of each
 Students will be able to estimate output file sizes and storage space requirements for the results of a simulation in their area of interest
 Example Activity: Students will do textbook style exercises to estimate storage requirements and data transfer times for storing and moving large files in both text and binary formats

Create a program that uses at least one widely used numerical library:
 Students will demonstrate how to include the appropriate header files
 Students will demonstrate how to call the library routines, pass in arguments, and use the output values
 Students will demonstrate how to compile, link and run the program
 Students will understand the effect of environment variables or other settings on library performance
 Students will be able to discuss library quality: differences between excellent and questionable libraries
 Example Activity: Students will compare the performance of naively coded matrix multiply to that of a wellcrafted library implementation for a variety of matrix sizes

Students will be introduced to the concepts of algorithm complexity:
 Students will understand at the overview level the concept of algorithm complexity and will know how some important algorithms compare to each other
 Students will be aware of bigO notation for algorithm complexity
 Example Activity: Students will do textbook problems on complexity analysis for common linear algebra and search algorithms

Students will be introduced to serial program optimization concepts and techniques:
 Students will understand basic compiler optimization options and how to use them
 Students will experiment with compiler optimization flags to see the effect on performance of selected codes
 Students will understand the dangers of using compiler optimizations on floatingpoint code, where common optimizations can change results
 Students will use timing functions or profiling tools to measure the time spent in different sections of their code
 Students will understand at an overview level the concepts of efficient cache utilization including unit stride and cache reuse
 Students will explore these concepts by examining sample codes that have different memory access patterns and measuring the execution time of the different versions. Sample code may include matrixmatrix multiplication
 Example Activity: Students will experiment with simple example codes as outlined above

Students will understand verification and validation principles:
 For this competency, students may use a simple model from their field of study or an example provided by the instructor
 Students will develop test cases to verify that their programs correctly implement their models
 Students will validate their models using one or more of the following approaches: a) identify simple or analytical problems that can be used to validate a new method; b) make a statistical comparison to experimental data; c) use a statistical comparison to previous approaches or other models
 Example Activity: Students will experiment with a simple model as outlined above and write a brief report explaining their model, approach to verification and validation and the results

Students will be introduced to Monte Carlo methods:
 Students will describe applications of Monte Carlo models with examples
 Students will discuss algorithms for Monte Carlo methods
 Example Activity: Students will write a simple program to compute pi using Monte Carlo integration
Area 2: Intro to High Performance Scientific Computing

Prereqs  IntermediateLevel Scientific Computing:
 This area will use either Fortran or C/C++ and addresses HPC and parallel computing topics
 The emphasis is on concepts rather than expertlevel parallel programming
 The course is intended for all users of HPC systems, not jlinearust developers
 Students should understand at a conceptual level the various forms of parallel computing and parallel programming models
 This course involves concepts that are to a large degree independent of specific technologies

Students will be introduced to parallel architectures and execution models:
 Students will understand the concepts of the SingleInstructionMultipleData (SIMD) execution model
 They will understand at a conceptual level the architectures associated with this model (streaming and vector processors, including GPUs)
 Students will understand the concepts of the MultipleInstructionMultipleData (MIMD) execution model
 They will understand at a conceptual level the architectures associated with this model (multicore processors, clusters)
 Students will understand the concepts of the SingleProgramMultipleData (SPMD) execution model
 They will understand that SPMD is a subset of the MIMD model used by MPI, UPC and CoArray FORTRAN applications
 Example Activity: Students will take a brief quiz on this terminology

Students will be introduced to memory models for parallel programming:
 Students will understand the concepts of shared and distributed memory and their advantages and disadvantages
 Students will be aware of the differences in programming approaches for shared and distributed memory models
 Students will be introduced to the concept of a global address space with distributed memory
 Example Activity: Students will take a brief quiz on this terminology

Students will understand the principles of how to match algorithms, applications, and
architectures:
 Students will be introduced to the concepts of data parallelism, functional parallelism, and tasklevel parallelism (embarrassingly parallel problems)
 Students will understand which types of algorithms parallelize well, at what granularity they are parallelized, and how this relates to different architectures
 Students will understand what algorithmic and program constructs work well with each of the execution models introduced (SIMD, MIMD, SPMD)
 Example Activity: Students will do textbook type problems, in which they will identify appropriate models and target architectures for different types of problems

Students will understand the concept of application scalability:
 Students will be able to define weak scalability and give examples of problems that are weakly scalable
 Students will be able to define strong scalability and give examples of problems that are weakly scalable
 Students will be aware of the factors that limit scalability
 Students will understand the limits of parallelism as described by Amdahl's law
 Example Activity: Students will do textbook exercises to compute speedup, serial fraction, etc.

Students will understand code performance metrics:
 Students will know how to measure, interpret, and report the performance of their code
 Students will know how to measure, interpret, and report speedup as the number of processors is increased
 Example Activity: Students will run a sample parallel program on varying numbers of processors and will calculate speedup

Students will be introduced to parallel programming methods and concepts:
 Students will be able to identify parallelism in an application and discuss approaches for exploiting it
 Students will be introduced to the concept of message passing and its implementation using MPI
 Students will be introduced to the concept of multithreading and its implementation using OpenMP
 Students will be introduced to the concept of vectorization and will understand how to take advantage of it using compiler flags
 Students will be introduced to the concepts of streaming and manycore accelerators such as GPUs
 Students will be aware of various concurrency issues, the problems they can cause, and some mechanisms for avoiding them, including race conditions, deadlocks, critical sections, and data dependencies
 Students will discuss the generation of parallel random number streams
 Example Activity: Students will review and run MPI and OpenMP example programs on parallel computer system

Data Intensive Computing:
 Students will be able to define data intensive computing and explain how it differs from traditional high performance computing
 Students will describe an application, preferably related to their field of study, that involves data intensive computing
 Students will be able to transfer a dataset from one system to another using a parallel data transfer method
 Students will understand how data intensive computing impacts algorithm design
 Example Activity: Students will experiment with file transfer tools such as gridftp

Data management:
 Students will be familiar with the file compression and archiving tools available on their systems
 Students will know what tool to use to decompress a file in any of the formats commonly used in the HPC community
 Students will understand how metadata helps make sense of data and will recognize at least one metadata format, possibly XML
 Example Activity: Students will read and write HDF5 files with a provided application and will use HDF5 commands to inspect the files

Understanding fault tolerance:
 Students will be able to explain the concept of mean time between failures (MTBF)
 Students will be able to explain the concept of fault tolerance
 Students will identify some causes of failures in HPC systems
 Students will understand why fault tolerance is a bigger issue on extremely large scale systems than on smaller systems
 Example Activity: Students will do textbook exercises to compute MTBF of computer system components

Students will understand basic scientific visualization concepts
 Students will understand the difference between scientific visualization and information visualization
 Students will understand basic techniques to visualize scalar fields
 Students will understand basic techniques to visualize vector fields
 Example Activity: Students will display a twodimensional scalar dataset using a widely available visualization tool, for example, MATLAB, Excel, or VisIt
Specialty Area 1: HPC Software Development

Prereqs  Intro to High Performance Scientific Computing:
 This area is intended for researchers who develop HPC software for their own use or community use
 Students coming out of this course should understand the topics of the first two areas at a mastery level in addition to mastery of the competencies listed here
 There is some redundancy in the competencies listed for this area and those above; this area is intended to provide more depth
 This area includes more advanced programming skills, software engineering practices, and parallel programming
 Students should understand and have a working knowledge of alternative approaches to parallel programming and how they relate to current and emerging parallel programming models

Advanced programming skills
 Students will be able to use advanced language features that support scientific computing
 Students will be able to create and use userdefined types, structures, classes, or similar mechanisms in their primary HPC programming language
 Students will be familiar with objectoriented design and programming
 Students will demonstrate the use of gdb or some other commonly available debugger, including setting breakpoints, stepping, and examining the contents of variables

Students will understand and follow software engineering best practices:
 Students will define software requirements for one or more of their class projects
 Students will be required to produce a software design for one or more of their class projects
 Students will use source control for all their class projects and will understand the capabilities of the source control tool they are using
 Students will create a makefile for each of their programs or use some other method for onestep builds
 Students will develop unit and regression test cases for their programs
 Students will be introduced to testing frameworks

Students will demonstrate techniques for understanding and interpreting existing code:
 Students will identify the main data structures in a sample code and the relationships among them
 Syntactical structure of the code
 Students will profile code to see call chain and logic
 Students will identify calls to external libraries

Students will be proficient in serial program optimization:
 Students will correctly use compiler flags to optimize their code
 They will understand the benefits and limitations of the compiler optimizations
 Students will understand the dangers of using compiler optimizations on floatingpoint code, where common optimizations can change results
 Students will use timing functions and code profiling tools such as gprof to measure the time spent in different sections of their code
 Students will understand the concepts of efficient cache utilization including unit stride and cache reuse
 Example Activity: Students will explore these concepts by modifying sample code to have different memory access patterns and measuring the execution time of the different versions. Sample code may include matrixmatrix multiplication.

Students will use parallel concepts/algorithms in developing software:
 Students will complete a class project using each of these models: SIMD, MIMD, SPMD
 Students will discuss the architectural concepts associated with each of these models and give a current example of each
 Students will discuss the algorithmic and program constructs that work well with each
 Students will understand which types of algorithms parallelize well, at what granularity they are parallelized, and how this relates to different architectures
 Students will understand various concurrency issues, the problems they can cause, and how to handle them properly, including race conditions, deadlocks, critical sections, and data dependencies

Students will demonstrate skill in application scaling:
 Students will be able to define weak and strong scalability and give examples of algorithms or applications exhibiting each
 Students will be able to identify parallelism in an application and decide on the best approach to exploit it
 Students will describe several factors that limit scalability
 Students will complete a class project in which they measure speedup of their code and present the information in a meaningful way
 Students will explain the limits of parallelism as described by Amdahl's law and give examples

Students will be able to write a parallel program using MPI:
 Students will understand the concepts of message passing and communicators
 Students will know how to use pointtopoint communication and collective communication functions
 Students will be able to explain the semantics of blocking and nonblocking send and receive operations
 Students will be able to explain the standard, buffered, synchronous, and ready communication modes
 Students will recognize common errors that can cause a program to deadlock and will suggest ways to correct them
 Students will understand why a program that works correctly with one MPI implementation may fail when run with another one
 Students will be introduced to techniques for communicating noncontiguous data or mixed data types
 Students will be introduced to virtual topologies in MPI
 Students will understand how data distribution affects communication loads (data movement)
 Students will investigate the use of nonblocking and onesided communication for overlapping computation and communication

Students will be able to write a program using OpenMP:
 Students will use OpenMP for looplevel parallelization in an example program
 Students will understand what types of data dependencies inhibit loop parallelization
 Students will be introduced to additional worksharing constructs in OpenMP including Parallel Regions and Parallel Sections
 Students will understand the concepts of private, shared and reduction variables in OpenMP and will be able to correctly determine when to use each attribute
 Students will be introduced to synchronization mechanisms available in OpenMP
 Students will recognize race conditions and use synchronization to prevent errors
 Students will understand the relationship between threads and cores and how OpenMP relates to both
 Students will understand OpenMP scheduling options and will experiment with them in class exercise

Students will be introduced to hybrid or mixed mode MPIOpenMP:
 Students should know why and when it is appropriate to mix MPI and OpenMP
 Students will be able to explain the benefits and pitfalls of mixing MPI and OpenMP
 Students will create and run a program that combines MPI and OpenMP parallelization techniques
 Students will be able to explain the threading models in the MPI standard and the strengths and pitfalls of each

Students will learn techniques for load balancing:
 Students will be introduced to data distribution strategies such as blocked partitioning, cyclic partitioning, graph partitioning, etc.
 Students will understand the effect of data distribution on load balance
 Students will know how to measure load imbalance
 Students will understand static and dynamic load balancing

Students will be introduced to frameworks for largescale parallel code development:
 Students will demonstrate the use of a code development framework or integrated development environment, preferably one created for high performance computing. Potential candidates include the Eclipse Parallel Tools Platform (PTP), Microsoft Visual Studio and Intel Cluster Studio.

Students will be introduced to manycore programming:
 Students will explain the difference between manycore and multicore computing
 Students will describe the characteristics, benefits, and programming challenges of at least one manycore device
 Students will run a sample CUDA or OpenCL (or similar language) program on a GPU or other manycore device and will compare execution time to that of a CPU version of the same computation

Students will be able to write a simple Partitioned Global Address Space (PGAS) parallel
program:
 Students will understand the fundamental concepts of PGAS: partitioned global memory, threads, affinity and nonlocal access, collective operations and ownercomputes
 Students will be introduced to a PGAS language or library such as UPC, Fortran, X10, Chapel, SHMEM or Global Arrays

Students will understand how to implement checkpointing:
 Students will be able to explain the purpose and concepts of checkpointing
 Students will understand the challenges of checkpointing a parallel code
 Students will add checkpointing to an existing HPC code

Students will demonstrate skill in debugging parallel programs:
 Students will be proficient in the use of a parallel debugger, for example Totalview, DDT, or the debugging capabilities of an IDE they are using
 Students will be aware of some common types of bugs in parallel programs and will understand how to find and correct them
 Students will understand how concurrency issues outlined earlier (data dependencies, race conditions, deadlocks) manifest as bugs

Students will be able to improve the efficiency of a program:
 Students will know how to improve efficiency by improving data locality and memory access patterns
 Students will know how to improve efficiency by finding hotspots in their code and optimizing small sections of code
 Students will understand the factors that affect the performance of a parallel program
 Students will be introduced to techniques for efficient scheduling: detecting load imbalance, appropriately sized tasks, etc.
 Students will be introduced for techniques for reducing communication overhead such as message coalescing and latency hiding

Students will know how to use performance analysis tools:
 Students will understand how to use performance analysis tools such as gprof, IPM and TAU to find performance problems in their code
 Students will be able to interpret the output of performance analysis tools
 Students will be able to use profiling tools such as Oprofile which report Hardware Performance Counter data to measure CPU performance metrics such as cache misses

Students will understand HPC workflows and know how to automate them:
 Students will learn the basics of shell scripting, Perl, Python or some other language used for scripting
 Students will create a script to automate a simple HPC workflow
 Students will be introduced to workflow construction tools such as Kepler and Eclipse

Students will understand parallel I/O:
 Students will understand how to implement parallel I/O, for example using MPI I/O, HDF5, or ADIOS
 Students will create a program that uses MPIIO or some other library to perform parallel I/O
 Students will understand the effect of network topology on parallel file I/O
 Students will investigate leveraging parallel filesystems for dataintensive applications