High performance computing

Course contents

Introduction

  1. Tendencies of computing systems evolution, which cause using of distributed (parallel) methods of computing
  2. Examples of compute-intensive tasks from science
  3. Parallel systems classification (SIMD, MISD...). Short history
  4. Modern high performance systems from SSE to clasters via multithreading

Parallelizm

  1. Types of parallelizm: by data; by functions
  2. The problem of automatic multisequencing
  3. Concepts of acceleration, efficiency (Amdala's law) and scalability of parallel algorithm
  4. Processes and threads (contexts, states)
  5. Priorities (java, Windows, Unix)
  6. Planning
    • Daemons
    • Orphans
    • Zombies
  7. Reasons of some threads advantages
  8. Processes and threads implementation specific details in different OS

The basis of multithreading

  1. Posix-threads
  2. Java-threads (Green-threads)
  3. Boost-threads
  4. Threads comparison (table of corresponding functions)
  5. Exception handling
  6. Spurious wakeups

Synchronization

  1. Primitives
    • Critical section
    • Mutex (futex)
    • Semaphore
    • Event
    • Interlocked-functions
    • Conditional variables
  2. Mutex types
    • Recursive/Non-Recursive
    • Shared (read/write)
    • Spin
    • Named
  3. Synchronization types
    • Rude
    • Thin
    • Optimistic
    • Lazy
    • Non-blocking (Lock-free on list example)
  4. Hidden synchronization in UI-libraries
  5. Peterson and Lamport's algorithms

Debug and performance analysis

  1. The main multithreading errors
    • Data Race
    • Deadlock (Livelock)
    • Lost signal
  2. Specific errors
    • Signal handling
    • fork in multithread program
    • ABA problem
    • Priorities inversion
  3. Parallel applications design principles
  4. Error detectors
    • Intel Thread Checker
    • Intel Parallel Inspector
    • valgrind (helgrind tool)
  5. Performance analysis tools
    • Intel Thread Profiler
    • valgrind (callgrind and cachegrind tools)

OpenMP

  1. History of the standard
  2. Main functions
  3. Main preprocessor directives
    • Thread creation, parallel sections
    • Working with data (shared, private...)
    • Synchronization (critical, atomic)
    • Threads collaboration (reduction, copyin...)
  4. Number of threads
  5. Using in recursive tasks and loops

Intel TBB

  1. Library structure
  2. Memory allocators
  3. Thread-safe containers
  4. Synchronization primitives
  5. Algorithms
    • parallel_for
    • parallel_reduce
    • parallel_scan
  6. Task trees
  7. TBB scheduler
    • Thread pool control
    • Work stealing

Java.util.concurrent

  1. Thread pools
  2. Task control using Future
  3. Thread-safe containers
  4. Synchronization primitives

Parallel algorithms

  1. || algorithms in STL
  2. Gparh algorithms
  3. Sorting
  4. Matrix multiplication
  5. Common approaches to algorithms multisequencing

Distributed systems

  1. Definitions. Goals. Architectral principles
    • Openness
    • Transparency
    • Scalability
  2. Program solutions
    • Distributed operating systems
    • Network operating systems
    • Middleware
  3. Client-server model. Variants
  4. Hostless systems. Multiagent systems
  5. Components interaction
    • Protocols (TCP, UDP...)
    • Remote Procedure Call
    • Remote Method Invocation
    • Message passing
  6. Types of clusters
    • High availability
    • Load balancing
    • High reliability
    • Compute clusters

Compute clusters and MPI

  1. History and standard assignation
  2. Common program structure
  3. Message passing
    • Blocking
    • Non-Blocking
    • Postponed requests
    • Deadlocks
  4. Process interaction
    • Groups and communicators
    • Collective operations
    • Virtual topologies
  5. Profilers
    • Jumpshot
    • Intel® Trace Analyzer и Intel® Trace Collector

Consensus

  1. Linearization
  2. Atomic register
  3. Consensus value
    • Consensus value of RMW-registers
    • Universalism of CAS-operations
  4. Atomic registers snapshot
    • lock-free algorithm
    • wait-free algorithm

Templates of parallel programming

  1. Decomposition based on tasks
  2. Geometric decomposition
  3. Recursive data
  4. Program models
    • SPMD
    • Parallel loops
    • Boss/Worker
    • Pipeline
  5. Parallel algorithms in STL

GRID-systems

  1. GRID concept
  2. Applicable scope
  3. Architecture demands
  4. Cloud computing

In addition

  1. Continuous integration systems (maven + continuum)
  2. rpm-building for different architectures (mock)
  3. Application servers (WSDL + XSD + SOAP...)
  4. Lambda expressions (C++)
  5. Compiler optimizations