+ +
- -
Operating Systems
Calendar  Details
Programming
 
Calendar  Details
ARM processors
 
Calendar  Details
PowerPC processors
 
 
 
 
Calendar  Details
Communications
 
 
Calendar  Details
+ +
> >
- -

Last News

Motor control by gestures with System Workbench for Linux on STM32MP1 MCU (video presentation)

Free Workshop on Embedded Linux with System Workbench for Linux on September 24th, 2019, in Lyon St Priest, with ARROW

 
ac6 >> ac6-training >> Programming >> Languages >> OpenCL Download Catalog Download as PDF Write us Printable version

L9 OpenCL

Parallel programming with OpenCL-1.2

formateur
Objectives
  • Learn parallel programming with OpenCL.
  • Know what (not) to expect from parallel programming.
  • Understand heavy multithreading and how it is mapped to the hardware.
  • Measure OpenCL code performance, locate and solve bottlenecks.
  • Write efficient OpenCL code.
Depending on the hardware environment, exercises will be run on either multi-core CPUs, nVidia or ATI GPUs.
Course environment
  • One PC under Windows for two trainees, with either
    • Intel OpenCL SDK (needs a recent CPU, core i3 or better, and Windows 7)
    • nVidia SDK (needs a recent workstation-class nVidia graphic interface)
    • ATI SDK (needs a recent workstation-class ATI graphic interface)
Exercise:  For on-site training sessions, contact us to check the needed configuration for PC used during hands-on labs.
Pre-requisites
  • Good knowledge of the C language

First day
Introduction to OpenCL
  • History
    • OpenCL 1.0
    • OpenCL 1.1
    • OpenCL 1.2
    • OpenCP/EP (Embedded Profile)
  • Design goals of OpenCL
    • CPUs, GPUs and GPGPUs
    • Data-parallel and Task-parallel
    • Hardware related and portable
  • Terminology
    • Host / Device
    • Memory model
    • Execution Model
The OpenCL Architecture
  • The OpenCL Architecture
    • Platform Model
    • Execution Model
    • Memory Model
    • Programming Model
  • The OpenCL Software Stack
  • Example
Exercise:  Installation and test of the OpenCL SDK
The OpenCL Host API
  • Platform layer
    • Querying and selecting devices
    • Managing compute devices
    • Managing computing contexts and queues
    • The host objects: program, kernel, buffer, image
Exercise:  Write a platform discovery and analysis program (displaying CPUs, GPUs, versions...)
  • Runtime
    • Managing resources
    • Managing memory domains
    • Executing compute kernels
Exercise:  Write an image loader program, transferring image to/from compute devices
  • Compiler
    • The OpenCL C programming language
    • Online compilation
    • Offline compilation
Second day
The Basic OpenCL Execution Model
  • How code is executed on hardware
    • Compute kernel
    • Compute program
    • Application queues
  • OpenCL Data-parallel execution
    • N-dimensional computation domains
    • Work-items and work-groups
    • Synchronization and communication in a work-group
    • Mapping global work size to work-groups
    • Parallel execution of work-groups
Exercise:  Compile and execute a program to square an array on the platform computing nodes
The OpenCL Programming Language
  • Restrictions from C99
  • Data types
    • Scalar
    • Vector
    • Structs and pointers
    • Type-conversion functions
    • Image types
Exercise:  Rewrite the square program to use vector operations
  • Required built-in functions
    • Work-item functions
    • Math and relational
    • Input/output
    • Geometric functions
    • Synchronization
  • Optional features
    • Atomics
    • Rounding modes
Exercise:  Write and execute an image manipulation program (Blur filter)
Third day
Advanced OpenCL Execution modes
  • Profiling
Exercise:  Enhance the image manipulation program to measure kernel computation time
  • The OpenCL Memory Model
    • Global Memory
    • Local Memory
    • Private Memory
  • OpenCL Task-parallel execution
    • Optional OpenCL feature
    • Native work-items
Exercise:  Simulate the N-Body problem, displaying data using OpenGL
Efficient OpenCL
  • When (not) to use OpenCL
  • Code design guidelines
  • Explicit vectorization
Exercise:  Explore vectorisation on an image rotation kernel
  • Memory latency and access patterns
    • ALU latency
    • Using local memory
Exercise:  Enhance the Blur filter program to investigate memory optimisations
  • Synchronizing threads
  • Warps/Wavefronts, work groups, and GPU cores