We currently live in an era where most computers possess multiple computing units, and where parallelization is key. In particular, GPGPUs (General Purpose Graphical Processing Units) are built for massive parallelism and they have recently risen to prominence as they are now used for many scientific tasks, such as physics or biological simulations, statistical inference or machine learning.
In this crash course we will focus on CUDA as well as several CUDA-based API, including openMP GPU offloading and python APIs. Through concrete examples we will describe the principles at the core of a successful parallelization attempt.