Unit 2 - Heterogenous Data Parallel Computing

2.1 Data Parallelism

2.2 CUDA C Program structure

2.3 A vector addition kernel

2.4 Device Global Memory and Data Transfer

2.5 Kernel Functions and Threading

2.6 Calling Kernel Functions

2.7 Compilation

Explain why data parallelism is the primary source of scalability in parallel programs. How does it allow performance to improve with newer hardware generations?
Compare and contrast the computational dependencies in image blurring versus color-to-grayscale conversion. Why is one inherently more parallelizable than the other?
What is the difference between Task and Data Parallelism?
Given a modern GPU with thousands of CUDA cores and a CPU with 16 cores, how would you allocate work for a color-to-grayscale conversion task? How does your strategy change if you were performing a task-parallel molecular dynamics simulation?
In a hybrid parallel program that employs both data parallelism and task parallelism, what are the five major factors that determine the efficiency of workload distribution across different processing units?
Suppose you are designing a real-time video processing system that applies both grayscale conversion and motion tracking. Which type of parallelism should be used for each operation, and why.
What are the potential challenges in ensuring correctness in a data-parallel grayscale conversion algorithm when implemented on a massively parallel system?