2.1 Data Parallelism
2.2 CUDA C Program structure
2.3 A vector addition kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Calling Kernel Functions
2.7 Compilation
2.1 Data Parallelism
- Explain why data parallelism is the primary source of scalability in parallel programs. How does it allow performance to improve with newer hardware generations?
- Compare and contrast the computational dependencies in image blurring versus color-to-grayscale conversion. Why is one inherently more parallelizable than the other?
- What is the difference between Task and Data Parallelism?
- Given a modern GPU with thousands of CUDA cores and a CPU with 16 cores, how would you allocate work for a color-to-grayscale conversion task? How does your strategy change if you were performing a task-parallel molecular dynamics simulation?
- In a hybrid parallel program that employs both data parallelism and task parallelism, what are the five major factors that determine the efficiency of workload distribution across different processing units?
- Suppose you are designing a real-time video processing system that applies both grayscale conversion and motion tracking. Which type of parallelism should be used for each operation, and why.
- What are the potential challenges in ensuring correctness in a data-parallel grayscale conversion algorithm when implemented on a massively parallel system?
2.2 CUDA C Program structure