Thread Lab – Multithreading Using pthreads
In this lab, you will implement several versions of a code which computes a histogram of a set
of random numbers. While seemingly simple, there are many possible optimizations. Also,
this is a key kernel in a variety of algorithms (eg. radix sort).
More specifically, you are given 4 tasks that help you through accelerating the computation
of a histogram using multiple threads. Completing all the tasks is mandatory. Please note
that for each task, you must follow its guideline.
You are given a tarball that contains the following files:
A skeleton of the lab is provided for you. util.c spawns NTHREADS threads for you, and
calls the various kernels which you are to implement. You should leave this file untouched.
You will only edit the specified sections in thread.c. For each kernel (task), we describe an
idea of how to parallelize that you should follow, along with some hints. thread.h contains the
function headers, constants and global variables. You can use Makefile to compile or turn in
• Task 1: The given code divides the data into blocks, one for each thread. However,
there is a race! Use a global lock to have a thread-safe function.
• Task 2: One problem with the previous implementation is that many threads access
the lock at the same time. One possible solution is to have one lock per histogram
bucket. Add a bucket-specific lock and modify the thread routine.
• Task 3: The X86 ISA (and others) have a special instruction to perform a single
increment. Use atomic increment and modify the thread routine.
• Task 4: For this task, you should eliminate fine-grain synchronization to get speedup.
One way to do this is using the local histogram-based method. You can also try your
own optimizations. The goal is to get the fastest implementation.
A more detailed description of each task is given in thread.c.
You should make sure that each of the tasks produce correct result. You will receive 10 points
for each of the tasks 1, 2, and 3 if you complete correctly and 20 points for task 4. To get the
correctness grade, you should follow the guidelines given to you. The TAs will manually check
your codes to see whether you have followed them. You are given a runtime range for each of
the tasks. It’s based on the past submissions for this lab and just for your own information.
You’re not obliged to fall in that range.
Your performance will be evaluated based on the performance of task 4. The performance
point will be added only if you have the correct result, so be sure to prioritize the correctness
over performance. The sequential version runs at 125ms, you will get the performance grade
based on the speedup you can achieve. The points will be assigned as follows:
• Higher than 4× speedup: 50% + 10% (bonus)
• Range A runtime (2× ≤ speedup ≤ 4×): 50%
• Range B runtime (1× < speedup < 2×): 30%
• No faster than the sequential version: 0%