1 Purpose and Background
This project is designed to give you experience in writing multi-threaded programs by implementing a simplified MapReduce-style wordcount application. By working on thisproject:
• You will learn to write multi-threaded code that correctly deals with race conditions.
• You will carry out a simple performance evaluation to examine the performance impact of (i) the degree of parallelism in the mapper stage and (ii) the size of the shared buffer which the two stages of your application will use to communicate.
The wordcount application takes as input a text file and produces as output the counts for all uniquely occurring words in the input file arranged in an alphabetically increasing order. We will assume that the words within our input files will only contain letters of the English alphabet and the digits 0-9 (i.e., no punctuation marks or other special characters). Our wordcount will consist of two stages. The first stage, called “mapper,”reads individual words and produces key-value pairs of the form (word, 1). The second stage, called the “reducer” consumes these, groups them by key, and sums up the counts in each group to produce the final output. Notice how the first stage can be parallelized. That is, the mapper stage can consist of multiple mapper threads, each working on its own separate portion of the input file. The reducer stage only contains a single thread which runs concurrently with the mapper threads. The communication between the mapper threads and the reducer thread occurs via a shared in-memory FIFO buffer.
Figure 1 summarizes these ideas.
2 Starter code: unsynchronized word-count
We are providing you starter code that implements a preliminary version of wordcount that works correctly in the absence of multi-threading. To appreciate this, you should confirm that this code is able to pass the “serialize” test in which accesses to the buffer occur in a strictly serial manner (i.e, an operation is issued only upon the completion of the previous operation). However, our implementation is not thread-safe. That is, it has race conditions. You will need to enhance this code to create a thread-safe version of wordcount.
3What you need to implement
Take the time to skim all the starter source code files. All your implementation will be confined to the files helper.c and helper . h. Specifically, you need to consider enhancing the following functions (recall that the provided versions of these functions work correctly in the absence of multi-threding). Though we have implemented the buffer_ create function for you, you can enhance it to make any kind of initialization.
●state_ t★buffer_ create (int capacity): creates an in-memory buffer with the specified capacity in bytes). Returns a pointer to state_ t, see definition in helper . h. This function will be called once at the beginning, you can do any kind of initialization in it based on your implementation.
●enum buffer_ status buffer_ send(state_ t *buffer, void* data): writes data to the buffer. In case the buffer is full, the function waits till the buffer has space to write the new data. Returns BUFFER_ SUCCESS for successfully writing data to the buffer, CLOSED_ ERROR if the buffer is closed, and BUFFER_ ERROR on encountering any other error. The size of data can be found out using the function get_ msg_ size (data).
●enum buffer_ status buffer_ receive (state_ t★buffer, void** data):Reads data from the given buffer and stores it in the function’s input parameter, data (Note that it is a double pointer). This is a blocking call i.e, the function only returns on a successful completion of receive In case the buffer is empty, the function waits till the buffer has some data to read.
●enum buffer_ status buffer_ close (state_ t★buffer): closes the buffer and informs (you may think of giving signal) all the blocking send/ receive calls to return with CLOSED_ ERROR. Once the buffer is closed, send/ receive operations will return CLOSED_ ERROR. Returns BUFFER_ SUCCESS if close is successful, CLOSED_ ERROR if the buffer is already closed, and BUFFER_ ERROR for other errors.
●enum buffer_ status buffer_ destroy (state_ t★buffer) Frees all the memory allocated to the buffer , using own version of sem fags The caller is responsible for calling buffer_ _close and waiting for all threads to finish their tasks before calling buffer_ dest roy Returns BUFFER_ SUCCESS if destroy is successful, DESTROY_ ERROR if buffer_ destroy is called on an open buffer,and BUFFER_ ERROR in any other error case
4 Programming rules
You are not allowed to take any of the following approaches to complete the assignment:
●Spinning in a polling loop to implement blocking calls.
●Sleeping instead of using condition-wait.
●Trying to change the timing of your code to overcome race conditions.
●Using global variables.
You are only allowed to use the pthreads library, the standard C library functions (e.g.,malloc/ free), and the provided starter code. If you think you have a valid reason to use some code outside of these, please contact the course staff to determine its eligibility.