bebrian458/Lab2b

NAME: Brian Be
EMAIL: bebrian458@gmail.com
ID: 204612203

// Slip days
I will be using one of my 2 remaining slip days.

// Brief Overview of each file
lab2_list.c
A C program that tests the power and problems of multithreading when updating one or more
linked lists. This includes insertion, deletion, searching, and lookups.
SortedList.c
A C module that manages the insertion, deletion, searching and lookups, used in
lab2_list.c
SortedList.h
A header file that includes all of the functions and external variables for the
SortedList module.
Makefile
default: compiles all programs
profile: run tests with profiling tools to generate an execution profiling report
tests: runs over 200 specified test cases to generate CSV results
graphs: uses the supplied CSV results to create graphs using provided scripts
dist: create the deliverable tarball
clean: delete all generated programs and output from this Makefile

lab2b_1.png
throughput vs number of threads for mutex and spin-lock synchronized list operations.
lab2b_2.png
mean time per mutex wait and mean time per operation for mutex-synchronized list operations.
lab2b_3.png
successful iterations vs threads for each synchronization method.
lab2b_4.png
throughput vs number of threads for mutex synchronized partitioned lists.
lab2b_5.png
throughput vs number of threads for spin-lock-synchronized partitioned lists. ]

// Questions
Q 2.3.1 - Cycles in the basic list implementation:
Where do you believe most of the cycles are spent in the 1 and 2-thread list tests ?
Why do you believe these to be the most expensive parts of the code?
Where do you believe most of the time/cycles are being spent in the high-thread spin-lock tests?
Where do you believe most of the time/cycles are being spent in the high-thread mutex tests?

I think that most of the cycles are spent in the list operations. Since there are low number of threads, there
is very little contention, so there is little to no wait time/context switching. In relation to this, list
operations are more expensive, since they have to iterate through a lot of nodes and perform lots of string
comparisons for key matching.
I think that most of the cycles being spent in the high-thread spin-lock tests are from the wasteful spinning
in the while loop as the threads check for spin-lock.
I think that most of the cycles being spent in the high-thread mutex tests are from the context switching as
the threads go to sleep and registers must be saved and loaded.

Q 2.3.2- Execution Profiling:
Where (what lines of code) are consuming most of the cycles when the spin-lock version of the list exerciser is run with a large number of threads?
Why does this operation become so expensive with large numbers of threads?

By looking at the gperftool profile, the lines for checking the spin-lock consumes most of the cycles.
Each spin-lock check has a thread spin until it acquires the lock, or until it gets switched out for another
thread. The more threads there are, the more contention for the lock, and as a result a lower chance for each
thread to acquire the lock, leaving them to spin for a long time. This means that a lot of CPU cycles will
be wasted on constant spinning.

Q 2.3.3 - Mutex Wait Time:
Look at the average time per operation (vs # threads) and the average wait-for-mutex time (vs #threads).
Why does the average lock-wait time rise so dramatically with the number of contending threads?
Why does the completion time per operation rise (less dramatically) with the number of contending threads?
How is it possible for the wait time per operation to go up faster (or higher) than the completion time per operation?

The more threads are competing for a single lock, the lower the chance each thread will have to acquire the lock,
and, as a result, the each thread has to wait longer to grab the lock.
The completion time counts the wall time for the whole operation; it does not account for all of the wait times.
When one thread goes to sleep, that thread's wait time continues to increase, but another thread will be working on an operation, using the CPU. The wait times for each thread could be overlapping, which explains why the wait
time per operation increases much faster, while the completion time rises slower.

Q 2.3.4 - Performance of Partitioned Lists
Explain the change in performance of the synchronized methods as a function of the number of lists.
Should the throughput continue increasing as the number of lists is further increased? If not, explain why not.
It seems reasonable to suggest the throughput of an N-way partitioned list should be equivalent to the throughput of a single list with fewer (1/N) threads. Does this appear to be true in the above curves? If not, explain why not.

The performance of the synchronized methods will increase as the number of lists increase because there will be less contention between threads (meaning less wait times) and less iterations per list to insert/delete a node.
There will come a point where there are so many lists that the chance of contention is close to 0, which means
that each thread will be able to to work full time at a maximum throughput. This means that there must be a
limit for the increasing throughput.
No, the throughput of an N-way partitioned list should not be equivalent to the throughput of a single list with fewer (1/N) threads because a partitoned list reduces contention for each sublist lock as well as reducing the number of iterations each thread will have to go through to insert/delete a node. Having one list will mean that each thread will have to acquire the single lock one at a time, thus slowing down the performance.

// Research
Used TA powerpoint and Arpaci’s chapters for reference for lock code
Used http://www.tek-tips.com/viewthread.cfm?qid=1259127 for ideas to make random key
Used stack overflow to check c syntax.
Used source: http://www.cse.yorku.ca/~oz/hash.html for hash function

bebrian458 / Lab2b

About

Languages