这是一个美国的C++并行计算优化project代写
The goal of this project is to use your understanding of parallel computing resources in a multicore
microprocessor to optimize two fully functional applications. The applications are Matrix Multiple and
K-Means Clustering.
For a functional description of the applications, please refer to:
http://en.wikipedia.org/wiki/Matrix_Multiplication
http://en.wikipedia.org/wiki/K-Means
The code optimization techniques you may want to consider include:
• Cache blocking
• OpenMP pragma-based optimizations
o omp parallel
o omp for
o omp atomic
o omp reduction
• Intrinsics Programming
Grading Criteria
• 30% – Correctness – Correctness of the results (program output)
• 30% – Performance
o matrix_mul/omp/matrix_mul.cpp (Matrix Multiply):
Achieve at least a 2x speed up compared to the current OpenMP version (SUM of all
the testcases on matrix_mul_03.dat)
o kmeans/omp_kmeans.c (K-means):
Achieve at least a 5x speed up compared to the current OpenMP version (SUM of all
tests)
• 30% – Write up – For each performance optimization explored, describe clearly:
o How the speed up works
o What is the expected speed up
o What is the observed speed up
o An explanation of any difference between the expected and observed speed ups
• 10% – Code quality – Good coding practices and well commented code
Guidelines for the write up:
Minimum of one 8.5×11 page write-up for each optimization. The write up should include:
• Optimization goal:
o Hardware resources being optimized toward? (cache? SIMD? multicore?)
o What is the specification of the hardware you are optimizing for?
• Optimization process:
o Data considerations
o Parallelization considerations
• Optimization results:
o Performance before optimization
o Performance after optimization
The three teams with the fastest implementations will present the techniques they attempted in a 10-
minute presentation during the project review session.