Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

Homework 1 CS 520 Fall 2014 (each question has 25 points) Total 100 points 1. Devise formulas for the functions that calculate my_first_i and...

We’ve implicitly assumed that each core’s call to Compute_next_value() function or other similar function requires roughly the same amount of work as the other calls for our block partition strategy among the cores. How would you change your partition approach to the preceding question if call i = k requires k + 1 times as much work as the call with i = 0? So if the first call (i =0) requires 2 milliseconds, the second call (i =1) requires 4, the third (i = 2) requires 6, and so on. Please show a better approach that can achieve the best load balancing than block partition. Using an example of p=5 as the total number of cores and n = 23 ( i=0,1,2….22) (Hint: Try to come up with a better/revised approach than simple cyclic assignment and compare the total work among different cores)Try to write pseudo-code for the tree-structured global sum illustrated in slide. We can use C’s bitwise operators to implement the tree-structured global sum. In order to see how this works, it helps to write down the binary (base 2) representation of each of the core ranks, and note the pairings during each stage:From the table we see that during the first stage each core is paired with the core whose rank differs in the rightmost or first bit. During the second stage cores that continue are paired with the core whose rank differs in the second bit, and during the third stage cores are paired with the core whose rank differs in the third bit. Thus, if we have a binary value bitmask that is 0012 for the first stage, 0102 for the second, and 1002 for the third, we can get the rank of the core we’re paired with by “inverting” the bit in our rank that is nonzero in bitmask . This can be done using the bitwise exclusive or ^ operator. Implement this algorithm in pseudo-code using the bitwise exclusive or and the left-shift operator.Derive formulas for the number of receives and additions that core 0 carries out usinga. The original pseudo-code for a global sum (all other cores send partial sum to core 0 and 0 does all the summation), andb. The tree-structured global sum.c. Make a table showing the numbers of receives and additions carried out by core 0 when the two sums are used with 2, 4, 8, … ,1024 cores

Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question