UU | IT | DoCS | eh TIC -- Lecture6: Clustered Architectures 1. Who are you? Sel Ali Charlotta Dan ErikB ErikH Fredrik Guillaume HåkanS HåkanZ HenrikJ HenrikL Johann Lars Magnus MartinK MartinT Mats Mikael Per Simon Tomas Zhonghai Zoran 2. Which of the following statements are true for paper P4 (Canal)? 1 Inter-cluster bypasses are cheaper then local bypasses 2 For good-performing schemes the number of communication buses between clusters is not important 3 Balance is more important then communication latency for the simulated tests 4 Assuming every instruction has its registers allocated in its local cluster the simple RMBS scheme would perform best. 5 According to the authors new steering schemes will be needed in the future (because of higher latencies). 3. Which of the following statements are true for paper P5 (Parcerisa)? 1 In current technology the total latency of a communication is dominated by contention-delay. 2 In a 4-cluster 8-issue system with a partially synchronous ring the average number of copy-instruction executed every cycle is about 0.5if the IPC is 3. 3 In the baseline steering heuristic the second criterion is used more often than it is used in the extended heuristic. 4 When forwarding a message each router in the network chooses the first neighbor it finds that minimizes the path-length. 5 In the 8-Cluster topologies studied in the paper nodes with 4-neighbors require larger buffers than nodes with 3 neighbors. 4. Which of the following statements are true for paper P6 (Tseng)? 1 When going from having only one read and write port/bank to two read and write ports/bank the extra write port adds more to area then the read port (due to its larger decoder area). 2 Only one extra pipeline stage is needed (Arbitrate) but the solution requires extra logic in the issue stage. 3 The number of logical register ports increases quadratically with issue width. 4 Control logic is a crucial issue avoiding bank conflicts therefore it must be designed without adding severe complexity and latency in the pipeline. 5 Bank conflicts can significantly be reduced using both read sharing and the bypass network 5 . Rate paper P5 1 Was it easy to read the paper? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 2 Is the paper technically sound (for the time it was written)? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 3 How do you rate the overall presentation? Sel bad not good average pretty good very good 4 Any short suggestions for improvements? 6 . Submit at least two issues to discuss at the meeting 1 Issue 1 2 Issue 2 3 Issue 3 4 Issue 4 Please, print a copy of your form and bring it to the next meeting <eh@it.uu.se>
Sel Ali Charlotta Dan ErikB ErikH Fredrik Guillaume HåkanS HåkanZ HenrikJ HenrikL Johann Lars Magnus MartinK MartinT Mats Mikael Per Simon Tomas Zhonghai Zoran
1 Inter-cluster bypasses are cheaper then local bypasses 2 For good-performing schemes the number of communication buses between clusters is not important 3 Balance is more important then communication latency for the simulated tests 4 Assuming every instruction has its registers allocated in its local cluster the simple RMBS scheme would perform best. 5 According to the authors new steering schemes will be needed in the future (because of higher latencies).
1 In current technology the total latency of a communication is dominated by contention-delay. 2 In a 4-cluster 8-issue system with a partially synchronous ring the average number of copy-instruction executed every cycle is about 0.5if the IPC is 3. 3 In the baseline steering heuristic the second criterion is used more often than it is used in the extended heuristic. 4 When forwarding a message each router in the network chooses the first neighbor it finds that minimizes the path-length. 5 In the 8-Cluster topologies studied in the paper nodes with 4-neighbors require larger buffers than nodes with 3 neighbors.
1 When going from having only one read and write port/bank to two read and write ports/bank the extra write port adds more to area then the read port (due to its larger decoder area). 2 Only one extra pipeline stage is needed (Arbitrate) but the solution requires extra logic in the issue stage. 3 The number of logical register ports increases quadratically with issue width. 4 Control logic is a crucial issue avoiding bank conflicts therefore it must be designed without adding severe complexity and latency in the pipeline. 5 Bank conflicts can significantly be reduced using both read sharing and the bypass network
1 Was it easy to read the paper? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 2 Is the paper technically sound (for the time it was written)? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 3 How do you rate the overall presentation? Sel bad not good average pretty good very good 4 Any short suggestions for improvements?
1 Issue 1 2 Issue 2 3 Issue 3 4 Issue 4
Please, print a copy of your form and bring it to the next meeting