UU | IT | DoCS | eh TIC -- Lecture4: Finding the threads 1. Who are you? Sel Ali Charlotta Dan ErikB ErikH Fredrik Guillaume HåkanS HåkanZ HenrikJ HenrikL Johann Lars Magnus MartinK MartinT Mats Mikael Per Simon Tomas Zhonghai Zoran 2. Which of the following statements are true for paper P0 (collins)? 1 Most static loads are delinquent loads 2 SPEC MCF has mostly capacity misses 3 The main thread spawns most speculative slices 4 A tunable maximum-spawn-depth register controls overly aggressive speculative precomputation 5 SP performance gets better with more thread contexts 6 SP improves performance of applications with chains of dependent loads 7 The performance of GZIP does not improve because the memory access pattern varies too much 8 The authors claim that SP also improves branchprediction 9 The method uses run-time profiling to find p-slices and triggers 3. Check the TRUE statements for P1 (tullsen): 1 The fetch policy ICOUNT used in this paper makes sure that the threads fetch the instructions in a round robin fashion 2 A single thread with poor memory utilization can make other threads execute slower since it occupies resources in the processor 3 It is especially when a SMT processor executes one thread with poor memory utilization and another thread with good memory utilization that it is preferable to perform a pipeline flush on a long latency stall 4 Stalling slow threads make them stay longer on the machine and hence makes the overall performance of the system worse 5 A pipeline flush initiated by a long-latency stall according to the paper always clears the rename registers \ 4. Which of these statements about paper 2 (Steffan) are true? 1 Thread-Level Speculation (TLS) needs compiler support? 2 TLS can hurt any locality optimizations present in the applications 3 TLS does not need any special hardware? 4 The "ownership required buffer" (ORB) stores all caches lines that are in the speculatively shared state? 5 Allowing multiple-writers to shared cache lines showed to be very efficient in the TLS setting? 6 The proposed TLS implementation showed to be more efficient for multi-node architectures? 5 . Rate paper P0 1 Was it easy to read the paper? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 2 Is the paper technically sound (for the time it was written)? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 3 How do you rate the overall presentation? Sel bad not good average pretty good very good 4 Any short suggestions for improvements? 6 . Submit at least two issues to discuss at the meeting 1 Issue 1 2 Issue 2 3 Issue 3 4 Issue 4 Please, print a copy of your form and bring it to the next meeting <eh@it.uu.se>
Sel Ali Charlotta Dan ErikB ErikH Fredrik Guillaume HåkanS HåkanZ HenrikJ HenrikL Johann Lars Magnus MartinK MartinT Mats Mikael Per Simon Tomas Zhonghai Zoran
1 Most static loads are delinquent loads 2 SPEC MCF has mostly capacity misses 3 The main thread spawns most speculative slices 4 A tunable maximum-spawn-depth register controls overly aggressive speculative precomputation 5 SP performance gets better with more thread contexts 6 SP improves performance of applications with chains of dependent loads 7 The performance of GZIP does not improve because the memory access pattern varies too much 8 The authors claim that SP also improves branchprediction 9 The method uses run-time profiling to find p-slices and triggers
1 The fetch policy ICOUNT used in this paper makes sure that the threads fetch the instructions in a round robin fashion 2 A single thread with poor memory utilization can make other threads execute slower since it occupies resources in the processor 3 It is especially when a SMT processor executes one thread with poor memory utilization and another thread with good memory utilization that it is preferable to perform a pipeline flush on a long latency stall 4 Stalling slow threads make them stay longer on the machine and hence makes the overall performance of the system worse 5 A pipeline flush initiated by a long-latency stall according to the paper always clears the rename registers
1 Thread-Level Speculation (TLS) needs compiler support? 2 TLS can hurt any locality optimizations present in the applications 3 TLS does not need any special hardware? 4 The "ownership required buffer" (ORB) stores all caches lines that are in the speculatively shared state? 5 Allowing multiple-writers to shared cache lines showed to be very efficient in the TLS setting? 6 The proposed TLS implementation showed to be more efficient for multi-node architectures?
1 Was it easy to read the paper? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 2 Is the paper technically sound (for the time it was written)? Sel no -- not at all no -- only marginally so neutral yes -- to some extent yes -- very much 3 How do you rate the overall presentation? Sel bad not good average pretty good very good 4 Any short suggestions for improvements?
1 Issue 1 2 Issue 2 3 Issue 3 4 Issue 4
Please, print a copy of your form and bring it to the next meeting