TIC -- Lecture4: Finding the threads

UU | IT | DoCS | eh

TIC -- Lecture4: Finding the threads

1. Who are you?

2. Which of the following statements are true for paper P0 (collins)?

1 Most static loads are delinquent loads

2 SPEC MCF has mostly capacity misses

3 The main thread spawns most speculative slices

4 A tunable maximum-spawn-depth register controls overly aggressive speculative precomputation

5 SP performance gets better with more thread contexts

6 SP improves performance of applications with chains of dependent loads

7 The performance of GZIP does not improve because the memory access pattern varies too much

8 The authors claim that SP also improves branchprediction

9 The method uses run-time profiling to find p-slices and triggers

3. Check the TRUE statements for P1 (tullsen):

1 The fetch policy ICOUNT used in this paper makes sure that the threads fetch the instructions in a round robin fashion

2 A single thread with poor memory utilization can make other threads execute slower since it occupies resources in the processor

3 It is especially when a SMT processor executes one thread with poor memory utilization and another thread with good memory utilization that it is preferable to perform a pipeline flush on a long latency stall

4 Stalling slow threads make them stay longer on the machine and hence makes the overall performance of the system worse

5 A pipeline flush initiated by a long-latency stall according to the paper always clears the rename registers

\ 4. Which of these statements about paper 2 (Steffan) are true?

1 Thread-Level Speculation (TLS) needs compiler support?

2 TLS can hurt any locality optimizations present in the applications

3 TLS does not need any special hardware?

4 The "ownership required buffer" (ORB) stores all caches lines that are in the speculatively shared state?

5 Allowing multiple-writers to shared cache lines showed to be very efficient in the TLS setting?

6 The proposed TLS implementation showed to be more efficient for multi-node architectures?

5 . Rate paper P0

1 Was it easy to read the paper?

2 Is the paper technically sound (for the time it was written)?

3 How do you rate the overall presentation?

4 Any short suggestions for improvements?

6 . Submit at least two issues to discuss at the meeting

1 Issue 1

2 Issue 2

3 Issue 3

4 Issue 4

Please, print a copy of your form and bring it to the next meeting

<eh@it.uu.se>

1	Most static loads are delinquent loads
2	SPEC MCF has mostly capacity misses
3	The main thread spawns most speculative slices
4	A tunable maximum-spawn-depth register controls overly aggressive speculative precomputation
5	SP performance gets better with more thread contexts
6	SP improves performance of applications with chains of dependent loads
7	The performance of GZIP does not improve because the memory access pattern varies too much
8	The authors claim that SP also improves branchprediction
9	The method uses run-time profiling to find p-slices and triggers

1	The fetch policy ICOUNT used in this paper makes sure that the threads fetch the instructions in a round robin fashion
2	A single thread with poor memory utilization can make other threads execute slower since it occupies resources in the processor
3	It is especially when a SMT processor executes one thread with poor memory utilization and another thread with good memory utilization that it is preferable to perform a pipeline flush on a long latency stall
4	Stalling slow threads make them stay longer on the machine and hence makes the overall performance of the system worse
5	A pipeline flush initiated by a long-latency stall according to the paper always clears the rename registers

1	Thread-Level Speculation (TLS) needs compiler support?
2	TLS can hurt any locality optimizations present in the applications
3	TLS does not need any special hardware?
4	The "ownership required buffer" (ORB) stores all caches lines that are in the speculatively shared state?
5	Allowing multiple-writers to shared cache lines showed to be very efficient in the TLS setting?
6	The proposed TLS implementation showed to be more efficient for multi-node architectures?

1	Was it easy to read the paper?
2	Is the paper technically sound (for the time it was written)?
3	How do you rate the overall presentation?
4	Any short suggestions for improvements?