TIC -- Lecture3: CMT and SMT

UU | IT | DoCS | eh

TIC -- Lecture3: CMT and SMT

1. Who are you?

2. Check the TRUE statements for P0 (eggers):

1 When it selects threads to fetch the Icount feedback technique tries to minimize stalls on load

2 2.8 fetching fetches in total sixteen instructions from two threads'

3 A fine-grained multiprocessor execute instructions from a new thread every cycle

4 Shared caches are the main bottleneck for the SMT

5 To achieve the same performance the described CMPs need a larger die-area than the proposed SMT

6 The single thread performance of an SMT is not degraded in comparison with an ordinary superscalar processor

7 Because of the increased complexity of the fetching unit the pipeline is extended by two stages

3. Check the TRUE statements for P1 (CMP Pirahna) :

1 The Piranha CPU is in-order and single-issue

2 Its L1 caches are write-through

3 The aggregate size of the L1 caches is about the same as the L2 cache size

4 There is no inclusion between L2 and L1

5 Therefore all L1 caches must snoop all transactions

6 The L2 cache implementation is interleaved on some lower address bits

7 The L2 cache is logically shared

8 The L2 cache is highly associative

9 Extra DRAM has been added to hold the directory information

4. Which of the following statements are true for paper Lecture3 paper P2 (hammond)

1 A pure architectural comparison speaks in favor for CMP over SMT

2 CMPs will improve single-thread performance

3 CMPs will improve "throughput computing" performance

4 Instruction window size determines the number of instructions that can be examined in search for ILP

5 CMP designs are simpler to validate than O-O-O ILP monsters

6 SMTs require more ports on the L1 caches than CMP designs

5. Which of the following statements are true for paper Lecture3 paper P3 (sasanka)

1 The energy-delay product (i.e. energy divided by performance) is a good metric if the user desires a fixed amount of performance or is constrained by a fixed amount of energy

2 The energy efficiency of CMP would be reduced if it is impossible increase its IPC linearly with the number of cores. For example if bus contention is high

3 CMP consistently shows better EPI than SMT. The better EPI is due to the less or equally complex cores (when compared to the SMT) and more total recourses (due to multiple cores)

4 A four threaded SMT core capable of fetching 8 instructions per cycle gives the same performance as a CMP with four cores

5 Both SMT-2 and CMP-2 running MPGenc_MPGdec achieve a minimum EPI with a fetch width of six at the performance point with SMT-2 at the highest frequency

6 . Rate paper P1

1 Was it easy to read the paper?

2 Is the paper technically sound (for the time it was written)?

3 How do you rate the overall presentation?

4 Any short suggestions for improvements?

7 . Submit at least two issues to discuss at the meeting

1 Issue 1

2 Issue 2

3 Issue 3

4 Issue 4

Please, print a copy of your form and bring it to the next meeting

<eh@it.uu.se>

1	When it selects threads to fetch the Icount feedback technique tries to minimize stalls on load
2	2.8 fetching fetches in total sixteen instructions from two threads'
3	A fine-grained multiprocessor execute instructions from a new thread every cycle
4	Shared caches are the main bottleneck for the SMT
5	To achieve the same performance the described CMPs need a larger die-area than the proposed SMT
6	The single thread performance of an SMT is not degraded in comparison with an ordinary superscalar processor
7	Because of the increased complexity of the fetching unit the pipeline is extended by two stages

1	The Piranha CPU is in-order and single-issue
2	Its L1 caches are write-through
3	The aggregate size of the L1 caches is about the same as the L2 cache size
4	There is no inclusion between L2 and L1
5	Therefore all L1 caches must snoop all transactions
6	The L2 cache implementation is interleaved on some lower address bits
7	The L2 cache is logically shared
8	The L2 cache is highly associative
9	Extra DRAM has been added to hold the directory information

1	A pure architectural comparison speaks in favor for CMP over SMT
2	CMPs will improve single-thread performance
3	CMPs will improve "throughput computing" performance
4	Instruction window size determines the number of instructions that can be examined in search for ILP
5	CMP designs are simpler to validate than O-O-O ILP monsters
6	SMTs require more ports on the L1 caches than CMP designs

1	The energy-delay product (i.e. energy divided by performance) is a good metric if the user desires a fixed amount of performance or is constrained by a fixed amount of energy
2	The energy efficiency of CMP would be reduced if it is impossible increase its IPC linearly with the number of cores. For example if bus contention is high
3	CMP consistently shows better EPI than SMT. The better EPI is due to the less or equally complex cores (when compared to the SMT) and more total recourses (due to multiple cores)
4	A four threaded SMT core capable of fetching 8 instructions per cycle gives the same performance as a CMP with four cores
5	Both SMT-2 and CMP-2 running MPGenc_MPGdec achieve a minimum EPI with a fetch width of six at the performance point with SMT-2 at the highest frequency

1	Was it easy to read the paper?
2	Is the paper technically sound (for the time it was written)?
3	How do you rate the overall presentation?
4	Any short suggestions for improvements?