Opening |
8:45-9:50 |
Opening |
8:50-9:00 |
UPMARC Presentation
Bengt Jonsson |
Session 1 |
09:00-09:45 |
Tractable Refinement Checking for Concurrent Objects (abstract) (slides)
Efficient implementations of concurrent objects such as semaphores,
locks, and atomic collections are essential to modern computing.
Yet programming such objects is error prone: in minimizing the
synchronization overhead between concurrent object invocations,
one risks the conformance to reference implementations — or in formal
terms, one risks violating observational refinement. Testing this refinement
even within a single execution is intractable, limiting existing approaches
to executions with very few object invocations. We develop a polynomial-time
(per execution) approximation to refinement checking. The approximation is
parameterized by an accuracy k ∈ N representing the degree to which refinement
violations are visible. In principle, more violations are detectable as k increases,
and in the limit, all are detectable. Our insight for this approximation arises from
foundational properties on the partial orders characterizing the happens-before
relations between object invocations: they are interval orders, with a well defined
measure of complexity, i.e., their length. Approximating the happens-before relation
with a possibly-weaker interval order of bounded length can be efficiently implemented
by maintaining a bounded number of integer counters. In practice, we find that
refinement violations can be detected with very small values of k, and that our
approach scales far beyond existing refinement-checking approaches.
This is joint work with Ahmed Bouajjani, Michael Emmi, and Jad Hamza.
Constantin Enea |
09:45-10:30 |
Analysing and Optimising Parallel Snapshot Isolation (abstract)
Modern Internet services often rely on geo-replicated databases that provide consistency models for transactions weaker than serialisability. We investigate a promising consistency model of Parallel Snapshot Isolation (PSI), which weakens the classical snapshot isolation in a way that allows more efficient geo-replicated implementations:
- We give a declarative specification to PSI that does not refer to implementation-level concepts and thus allows reasoning about the behaviour of PSI databases more easily.
- Using our specification, we propose a criterion for checking when an application using a PSI database exhibits only serialisable behaviours.
- We additionally propose a criterion for checking when a set of transactions executing on PSI can be chopped into smaller pieces without introducing new behaviours, thus improving performance.
This is joint work with Andrea Cerone (IMDEA), Giovanni Bernardi (IMDEA) and Hongseok Yang (Oxford).
Alexey Gotsman |
Coffee break (10:30-11:00) |
Session 2 |
11:00-11:45 |
Scalable consistency for replicated data
(abstract) (slides)
Replicating modifiable data is a challenge in large-scale distributed systems, as it suffers from a fundamental tension between availability and data consistency.
Eventual consistency sidesteps the problem of availability under network partitioning, but remains ad-hoc, error-prone, and difficult to apply for programmers.
In this talk, we will review replicated data types that require no blocking synchronization: an update can execute immediately, irrespective of network latency, faults, or disconnection; this approach is highly scalable and fault-tolerant.
In discussing several use cases, we will see benefits and limitations of this approach and outline recent results.
Related project: https://syncfree.lip6.fr
Annette Bieniusa |
11:45-12:30 |
TSO-CC: Consistency-directed Cache Coherence for TSO (abstract) (slides)
The cache coherence protocol is a hardware mechanism present in shared
memory multiprocessors, that ensures that memory operations are made
visible to not only the processor that has initiated it, but also to
other processors. Traditional coherence protocols are designed for the
strictest consistency model, sequential consistency (SC). When they
are used for processors that support relaxed memory consistency
models, such protocols turn out to be unnecessarily strict at the cost
of scalability (in terms of per core storage), which poses a problem
with increasing number of cores in today’s CMPs.
Because of the wide adoption of Total Store Order (TSO) and its
variants in x86 processors, we propose TSO-CC, a cache coherence
protocol for the TSO memory consistency model. TSO-CC relies on
self-invalidation and detection of potential acquires using timestamps
to satisfy the TSO memory consistency model lazily. Our results show
that TSO-CC achieves average performance comparable to a conventional
(MESI directory protocol), while TSO-CC’s storage overhead per cache
line scales logarithmically with increasing core count. We will
conclude by outlining the challenges of verifying such
consistency-directed (and also conventional) coherence protocols.
Vijay Nagarajan |
Lunch Break (12:30-14:00) |
Session 3 |
14:00-14:45 |
Memory Subsystem & Consistency Verification in gem5 (abstract)
Daryl Stewart |
14:45-15:30 |
Callbacks: taking care of races in self-invalidation coherence protocols (abstract)
Cache coherence protocols based on self-invalidation allow a simpler design compared to traditional invalidation-based protocols, by relying on data-race-free (DRF) semantics and applying self-invalidation on racy synchronization points ex- posed to the hardware. Their simplicity lies in the absence of invalidation traffic, which eliminates the need to track readers in a directory, and reduces the number of transient protocol states. With the addition of self-downgrade these protocols can become effectively directory-free. While this works well for race-free data, unfortunately, lack of explicit invalidations compromises the effectiveness of any synchronization that re- lies on races. This includes any form of spin waiting, which is employed for signaling, locking, and barrier primitives.
In this work we propose a new solution for spin-waiting in these protocols, the callback mechanism, that is simpler and more efficient than explicit invalidation. Callbacks are set by reads involved in spin waiting, and are satisfied by writes (that can even precede these reads). To implement callbacks we use a small (just a few entries) directory-cache structure that is intended to service only these “spin-waiting” races. This directory structure is self-contained and is not backed up in any way. Entries are created on demand and can be evicted without the need to preserve their information. Our evaluation shows a significant improvement both over explicit invalidation and over exponential back-off, the state-of-the-art mechanism for self-invalidation protocols to avoid spinning in the shared cache.
Stefanos Kaxiras |
Coffee break (15:30-16:00) |
Session 4 |
16:00-16:30 |
Fast&Furious: A Tool for Detecting Covert Racing (abstract)
Existing multi-threaded applications perform synchronization either in
an explicit way, e.g., making use of the functionality
provided by synchronization libraries or in an implicit or ``covert''
way, e.g., using shared variables. Unfortunately, the
implicit synchronization constructs are prone to errors and difficult
to detect.
This talk presents a tool that is able to detect implicit
synchronization in multi-threaded applications. The detection is
performed by ensuring that during the execution of an application
under a memory model that provides sequential consistency for
data-race-free applications (SC for DRF), every read returns the same
value as if running under sequential consistency. If the previous
condition is not fulfilled by the execution, the application has data
races, which may be intended to perform implicit synchronization.
We analyze the applications in the Splash2 benchmark suite with the
presented tool and we detect data races in 7 out of the 14 Splash2
applications. These data races perform implicit synchronization in all
but one of the applications (6 out of 7). We analyze these implicit
synchronization constructs and discuss their correctness.
Alberto Ros |
16:30-17:15 |
Owicki-Gries for weak memory consistency
(abstract) (slides)
This talk concerns the adaptation of the Owicki-Gries method for verifying concurrent programs to the release-acquire weak memory model. We show that the standard Owicki-Gries non-interference check is unsound even for TSO, but that a slightly stronger check is not only sound for the release-acquire memory model, but is also actually useful for verifying a number of intricate examples.
Viktor Vafeiadis |
17:15-17:45 |
A Generic Process Calculus Approach to Relaxed-Memory Consistency (abstract) (slides)
Memory models may be specified operationally or axiomatically. Psi
calculi provide a unified framework that supports formal verification
for a range of specification styles. We have instantiated psi calculi
to obtain a bespoke process calculus, which we have used to model the
MESI cache coherence protocol at different levels of abstraction. Our
modeling language conveniently supports both a monolithic and a
distributed view of the protocol. Sequential consistency of the
protocol is established through a simulation proof.
Tjark Weber |
Dinner (19.30) at Peppar Peppar |