Consistency Models and the WWW

Arnold Pears
arnoldp@DoCS.uu.se
Uppsala University
SWEDEN


  1. Introduction
  2. The basis memory or data consistency problem arises in parallel computer architecture design. The two common architectures used for such machines are shown in the figures 1 and 2. There are several interesting issues, including how to program such computers, and what implications such designs may have in terms of maintaining "consistent" data in the cache and main memories of all the participating computers.

    The problem occurs when the processing elements (PE's) of the machine are able to communicate with a shared logical or physical memory space independently of each other. This means that PE's cannot observe the memory transactions of other PE's and this means that the caches of the PE's can hold values that are loaded from memory, but for which the value has been changed since the value was loaded. Consistency protocols are sets of hardware rules and operating system algorithms that try to minimise occurences of the types of situation that cause cache data to become inconsistent with the data in other caches and the main memory store.

    Figure 1 shows the standard design of a Distributed Shared Memory (DSM) computer in which all the processing elements have access to a single shared data space that is used to store the program and the data (program variables and dynamic data structures). Such machines have been popular since the early 1980's and the problems of creating scalable designs has been widely studied.

    A large number of models for how the memory and caches should be made the same (consistent) have been proposed. These protocols have been discussed in a number of IEEE and other computer science research journals and are too complicated to present here. The important thing to realise about them is that they all aim to make the caches and memory content the same for all relevant addresses when this is necessary for program correctness. The difference between the protocols lies in the cost associated with creating the consistency. Some consistency schemes enforce consistency all the time, others (more efficient ones) operate only when absolutely required.

    a picture of a DSM architecture

    Figure 1: Shared Data Space Parallel Architecture

    The structure shown in Figure 2 is also a popular design for computers. However, here the memory attached to each processing element is private, and can only be accessed by the attached processor. Other processors need to communicate with the owner of a memory address to access its contents. This means that local copies can be requested by all the nodes and stored. This is shown in the diagram where all the caches store a value obtained from a single node's memory. In this type of computer the consistency of the data is often the responsibility of the program.

    One variant of this second design implements a logical shared address space using the physically distributed hardware that is depicted. In this type of design (DSM) the memory subsystem operating system and program are involved in making sure that data is consistent at the times when data consistency is necessary for program correctness.

    A picture of a private data space architecture

    Figure 2: Private Data Space Parallel Architecture

  3. Consistency
  4. One major problem in implementing protocols that ensure data consistency in any architecture is keeping track of where all the cached copies of a memory location might be. This problem is illustrated in Figure 1 where we can see that the yellow memory location is duplicated in two caches. If either of the processors in this figure updates the value of the specified variable the values stored in the yellow locations will no longer be the same, the caches and memory can be made the same again, but this takes time and information about the location of the copies.

    There are two major methods for achieving consistency between the caches and the memory. Either the caches can be updated, so that they hold the new value of the memory (update consistency), or the value of the address held in the cache can be discarded (invalidation consistency) Each of these relies on being able to locate all the locations where a copy of the address is stored.

    Another major main problem is, "what can the computer do while the caches and memory are being brought into a consistent state?". In some cases nothing, in other cases the computer can continue to execute the program knowing that the use of "possibly incorrect values" will not cause a problem. This situation can be exploited to increase the efficiency of execution of programs.

    An Overview of Consistency Models

    To obtain performance improvement consistency protocols try to place fewer constraints on the allowable interleavings of data operations. In other words, they enforce consistency throughout the memory system only when required. Figure 3 shows execution associated with three write operations a barrier and a read operation for SC, PC, WC and FC protocols. SC stalls the architecture to prevent processors from reading stale values. Adding pipelines in PC allows processing elements to overlap execution and consistency activity. Overlap between consistency hardware activity and program execution is shown using two layers in the diagram. In WC three layers of concurrent activity are possible. The top layer is program execution, the second is processor consistency hardware, the third is network protocol activity.
    Relative efficiency of consistency schemes shown as a time delay diagram

    Figure 3: Efficiency of Consistency Schemes

    Processor Consistency (PC)
    Of the three major approaches to relaxed cache consistency schemes, processor consistency is the most restrictive. Processor consistency uses write pipelining to overlap writes memory with processor execution. This is particularly beneficial with high latency memory. Defined by Goodman[Good89], writes by a single processor (as in sequential consistency) are performed in order. The writes issued by a single processor will be observed in the issued order by all processors, however, the observed interleaving of the order of write accesses issued by two different processors may differ between different processors.
    Weak Consistency (WC)
    Weak consistency recognizes that program correctness only depends on the temporal order of {\em critical} data accesses. As originally defined by Dubois et al.[Dubo86], weak consistency enforces the following restrictions :-
    1. accesses to synchronization variables are sequentially consistent and
    2. no access to synchronization variable is issued in a processor before all previous data accesses have been performed and
    3. no access is issued by a processor before a previous access to a synchronization variable has been performed.
    A program executing on a weakly consistent memory model will appear sequentially consistent (SC) if
    1. there are no competing accesses;
    2. synchronization is visible to the memory system hardware.
    Weak consistency is less restrictive in its consistency requirements than processor consistency because it recognizes that a parallel program can define its own consistency requirements using synchronization primitives. Weak consistency also takes advantage of write pipelining, and processors only need to stall once a synchronization point is reached.
    Release Consistency (RC)
    Release consistency, defined by Gharachorloo et al.[Ghar90], is based upon weak consistency in that it uses knowledge about data operations to maintain coherence only when necessary. RC categorises each program access as either an Acquire, Release, or a Non-Synchronizing access. Acquires delay future accesses only. A Release delays until all previous accesses have been performed. This allows limited execution overlap when executing critical sections. This was not possible under the more restrictive WC model.

  5. Consistency for the WWW
  6. Why does the WWW need consistency?

    In many situations WWW information is of a transient and time critical nature. The relevance of stockmarket data and weather or news presentations on the WWW is limited. In addition customers want access to the most up to date versions of WWW content. Current approaches to reducing bandwidth and access delay for high level Internet content are based on duplication of content in Proxies and Browser Caches and a large portion of this activity is both invisible to (and uncontrollable by) the end user. This presents a significant problem for users who are relying on the Web for time critical data.

    Applying Consistency to the WWW

    How are traditional parallel computer consistency protocols of interest in terms of developing consistency approaches for the WWW?

    Consider how caches are used and how consistency protocols (algorithms) work to make the contents of caches and the original item consistent with one another. If we now draw on the ideas of updating and invalidating copies that have been developed in the last 20 years for multiprocessor computers, is it possible to use this information to help us in solving the problems that arise when we try to distribute and cache WWW information using "proxies" and "browser caches"?

    To do this we have to look at how the existing algorithms might be relevant to maintaining and replicating computer information in an address space that is constructed in a more "ad hoc" manner. To do this we can try to view URL's as addresses, and to use caches in "proxies" and "browsers" to replicate these items for local use.

    However, the architecture that we are confronted with is not completely analagous to that for either of the two hardware models shown in Figures 1 and 2. In fact the architecture is rather more complex due to several layers of caching, which introduces the notion of cache heirarchy in the WWW. Other problems include

    An overview of the WWW architecture is shown in Figure 4, and the cache hierarchy used in the remaining discussion is presented in Figure 5. The combination of these two allows us to discuss the problem of developing consistency in the WWW.
    WWW architecture

    Figure 4:WWW Architecture

    Traditional caching approaches attempt to gain performance improvements based on two observations about program behaviour.

    WWW Cache Heirarchy structure diagram

    Figure 5: WWW Cache Hierarchy

  7. References
  8. [Dubo86] M. Dubois, C. Scheurich, and F. A. Briggs., "Memory access buffering in multiprocessors.", In 13th International Symposium on Computer Architecture, pages 434--442, June 1986.

    [Ghar90] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy., "Memory consistency and event ordering in scalable shared-memory multiprocessors.", Computer Architecture News, 18(2A):15--26, June 1990.

    [Good89] J.R. Goodman., "Cache consistency and sequential consistency.", Technical Report Technical Report 61, SCI Committee, March 1989.

    [Pear00] A. Pears, "The Forgetful Cache Consistency Protocol", Proc.