Weak Ordering(Memory Consistency Models)

edit

Weak ordering is one of the relaxed memory consistency models  and is used to address the memory consistency problem in multi-processor systems.

Introduction

edit

While it is fairly easy to implement sequential consistency in uni-processor systems, achieving sequential consistency in a multi-processor system often requires performance trade-offs. In a sequential consistency model  there is a strict ordering enforced on all memory accesses. This often imposes a lot of restrictions in various techniques to achieve Instruction Level Parallelism. As a result few consistency models that relax the memory access ordering restriction have been introduced. Weak ordering is one such model.

Conditions for Weak Ordering

edit

Weak ordering relies on the assumption that[1]

  1. The program under consideration is synchronized properly
  2. There is no data race .
Data Race
edit

A data race is defined as

"Simultaneous accesses by multiple threads to a single location in which at least one of the access is a write". [1]

A data race usually results in non-deterministic outcomes and therefore must be avoided to achieve a synchronized program. So a properly synchronized program is assumed to be rid of all data race conditions and therefore it is safe to reorder memory accesses within the critical section of such program while synchronization is required only at few necessary points in the program.

Classification of Memory Accesses
edit

For the Weak Ordering consistency model, therefore, all memory accesses are classified as either “ordinary” or “synchronization” accesses. [2]All memory accesses are under relaxed ordering for ordinary instructions.

For synchronization accesses, all the read, write operations and other synchronization accesses before the synchronization point have to be completed before the synchronization access can be issued  and similarly, all the read, write operations and other synchronization accesses after the synchronization point can be issued only after the  synchronization access under consideration has been completed. Thus all memory accesses on either side of a synchronization point are strictly ordered. So, in a pipelined execution, all the instructions after the synchronization point if already fetched, have to be flushed from the pipeline while all the memory accesses before the synchronization point have to be completed globally before the synchronization access can be issued.

Therefore Weak ordering can be defined as follows:

“In a multiprocessor system, storage accesses are weakly ordered if (1) accesses to global synchronizing variables are strongly ordered, (2) no access to a synchronizing variable is issued by a processor before all previous global data accesses have been globally performed, and if (3) no access to global data is issued by a processor before a previous access to a synchronizing variable has been globally performed”[3]

Implementation[1]

edit

While instructions like test-and-set , fetch-and-op,load-linked and store-conditional would implicitly declare synchronization accesses,  for regular memory operations  to be considered as synchronization accesses, memory barriers or labels have to be inserted carefully.

The below mentioned example illustrates the synchronization using labelling. Here, the store in S2 is marked as synchronizing access using the label sync which will imply that the store will get issued only after all the previous accesses have completed including store in S1. Similarly, in processor P1 ,all the memory accesses after S3 cannot issue until the synchronizing load has completed in S3. Hence it will be guaranteed that the load in S6 will have the latest value.

Processor P0							   Processor P1                             /*Program Code
                                                                                        P0                  P1    
S1:  st &flag, #2 ;				           S3: prev:	(sync) ld R1,&newflag;          flag = 2;       while(!newflag)
S2: (sync) st &newflag, #1;			       S4:		    sub R1, R1, #1;                 newflag = 1;    print flag;
                                           S5:		    bnz R1, prev;               */    
						                   S6: 	        ld R2, &flag;

Weak Ordering vs Other Consistency Models

edit

Sequential Consistency

edit

While weak ordering provides considerably higher performance compared to the Sequential consistency model, it is imperative that all the synchronization accesses have to be identified and properly communicated with the underlying hardware.

Release Consistency

edit

Release consistency model employs an even relaxed model by identifying the different types of synchronization accesses i.e. acquire and release.  An acquire is a read operation (it can also be a read-modify-write) that is performed to gain access to a set of variables.  A release is a write operation (or a read-modify-write) that grants permission to another processor to gain access to some variables.[4] In case of release consistency all the operations that precede the acquire access, when reordered after the acquire access, will simply result in a critical section with more number of instructions and will not affect the correctness of the program. Similarly, all the operations that succeed the release access can be reordered without violating the correctness of the program. So , although the performance of Release Consistency is better than Weak Ordering , it adds more complexity to the programmer as the acquire and release operations have to be labelled separately along with the synchronization accesses.

See Also

edit

References

edit
  1. ^ a b c Solihin, Yan (17 November 2015). Fundamentals of Parallel Multicore Architecture. Chapman and Hall/CRC. ISBN 9781482211184.
  2. ^ Mankin, Jenny. "CSG280: Parallel Computing Memory Consistency Models: A Survey in Past and Present Research". CiteSeerX 10.1.1.490.6750. {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ Dubois, Michel; Scheurich, Christoph; Briggs, Fayé A. (1988-02-01). "Synchronization, Coherence, and Event Ordering in Multiprocessors". Computer. 21 (2): 9–21. doi:10.1109/2.15. ISSN 0018-9162.
  4. ^ Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann. 1998. ISBN 9781558603431.