Memory hierarchy

From Marss86

Contents

Overview

Memory hierarchy.png

Memory hierarchy consits of various storage modules on chip caches like L1, L2 etc with offchip caches and RAM and modules that connect them called interconnects like bus, p2p, ring, switch etc. In Marss we have designed the memory hierarchy that simulate modules like caches, ram and interconnects. Marss provides two basic modules: Controllers and Interconnects. Controllers represents data storage modules like L1, L2 caches. Interconnects provides standred methods to connect various types of controllers. All memory access is done using a MemoryRequest object which contains various information like address, request type, core-id etc.. Communication between controllers and interconnects is done via temporary message objects. To simulate any type of delay like accessing a cache, sending a message etc is done via event-callback mechanism which is handled by global memory hierarchy object.

Memory Request

All memory operations in cache, interconnects and CPU's are identified by a MemoryRequest object which is defined in ptlsim/cache/memoryRequest.(h|cpp) files. When a CPU needs to access memory it initialize a MemoryRequest object with information like physical address, core-id, thread-id, memory access type, etc.. Following code snippet shows how core initialize a memory request.

MemoryRequest and Message
  1. /* First grab a free memory request from request pool */
  2. MemoryRequest* req = memoryHierarchy->get_free_request();
  3.  
  4. /* Initialize the request fields */
  5. request->init(coreid, threadid, physaddr, robid, sim_cycle, is_instruction, 
  6.     rip, uop.uuid, memory_op_type);
  7.  
  8. /* Set request completion signal */
  9. request->set_coreSignal(signal_ptr);
  10.  
  11. /* Send request to memory hierarchy to access the cache */
  12. bool l1_hit = memoryHierarchy->access_cache(request);

To optimize performance Memory Hierarchy contains a pool of memory request objects that are reused. To create a new memory request, cores get MemoryRequest object using get_free_request() function, as shown in line 2. Another advantage of using memory request pool is in debugging, when dump_all_info() is called it dumps the MemoryRequest pool which shows objects which are in use and also those which are free. When a request is freed its content is not destroyed so users can study those free MemoryRequest objects for debugging historical memory requests.

Once a free request is obtained, to initialize its parameters like coreid, threadid, physical-address, operation type, instruction or data type etc. can be set using init(..) function as shown in line 5. In line 9 a core sets a callback signal which is triggered when the memory request is completed. Once all these information is set, we call access_cache function to send the memory request to caches. In case of L1 cache hit, it returns 'true' else returns 'false'. On L1 cache miss, when L1 cache is filled then core-signal is triggered to indicate cache request is completed.

Signal - Event - Callbacks

Marss provides a very basic event based simulation model to simulate a delay. There are 3 components of simulating an event, first you need a Signal object which defines type of event and which function to call when its triggered. Second is a callback function which is called when an event is triggered. Third is adding an event to event queue to execute the callback function after specific delay.

Signal

Signal.png

A Signal object is of type Signal defined in pltsim/lib/superstl.h file.

Signal sample1;

Before using a signal object, you must initialize it by connecting a callback function using Signal::connect function.

sample1.connect(&callback_fn);

To connect functions of a C++ class, you need to use signal_mem_ptr function as shown below:

sample1.connect(sigal_mem_ptr(*this, &callback_fn))

We provide a macro ‘SET_SIGNAL_CB’ that makes easy to assign a callback function to a Signal and also set the name of the signal (useful for debugging).

SET_SIGNAL_CB(“cache1”, “_sample”, sample1, &Class::callback_fn);

Callbacks

A callback function takes only one argument of ‘void*’ type and return boolean. A callback function prototype is shown below:

bool callback_function (void *arg)

True return value indicates that the function is executed correctly and False indicate a failure. Marss event execution model always expects ‘true’ return value from callback function, if it receives ‘false’ then it will generate an assertion and will crash the simulation. The only argument to this function is a generic pointer that can point to any object. If you want to pass more than one argument then put those arguments into a structure and pass pointer to a object of that structure. Having only single argument keeps things clean and simple for event callbacks.

Events

Event is a combination of two activities, first adding an event (reference to a signal) to an event queue in MemoryHierarchy and second is executing an event by calling the callback function associated with the signal. To add an event use ‘add_event’ function of MemoryHierarchy class as shown below:

void MemoryHierarchy::add_event(Signal &sig, int delay, void *arg);

Here, ‘sig’ is a signal object whose callback function will be called after ‘delay’ cycles with ‘arg’ as an argument. For example,

memoryHierarchy_->add_event(cache_hit, 1, request);

Above code adds a ‘cache_hit’ event to be executed after 1 cycle with ‘request’ as the argument.

Controllers

A controller object represents a data storage and handling module like a cache. A controller object does not have to be a cache, it can also maintain a global directory for cache coherence in multicore design. Normally a controller contains a queue to store all the pending memory requests. Each controller receives a memory request from an interconnect. A controller distinguish the interconnects into two basic types: upper and lower. A upper interconnect represents connection to controllers that are at higher level in memory hierarchy (towards cpu core). A lower interconnect represents connection to controllers that are in lower level of memory hierarchy (towards RAM). A controller can be connected to any number of interconnects based on its design. For example, a simple cache contoller supports two upper interconnects and one lower interconnect, total three interconnects, so an L2 cache can be connected to L1-I and L1-D via upper interconnects and L3 via lowre interconnect.

Currently Marss has two types of Controllers for Cache:

  1. Basic/Simple cache controller: It implements cache storage, a queue for all pending requests, which includes CPU requests and local evictions. It supports write-back or write-through policy by two modules wb_cache and wt_cache respectively. A basic/simple cache controller is implemented in ptlsim/cache/cacheController.(h|cpp) files.
  2. Coherent cache controller (ptlsim/cache/coherentCache.(h|cpp)): It implements a cache storage, a pending requests queue and provides an interface to Coherence logic (ptlsim/cache/coherenceLogic.h). All the cache hit/miss/eviction related decisions are taken by coherence logic module. On cache access either by local or remote CPU, this controller look up the data array to check if a cache line is present or not and pass that data to coherence-logic module which takes action based on operation type and cache hit/miss. Marss currently provides two types of coherence logic modules: 1. MESI and 2. MOESI.

All controllers uses event based simulation to simulate delay in cache-access, handling evictions or sending messages to other caches. The general design concept in Marss caches is that all the events use pending request queue entry as arguments to callback functions. So each pending request goes through various states and marss simulates the delay of moving from state A to state B via event-callback mechanism.

Cache request flow.jpg
The image shows how a cache access is handled when a request is received from upper interconnect. Number shown on each edge represents the delay in executing the next block. Cache controller implements follows this flow diagram to simulate a cache access from upper interconnect. For coherent caches, hit/miss decision is made by CoherenceLogic module. This example show how users can transform the basic flow diagrams into simulator module code in Marss.


Interconnects

An interconnect module, as name suggests, provides a communication link between two or more controllers. Marss provides various types of interconnects - point-to-point, bus, split-phase bus, and switch interconnect. An interconnect module simulates a delay in sending a message from one controller to another controller. It may or may not have a queue to store pending requests. A very simple type of interconnect is point-to-point (ptlsim/cache/p2p.(h|cpp)) which sends a message from one controller to another in 0 cycle delay. A bus interconnect incorporates a per controller queue where request from each controller is stored and its broadcast to all connected controllers in serial order.

Interconnects with pending request queues are also implemented similar to controllers using event-callback to simulate delay. They also uses pending request entry as callback function argument and follows the same design philosophy. Communication between an interconnect and controller is also done via ‘signal’ but they are not executed via ‘add_event’ mechanism.

Gluing Everything

Once you have understood all the basic components that build the Memory Hierarchy, now it time to understand how they are all connected to each-other to serve the memory requests from core. Connecting all controllers using various types of interconnects are done via configuration file. More details on configuration files and writing them can be found here.

Once all connections are defined, its controller's responsibility to keep track of all connected interconnects. For example a in single core machine, L2 cache is connected to L1-I, L1-D and L3/RAM controllers using 3 different interconnects. Its L2 cache controller's responsibility to make sure that it routes the memory requests to the correct controller. Same thing goes for interconnects, each interconnect must know how to handle/route the incoming memory requests. All communication between interconnects and controllers is done via signals. As metioned earlier, each signal callback function takes only one argument and for communication between controllers and interconnects, a Message object is passed as argument.

Message

A Message object is a short lived object that contains information like sender, destination, origin, memory request, etc. which is used by either controller or interconnect to handle a memory request or response. Once the request is send, the Message object is freed so the message receiver module is responsible to store required information from Message. A Message object also contains an 'arg' variable which can point to any object and can be used for passing module specific information. For example, MESI L2 cache uses 'arg' variable to send updated cache line state to L1 cache.

Personal tools