CMU-SAFARI/RamulatorSharp

Ramulator#: A fast and lightweight DRAM simulator

Ramulator# (pronounced as ramulator sharp) is a fast and flexible memory subsystem simulator implemented in C# and it can easily run on Linux, OS X, and Windows. The simulator contains the implementation of a wide array of commercial, as well as academic, DRAM standards. It is a predecessor of Ramulator. Therefore, the high-level design Ramulator# is very similar to that of Ramulator. The main difference is that it is implemented in the C# language whereas Ramulator is written in C++. To understand how Ramulator# models a DRAM-based memory system, we refer the readers to the original article that provides further details on the design.

DDR Standards

Ramulator# can be configured to run with an array of DDR3 and LPDDR3 standards (src/Mem/DDR3DRAM.cs). Additional standards can be easily added to Ramulator#.

Academic Proposals

Ramulator# supports many recent academic proposals:

Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM by Chang et al. in HPCA 2016.
ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality by Hassan et al. in HPCA 2016.
The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost by Subramanian et al. in ICCD 2014.
RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization by Seshadri et al. in MICRO 2013.
A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM by Kim et al. in ISCA 2012.
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior by Kim et al. in MICRO 2010.
ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers by Kim et al. in HPCA 2010.
[Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems] (https://users.ece.cmu.edu/~omutlu/pub/parbs_isca08.pdf) by Mutlu et al. in ISCA

Getting Started

By default, Ramulator# uses Mono (an open-source implementation of Microsoft's .NET framework) for compilation and runtime on the installed system. It has been tested on Windows, Linux, and OS X. Users can also choose to use Visual Studio as the development tool.

Installation and Compilation

$ git clone [RamulatorSharp-repo-url]
$ cd RamulatorSharp; make

If the compilation completes without errors, the binary is located in bin/sim.exe.

Running Ramulator#?

To run some sample simulations:

$ ./test_runs.sh

The script provides examples on running various system configurations, such as a single-core system to a quad-core system with different memory configurations. It generates results in the json format under the folder results.

Configuring Ramulator#

Ramulator# can be easily run with various configurations. Academic proposals and other DRAM standards can be selected using a simple configuration file (.cfg). We provide sample configuration files inside the configs/ folder which demostrate the usage of LISA and ChargeCache mechanisms.

How does Ramulator# simulate an application?

At a high level, Ramulator# simulates a core running a single application by consuming a trace. A trace consists of a list of memory requests and number of CPU instructions generated from an actual application running on a real system. We provide several sample traces collected from real applications running on a real Linux system. They locate under traces. In a trace file, each line represents a memory request with the following format:

<num-cpuinst> <addr-read> <addr-writeback>: The line has three tokens. The first token represents the number of CPU (i.e., non-memory) instructions before the memory request, and the second token is the decimal address of a read. The third token is the decimal address of the writeback request, which is the dirty cache-line eviction caused by the read request before it. Note the memory requests in our traces refer to cache miss requests generated from the L2 cache.

Ramulator# consumes a workload file through the -workload flag. A workload file contains a list of traces to run and a workload means a set of applications to run on the system. Each line specifies a particular workload to run, which can consist of one to many trace file names. The number of trace files written on each line essentially represents the number of applications or cores to run. For example, in workloads/4core_mix, the second line shows trace.forkset-rc.gz trace.bootup.1B-rc.gz tpcc64.gz 462.libquantum.gz, which has 4 different trace file names. This means that the simulator should be configured with 4 cores (using -N 4) to consume these 4 traces. It is also the first workload in the trace file. As the first line of the file specifies the locations where the trace files locate. So by specifying -workload workloads/4core_mix 1, the simulator simulates the first workload (line 2) in the trace file.

GUI support for visualizing DRAM command sequence

Ramulator# provides a nice GUI tool to visual DRAM commands sent from the memory controller to the various DRAM structures (i.e., channels, ranks, and banks). The authors have found it extremely useful for debugging or understanding the performance of implementations that can affect the ordering of the commands. The following two image are the screenshot from using the GUI tool to show the commands that have been issued to the 8 different DRAM banks within a specific channel and rank.

To enable the GUI tool, add the flag -gfx.gui_enabled true to the running command or to the config file. The GUI tool works on both Windows and Linux, but is not rendering properly on OS X at the momemnt.

Contributors

License

Released under a BSD (3-clause) license

CMU-SAFARI / RamulatorSharp

Ramulator#: A fast and lightweight DRAM simulator

Getting Started

GUI support for visualizing DRAM command sequence

Contributors

License

About

Languages