ECE 6745 Section 5: SRAM Generators
In this section, we will be learning about SRAM generators. Small memories can be easily synthesized using flip-flop or latch standard cells, but synthesizing large memories can significantly impact the area, energy, and timing of the overall design. ASIC designers often use SRAM generators to "generate" arrays of memory bitcells and the corresponding peripheral circuitry (e.g., address decoders, bitline drivers, sense amps) which are combined into what is called an "SRAM macro". These SRAM generators are parameterized to enable generating a wide range of SRAM macros with different numbers of rows, columns, and column muxes, as well as optional support for partial writes, built-in self-test, and error correction. Similar to a standard-cell library, an SRAM generator must generate not just layout but also all of the necessary views to capture logical functionality, timing, geometry, and power usage. These views can then by used by the ASIC tools to produce a complete design which includes a mix of both standard cells and SRAM macros. We will first see how to use the open-source OpenRAM memory generator to generate various views of an SRAM macro. Then we will see how to use SRAMs in our RTL designs. Finally, we will put the these two pieces together to combine synthesizable RTL with SRAM macros and push the composition through the ASIC toolflow.
The first step is to access ecelinux
. Use Microsoft Remote Desktop to
log into a specific ecelinux
server. Then use VS Code to log into the
same specific ecelinux
server. Once you are at the ecelinux
prompt,
source the setup script, source the GUI setup script, clone this
repository from GitHub, and define an environment variable to keep track
of the top directory for the project.
% source setup-ece6745.sh
% source setup-gui.sh
% mkdir -p $HOME/ece6745
% cd $HOME/ece6745
% git clone git@github.com:cornell-ece6745/ece6745-sec05-sram sec05
% cd sec05
% export TOPDIR=$PWD
1. OpenRAM Memory Generator
Just as with standard-cell libraries, acquiring real SRAM generators is a complex and potentially expensive process. It requires gaining access to a specific fabrication technology, negotiating with a company which makes the SRAM generator, and usually signing multiple non-disclosure agreements. The OpenRAM memory generator is based on the same "fake" 45nm technology that we are using for the Nangate standard-cell library. The "fake" technology is representative enough to provide reasonable area, energy, and timing estimates for our purposes. Let's take a look at how to use the OpenRAM memory generator to generate various views of an SRAM macro.
An SRAM generator takes as input a configuration file which specifies the
various parameters for the desired SRAM macro. Create a configuration
file with the following content using your favorite text editor. You
should name your file SRAM_32x128_1rw_cfg.py
and it should be located in
the directory shown below.
The configuration file should look like this:
use_conda = False
num_rw_ports = 1
num_r_ports = 0
num_w_ports = 0
word_size = 32
num_words = 128
num_banks = 1
words_per_row = 4
write_size = 8
tech_name = "freepdk45"
process_corners = ["TT"]
supply_voltages = [1.1]
temperatures = [25]
route_supplies = True
check_lvsdrc = False
output_path = "SRAM_32x128_1rw"
output_name = "SRAM_32x128_1rw"
instance_name = "SRAM_32x128_1rw"
In this example, we are generating a single-ported SRAM which has 128
rows and 32 bits per row for a total capacity of 4096 bits or 512B. This
size is probably near the cross-over point where you might transition
from using synthesized memories to SRAM macros. OpenRAM will take this
configuration file as input and generate many different views of the SRAM
macro including: schematics (.sp
), layout (.gds
), a Verilog
behavioral model (.v
), abstract logical, timing, power view (.lib
),
and a physical view (.lef
). These views can then be used by the ASIC
tools.
You can use the following command to run the OpenRAM memory generator.
It will take about 4-5 minutes to generate the SRAM macro. You can see the resulting views here:
% cd $TOPDIR/asic/build-sram/00-openram-memgen/SRAM_32x128_1rw
% ls -1
SRAM_32x128_1rw.gds
SRAM_32x128_1rw.lef
SRAM_32x128_1rw.sp
SRAM_32x128_1rw_TT_1p1V_25C.lib
SRAM_32x128_1rw.v
SRAM_32x128_1rw.html
You can find more information about the OpenRAM memory generator on the project's webpage here:
Or in this research paper:
- M. Guthaus et. al, "OpenRAM: An Open-Source Memory Compiler", Int'l Conf. on Computer-Aided Design (ICCAD), Nov. 2016. (https://doi.org/10.1145/2966986.2980098)
The following excerpt from the paper illustrates the microarchitecture used in the single-port SRAM macro.
The functionality of the pins are as follows:
clk
: clockWEb
: write enable (active low)OEb
: output enable (active low)CSb
: whole SRAM enable (active low)ADDR
: addressDATA
: read/write data
Notice that there is a single address, and a single read/write data bus. This SRAM macro has a single read/write port and only supports executing a single transaction at a time. The following excerpt from the paper shows the timing diagram for a read and write transaction.
Prof. Batten will explain this timing diagram in more detail, especially the important distinction between a synchronous read SRAM and a combinational read register file. Take a few minutes to look at the behavioral verilog. See if you can see how this models a synchronous read SRAM.
You can take a look at the generated transistor-level netlist for a single bit-cell like this:
% cd $TOPDIR/asic/build-sram/00-openram-memgen/SRAM_32x128_1rw
% less -p " cell_1rw " SRAM_32x128_1rw.sp
Now let's use Klayout look at the actual layout produced by the OpenRAM memory generator.
% cd $TOPDIR/asic/build-sram/00-openram-memgen/SRAM_32x128_1rw
% klayout -l $ECE6745_STDCELLS/klayout.lyp SRAM_32x128_1rw.gds
In Klayout, you can show/hide layers by double clicking on them on the right panel. You can show more of the hierarchy by selecting Display > Increment Hierarchy or less of the hierarchy by selecting Display > Decrement Hierarchy from the menu.
Take a quick look at the .lib
file and the .lef
file for the SRAM
macro.
% cd $TOPDIR/asic/build-sram/00-openram-memgen/SRAM_32x128_1rw
% less SRAM_32x128_1rw_TT_1p1V_25C.lib
% less SRAM_32x128_1rw.lef
2. SRAM RTL Models
Now that we understand how an SRAM generator works, let's see how to
reference an SRAM in Verilog RTL. Our basic SRAMs are located in the
sim/sram
subdirectory.
Take a look the interface of the SRAM in SRAM.v
.
module sram_SRAM
#(
parameter p_data_nbits = 32,
parameter p_num_entries = 256,
// Local constants not meant to be set from outside the module
parameter c_addr_nbits = $clog2(p_num_entries),
parameter c_data_nbytes = (p_data_nbits+7)/8 // $ceil(p_data_nbits/8)
)(
input logic clk,
input logic reset,
input logic port0_val,
input logic port0_type,
input logic [c_addr_nbits-1:0] port0_idx,
input logic [(p_data_nbits/8)-1:0] port0_wben,
input logic [p_data_nbits-1:0] port0_wdata,
output logic [p_data_nbits-1:0] port0_rdata
);
The SRAM model is parameterized by the number of words and the bits per word, and has the following pin-level interface:
port0_val
: port enableport0_type
: transaction type (0 = read, 1 = write)port0_idx
: which row to read/writeport0_wben
: write byte enablesport0_wdata
: write dataport0_rdata
: read data
Now look at the implementation of the SRAM. You will see a generate if statement which uses the parameters to either (1) instantiate a specific SRAM macro or (2) instantiate a generic SRAM. It is critical that the name of the specific SRAM macro matches the name generated by OpenRAM. In this discussion section we will be generating an SRAM with 128 words each of which is 32 bits, and we can see that this SRAM macro is already included.
Let's run the tests for the specific SRAM we will be using in this discussion section.
% mkdir -p $TOPDIR/sim/build
% cd $TOPDIR/sim/build
% pytest ../sram/test/SRAM_test.py -k test_direct_32x128 -s
You can run the random test like this:
3. SRAM Minion Wrapper RTL
SRAMs use a latency sensitive interface meaning a user must carefully manage the timing for correct operation (i.e., set the read address and then exactly one cycle later use the read data). In addition, the SRAM cannot be "stalled". To illustrate how to use SRAM macros, we will create a latency insensitive minion wrapper around an SRAM which enables writing and reading the SRAM using our standard memory messages. The following figure illustrates our approach to implementing this wrapper:
Here is a pipeline diagram that illustrates how this works.
cycle : 0 1 2 3 4 5 6 7 8
msg a : M0 Mx
msg b : M0 Mx
msg c : M0 M1 M2 M2 M2
msg d : M0 M1 q q M2 # msg c is in skid buffer
msg e : M0 M0 M0 M0 Mx
cycle M0 M1 [q ] M2
0: a
1: b a a # a flows through bypass queue
2: c b b # b flows through bypass queue
3: d c # M2 is stalled, c will need to go into bypq
4: e d c #
5: e dc # d skids behind c into the bypq
6: e d c # c is dequeued from bypq
7: e d # d is dequeued from bypq
8: e e # e flows through bypass queue
Take a closer look at the SRAM minion wrapper we provide you.
To use an SRAM, simply include sram/SRAM.v
, instantiate the SRAM, and
set the number of words and number of bits per word. Here is what the
instantiation of the SRAM looks like in the wrapper.
`include "sram/SRAM.v"
...
sram_SRAM#(32,128) sram
(
.clk (clk),
.reset (reset),
.port0_idx (sram_addr_M0),
.port0_type (sram_wen_M0),
.port0_val (sram_en_M0),
.port0_wben (sram_wben_M0),
.port0_wdata (memreq_msg_data_M0),
.port0_rdata (sram_read_data_M1)
);
We can run a test on the SRAM minion wrapper like this:
Here is what the trace output should look like.
1r > ( (). ) > .
2r > ( (). ) > .
3: > ( (). ) > .
4: wr:00:00000000:0:55fceed9 > (wr(). ) > .
5: wr:01:00000004:0:5bec8a7b > (wr()# ) > #
6: # > (# ()# ) > #
7: # > (# ()wr) > wr:00:0:0:
8: # > (# ()# ) > #
9: # > (# ()# ) > #
10: # > (# ()# ) > #
11: # > (# ()wr) > wr:01:0:0:
12: wr:02:00000008:0:b1aa20f1 > (wr(). ) > .
13: wr:03:0000000c:0:a5b6b6bb > (wr()# ) > #
14: # > (# ()# ) > #
15: # > (# ()wr) > wr:02:0:0:
The first write transaction takes a single cycle to go through the SRAM minion wrapper, but then the response interface is not ready on cycles 5-6. The second write transaction is still accepted by the SRAM minion wrapper and it will end up in the bypass queue, but the later transactions are stalled because the request interface is not ready. No transactions are lost.
Let's now use an interactive simulator to generate the pickled Verilog and a test bench for pushing through the ASIC flow.
4. ASIC Front-End Flow with SRAM Macros
Now we will push our SRAM minion wrapper through the ASIC front-end flow. We have already generated the SRAM macro using OpenRAM at the beginning of the discussion section. We want to move the key generated files to make them easier to use by the ASIC tools.
% cd $TOPDIR/asic/build-sram/00-openram-memgen/SRAM_32x128_1rw
% cp SRAM_32x128_1rw_TT_1p1V_25C.lib ../SRAM_32x128_1rw.lib
% cp *.gds *.lef *.v ..
We need to convert the .lib
file into a .db
file using the Synopsys
Library Compiler (LC) tool.
% cd $TOPDIR/asic/build-sram/00-openram-memgen
% lc_shell
lc_shell> read_lib SRAM_32x128_1rw.lib
lc_shell> write_lib SRAM_32x128_1rw_TT_1p1V_25C_lib \
-format db -output SRAM_32x128_1rw.db
lc_shell> exit
As always, we start by using four-state RTL simulation to further verify our design.
% mkdir -p ${TOPDIR}/asic/build-sram/01-synopsys-vcs-rtlsim
% cd ${TOPDIR}/asic/build-sram/01-synopsys-vcs-rtlsim
% vcs -sverilog -xprop=tmerge -override_timescale=1ns/1ps -top Top \
+vcs+dumpvars+waves.vcd \
+incdir+${TOPDIR}/sim/build \
${TOPDIR}/sim/build/SRAMMinion_noparam__pickled.v \
${TOPDIR}/sim/build/SRAMMinion_noparam_sram-rtl-random_tb.v
% ./simv
Now we can use Synopsys DC to synthesize the logic which goes around the SRAM macro.
% mkdir -p ${TOPDIR}/asic/build-sram/02-synopsys-dc-synth
% cd ${TOPDIR}/asic/build-sram/02-synopsys-dc-synth
% dc_shell-xg-t
dc_shell> set_app_var target_library "$env(ECE6745_STDCELLS)/stdcells.db ../00-openram-memgen/SRAM_32x128_1rw.db"
dc_shell> set_app_var link_library "* $env(ECE6745_STDCELLS)/stdcells.db ../00-openram-memgen/SRAM_32x128_1rw.db"
dc_shell> analyze -format sverilog ../../../sim/build/SRAMMinion_noparam__pickled.v
dc_shell> elaborate SRAMMinion_noparam
dc_shell> create_clock clk -name ideal_clock1 -period 2.0
dc_shell> compile
dc_shell> write -format ddc -hierarchy -output post-synth.ddc
dc_shell> write -format verilog -hierarchy -output post-synth.v
dc_shell> report_area -hierarchy
dc_shell> report_timing -nets
dc_shell> exit
We are basically using the same steps we used in the ASIC front-end flow
section. Notice how we must point Synopsys DC to the .db
file generated
by the OpenRAM memory generator so Synopsys DC knows the abstract logical
and timing views of the SRAM.
If you look for the SRAM module in the synthesized gate-level netlist, you will see that it is referenced but not declared. This is what we expect since we are not synthesizing the memory but instead using an SRAM macro.
We can use fast-functional gate-level simulation to simulate the gate-level netlist integrated with the Verilog RTL models for the SRAMs.
% mkdir -p $TOPDIR/asic/build-sram/03-synopsys-vcs-ffglsim
% cd $TOPDIR/asic/build-sram/03-synopsys-vcs-ffglsim
% vcs -sverilog -xprop=tmerge -override_timescale=1ns/1ps -top Top \
+delay_mode_zero \
+vcs+dumpvars+waves.vcd \
+incdir+${TOPDIR}/sim/build \
${ECE6745_STDCELLS}/stdcells.v \
../00-openram-memgen/SRAM_32x128_1rw.v \
../02-synopsys-dc-synth/post-synth.v \
${TOPDIR}/sim/build/SRAMMinion_noparam_sram-rtl-random_tb.v
% ./simv
4. ASIC Back-End Flow with SRAM Macros
Now we can use Cadence Innovus to place the SRAM macro and the standard cells, and then automatically route everything together.
% mkdir -p $TOPDIR/asic/build-sram/04-cadence-innovus-pnr
% cd $TOPDIR/asic/build-sram/04-cadence-innovus-pnr
As in the ASIC back-end flow section, we need to create two files before
starting Cadence Innovus. Use VS Code to create a file named
constraints.sdc
.
The file should have the following constraint:
Now use VS Code to create a file named setup-timing.tcl
.
The file should have the following content:
create_rc_corner -name typical \
-cap_table "$env(ECE6745_STDCELLS)/rtk-typical.captable" \
-T 25
create_library_set -name libs_typical \
-timing [list "$env(ECE6745_STDCELLS)/stdcells.lib" \
"../00-openram-memgen/SRAM_32x128_1rw.lib" ]
create_delay_corner -name delay_default \
-library_set libs_typical \
-rc_corner typical
create_constraint_mode -name constraints_default \
-sdc_files [list constraints.sdc]
create_analysis_view -name analysis_default \
-constraint_mode constraints_default \
-delay_corner delay_default
set_analysis_view -setup analysis_default -hold analysis_default
Notice that we are including the .lib
file generated by the OpenRAM
memory generator. Now let's start Cadence Innovus, load in the design,
and complete power routing just as in the previous ASIC back-end section.
We recommend working with a partner on this section to avoid
overloading the ecelinux servers by running so many copies of Cadence
Innovus as the same time.
% cd $TOPDIR/asic/build-sram/04-cadence-innovus-pnr
% innovus
innovus> set init_mmmc_file "setup-timing.tcl"
innovus> set init_verilog "../02-synopsys-dc-synth/post-synth.v"
innovus> set init_top_cell "SRAMMinion_noparam"
innovus> set init_lef_file "$env(ECE6745_STDCELLS)/rtk-tech.lef \
$env(ECE6745_STDCELLS)/stdcells.lef \
../00-openram-memgen/SRAM_32x128_1rw.lef"
innovus> set init_gnd_net "VSS"
innovus> set init_pwr_net "VDD"
innovus> init_design
innovus> floorPlan -d 175 175 4.0 4.0 4.0 4.0
Now let's place the design.
You should be able to see the SRAM macro as a black box in the middle of the layout and then various standard cells placed around the perimeter. Now let's go ahead and do some simple power and signal routing.
innovus> globalNetConnect VDD -type pgpin -pin VDD -inst * -verbose
innovus> globalNetConnect VSS -type pgpin -pin VSS -inst * -verbose
innovus> sroute -nets {VDD VSS}
innovus> routeDesign
innovus> extractRC
You can now see all of the routing between the standard cells and the SRAM macro. We can now go ahead and add filler cells.
Let's now save the final merged GDS file.
innovus> streamOut post-pnr.gds \
-merge "$env(ECE6745_STDCELLS)/stdcells.gds \
../00-openram-memgen/SRAM_32x128_1rw.gds" \
-mapFile "$env(ECE6745_STDCELLS)/rtk-stream-out.map"
Notice how we need to provide the GDS for the SRAM macro when doing the final GDS merge. Finally we can look at some timing and area reports.
innovus> report_timing -late -path_type full_clock -net
innovus> report_timing -early -path_type full_clock -net
innovus> report_area
While the design should meet the setup time constraint it will likely fail the hold time constraint. A more advanced place-and-route script will include specific steps for fixing hold time violations.
Let's exit Cadence Innovus.
Then we can use Klayout to take a look at the final layout.