ECE 6745 Tutorial 8: ASIC Automated Flow
The previous tutorials demonstrated how to take a design from RTL to layout by both manually entering commands for each tool and also writing flow scripts. Flow scripts can help automate the process but copying and modifying these flow scripts for every design is tedious and error prone. An agile hardware design flow demands automation to simplify rapidly exploring the area, energy, timing design space of one or more designs. In this tutorial we will introduce a simple tool called pyhflow which takes as input a step templates and a design YAML and generates appropriate flow scripts. This tutorial assumes you have already completed the tutorials on Linux, Git, and Verilog.
The following diagram illustrates the five primary tools we will be using in ECE 6745 along with a few smaller secondary tools. Notice that the ASIC tools all require various views from the standard-cell library. Before starting this tutorial, you must complete the ASIC standard-cell tutorial, the ASIC front-end flow tutorial, and the ASIC back-end flow tutorial.
-
We write our RTL models in Verilog, and we use the PyMTL framework to test, verify, and evaluate the execution time (in cycles) of our design. This part of the flow is very similar to the flow used in ECE 4750. Once we are sure our design is working correctly, we can then start to push the design through the flow.
-
We use Synopsys VCS to compile and run both 4-state RTL and gate-level simulations. These simulations help us to build confidence in our design as we push our designs through different stages of the flow. From these simulations, we also generate waveforms in
.vcd
(Verilog Change Dump) format, and per-net average activity factors stored in.saif
format. These activity factors will be used for power analysis. Gate-level simulation is an valuable tool for ensuring the tools did not optimize something away which impacts the correctness of the design, and also provides an avenue for obtaining a more accurate power analysis than RTL simulation. While static timing analysis (STA) analyzes all paths, GL simulation can also serve as a backup to check for hold and setup time violations (chip designers must be paranoid!) -
We use Synopsys Design Compiler (DC) to synthesize our design, which means to transform the Verilog RTL model into a Verilog gate-level netlist where all of the gates are selected from the standard-cell library. We need to provide Synopsys DC with abstract logical and timing views of the standard-cell library in
.db
format. In addition to the Verilog gate-level netlist, Synopsys DC can also generate a.ddc
file which contains information about the gate-level netlist and timing, and this.ddc
file can be inspected using Synopsys Design Vision (DV). We will also use Synopsys DC to generate a.sdc
which captures timing constraints which can then be used as input to the place-and-route tool. -
We use Cadence Innovus to place-and-route our design, which means to place all of the gates in the gate-level netlist into rows on the chip and then to generate the metal wires that connect all of the gates together. We need to provide Cadence Innovus with the same abstract logical and timing views used in Synopsys DC, but we also need to provide Cadence Innovus with technology information in
.lef
, and.captable
format and abstract physical views of the standard-cell library also in.lef
format. Cadence Innovus will generate an updated Verilog gate-level netlist, a.spef
file which contains parasitic resistance/capacitance information about all nets in the design, and a.gds
file which contains the final layout. The.gds
file can be inspected using the open-source Klayout GDS viewer. Cadence Innovus also generates reports which can be used to accurately characterize area and timing. -
We use Synopsys PrimeTime (PT) to perform power analysis of our design. We need to provide Synopsys PT with the same abstract logical, timing, and power views used in Synopsys DC and Cadence Innovus, but in addition we need to provide switching activity information for every net in the design (which comes from the
.saif
file), and capacitance information for every net in the design (which comes from the.spef
file). Synopsys PT puts the switching activity, capacitance, clock frequency, and voltage together to estimate the power consumption of every net and thus every module in the design, and these estimates are captured in various reports.
Extensive documentation is provided by Synopsys and Cadence for these ASIC tools. We have organized this documentation and made it available to you on the public course webpage:
The first step is to access ecelinux
. Use VS Code to log into the same
specific ecelinux
server and then use Microsoft Remote Desktop to log
into the same server. Once you are at the ecelinux
prompt, source the
setup script, source the GUI setup script, clone this repository from
GitHub, and define an environment variable to keep track of the top
directory for the project.
% source setup-ece6745.sh
% source setup-gui.sh
% mkdir -p $HOME/ece6745
% cd $HOME/ece6745
% git clone git@github.com:cornell-ece6745/ece6745-tut08-asic-auto tut08
% cd tut08
% export TOPDIR=$PWD
1. Testing, Simulation, and Translation
As in the previous tutorial, our goal is to characterize the area, energy, and timing for the sort unit from the Verilog tutorial using the ASIC tools. As a reminder, the sort unit takes as input four integers and a valid bit and outputs those same four integers in increasing order with the valid bit. The sort unit is implemented using a three-stage pipelined, bitonic sorting network and the datapath is shown below.
Before using the ASIC tools we need to verify that our design passes all of our tests and use an interactive simulator to drive our design-space exploration.
1.1. Testing the Sort Unit
As always, we can use our Python-based testing framework combined with
the Verilator two-state RTL simulator to verify our design's
functionality. We need to use the --test-verilog
and --dump-vtb
command line options to generate Verilog test benches which can then be
used for four-state RTL simulation, fast-functional gate-level
simulation, and back-annotated gate-level simulation.
% mkdir -p $TOPDIR/sim/build
% cd $TOPDIR/sim/build
% pytest ../tut3_verilog/sort --test-verilog --dump-vtb
Verify the test benches have been generated.
1.2. Evaluating the Sort Unit
We can use the provided interactive simulator to run two experiments each with a different dataset.
% ../tut3_verilog/sort/sort-sim --impl rtl-struct --input random --stats --translate --dump-vtb
% ../tut3_verilog/sort/sort-sim --impl rtl-struct --input zeros --stats --translate --dump-vtb
Verify the Verilog RTL and test benches have been generated.
% cd $TOPDIR/sim/build
% ls SortUnitStruct__p_nbits_8__pickled.v
% ls SortUnitStruct__p_nbits_8_sort-sim-rtl-struct-random_tb.v
% SortUnitStruct__p_nbits_8_sort-sim-rtl-struct-zeros_tb.v
2. pyhflow For Generating Flows
pyflow is based on the idea of step templates which are located in the
asic/steps
directory.
% cd $TOPDIR/asic/steps
% tree
.
├── 01-synopsys-vcs-rtlsim
│ └── run
├── 02-synopsys-dc-synth
│ ├── run
│ └── run.tcl
├── 03-synopsys-vcs-ffglsim
│ └── run
├── 04-cadence-innovus-pnr
│ ├── run
│ ├── run.tcl
│ └── setup-timing.tcl
├── 05-synopsys-vcs-baglsim
│ └── run
├── 06-synopsys-pt-pwr
│ ├── run
│ └── run.tcl
└── 07-summarize-results
├── run
└── summarize-results
Each step is a directory with a run script and possibly other scripts. The key difference from the scripts we used in the previous tutorials, is that these scripts are templated using the Jinja2 templating system:
So for example, the Synopsys DC run.tcl script is templated based on the design name and the target clock period as follows
analyze -format sverilog $env(TOPDIR)/sim/build/{{design_name}}__pickled.v
elaborate {{design_name}}
create_clock clk -name ideal_clock1 -period {{clock_period}}
The {{ }}
directive is the standard syntax for template variable
substitution using Jinja2. As another example, the RTL, FFGL, and BAGL
simulation scripts are all templated by the list of tests and
evaluations.
The {% %}
directive is the standard syntax for more complex templating
using Jinja2.
The pyhflow program takes as input a design YAML file which specifies:
- what steps make up the flow
- key/value pairs for variables to substitute into scripts
- list of tests
- list of evals
Take a look at the provided design YAML file for the sort unit.
% cd $TOPDIR/asic/designs
% cat tut08-sort.yml
steps:
- 01-synopsys-vcs-rtlsim
- 02-synopsys-dc-synth
- 03-synopsys-vcs-ffglsim
- 04-cadence-innovus-pnr
- 05-synopsys-vcs-baglsim
- 06-synopsys-pt-pwr
- 07-summarize-results
src_dir : ../../../sim/build
design_name : SortUnitStruct__p_nbits_8
clock_period : 0.7
dump_vcd : true
tests:
- SortUnitStruct__p_nbits_8_test_basic
- SortUnitStruct__p_nbits_8_test_stream
- SortUnitStruct__p_nbits_8_test_dups
- SortUnitStruct__p_nbits_8_test_sorted
- SortUnitStruct__p_nbits_8_test_random_8
evals:
- SortUnitStruct__p_nbits_8_sort-sim-rtl-struct-random
This design YAML file specifies the generated flow should use all seven
steps. Currently the only parameters are the source directory, design
name, and the clock period. We run RTL sim, FFGL sim, and BAGL sim on all
tests and evals, but we only do energy analysis on the evals. The evals
usually come from running an interactive simulator like sort-sim
. All
pyhflow does is use the YAML file to figure out what to substitute into
the templated steps and then copy the run scripts into the current
working directory. You can also override parameters on pyhflow command
line.
2.1. Running ASIC Flow with One Test
Let's go ahead and use pyhflow to generate the flow scripts for the sort unit.
% mkdir -p $TOPDIR/asic/build-tut08-sort
% cd $TOPDIR/asic/build-tut08-sort
% pyhflow --one-test ../designs/tut08-sort.yml
The --one-test
command line option tells pyhflow to only include the
first test and no evals in the flow scripts. This is a useful way to get
started with a single test and reduces the overall runtime of the flow.
Once we know that everything works with one test we can circle back and
regenerate the flow scripts with all of the tests and evals.
Let's see how the step template has been filled in for the Verilog RTL simulation step.
% cd $TOPDIR/asic/build-tut08-sort
% cat 01-synopsys-vcs-rtlsim/run
...
run_sim SortUnitStruct__p_nbits_8_test_basic
Notice how the name of the first test has been filled in. Let's also see how the step template has been filled in for the Synopsys DC synthesis step.
% cd $TOPDIR/asic/build-tut08-sort
% cat 02-synopsys-dc-synth/run.tcl
...
analyze -format sverilog $env(TOPDIR)/sim/build/SortUnitStruct__p_nbits_8__pickled.v
elaborate SortUnitStruct__p_nbits_8
create_clock clk -name ideal_clock1 -period 0.7
Notice how the name of the source Verilog RTL File, the top-level modulename, and the clock period have all been filled in.
After generating a flow, we always recommend explicitly running at least the first two steps to ensure there are no errors. You can run the four-state RTL simulation as follows.
Make sure the step can find the source files and passes the test. Then run synthesis as follows.
Carefully look at the output from the synthesis step (also stored in the
run.log
file). Look for the output after Running PRESTO HDLC
for any
warnings to ensure that all of your Verilog RTL is indeed synthesizable.
Scan through the rest of the logs to ensure there are no worrying
warnings or errors.
Once you have explicitly run the first two steps to ensure there are no errors, you can run the remaning steps.
% cd $TOPDIR/asic/build-tut08-sort
% 03-synopsys-vcs-ffglsim
% 04-cadence-innovus-pnr
% 05-synopsys-vcs-baglsim
% 06-synopsys-pt-pwr
% 07-summarize-results
2.2. Running ASIC Flow with All Tests and Evals
If all looks good, then you can regenerate the with all of the tests and
evals. pyhflow will also create a run-flow
script which will run all
of the steps in sequence for you, but only use this if you are confident
there are no errors!
% cd $TOPDIR/asic/build-tut08-sort
% pyhflow ../designs/tut08-sort.yml
% ./run-flow
timestamp = 2025-02-19 22:14:30
design_name = SortUnitStruct__p_nbits_8
clock_period = 0.7
rtlsim = 7/7 passed
synth_setup_slack = 0.0089 ns
synth_num_stdcells = 461
synth_area = 785.498 um^2
ffglsim = 7/7 passed
pnr_setup_slack = 0.0683 ns
pnr_hold_slack = 0.0102 ns
pnr_num_stdcells = 609
pnr_area = 930.202 um^2
baglsim = 7/7 passed
SortUnitStruct__p_nbits_8_sort-sim-rtl-struct-random
- exec_time = 106 cycles
- exec_time = 74.5750 ns
- power = 2.9100 mW
- energy = 0.2170 nJ
The final step summarizes the results and includes the following information.
- Timestamp
- Design name
- Clock period in ns
- Four-state RTL simulation results
- Synthesis setup slack in ns
- Synthesis num stdcells
- Synthesis area in um^2
- Fast-functional gate-level simulation results
- Place-and-route setup slack in ns
- Place-and-route hold slack in ns
- Place-and-route num stdcells
- Place-and-route area in um^2
- Back-annotated gate-level simulation results
- For each evaluation
- Execution time in cycles
- Execution time in ns
- Power in mW
- Energy in nJ
For the results to be valid, the following must be true:
- all four-state RTL simulations pass
- all fast-functional gate-level simulations pass
- all back-annotated gate-level simulations pass
- place-and-route setup slack is positive
- place-and-route hold slack is positive
If your design does not meet timing after synthesis but does meet timing after place-and-route then these are still valid results. It just means Synopsys DC was conservative and/or Cadence Innovus did a good job further optimizing the design.
2.3. Debugging Issues
Every step logs its output to a run.log
file. The four-state RTL
simulation, fast-functional gate-level simulation, and back-annotated
gate-level simulation save the output of each simulation in a separate
log file and also save a VCD file.
If you pass two-state RTL simulation but fail four-state RTL simulation then the likely cause is either Verilog syntax that Synopsys VCS does not like (e.g., using signals before they are declared), mishandling register initialization, not handling inputs which are X correctly, and/or not forcing the outputs to always be known values (i.e., you cannot produces Xs on the outputs of your block after reset). Use Surfer and start from the observable error and work backwards through your design in both space and time. If you are seeing Xs in the output work backwards to try and figure out where these Xs are coming from.
If you pass two-state RTL simulation and four-state RTL simulation but
fail fast-functional gate-level simulation then the likely cause is some
kind of synthesis issue such as inferred latches, parts of your design
are being optimized away incorrectly, or the Synopsys DC TCL script has
an issue. While you can look at waveforms for gate-level simulation it is
not fun; it might be best to carefully look through the run.log
file
for the synthesis step first.
If you pass two-state RTL simulation, four-state RTL simulation,
fast-functional gate-level simulation, and your design has positive setup
and hold slack after place-and-route then the likely cause is the Cadence
Innovus TCL script has an issue. While you can look at waveforms for
gate-level simulation it is not fun; it might be best to carefully look
through the run.log
file for the place-and-route step first.
2.4. Interactive Debugging
You can use Synopsys DV to look at the synthesis results as follows.
% cd $TOPDIR/asic/build-tut08-sort
% design_vision-xg
design_vision> set_app_var target_library "$env(ECE6745_STDCELLS)/stdcells.db"
design_vision> set_app_var link_library "* $env(ECE6745_STDCELLS)/stdcells.db"
You can use the following steps to open the .ddc
file generated during
synthesis.
- Choose File > Read from the menu
- Open the
02-synopsys-dc-synth/post-synth.dcc
file
However, we don't usually find using Synopsys DV to be that helpful.
It is far more helpful to use Cadence Innovus to debug your physical design including highlighting modules and paths to see where they are located in the block. You can reload the design into Cadence Innovus as follows.
You can use the Amobea workspace to help visualize how modules are mapped across the chip. Choose Windows > Workspaces > Amoeba from the menu. However, we recommend using the design browser to help visualize how modules are mapped across the chip. Here are the steps:
- Choose Windows > Workspaces > Design Browser + Physical from the menu
- Hide all of the metal layers by pressing the number keys
- Browse the design hierarchy using the panel on the left
- Right click on a module, click Highlight, select a color
You can use the following steps in Cadence Innovus to display where the critical path is on the actual chip.
- Choose Timing > Debug Timing from the menu
- Click OK in the pop-up window
- Right click on first path in the Path List
- Choose Highlight > Only This Path > Color
Finally, you can use Klayout to capture a screen shot demonstrating that you have successfully taken a design from RTL to layout.
% cd $TOPDIR/asic/build-tut08-sort
% klayout -l $ECE6745_STDCELLS/klayout.lyp 04-cadence-innovus-pnr/post-pnr.gds
2.5. Key Reports
Here is a list of key reports.
- 02-synopsys-dc-synth/timing.rpt
- 02-synopsys-dc-synth/area.rpt
- 02-synopsys-dc-synth/resources.rpt
- 04-cadence-innovus-pnr/timing-setup.rpt
- 04-cadence-innovus-pnr/timing-hold.rpt
- 04-cadence-innovus-pnr/area.rpt
- 06-synopsys-pt-pwr/*-summary.rpt
- 06-synopsys-pt-pwr/*-detailed.rpt
You can use the synthesis resources report to determine if Synospys DC has used DesignWare components to optimize parts of your design. You can learn more about all of the DesignWare components here:
You can use the place-and-route timing setup report to determine the critical path in your design and the delay of every component along this critical path. You can use the place-and-route area reprot to determine the area of every module in your design. You can use the detailed power reports to determine the power consumption of every module in your design.
3. To Do On Your Own
Now we can use what you have learned so far to push the GCD unit through the ASIC automted flow. First, run all of the tests for theGCD unit.
Then run three interactive simulations for our evaluation.
% ../tut3_verilog/gcd/gcd-sim --impl rtl --input random --stats --translate --dump-vtb
% ../tut3_verilog/gcd/gcd-sim --impl rtl --input small --stats --translate --dump-vtb
% ../tut3_verilog/gcd/gcd-sim --impl rtl --input zeros --stats --translate --dump-vtb
Then you can use pyhflow to push the GCD unit through the flow.
% mkdir -p ${TOPDIR}/asic/build-tut08-gcd
% cd ${TOPDIR}/asic/build-tut08-gcd
% pyhflow ../designs/tut08-gcd.yml
% ./01-synopsys-vcs-rtlsim/run
% ./02-synopsys-dc-synth/run
% ./03-synopsys-vcs-ffglsim/run
% ./04-cadence-innovus-pnr/run
% ./05-synopsys-vcs-baglsim/run
% ./06-synopsys-pt-pwr/run
% ./07-summarize-results/run
Or use the run-flow
script to run the entire flow with a single
command.
% mkdir -p ${TOPDIR}/asic/build-tut08-gcd
% cd ${TOPDIR}/asic/build-tut08-gcd
% pyhflow ../designs/tut08-gcd.yml
% ./run-flow
Compare the energy of the three evaluations. Look at the following reports to understand the critical path and area breakdown.
% cd ${TOPDIR}/asic/build-tut08-gcd
% cat 04-cadence-innovus-pnr/timing-setup.rpt
% cat 04-cadence-innovus-pnr/area.rpt
Use the Cadence Innovus GUI to highlight the datapath vs control modules. Use klayout to look at the final layout. Then try to push the clock period lower to see if the block can operate at a higher clock frequency while still meeting timing and passing all of the tests.