Speed Optimization with PlanAhead

From Casper

Jump to: navigation, search

Memo #42 - William Mallard, August 2011.

To get Simulink designs to run above 300MHz on a ROACH, it is typically necessary to manually constrain placement of primitives on the FPGA fabric. This guide attempts to explain the process.


Compile design with BeeXPS

This is not strictly necessary, but the BeeXPS workflow automates a number of things that would be tedious to do manually. So, use BeeXPS to compile your design for 200MHz. It's fine if it fails during PAR. If it doesn't make it to PAR, there's probably something wrong with your design, and you should fix it.

Import design into PlanAhead

Start PlanAhead.

You'll probably want to do something like this:

$ export XILINX=/opt/Xilinx/11.1/ISE
$ export PATH=$PATH:$XILINX/bin/lin64
$ /opt/Xilinx/11.1/PlanAhead/bin/planAhead -m64

Import the netlist and constraints generated by BeeXPS:

File -> New Project

(1) Create a New PlanAhead Project

Click Next.

(2) Project Name

Pick a name and location for your new PlanAhead project. Click Next.

(3) Design Source

Select "Import synthesized (EDIF or NGC) netlist". Click Next.

(4) Import Netlist

Set "Netlist file" to "$YOUR_SIMULINK_PROJECT_PATH/XPS_ROACH_base/implementation/system.ngc". Click Next.

(5) Choose a Part and a Floorplan Name

It should automatically detect the following:

Product Family: Virtex5
Choose Part: xc5vsx95tff1136-1

Pick a name for your first new floorplan. Click Next.

(6) Import Constraints

Add "$YOUR_SIMULINK_PROJECT_PATH/XPS_ROACH_base/implementation/system.ucf" to the "Constraints files" list. Click next.

(7) New Project Summary

Click Finish.

Manually floorplan the design

  1. Hit Ctrl+F, and in the "Find" dialog box:
    1. Click "More" once to search by type and name.
    2. For type, select "Block Memory" for BRAMs or "Block Arithmetic" for DSP48s.
    3. For name, use a regular expression that captures the names of blocks that should be placed together.
    4. Click "OK".
  2. In the "Find Results" window:
    1. Select some primitives.
    2. Right-click your selection.
    3. Click "Draw Pblock".
  3. On the "Device" window, draw a Pblock rectangle around a region where you would like these primitives to be placed.
  4. A "New Pblock" dialog box will pop up.
    1. Pick a name for your Pblock.
    2. Select the types of primitives you'd like this Pblock to constrain.
    3. Make sure "Assign selected instances" is checked.
    4. Click "OK".
  5. In the "Pblock Properties" window:
    1. Select the "Statistics" tab.
    2. Verify that your Pblock does not exceed 100% utilization.
  6. Repeat this process as needed.

You can also draw empty Pblocks, and then assign them primitives as you go. You can even create an empty Pblock with no associated rectangle, and add primitives and a rectangle later. To learn more, right click on the "Device" window and play with the menu options. It should be pretty straightforward.

General placement strategy

Unfortunately, there is no "recipe" for getting your design to meet timing at high speeds. It's an iterative process that you develop an intuition for with practice. Here's the general strategy.

When PAR fails, load your timing reports in Timing Analyzer, and use the results to guide your placement.

In most situations, placing BRAMs and DSP48s works well. PAR is pretty good at placing auxiliary logic, and trying to do it by hand typically leads to suboptimal results. Only place individual LUTs if you have a specific reason for doing so (eg, routing status bits around the perimeter of your fft).

Locking down individual BRAM and DSP48 locations does not work well; it's usually best to constrain groups of them to Pblock areas. When packing things into Pblocks, it often helps to leave some free space. Filling Pblocks to ~70% seems to work well, but it really depends on the situation. Sometimes 100% packing is necessary and works fine.

Modify timing constraints

In the "Constraints" window, select:

Constraints -> Clk period -> Basic period

Change the four NET PERIOD constraints for "adc[01]clk_[pn]" to the desired clock period.

If f is the desired FPGA clock freq in MHz, the period should be 1000/f, truncated to three decimal places.

Compile floorplanned design

  1. Tools -> Run Implementation
    1. Pick a name for this (floorplan, clock speed) combination.
      • If it doesn't work, you'll have to turn down the clock and/or modify your floorplan, so it's a good idea pick your "Run Name" accordingly.
    2. Set "Pblock" to "(None)".
    3. Set "Strategy" to "ISE Defaults".
      • Other "optimizations" are rarely better and are often worse.
    4. Click "OK".
  2. When it prompts you to "Save changes before Launching?", it's usually a good idea to say "Yes".
  3. Your design should start compiling.
    • You can monitor progress in the "Implementation Run Properties" window.
  4. In the "Design Runs" window, wait for "Progress" to hit 100%. The "Status" column should read "PAR done!"
    • If your timing score is 0, then you've met timing. Move on to the next step.
    • If your timing score is not 0, then check your timing report to figure out why.

Generate a bitstream

  1. In the "Design Runs" window, select a successful design run.
  2. Right click it, and select "Run bitgen ...".
  3. In the Run Bitgen dialog box, click "OK".
  4. In the "Design Runs" window, wait for "Progress" to hit 100%.
    • The "Status" column should read "Bitgen Complete!"

Generate a borph executable

We define the following abbreviations:

SPP = Simulink project path
PPP = PlanAhead project path
PPN = PlanAhead project name
FPN = Floorplan name
DRN = Design Run name

Locate a mkbof executable:


Locate your new bitstream:


Locate your core info table:


The easiest method is to:

  1. cd to the same directory as the bitstream
  2. copy core_info.tab to this directory
  3. run:
    mkbof -o $FPN.bof -s core_info.tab -t 3 $FPN.bit

You should now have a BORPH executable for ROACH.

Personal tools