Jim Hogan, of Vista Ventures LLC and an industry luminary in the semiconductor space, just published his analysis of the emulation space on Deepchip yesterday. The complete analysis includes four separate pieces:
Hogan's core thesis is that emulation is becoming more ubiquitous.
From interactions with customers, that's evident. 10 years ago, emulation and FPGA prototyping were more the exception -- now it is almost the rule for chip development teams. This trend is being reflected in vendor results. In quarterly earnings calls, Mentor's Wally Rhines recently characterized the last two years of emulation growth as "really remarkable" -- and Cadence stated that 2011-2013 hardware sales, led by Palladium XP, grew 90% over the 2008-2010 period. Of course, Synopsys recently acquired EVE.
Not only are more teams using these solutions -- but additional groups within chip development teams are starting to as well. Many of these new groups are trying to leverage FPGA boards and open software to address their needs.
Some are doing this to validate IP and subsystems. Many are doing this to address the exploding demand for pre-silicon software development. Firmware engineers need pre-silicon models. Often RTL is the only model accurate enough for their needs or even available. FPGA boards run RTL fast -- and they are cost-effective to deploy for firmware development, once you get them working and integrated with your simulations. But FPGA boards are difficult to use, especially so when connecting to software, for models, test benches and debug.
These engineers could leverage emulation, if only it could meet their budget and productivity needs. These users need solutions that:
- Connect FPGA boards with host-based open software, such as SystemC/C/C++
- Reduce the significant challenges debugging RTL IP in FPGA boards
There's a new segment of emulation that addresses these needs. We're calling this new segment desktop emulation -- and it connects commodity FPGA boards together with C to deliver affordable emulation for individual use.
We're happy that Hogan included our desktop emulation solution, Semu, in his analysis. It starts at $9,500 and can be used with standard, low-cost Xilinx-based development boards for applications and users that have been trying to leverage FPGA boards for applications that benefit from emulation:
- Pre-silicon firmware development with RTL IP
- High-speed validation of IP and subsystems
If you would like to learn more, please download the Semu datasheet, or contact Bluespec for a demo:
Interested in the rapid prototyping and architectural exploration of algorithms and architectures? Please come join former MIT professor Rishiyur S. Nikhil, PhD, CTO of Bluespec, in Santa Clara, CA on November 2nd for an in-person, live seminar and lunch. He will be presenting a case study in the rapid prototyping and architectural exploration of the H.264 video algorithm.
What you'll learn about in this hour and half seminar:
- Algotecture: the critical role of architecture for optimal implementation of hardware algorithms (architecture + algorithm)
- Rapid architectural exploration: making rapid changes for performance, power and area, while still delivering optimal results across many tradeoff points
- Prototyping: the benefits of early FPGA prototyping for architectural exploration
At the culmination of Nikhil's presentation, the complete video compression algorithm will be demonstrated on a low-cost FPGA emulation system. Lunch will be served.
Click here to register and get details:
Schedule on November 2:
| 10:30a-10:45a |
Welcome Reception |
| 10:45a-12:15p |
Seminar and Demo |
| 12:15p-1:00p |
Lunch |
About Rishiyur S. Nikhil
Rishiyur S. Nikhil is co-founder and CTO of Bluespec, Inc., which develops tools that dramatically improve correctness, productivity, reuse and maintainability in the design, modeling and verification of digital designs (ASICs and FPGAs). Earlier, from 2000 to 2003, he led a team inside Sandburst Corp. (later acquired by Broadcom) developing Bluespec technology and contributing to 10Gb/s enterprise network chip models, designs and design tools. Prior to that, he was at Cambridge Research Laboratory (DEC/Compaq), including one and a half years as Acting Director. He was a professor of Computer Science and Engineering at MIT. He has led research teams, published widely, and holds several patents in functional programming, dataflow and multithreaded architectures, parallel processing, compiling, and EDA. He received his Ph.D. and M.S.E.E. in Computer and Information Sciences from the Univ. of Pennsylvania, and his B.Tech in EE from IIT Kanpur.
In order to validate increasingly complex designs and deliver firmware earlier, emulation and FPGA prototyping are becoming imperatives for chip development. These hardware execution platforms deliver orders-of-magnitude higher speeds than simulation, which is too slow to handle the increased complexity and software content of today's designs. But when these platforms are connected to test benches or models on a host workstation, the performance can often significantly miss expectations, putting into question both the investment in this approach and project success. In order to avoid this, design teams need to architect and optimize their co-emulation environments to avoid the ramifications of Amdahl's Law. Transaction-level co-emulation and synthesizable testbenches are key tools for avoiding the bottlenecks that can kill emulation performance.
Emulation & Amdahl's Law
Amdahl's Law teaches us how the speedup of a system is governed by how much and how big a portion of that system is improved. With emulation and FPGA prototyping, the bottlenecks are typically the host and co-emulation link. While an emulator can run at MHz+ speeds, host-based simulation and the co-emulation link can severly impact the effective performance achieved. In a case study we performed of the co-emulation of an Ethernet switch test bench and DUT (Case Study: Using Synthesizable Transactors & Testbenches to Avoid Emulation Performance Disasters), the effective performance ranged from a low of 47 KHz to a very high 49 MHz, a dynamic range of 1000X, which depended on the choices made in the partitioning of the testbench across the co-emulation link and on the abstraction level of the transactors.
We've heard multiple first-emulation-experience anecdotes from teams seeing disappointing improvements over simulation. While you might get lucky simply dropping the DUT into emulation, typically you'll need to analyze and optimize performance to achieve your expectations -- and, in the process, you'll likely need to rework the test bench partitioning, transactors, and consider other items such as system reset/configuration and memory architectures.
Transaction-Level Co-Emulation
Co-emulation links have limited throughput and considerable latency, which can bottleneck performance if:
-
Models or a testbench, running on a host workstation, have tightly coupled interactions with a DUT running in emulation. This can happen if the interface across the co-emulation link is running at too low a level (for example, running a low-level, signal-level interface between host and emulator). It can also happen if there are round-trip, latency sensitive interactions between host and emulator.
-
Co-emulation communication, from host to emulation, requires too much bandwidth. For example, transmitting uncompressed, high-resolution, high-frame rate video across the co-emulation link can easily swamp a link's bandwidth.
Transaction-level co-emulation, using higher level communication abstractions which are ideally latency decoupled, can help with both of these. Take for example a testbench on a host connected to an emulation-resident DUT that comprises an AMBA AXI-based SoC. And, the testbench connects into an AXI port, over which the switch traffic is sent and received. One option would be to just bring the AXI interface across the co-emulation link, which would require cycle-by-cycle communication, with very tight latency coupling, at the switch interface level. This would result in terrible performance. Alternatively, one could send each AXI transaction, or even multiple transactions, as a single transmission, including both the data for the AXI transaction and the parameters for this transaction (e.g. address/etc.). From performance standpoint, this second approach is far preferable.
What's required to communicate across the co-emulation link with transactions? You need synthesizable transactors running on the emulator that take these high-level transactions from/to the host and convert them to low-level DUT interfaces. Transactors can be very complex, requiring considerable time-to-develop and significant verification efforts, because they are typically done in RTL. Because of the types of designs (complex control/protocol) typically required for synthesizable transactors, C/C++ and SystemC don't offer much in terms of abstraction over RTL. In contrast, Bluespec BSV offers an effective, high-level approach for the development of transactors, and has been used in many, many transactor designs as a high-level alternative to RTL.
Synthesizable Test Benches & System Models
In addition to transactors, test benches can be re-partitioned, even fully migrated, into emulation. As well, System Models required for your test bench, such as a disk drive model, or real lab equipment that would typically only be used with FPGA prototyping, such as an Ethernet packet tester, can be modeled and synthesized for even higher performance. If you were developing these for simulation, you would write them at a high-level using transaction-level SystemVerilog/'e'/C++/SystemC. With these tools available, why would you write them in RTL? But, with emulation and FPGA prototyping, you are stuck with RTL, unless you use a powerful high-level, synthesizable modeling environment like Bluespec.
With over 50 universities in the Bluespec University Program, it is not surprising to stumble upon fascinating new projects and courses using Bluespec. But, it is surprising when it is in your backyard, at MIT, where Bluespec technology was originally invented.
Peggy Aycinena, the editor of EDACafe, did a nice piece on July 17th,
MIT: towards the 1000-core processor. This article is about a project that came as a surprise to many of us here. Here is a great quote by Prof. Srini Devadas from the article:
Also, our chip is being designed completely using Bluespec, which we want to shout from the rooftop of Stata! The primary designers involved swear there are huge benefits from using Bluespec versus coding directly – with the Verilog RTL done by Bluespec.
And we’re taking the academic attitude: Forget legacy designs! We’re not using any third-party IP. The RTL is all truly MIT home grown. We’re proving that a small group of 4 students can do this huge design – assuming it works, of course!
In a sense, we're not surprised by this kind of response -- Bluespec designers talk about Bluespec this way, and report similar results. Whether it's designing complex IP, exploring architectures, accelerating algorithms, expressing networks-on-chip, constructing highly parameterized IP, or putting models/testbenches/transactors into emulation/FPGAs, Bluespec designers can do it faster, with fewer errors, and don't want to go back.
Recently, Twitter user @avsm
tweeted about a BSV-based MIPS model booting BSD UNIX at the University of Cambridge in the UK. Add to that Bluespec's own ARM Cortex ISS models that boot Linux and are included in its family of Synthesizable Virtual Platforms (SVP) for firmware development. Think Virtual Platform, but running in FPGAs blazingly fast, so they can integrate with RTL IP and still run at MHz speeds -- speed + accuracy.
Then add in a whole bunch more, including x86, PowerPC, SPARC, Alpha and Itanium models. These models run the gamut from architectural pipelined to ISSes. Most of them boot real OSes, and do it running really fast in FPGAs.
Given the breadth of real processor architectures booting real OSes in FPGAs, it's clear that no other HLS solution comes close!
Creating highly configurable IP generators can be complex and time-consuming. Simple parameterization is easy, such as designing for configurable bit widths. What gets really hard is accounting for parameterization that handles behavior and structure: such as features, architectures, and even microarchitectures. Doing this is not only complex, but it means inevitably having to leverage and integrate multiple tool environments such as Verilog, TCL and PERL.
BSV, with its control-adaptive, extreme parameterization, makes building highly parameterized IP easy -- allowing you to parameterize on almost any dimension: features, modules, functions, architectures and micro-architectures.
Michael Papamichael used this power to win last year's
2011 IEEE MEMOCODE design contest. In five short weeks, he built two configurable NoC models that ran really fast, because they were synthesizable, on FPGAs.
His paper outlines this amazing, five-short-weeks accomplishment.
Even more impressive is what he managed to pull together in the midst of all his other work this past academic year. Based partly on his entry from last year's IEEE MEMOCODE design contest, Michael just launched CONNECT, a configurable, FPGA-friendly Network-on-Chip (NoC) generator. This is a really impressive IP generator -- configurable on
many dimensions, including the choice of up to seven different network topologies -- line, ring, double-ring, star, mesh, torus and fully-connected. And, if one of these topologies doesn't work, there's an option to design your own fully customized topology. But, don't take our word for it --
you can try it out yourself through his very cool web portal:
On January 27th, CYBERNET SYSTEMS hosted their Bluespec User Group Meeting. In addition to presentations by Bluespec and CYBERNET, customer and industry presentations were made by Hitachi, TOPS Systems and Vennsa. The following are some pictures from the event:

Miyajima-san Kickoff

Venue

Presentation by Hitachi
CYBERNET SYSTEMS will be hosting a Bluespec User Group Meeting in Tokyo, Japan on January 27th, 2012. Details can be found on the
Bluespec User Group Meeting website. In addition to presentations from users and Cybernet, Bluespec VP of Marketing, George Harper, will be presenting
How BSV Enables Early Modeling, Verification & Emulation.
EETimes editor Brian Fuller has been traveling the country in a Chevy Volt seeking out stories about innovation. Along with his brother Kirk, who is the trip videographer, Brian stopped by Bluespec headquarters Friday October 7th. After a tour of the facility, Brian had a nice conversation about Bluespec with CEO Charlie Hauck, CTO Rishiyur S. Nikhil, VP Marketing George Harper, and VP Sales Gerry Desmond.

Afterward, Brian took the team out to the Chevy Volt for test drives. You can check out the video of the Chevy Volt below:
George Harper was invited to blog about his industry perspectives for EETimes' EDA Designline. His first blog,
Planned Human Obsolescence, has been posted.