Wednesday, November 22, 2017

Resetting reset handling

I will make a bold statement. You are probably doing your reset wrong in your RTL code. Yep, that's right. There are of course plenty of ways to resetting things the wrong way. Asynchronous vs synchronous is one example. Active high or active low is another.

But that's not why we're here today. Instead I will talk about a bad coding style that is so prevalent that I feel it needs some mentioning. Consider the following verilog code of a component that is most likely pretty worthless in practice, but will serve well as our example


module rst_all
  (input      clk,
   input      rst,
   input      valid_i,
   input [3:0]      data_i,
   output reg      valid_o,
   output reg [3:0] data_o);

   reg [2:0]  count;
   reg [3:0]  data_r;

   always @(posedge clk) begin
      if (rst) begin
  data_o  <= 4'd0;
  data_r  <= 4'd0;
  valid_o <= 1'b0;
  count   <= 3'd0;
      end else begin
  if (valid_i)
    count <= count + 1;
  data_r  <= data_i;
  data_o  <= data_r + count;
  valid_o <= valid_i;
      end
   end
endmodule


All our registers are being reset, as can be seen in the following screenshot from the elaborated code in Vivado.





Now, as good RTL designers we want to minimize the load on the reset network, so we start looking for things that don't really need to be reset. Both data_o and data_r are unnecessary to reset if we only look at the data when valid_o is asserted. Let's remove them and try again


module rst_bad
  (input clk,
   input     rst,
   input     valid_i,
   input [3:0]     data_i,
   output reg    valid_o,
   output reg [3:0] data_o);

   reg [2:0]  count;
   reg [3:0]  data_r;

   always @(posedge clk) begin
      if (rst) begin
  //data_o  <= 4'd0;
  //data_r  <= 4'd0;
  valid_o <= 1'b0;
  count   <= 3'd0;
      end else begin
  if (valid_i)
    count <= count + 1;
  data_r  <= data_i;
  data_o  <= data_r + count;
  valid_o <= valid_i;
      end
   end
endmodule



Whoa.. what just happened The reset inputs are gone, but we suddenly have two new muxes in the design and clock enable pins on the flip flops. And what's worse, we still have the same load on the reset net. How is this possible?!?! Well, the thing is that we haven't specified what to do with the data_r and data_o signals when rst is low. Therefore, according to verilog rules (the same thing applies to VHDL designs too), we must keep the old value. This is most likely not what we wanted to do. Still, I see this design pattern all the time, and no one is warning about it. What should we do then? One way is to replicate the assignments to data_r and data_o also in the reset section, but that's pretty awkward. The real solution is much simpler, but will probably cause heart attacks among conservative RTL designers.

We put the reset handling at the end of the process. Oh yes, I just did! And no one can stop me! It looks like this by the way


module rst_good
  (input clk,
   input     rst,
   input     valid_i,
   input [3:0]     data_i,
   output reg    valid_o,
   output reg [3:0] data_o);

   reg [2:0]  count;
   reg [3:0]  data_r;

   always @(posedge clk) begin
      begin
  if (valid_i)
    count <= count + 1;
  data_r  <= data_i;
  data_o  <= data_r + count;
  valid_o <= valid_i;

      end
      if (rst) begin
  //data_o  <= 4'd0;
  //data_r  <= 4'd0;
  valid_o <= 1'b0;
  count   <= 3'd0;
      end
   end
endmodule

and the generated design looks like this.



Problem solved!

A few additional notes:
  1. I have put together a FuseSoC core with the source and some notes on how to simulate and build the designs mentioned here in a git repo
  2. This was an example with synchronous resets, but the same thing is true for asynchronous.
  3. I have no idea why Vivado chooses to instantiate the muxes however, as it could just use the CE port directly. Other tools do that. My guess is that CE maybe is always active high, and the mux serves as an inverter.
  4. If you only target FPGA and the reset is just a power-on-reset, you don't need to reset at all. Just provide an init value if you must. Don't be afraid. It works, and will save some load on your reset net.
  5. There are many many more things to say about resets. But we stop here for today

Wednesday, August 30, 2017

Happy sixth birthday FuseSoC

Today FuseSoC is turning six years old. That is probably something like 35 in software years. It has had a colourful past with some breakups and an identity crisis, but has now settled down and realized that it will not change the world in the same way that it used to think. It has spawned a few child projects which are not yet able to handle themselves in the world and still need their loving parent project. Being 35 also means that we can expect a FuseSoC middle-age crisis in a few years where it will try to reinvent itself in a desperate attempt to appear youthful. All in all, it's pretty similar to it's author.

As with most software, there is no birth certificate, but we will use the date of the first commit to the repo of what would become FuseSoC as the birth date. So what really happened on that fateful day that would become forever etched into history as the day when everything changed? According to Wikipedia, it wasn't a happy day overall, but none of those events are really related to FuseSoC.

As so often, we need to go back further in time and take a look at the events leading up to this day. It all started with the OpenRISC Reference Platform System on Chip version 2, or ORPSoCv2. This project was a combination of RTL code for the OpenRISC CPU together with a bunch of peripheral controller cores, drivers, example applications and miles of makefiles to build everything together into FPGA images that could be loaded onto a few select boards for running OpenRISC-based systems. Despite having one of the least sexy names ever, it was widely used by most people who dealt with OpenRISC and seems to still be in use by some people.  But it wasn't without flaws. Due to the tightly integrated nature of the project, everyone who wanted to add support for a new FPGA board or add some extra peripheral driver ended up with their own version of the project, each with their own bugs and features. Fixes were rarely submitted back upstream to the main ORPSoCv2 repo. Also, the RTL code for the CPU and peripheral controllers were copies of other repositories, which quickly started to diverge from their upstream counterparts. Again, none of that code was submitted upstream. There were also other issues with regards to scalability that started to show when more features were added. In short, it was time for something new.

I started sketching out what I would like to see in a successor, and then started implementing ORPSoCv3. Just like ORPSoCv2 this was a system of makefiles calling into other makefiles, but with a major difference. Instead of storing copies of cores, the upstream versions were fetched when they were requested in a SoC to avoid all the code duplication. After some time, I was ready to present my work in progress to the world. At that time I was working for the company that owned and maintained OpenCores. The git hype had already started to sweep through the software landscape and I had been trying to convince my co-workers that we needed to start making OpenCores support git instead of just SVN. I never managed to convince them, but at least I got them to set up a git server where I could put my project as a trial. Except for three or four outdated clones of other OpenRISC-related projects, I had the only git repo at the now defunct git.opencores.org. On August 30 20111 I made the first commit.

The better part of the coming year was spent on writing makefiles calling other makefile until one day I had enough and decided that I will never in my life write another makefile calling other makefiles. It was time to kill my darling. I started a new implementation in Python with the lessons learned and soon got to a state where I wanted to present it to the world. As I only had this one git repo and no real understanding of git work flows, my instinct was to clean out the old repo and just push the new code in. Unfortunately I never figured out how to get rid of the first commit, which resulted in this sequence of commits:




Even after the Python migration, FuseSoC (or ORPSoCv3 really) was still storing a lot of RTL code in the repo. It was a shaky relationship, and in August 2013 there was an inevitable separation of tool and RTL code. The RTL code went into a new project called orpsoc-cores. There wasn't any crying involved an both parties realized that it was best to go separate ways. A day later, the first released version, ORPSoC 3.0 was released.

Life went on, new features were added, bugs were fixed, ORPSoCv3 became older and fatter, but it became more and more evident that ORPSoC really didn't have anything to do with OpenRISC. Everything that was OpenRISC-specific had already moved to the orpsoc-cores repository and ORPSoCv3 was really a dependency manager and build system for any RTL code. It was once again time to cut some ties. As usual, names are harder than code, and I spent some time trying to figure out what to call the thing I had created. One of the main alternatives was SoCify, but it turned out someone else had already used that name. In hindsight, I'm really grateful for that. SoCify as it is a horrible name. The idea of FuseSoC came from the analogy of fusion reactions to build something bigger from a number of smaller cores. The minimalist in me also considered FuSoC, which is also pretty bad and sounds a bit like F*** you SoC. I do like FuseSoC though. In February 2014 the big rename was made, the project was moved to its current location and FuseSoC 1.0 was released.

Since then not much has happened. New features are added. Bugs are fixed, reintroduced and fixed again. FuseSoC is getting older and fatter. I'm really grateful for all help that I have received over the years. According to github, there have been 20 contributors to the code base, but there are also a number of other people who have submitted bugs or contributed to the RTL code in the standard core library. Big thanks to everyone involved.

Birthdays usually involve presents. But what can we give to a project that already has everything? How about a logo and a home page? That's the perfect gift for a six year old and on this big day I can proudly announce the brand new home page (Ssshhh...I have had the domain name since 2014, but don't tell FuseSoC) and the FuseSoC logo.


Happy birthday FuseSoC! I will now leave the word to the millions of users to tell their stories of how FuseSoC has changed their lives.

Monday, August 21, 2017

OSDDI: Director's commentaries

Andrew Back of AB Open and FOSSi Foundation has been working on this great series of interviews called Open Source Digital Design Insights, in which he has been interviewing some of the great minds of the Free and Open Source Silicon movement (+ me). In the fourth episode the turn has come to me. As I watch the video myself, I realize how quickly time moves in the open source silicon world and how many things that have happened since then. I would therefore like to take the opportunity to add some more context as an addendum to the interview.
The interview was made at ORConf 2015, the same day as we publicly announced the Free and Open Source Silicon Foundation. We had been working on this for a year and it was a great feeling to present our ambitions to the world. The first thing that strikes during this interview is that we hadn't yet embraced the Open Source Silicon epithet ourselves and were still referring to our work as Open Source Hardware.
Another major theme that can use some more explanation is to role of the OpenRISC project nowadays. I would believe that most people coming in contact with open source silicon at this time will do so through the RISC-V project. When I started out, the RISC-V project was not yet born and OpenRISC was just about to become a teenager. OpenRISC wasn't the only free ISA around at the time. Most notably there were also free implementations of SPARC (both the LEON and the Sun T1/T2) and Lattice Mico 32 (lm32). OpenRISC was likely the most widely used architecture however and is still used in some critical infrastructure, which I'm unfortunately not allowed to speak freely of. Despite being widely used, the OpenRISC ISA hasn't been without faults, and already in 2011, we started work on a successor to the OpenRISC 1000 ISA, called OpenRISC 2000. Some of the things we wanted to fix was removal of the branch delay slots, better support for wider instruction lengths, instruction compression, more modular instruction set, revised memory model and other things. Unfortunately, we never got around to implement any of that, as we were a small group and there was barely enough manpower to do all the necessary work on or1k. Turns out, we never needed to, because a year or two after that, RISC-V came along and did all those things that we had planned for or2k - and more. In that regard, we see RISC-V as the spiritual successor to OpenRISC and we are happy to pass the dutch to RISC-V for future free thinking ISA development.
So what's the deal with OpenRISC in 2017? Well, it's not seeing as many design starts as it used to do since most new designs are based on RISC-V. My guess is that the ones who make new designs based on OpenRISC do it because they either already have a working OpenRISC environment and have no need to replace that, or because they know that it's a stable code base that has been ASIC-proven numerous times for more than a decade. On the software side we are still pushing to upstream some of the last bits of the toolchains, notably GDB and GCC. There are also some updates and clarifications to the specification, mostly related to the ABI.
I believe that the greatest legacy of OpenRISC will not be the ISA, but the idea and realization of a free and open source silicon ecosystem. A CPU isn't very useful by itself and much of what came out of the OpenRISC project was IP cores, such as peripheral controllers and a lot of support software. For example, the i2c and ethernet drivers for the controllers that came out from the OpenRISC project has been in the Linux kernel since 2006, which is seven years before the OpenRISC CPU support was added to the kernel. Some of the debug infrastructure that originated from OpenRISC is widely used in RISC-V-based designs. The FOSSi Foundation was born from a group of OpenRISC developers who saw the need for a vendor-independent group to foster the open source silicon ecosystem, regardless of which ISA is currently in vogue. ORConf was originally the OpenRISC conference. We have considered renaming it, but we like the name so we just have to find a good backronym (the best proposal is still Olof's Rock'n'roll Conference). Even FuseSoC was born as a tool to make it easier to build OpenRISC-based SoCs, and for the first year or two it was still called ORPSoCv3 (OpenRISC Reference Platform System on Chip version 3)
Enough said about OpenRISC. I think the most amazing aspect of the interview is that I did not mention FuseSoC even once. Nope. Not a single mention of FuseSoC in over 8 minutes! And if you think I look a bit like a zombie sloth on heroin in the interview, that's because I usually spend the months leading up to ORConf as Sonic the Hedgehog on amphetamine, so once everyone is seated and the conference starts, that's when I start to relax. It's a lot of work to organize a conference, but I absolutely love doing it and I hope that you will come to visit and enjoy as well.  And if you haven't seen the other entries in the OSDDI series, please watch them now. They really are insights in the world of open source silicon from some of the most knowledgeable people in the field (+me).

Saturday, August 12, 2017

FuseSoC 1.7

Lock up your wifes and daughters! FuseSoC 1.7 has been unleashed on the world. This unstoppable force will organize your HDL dependencies and provide abstractions to your EDA tools without giving you a chance to defend yourself.

Actually, there's not that much new on the surface of this release. Most of the work has been spent on internal refactoring in order to bring in two new major features for the next cycle. The first of these is a separation of the frontend - which handles reading core files, maintains the core database and does dependency resolution - and the backends - which launch the EDA tools. There are several reasons for doing this, but I hope to write more about this specifically in another post. The other major feature is the preparation for a new core description format, called CAPI2. This will be added early in the FuseSoC 1.8 cycle, so expect to read more about this in the future as well. If you are interested in taking an early peek, there's a CAPI2 branch of FuseSoC together with corresponding branch of fusesoc-cores which is used as a playground for now.

So, onto the actual changes.

Test coverage has now reached 72% of the code base. Unit testing is something I should had done from day one, as it has uncovered plenty of bugs and been a huge help when doing refactoring. So kids, get tested you too!

Failure is always an option, and should be handled with the same loving care as success. FuseSoC now exits with an error code when a build or simulation fails, making it easier for external tools to pick up failures. Also, failing scripts now print out the error code on failures to make it easier to analyze what went wrong. Speaking of things going wrong, the parsing of the core files have been made improved to warn for syntax errors instead of leaving the user with a Python stack trace. In general, there has also been many improvements to the logging, so that running with the --verbose option might actually be helpful when debugging strange errors.

There have been a number of improvements in the tool backends, mostly related to parameter passing. The Vivado backend had a bug that prevented passing multiple parameters to the backend. Quartus now supports passing verilog defines on the command-line. Parameters are properly escaped before being passed to the backend, which fixes string parameters for some backends. Other than that, ISIM now supports multiple toplevels, which is required for example when simulating Xilinx primitives that require glbl.v as a parallel toplevel. The Vivado flow now works on Windows after discovering that Vivado prefer forward regardless of what the OS uses as path separator.  The Icarus backend has been rewritten so that it's easier to rebuild the simulation model from an exported build tree.

In addition to fixes and new features, a few features have been removed. Mostly because they made no sense, were broken or turned out to be hard to maintain with little gain. The system-info command is removed, as all details are shown in core-info anyway. The submodule provider was likely broken for a long time without anyone complaining, and was a bad fit for the FuseSoC depedency model, so it has been removed too. There was also a semi-working feature of the verilator backend that aimed to convert files containing verilog `define statements to a correspondent C header file. As there might be users out there actually using this, I added an entry to the FuseSoC migration guide with information on how to replicate this functionality in newer versions of FuseSoC.

Other than that, there are some other bug fixes, like FuseSoC now supports putting IP-XACT files in subdirectories of the core tree. There is also a --version command-line option to show, surprise, the current version of FuseSoC.

That's more or less it. Make sure to upgrade and get prepared for the wild ride that will be FuseSoC 1.8

Peace out!

Tuesday, March 21, 2017

FuseSoC 1.6.1

This is just a quick one. I was asked by fellow FOSSi Foundation founder and OpTiMSoC creator and lead architect Stefan Wallentowitz to do a point release. OpTiMSoC is currently shipping a forked version of FuseSoC, and while doing some spring cleaning they wanted to see if the upstream version could be used instead. As there has recently been some problems with the FuseSoC dependencies, originally caused by python's sorry excuse for a package manager called pip, this was also a good time to put out a new version. So here it is. Not much changed on the surface, but there has been some refactoring and preparation that is needed for the upcoming CAPI2 core format, which eventually will supersede the current format for .core files. If you have any thoughts about what the new format should look like, please check out the current CAPI2 draft, and yell if something needs to change. Other than that, the most noteworthy addition is that FuseSoC now comes with unit tests. Not sure I like them though. The first thing that happened is that they pointed out a bunch of bugs that I didn't know about. Oh well. I'll get those fixed eventually. That's all. Have fun with the new release!

Saturday, January 7, 2017

The dream of HDL standard libraries

I was recently asked why RTL code is so unportable and why there aren't any standard components to use for common blocks like RAM and ROM. As this is something that I've been thinking about for a long time, and was part of the reason why I started working on FuseSoC, I started writing a long reply, but realized it was turning into a decent-sized blog post instead, so I decided to put it here instead. So here are my thoughts on the issue, why it is an issue and some ways to make it more manageble. My view on this is largely FPGA-centric, but much of it is applicable also for ASIC.

In the RTL world we have no standard libraries. Coming from a software world, this is something that very much expected to exist. There are plenty of more or less standardized software components and interfaces. A prime example would be the C standard library (libc), which is used by pretty much to some extent by all software written in C. It's portable, and the same functions can be used from the smallest microcontrollers to the beefiest server parks. Other programming languages have their own built-in libraries, but most of those are internally using libc deep down. So why don't we have the same thing for HDL code? Both VHDL and Verilog has been around long enough for this to materialize. I'm not sure why, but my guess is that part of the reasons is that the pool of RTL engineers are much smaller than their SW counterpart and that most RTL projects has been done in isolation at companies that hasn't seen the value of agreeing with other companies on a standard interface. Another part of the problem is the general ignorance that is often seen in the digital design field. One of the most commonly used phrases for HDL code is that "it's a hardware description language, not a programming language". Over the years, this makes me more and more annoyed, and I really wish that schools and companies started telling this to new engineers. No, it's not a programming language, in the sense that it gets translated to a circuit description (which is not true for simulations), rather than machine-language code. This, however doesn't mean that RTL engineers should ignore all the best practices that has been developed for software. Most of the general ideas from software can be applied directly to HDL code as well, some need interpretation and a few doesn't apply at all. Standard libraries is one of those useful ideas that can be applied directly to HDL code, so again, why don't we have them? Surely, someone else must have thought about this before.

Yes, there has been a lot of attempts to build standard libraries but it's hard to standardize something that is already implemented in a million ways, and from what I see, no one seems to be interested in using existing solutions. Part of this is due to the fact that there hasn't been an easy way to share code, and this is a big part of why I started with FuseSoC, so that it would be easier to reuse existing cores. I'd say that the constant reimplementation is actually worse than what it first looks like. Writing these things are not necessarily that difficult. The hard part is a) proving their correctness, which means both good test cases and testing against a multitude of target devices and b) usability, in form of documentation and easy integration into other projects. Most of the standard library contenders I have found falls short in both these areas, and just contain a code dump.

Just being able to reuse code more efficiently isn't a silver bullet. There are plenty of examples from the software world where similar functionality is being reimplemented over and over again. But it does help. Another issue is also the scope. What should go into this standard library? Really small things like registers and muxes? No, those are probably best handled by the HDLs themselves. How about FIFOs, RAMs, ROMs, SERDES, clock domain synchronisers? Yes, these are things that are common enough to be used in a lot of places while still being complex enough to benefit from the increased testing that comes with more users. How about even larger components like UARTs, SPI controllers and caches? While I personally would sacrifice my left pinky (this is a big thing for an Emacs user) to never see another reimplementation of a UART or SPI master, I think they contain too much configurability to be put this into a standard library. I am however a big fan of modularity, and I would be happy to see these components being built using elements from the still fictous standard library. It should also be said that there are some examples of standardized interface in the HDL world as well, mostly related to verification, perhaps since this is an area more closely tied to software development. UVM is an example of this for SystemVerilog, and OSVVM would perhaps be its VHDL counterpart.

So let's pretend that a large enough group of developers agree on a standard library interface, the next problem is the portability. Such a library would preferrably have both Verilog and VHDL implementations. The main reson for this isn't compatibility, as it is possible in most cases to API make the API language-neutral. This of course has the drawback that many of the useful features in each language can't be used as they have no direct translation in other languages. The main reason is instead that the state of mixed-language simulations are still not very good. Many commercial tools charge a lot of money for this feature (even though the situation has improved quite a bit recently), and the Open Source tools are still not good enough for mixed-language. Where does this leave all the fancy new HDLs such as Chisel or Migen then? In the great tradition of reinventing wheels, of course they have also created their own libraries for these things instead of reusing existing Verilog libraries. To be fair though, it's not as if there was any good libraries to build upon. At least none that could be considered standardized.

If we ignore this for a while and assume we have implemented the library in several languages, or perhaps only need one, the next issue is the target technology. Different FPGA and ASIC technologies use different primitives for things like memories, FIFOs etc. The good news is that many of these constructs can be described in regular HDL code, and the EDA tools will happily convert them to the correct target implementation. This should always be used as the first option if posssible. Unfortunately, I see so many cases of engineers generating a vendor-specific IP for the simplest things. This has several drawbacks. First of all, it's not portable. Definitely not among vendors, and in most cases, not even among devices of the same family. For some truly idiotic reason, it's also usually bound to a specific version of the EDA tool it was generated by, which means it becomes a mess of upgrades or downgrades whenever someone is trying to use it with another version of the tool. Depending on the format of the IP core, it can be a pain to use together with version control systems and it often requires manual changes and regeneration whenever the smallest parameter (such as FIFO depth) change. A classic example of trying to save some time on implementation which ends up adding a large cost instead on maintenance and reuse. Trust me. I've been through this too many times.

Not everything can be described easily, efficiently or even at all with vendor neutral HDL code. Examples of things that are closely tied to target technology can be clock generation, I/O cells or special macros. The good thing here is that if these features are needed, you likely already know which chip or technology you're targeting and can afford to lose portability. These things are not supposed to be in a standard library anyway. A good practice here is also to try to keep these things close to the top-level so that the core functionality can be more easily moved between different targets. Other blocks might be possible to implement in many different technologies, but can't be expressed as pure HDL, since the EDA tools can't map them in a satisfying way. In these cases, there need to be backend-specific implementations, possibly with a pure HDL fallback for some tools. This is a common approach, but it's not really standardized how to select the correct backend. Many RTL engineers uses VHDL/Verilog generate statements for this, IP-XACT uses the "view" mechanism to switch between different files and FuseSoC currently uses something similar to IP-XACT, with a more powerful mechanism being in the works, inspired by the idea behind Gentoo's use flags.

So to sum it up, I'd love to see more standard libraries, as I think it's a good idea. Unfortunately, it's hard to make this come true, both for technical and political reasons. This doesn't mean we shouldn't try, and as FOSSi Foundation now plays a role in aiming to foster collaboration and open standards we might have a better chance than before. This won't happen automatically however, and in the meantime there are a few ways to iteratively improve the situation. Follow these short guidelines and the world will become a much better place

1. Whenever you start writing new basic functionality, take a look around to see if there is something that can be reused. It might look like more work up-front, but further down the road it will mean less maintenance, better documentation  and hopefully fewer bugs for more people.

2. Use pure HDL whenever possible instead of relying on vendor-specific IP. Again, the benefits will come from less maintenance, and in this case also improved portability

3. If you decide to write your own code, document it, add testcases and publish it through LibreCores so that other people will find it. This improves the chances that your code will be reused and improved by other people.

4. Use FuseSoC! In most cases it's quite simple to put together a FuseSoC core description file to go together with you component. This makes it easier for other people to reuse your code if they are also using FuseSoC. It also makes it easier for you to reuse other code if that code is already packaged to be used with FuseSoC

Let's show those softies that we RTL engineers also can collaborate and build awesome things together!