Coreblocks stage 2 roadmap

Next plans for Coreblocks development, to be finished before first ASIC prototypes

Micro-architectural performance features

Scale the Coreblocks from simple internal modules implementations to the high- performance ones. As part of this task, we want to replace the current Load Store Unit (LSU), which performs memory operations in order, with a fully featured one, allowing the reordering of independent operations. Later, support for executing multiple instructions per cycle will be introduced to the frontend and backend modules to enable superscalar execution. We would also like to improve our frontend by replacing the current, naive branch predictor with a more elaborate one (e.g. gshare or TAGE) and implement support for checkpointing so that branch mispredictions can be recovered with low penalty. Each task should be documented and present some insight into decisions taken during implementation.

Milestones:
  • Load/Store Unit with instruction reordering
  • Superscalar execution (40% done)
  • Checkpointing – fast missprediction recovery (80% done)
  • Multi-stage Branch Prediction

Features for rich OS support

While previous tasks are about improving already existing functionalities, in this group we would like to extend them, particularly these related to operating system support. We are going to implement the Supervisor Mode, which includes the support for virtual memory and hardware translation to physical memory (MMU). Additionally we would like to implement a data cache and floating point unit, so that the processor will be useful not only for simple embedded tasks, but also support some scientific computations with acceptable performance.

Milestones:
  • Memory Management Unit (MMU) with translation lookaside buffer (TLB)
  • RISC-V Supervisor Mode implementation
  • Basic data cache
  • Floating Point implementation (50% done)
  • Adapting LiteX, Linux and other ports to processor features after implementing milestones, including porting OpenSBI

Finalizing application processor

After implementing features from Micro-architectural performance features and Features for rich OS support we would like to do further processor improvements. As a goal for this task, we want our core to be usable as an functional generic application-class processor, including support for running full multi-core Linux. Particularly, we would like to analyse synthesis results and tune our design to increase performance, implement support for 64-bit instructions and add multicore support with protocols to provide consistency and coherency between memory and cache.

Milestones:
  • IPC focused microarchitectural and FPGA specific performance optimizations (target: IPC > 1.2 @ embench-iot)
  • RISC-V 64 support
  • Multi-core operation support
  • Advanced Data Cache with Miss Status Handling Registers (MSHR)
  • TileLink bus interface with cache coherence protocols
  • RISC-V debug interface
  • Automated more advanced riscv-dv, design and integration test, with FPGA backend
  • Adapting LiteX, Linux and other ports to processor features after implementing milestones