Workshop and Tutorial Schedule

All times shown in Pacific Standard Time (UTC−8)

To sign up for workshops and tutorials, please use the registration link on the conference registration page ( Some events may require additional registration, see each event for details. All events will send emails to registered participants.

Lunch will not be provided, but we will have a lunch break from 12:15PM to 1:45PM

March 3Tutorial08:30 AM - 12:00 PMT1: Introduction to Ryzen™ AI Development Tool FlowMonterey Bay
March 3Tutorial01:45 PM - 03:45 PMT2: Who needs neuromorphic hardware? Deploying SNNs to FPGAs via HLSMonterey Bay
March 3Tutorial04:00 PM - 06:00 PMT3: Hands-on introduction to the Intel FPGA AI SuiteMonterey Bay
March 3Tutorial08:30 AM - 10:40 AMT4: Fabric-to-Silicon: Agile Design of Soft Embedded FPGA FabricsCypress 1 & 2
March 3Tutorial10:45 AM - 12:15 PMT5: RapidStream Pro: Timing Optimization for FPGA Accelerators through Integrated Design Partitioning and PipeliningCypress 1 & 2
March 3Tutorial01:45 PM - 03:45 PMT6: ScaleHLS-HIDA: From PyTorch/C++ to Highly-optimized HLS AcceleratorsCypress 1 & 2
March 3Tutorial04:00 PM - 06:00 PMT7: Dynamatic Reloaded: An MLIR-Based Dynamically Scheduled HLS CompilerCypress 1 & 2
March 3Tutorial04:00 PM - 06:00 PMT8: CEDR: A Holistic Software and Hardware Design Environment for FPGA-Integrated Heterogeneous SystemsCypress 3 & 4
March 3Workshop08:30 AM - 03:45 PMW1: Spatial Machine Learning: Architectures, Designs, EDA, and Software Cypress 3 & 4
March 3Social Hour06:00 PM - 07:00 PMPlaza

Workshop and Tutorial Details

T1: Introduction to Ryzen™ AI Development Tool Flow

3rd March 2024, 08:30 AM – 12:00 PM PST

Organizers: Andrew Schmidt (AMD), Naveen Purushotham (AMD)

In this tutorial we will describe the AMD machine learning solutions with the Ryzen AI™ platform, discuss the Neural Processing Units (NPUs), and present Riallto, an open-source exploration framework for first time users of the NPU developed by teams from the AMD Research and Advanced Development group and the AMD University Program. AMD Ryzen AI is the world’s first built-in AI engine on select x86 computers. This dedicated engine is built on the AMD XDNA™ spatial dataflow NPU architecture consisting of a tiled array of AI Engine processors and is designed to offer lower latency and better energy efficiency. Such processor arrays are also found in the Versal Adaptive SoC enabling rapid development and evaluation across heterogenous architectures. This integration optimizes efficiency by offloading specific AI processing tasks such as background blur, facial detection, and eye gaze correction, freeing up CPU and GPU cycles and enhancing system efficiency. With Ryzen AI-powered laptops or miniPCs, you can develop innovative applications spanning creative solutions like media editing and studio effects or productivity solutions like Information search, summarization, transcription and so much more. Ryzen AI also caters to the gaming industry providing a platform to create real-time audio/video effects, Image enhancement, NPC Agents, RL, and Rendering applications.

We will list and explain the supported frameworks, modes, and pre-trained models for AI at the edge as well as in the data center. The Ryzen AI platform consists of a rich set of AI models, optimized NPU designs, tools, libraries, and example designs for AI. We will explain the development environment features: the tool flow, the software stack, and supported deployment platforms. We will cover the various configurations to optimize a model for an application. We will identify high-level libraries and the APIs of the AMD AI Library. In the tutorial we will go through the tool flow using ONNX which demonstrates the important steps involved in developing applications using these frameworks.  Even though this tutorial focuses on the use of AMD XDMA on Ryzen AI processors, such techniques are applicable to program Versal Adaptive SoC that can be found in a range of accelerator cards like the Alveo™ V70.

Some basic AI awareness would be an advantage but is not required.

T2: Who needs neuromorphic hardware? Deploying SNNs to FPGAs via HLS

3rd March 2024, 01:45 PM – 03:45 PM PST

Organizers: Jason Eshraghian (University of California Santa Cruz), Fabrizio Ottati (NXP & Politecnico di Torino)

How can we use natural intelligence to improve artificial intelligence? The human brain is a great place to look to improve modern neural networks and reduce their exorbitant energy costs. While we may be far from having a complete understanding of the brain, we are at a point where a set of design principles have enabled us to build potentially more efficient deep learning tools. Most of these are linked back to spiking neural networks (SNNs). In a cruel twist of irony, the neuromorphic hardware that is out there for research and/or commercial use, is considerably more expensive (and often less performant), and harder to obtain than a consumer-grade GPU. How can we move towards using low-cost hardware that sits on our desk, or fits in a PCIe slot in our desktops, and accelerates SNNs? FPGAs might be the solution. This tutorial will take a hands-on approach to learning how to train SNNs for hardware deployment on conventional GPUs, and running these models on a embedded class FPGA (AMD Kria KV260) for inference. FPGA inference is achieved using high level synthesis, employing the AMD Vitis HLS compiler, and using a dataflow architecture of a deep SNN, with in-hardware testing.

T3: Hands-on introduction to the Intel FPGA AI Suite

3rd March 2024, 04:00 PM – 06:00 PM PST

Organizers: Rama Venkata (Intel), Cinthya Rosales (Intel), Kevin Drake (Intel), Domi Popoi (Intel)

FPGAs enable low latency AI pipelines with custom I/O, thousands of high performance DSP blocks and optimized memory accesses. The traditional challenges of programming complex FPGA AI algorithms have largely been overcome by the Intel FPGA AI Suite which takes you straight from a trained (TensorFLow, PyTorch etc.,) model to Intel Quartus FPGA AI IP. See what is new with Intel Programmable Solutions Group FPGAs and AI, and experience hands-on (optional) FPGA AI inference IP generation.

  • Attendees can gain an understanding of how they can use Intel FPGAs to implement AI today with the Intel FPGA AI Suite.
  • Attendees will be led through a hands-on introduction to the features of the Intel FPGA AI Suite and experience the generation of AI inference. (Instructions will be provided for preinstallation of the software suite on attendees’ laptops by Feb 15th, 2024).
  • Finally, Intel engineers will also demonstrate actual AI inference in hardware using an Intel FPGA board. The expected duration of the entire session is 2 hours, also including 20 minutes of Q&A.

T4: Fabric-to-Silicon: Agile Design of Soft Embedded FPGA Fabrics

3rd March 2024, 08:30 AM – 10:40 AM PST


Organizers: Prashanth Mohan, Ken Mai (Carnegie Mellon University)

This tutorial endeavors to introduce students and researchers to an agile silicon-verified embedded Field-Programmable Gate Array (eFPGA) design framework. The primary objective of this framework is to facilitate the creation of customizable eFPGA fabrics, spanning from architectural specification to physical layout. Our methodology adopts a top-down physical design approach [1, 2], thereby significantly streamlining the physical design process of eFPGA fabrics and providing a straightforward mechanism for the push-button physical design of multiple eFPGA fabrics. The utilization of the top-down methodology affords fine-grained Application-Specific Integrated Circuit (ASIC) integration [3] and facilitates agile fabric customization, aspects that are challenging to attain with alternative eFPGA frameworks such as OpenFPGA. The Fabric-to-Silicon framework has been refined by utilizing multiple test chips and we aim to disseminate this framework to the FPGA research community.

Fabric-to-Silicon framework empowers the  architectural exploration of custom eFPGA fabrics, extending down to the layout level, on open-source Process Design Kits (PDKs) like ASAP7. Additionally, it facilitates the construction of System-on-Chips (SoCs) featuring specialized eFPGA fabrics. The tutorial will encompass comprehensive design exemplars of sample eFPGA fabrics, serving as a catalyst for novices and researchers to embark on their journey with the framework. Furthermore, a system-level illustration will be presented, wherein an eFPGA is seamlessly integrated into a RISC-V SoC, demonstrating the feasibility of progressing from architectural specification to layout within 24 hours. As a part of the tutorial, an in-depth discussion will be conducted on the framework’s capabilities, methodologies for fabric customization, leveraging VPR for the mapping and verification of designs, and evaluating the Power, Performance, and Area (PPA) metrics associated with the resulting eFPGA fabrics.

[1] Mohan, et al. “Top-down physical design of soft embedded fpga fabrics.” The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2021.

[2] Mohan, Prashanth. Soft Embedded FPGA Fabrics: Top-down Physical Design & Applications. Diss. Carnegie Mellon University Pittsburgh, PA, 2022.

[3] Mohan, et al. “Hardware redaction via designer-directed fine-grained eFPGA insertion.” 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021

T5: Timing Optimization for FPGA Accelerators with RapidStream Pro

3rd March 2024, 10:45 AM – 12:15 PM PST

Organizers: Jason Lau (UCLA, RapidStream Design Automation), Yuanlong Xiao (RapidStream Design Automation), Yutong Xie (RapidStream Design Automation), Jason Cong (UCLA, RapidStream Design Automation), Licheng Guo (RapidStream Design Automation)

This tutorial presents an exploration of RapidStream Pro, the industrial-strength version of our previous AutoBridge/RapidStream/TAPA tool suites recognized with Best Paper Awards at FPGA 2021 and 2022. RapidStream Pro is designed to significantly boost the operating frequency of large-scale FPGA accelerators and streamline the design process through innovative parallel partitioning strategies for placement and routing. Our academic prototypes have demonstrated impressive results in a series of publications (2X frequency and 7X reduction in compile time) and used by a wide range of projects.

RapidStream leverages the modular hierarchy and pipeline flexibility inherent in FPGA designs to enhance timing quality. RapidStream employs early-stage coarse-grained physical planning for an initial layout, then uses pipeline flexibility to shorten interconnects. This approach alleviates local congestion by  distributing the logic evenly using high-level design hierarchy information, while avoiding global critical paths by utilizing the inherent pipeline flexibility, thus achieving optimal frequency performance.

T6: ScaleHLS-HIDA: From PyTorch/C++ to Highly-optimized HLS Accelerators

3rd March 2024, 01:45 PM – 03:45 PM PST

Organizers: Hanchen Ye (UIUC), Deming Chen (UIUC)

High-Level Synthesis (HLS) is vital for rapidly developing efficient, high-density hardware accelerators, enabling quick evaluation of different algorithmic choices. The challenge lies in effectively exploring the vast design space of large-scale HLS accelerators, which can lead to sub-optimal solutions if not done well, undermining the productivity benefits of HLS. This tutorial will introduce ScaleHLS-HIDA [HPCA’22, DAC’22, DAC’23, TRETS’23, ISPD’23, ASPLOS’24], a MLIR- based open-source HLS framework, which can compile HLS C/C++ or PyTorch model to optimized HLS C/C++ in order to generate high-efficiency RTL designs using downstream tools, such as Vitis HLS. Despite being fully automated and able to handle various applications, ScaleHLS-HIDA achieves a 1.29x higher throughput over a state-of-the-art RTL-based neural network accelerator on FPGAs. This tutorial aims to enable attendees to use the ScaleHLS-HIDA tool and participate in its development. The system is open sourced at:

T7: Dynamatic Reloaded: An MLIR-Based Dynamically Scheduled HLS Compiler

3rd March 2024, 04:00 PM – 06:00 PM PST

Organizers: Lana Josipović (ETH Zurich), Paolo Ienne (EPFL), Lucas Ramirez (EPFL), Andrea Guerrieri (EPFL)

Dynamatic is an academic, open-source high-level synthesis tool based on the MLIR compiler infrastructure. It produces synchronous dynamically-scheduled circuits from C/C++ code and generates synthesizable RTL. Dynamatic delivers significant performance improvements compared to state-of-the-art commercial HLS tools in specific situations (e.g., applications with irregular memory accesses or control-dominated code). It is customizable and extensible to target different hardware platforms and easy to use with commercial tools (e.g., Vivado, ModelSim). Dynamatic is not new–an LLVM-based version of it has been around for some years. Yet, this tutorial will introduce a brand new version of Dynamatic, reimplemented from scratch to take advantage of MLIR’s modularity and reusability. Our new incarnation of Dynamatic puts more emphasis on the richness of features and user-friendliness, and is now more useful for both HLS designers and HLS researchers.

In this tutorial, we will first discuss when, why, and how dynamic scheduling is beneficial in the context of HLS. Then, the attendees will experience Dynamatic from the HLS user perspective and employ it to produce dynamically scheduled circuits from C++ code. We will then discuss the internal architecture of the Dynamatic compiler and the MLIR infrastructure that we rely on. The attendees will obtain hands-on experience in HLS compiler development by integrating a simple compiler pass into the Dynamatic flow. Finally, we will discuss our plans for the project and summarize different ways that the attendees can use it or contribute to it.

T8: CEDR: A Holistic Software and Hardware Design Environment for FPGA-Integrated Heterogeneous Systems

3rd March 2024, 04:00 PM – 06:00 PM PST


Organizers: Joshua Mack (University of Arizona), Sahil Hassan (University of Arizona), Ali Akoglu (University of Arizona)

As the FPGAs are being embedded in all layers of computing infrastructure from edge to HPC scales, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to meet target performance goals or constraints. In line with this, we have developed CEDR, an open-source, unified compilation and runtime framework designed for FPGA-integrated heterogeneous systems. CEDR allows applications, scheduling heuristics, and accelerators to be co-designed in a cohesive manner. This tutorial builds on the educational class conducted on the CEDR framework during ESWEEK’23 with a focus on FPGA-integrated heterogeneous systems. It caters to audiences with diverse backgrounds and varying levels of expertise, providing an opportunity for exploration and study of FPGA-based computing in heterogeneous contexts.

We will start with an overview of CEDR, and then we will explore how CEDR (i) allows naive application developers to utilize FPGA-based acceleration within heterogeneous environments, (ii) enables system designers to sweep hardware compositions and measure their impact on realistic workload scenarios, and (iii) provides a rich environment for resource management developers to design new scheduling policies. Throughout the tutorial, our common goal is lifting the barriers to research and enabling productive application deployment on FPGA-integrated heterogeneous systems.

W1: Spatial Machine Learning: Architectures, Designs, EDA, and Software

3rd March 2024, 08:30 AM – 03:45 PM PST


Organizers: George Constantinides (Imperial College London), Christos-Savvas Bouganis (Imperial College London) Andrew Boutros (University of Toronto)

This workshop aims to take a vertical slice through the stack of parallel, ‘spatial’, accelerators for machine learning, From systems-level work involving multi-FPGA acceleration, down to low-level computer arithmetic. By doing so, and particularly highlighting research bridging traditional levels of abstraction, we aim to both showcase the most up-to-date developments in this area and also encourage greater collaboration by researchers across the stack. By the end of this workshop, participants will have an understanding of:

  • The latest research in spatial computation for machine learning in arithmetic, high-level EDA support, architectures, and software frameworks.
  • The opportunities for research and collaboration by bridging traditional abstraction layers in the systems, software, and hardware design process for modern machine learning.