Projects
Current projects
Technology-aware 3D Interconnect Architectures for heterogeneous SoCs manufactured in Monolithic 3D Integration
Duration: 01.11.2022 bis 31.10.2025
Monolithic 3D integration (M3D) is a disruptive technology for the design of 3D System-on-Chips. In contrast to more conventional 3D integration schemes, M3D permits a very dense integration of vertical interconnects between neighboring tiers. Together with extrinsic heterogeneity, i.e., the combination of tiers with different electrical characteristics, unprecedented opportunities for new architectural designs and extended system functionalities arise.
These benefits have been proven by numerous works addressing processing elements and memories; yet, for on-chip communication architectures such as Network-on-Chips, only few related works exist. Further on, these works often neglect the significant impact of intrinsic heterogeneity caused by monolithic fabrication, such as process-related transistor degradations on higher tiers, interconnect degradations on lower tiers, or the non-uniform distribution of routing resources among tiers. Finally, previous works primarily exploit wire length reduction in 3D, yet do not consider the extended micro- and macroarchitectural design space.
We want to address all of these shortcomings by analyzing how the characteristics of monolithic 3D integration affect the design of the microarchitecture of individual network components, and the architecture of the communication infrastructure. Furthermore, we will analyze the impact of these modifications and extended design options on the overall system architecture.
The project will provide four specific contributions to the scientific community:
1) It will provide systematic design guidelines and a set of architectural templates for optimized 3D interconnect architectures addressing extrinsic and intrinsic heterogeneity;
2) it will provide models for formulating Network-on-Chip topology synthesis as an optimization problem;
3) it will provide a toolset for supporting a systematic design space exploration, which accounts for all relevant M3D technology characteristics;
4) it will demonstrate the optimization potential by means of two demonstrators, a Vision-System-on-Chip and a multiprocessor system.
The main outcome of this project will be a deeper understanding on how the disruptive characteristics of Monolithic 3D integration can be exploited for improving the interconnect architecture in 3D integrated circuits. This allows for the design of optimized systems, not supported by current design concepts.
Hybrid^2-Index Structures for Main Memory Databases
Duration: 01.01.2020 bis 31.12.2024
Aim of this project is to speed up index accesses of database management systems (DBMS) in order to improve the total performance. As index accesses are starting points for all succeeding processing steps of database queries, fast index accesses are the key to a superior total performance of DBMS. For the purpose of speeding up index accesses we propose to investigate and develop new hardware-/software index structures, which realize structure-hybrid indexes, i.e., the combination of static and dynamic indexes, on hybrid shared-memory system architectures consisting of a CPU and an FPGA or GPU as hardware accelerator. Such hybrid^2-indexes are not considered so far in literature, such that the possibilities of current hybrid shared-memory system architectures are not utilized in an optimal way. Because of the reduction of the communication costs between CPU and hardware accelerator many existing design rules for utilizing hardware accelerators must be rethought, especially concerning the complexities of tasks taken over by the hardware accelerators.Within this project we will hence research on which and how static and dynamic index structures can be realized in an efficient way with high performance on hybrid systems. Furthermore, we will investigate how to react on changing access patterns by dynamically swapping used index structures on the hardware accelerators. We expect novel, adaptive structure- and hardware-hybrid index structures, which significantly improve the performance of index accesses in DBMS in comparison to existing traditional systems.
Completed projects
ADAMANT-II: Adaptive Data Management in Evolving Heterogeneous Hardware/Software Systems
Duration: 01.06.2021 bis 30.09.2024
Heterogeneous system architectures consisting of CPUs, GPUs and FPGAs offer a variety of optimization possibilities for database systems compared to pure CPU-based systems. However, it has been shown that it is not sufficient to just map existing software concepts one-to-one to non von-Neumann hardware architectures such as FPGAs to fully exploit their optimization potential. Rather, new processing capabilities require the design of novel processing concepts, which have to be considered at the planning level of query processing. A basic processing concept has already been developed in the first project phase by considering device-specific features in our plug’n’play system architecture. In fact, more advanced concepts are required to achieve an optimal exploitation of the capabilities of the hardware architectures. While significant speed-ups were achieved on the level of individual operators mapped to GPUs and FPGAs, the performance gain at the level of complete queries was unsatisfying. Hence, we derived the hypothesis for the second project phase that standard query-mapping approaches with their consideration of queries on the level of individual operators is not sufficient to explore the extended processing features of heterogeneous system architectures.
We will address this shortcoming by researching new processing and query mapping methods for heterogeneous systems, which question the commonly used granularity level of operators. Therefore, we will provide processing entities that encapsulate a greater functionality than standard database operators and may span multiple hardware devices. Thus, processing entities are intrinsically heterogeneous and combine the specific features of individual devices. As a result, our heterogeneous system architecture enables database operations and features that are not available or cannot be implemented efficiently in classical database systems.
To explore this extended feature set, we have identified three application domains that are still challenging for classical database systems and for which we assume that they will benefit greatly from heterogeneous system architectures: High-volume data feeds, approximate query processing and dynamic multi-query processing. The stream-based nature of high-volume data feeds asks for a hardware architecture where processing can be done on the fly without the need to store data beforehand. Hence, FPGAs are a promising hardware platform for processing high-volume data feed applications. Furthermore, FPGAs as well as GPUs are good platforms for approximate query processing, as they allow for approximate arithmetics and hardware-influenced sampling techniques. Dynamic multi-query processing is very challenging from the system management point of view, as query plans that have performed well for one workload can be inefficient for a different workload. Here, the multi-level parallelism of heterogeneous systems offers better opportunities to handle heavy workloads.
Our aim is to develop new processing concepts for exploiting the special characteristics of hardware accelerators in heterogeneous system architectures for classical and non-classical database systems. On the system management level, we want to research alternative query modeling concepts and mapping approaches that are better suited to capture the extended feature sets of heterogeneous hardware/software systems. On the hardware level, we will work on how processing engines for non-classical database systems can benefit from heterogeneous hardware and in which way processing engines mapped across device boundaries may provide benefits for query optimization. Our working hypothesis is that standard query mapping approaches with their consideration of queries on the level of individual operators is not sufficient to explore the extended processing features of heterogeneous system architectures. In the same way, implementing a complete operator on an individual device does not seem to be optimal to exploit heterogeneous systems. We base these claims on our results from the first project phase where we developed the ADAMANT architecture allowing a plug & play integration of heterogeneous hardware accelerators. We will extend ADAMANT by the proposed processing approaches in the second project phase and focus on how to utilize the extended feature sets of heterogeneous systems rather than how to set such systems up.
Duration: 01.01.2021 bis 31.12.2023
Heterogene Systemarchitekturen bestehend aus CPUs, GPUs und FPGAs bieten vielfältige Optimierungsmöglichkeiten im Vergleich zu rein CPU-basierten Systemen. Zur vollständigen Ausnutzung dieses Optimierungspotenzials reicht es jedoch nicht, bestehende Softwarekonzepte unverändert auf nicht-von-Neumann-Architekturen wie beispielsweise FPGAs zu übertragen. Vielmehr erfordern die zusätzlichen Verarbeitungsmöglichkeiten dieser Architekturen den Entwurf neuartiger Verarbeitungskonzepte. Dies ist bereits in der Planung der Anfrageverarbeitung zu berücksichtigen. In der ersten Projektphase entwickelten wir hierfür bereits ein erstes Konzept, welches die gerätespezifischen Merkmale in unserer Plug’n’Play Architektur berücksichtigt. Allerdings sehen wir die Notwendigkeit zu dessen Weiterentwicklung, um eine noch bessere Ausnutzung der spezifischen Eigenschaften der Hardwarearchitekturen zu erreichen. Für die zweite Projektphase stellen wir daher die Hypothese auf, dass bekannte Verfahren zur Abbildung von Anfragen auf der Ebene einzelner Operatoren nicht ausreichen sind, um die erweiterten Verarbeitungsmöglichkeiten heterogener Systemarchitekturen auszunutzen.
Unser Ziel ist daher die Erforschung neuartiger Verarbeitungskonzepte und Verfahren zur Abbildung von Anfragen für heterogene Systeme, welche von der üblicherweise verwendeten Granularität auf Ebene einzelner Operatoren abweichen. Wir werden Verarbeitungseinheiten entwickeln, die eine größere Funktionalität als einzelne Operatoren bereitstellen und sich über mehrere Geräte hinweg erstrecken. Diese Verarbeitungseinheiten sind in sich heterogen und kombinieren die spezifischen Eigenschaften einzelner Architekturen. Im Ergebnis ermöglicht unsere heterogene Systemarchitektur das Bereitstellen von Datenbankoperationen und Funktionen, die in klassischen Datenbanksystemen nicht verfügbar oder nicht effizient realisierbar sind.
Zu Demonstrationszwecken haben wir drei Anwendungsfälle identifiziert, welche von heterogenen Systemarchitekturen stark profitieren können: Verarbeitung von Datenströmen mit hohem Aufkommen, approximative Anfrageverarbeitung und dynamische Multianfrageverarbeitung. Hochvolumige Datenströme erfordern eine Hardwarearchitektur, die eine Verarbeitung der Daten ohne vorherige Zwischenspeicherung ermöglicht. Dafür stellen FPGAs eine vielversprechende Plattform durch ihr datenstrombasiertes Verarbeitungsprinzip dar. Darüber hinaus eignen sich sowohl FPGAs als auch GPUs für approximierende Anfragenverarbeitungen, da sie arithmetische Operationen mit reduzierter Genauigkeit und die Realisierung von approximativen, hardwarebeschleunigten Samplingtechniken ermöglichen. Die dynamische Multianfrageverarbeitung ist aus Systemsicht sehr anspruchsvoll, da variable Systemlasten die Effizienz zuvor aufgestellter Anfragepläne reduzieren können. Hier ermöglichen die zahlreichen Parallelitätsebenen in heterogenen Systemen eine bessere Verteilung der Systemlasten.
Adaptive Data Management in Evolving Heterogeneous Hardware/Software Systems
Duration: 01.09.2017 bis 31.10.2022
Currently, database systems face two big challenges: First, the application scenarios become more and more diverse ranging from purely relational to graph-shaped or stream-based data analysis. Second, the hardware landscape becomes more and more heterogeneous with standard multi-core Central Processing Units (CPUs) as well as specialized high-performance co-processors such as Graphics Processing Unit (GPUs) or Field Programmable Gate Arrays (FPGAs).
Recent research shows that operators designed for co-processors can outperform their CPU counterparts. However, most of the approaches focus on single-device processing to speedup single analyses not considering overall system performance. Consequently, they miss hidden performance potentials of parallel processing across all devices available in the system. Furthermore, current research results are hard to generalize and, thus, cannot be applied to other domains and devices.
In this project, we aim to provide integration concepts for diverse operators and heterogeneous hardware devices in adaptive database systems. We work on optimization strategies not only exploiting individual device-specific features but also the inherent cross-device parallelism in multi-device systems. Thereby we focus on operators from the relational and graph domain to derive concepts not limited to a certain application domain. To achieve the project goals, interfaces and abstraction concepts for operators and processing devices have to be defined. Furthermore, operator and device characteristics have to be made available to all system layers such that the software layer can account for device specific features and the hardware layer can adapt to the characteristics of the operators and data. The availability of device and operator characteristics is especially important for global query optimization to find a suitable execution strategy. Therefore, we also need to analyze the design space for query processing on heterogeneous hardware, in particular with regards to functional, data and cross-device parallelism. To handle the enormous complexity of the query optimization design space incurred by the parallelism, we follow a distributed optimization approach where optimization tasks are delegated to the lowest possible system layer. Lower layers also have a more precise view on device-specific features allowing to exploit them more efficiently. To avoid interferences of optimization decisions at different layers, a focus is also set on cross-layer optimizations strategies. These will incorporate learning-based techniques for evaluating optimization decisions at runtime to improve future optimization decisions. Moreover, we expect that learning-based strategies are best suited to integrate device-specific features not accounted for by the initial system design, such as it is often the case with the dynamic partial reconfiguration capabilities of FPGAs.
Joint research projekt: Research and development of a freely configurable, open and dose-saving computer tomograph (KIDs-CT) - Sub-project:; Processing of detector signals
Duration: 01.10.2017 bis 31.03.2021
We design an open-source system that reads the detector data in computer tomography, hierarchically aggregates these, and calculates signal (pre-) processing. The system consists of industry-standard components. It is the first CT-scanner with open-source interfaces and publically available system architecture. This opens unparalleled potential for research and optimization: The (pre-) processing of raw data in close proximity to the signal source increases the signal quality. The amount of data send in the communication is reduced. The combination of (pre-) processing and subsequent algorithms for image reconstruction sharply increases the image quality.
Technology-aware Asymmetric 3D-Inteconnect Architectures: Templates and Design Methods
Duration: 01.07.2017 bis 31.12.2020
New production methods enable the design of heterogeneous 3D-System-on-Chips (3D-SoCs), which consist of stacked silicon dies manufactured with different technologies. In contrast to homogeneous SoCs, this allows to adjust the technological characteristics of each die to the specific requirements of the components placed in each layer. Heterogeneous 3D-SoCs provide unprecedented integration possibilities for embedded and high performance systems. To exploit that potential, powerful, flexible, and scalable communication infrastructures are required. Yet, current interconnect architectures (IAs) tacitly assume a multilayer homogeneous 3D-SoC and do not consider the influence of different technology parameters on the topology, architectural, and micro-architectural level of the IA.
In this project, we aim to develop architectural templates and design methods for 3D-interconnect architectures for heterogeneous 3D-SoCs. We target two main innovations: First, we will exploit the specific technology characteristics of individual chip layers in heterogeneous 3D-SoCs. Therefore, we will re-evaluate and extend existing approaches for heterogeneous and hybrid 2D-interconnect architectures. Second, we aim at discovering
new interaction mechanisms among components, which may be spatially distributed even at the micro-architectural level, to exploit their diverse features when manufactured in different technologies. The combination of these aspects leads to technology asymmetric 3D-interconnect architectures (TA-3D-IAs), as defined in this proposal for the first time.
The main outcome of the project will be a deeper understanding of TA-3D-IAs as part of heterogeneous 3D-SoCs. Further, we will develop systematic design methodologies and a set of architectural templates for the design of TA-3D-IAs. Therefor we will create a full-fledged simulation framework for the analysis of TA-3D-IAs' design space, which will be capable of accounting for technology-specific parameters for all components of the communication infrastructure. In addition, we will provide reference benchmarks and selected TA-3D-IAs, which will allow other research teams to evaluate and compare their ideas.
Hardware-acceleration of Semantic Web databases with runtime reconfigurable FPGAs
Duration: 01.10.2014 bis 30.06.2017
The relevance of the Semantic Web has been increased steadily over the recent years. This can be shown by the increasing number of developed and used Semantic Web tools and applications.The main idea of the Semantic Web is to consider the semantic of symbols to enable a more precise machine processing. For this purpose, the necessary links between data sets are stored in database systems. The continuously increasing size of the data sets leads to performance issues for traditional databases and even specialized Semantic Web databases. In the scope of Semantic Web databases data sets with billions of entries are available and processing of these data sets on software-based solutions is highly time consuming.Thus, in this project a hardware/software system will be investigated and developed to outsource time consuming tasks to a programmable logic chip (FPGA, Field Programmable Gate Array). The hardware acceleration of cost intensive tasks will cover the index generation as well as query processing in Semantic Web databases. During query processing the determination of which function should be mapped to the FPGA will be decided at runtime. As the mapping of the data path to the basic elements uses partial runtime reconfiguration, an optimal hardware accelerator can be provided for any query.
Detection and adaptive prioritization of semi-static data streams and traffic patterns in Network-on-Chips
Duration: 01.04.2014 bis 31.12.2016
Aim of this project is the design and implementation of a traffic adaptive network-on-chip for communication latency reduction in complex manycore systems. Temporally constant communication patterns between functional units should be detected online and the corresponding data streams should be transferred without any delay by bypassing the complete router pipeline. Such temporally constant patterns exist for the duration of an application in multifunctional systems as well as temporally in manycore processor systems with distributed caches. Prioritization of suitable data streams will be applied to individual semi-static data streams between two functional units, as well as to repeating patterns of semi-static data streams. Traffic pattern detection is done locally by each router and only accounts the local routing decisions for all data streams of one router input. This allows local aggregation of several individual data streams with different destination addresses and virtual channel identifiers. If several consecutive routers prioritize the same aggregate, a direct point-to-point connection is set up. Depending on the actual traffic patterns this results in a combination of a packet-oriented and a circuit switched network-on-chip.The frequency of occurrence, duration and pattern of semi-static data streams do not only depend on the communication characteristics between functional blocks and their location, but also on the routing algorithm used. Therefore the effect of different deterministic and adaptive routing algorithms on these parameters needs to be evaluated. It is also intended to use adaptive routing algorithms to support the formation of aggregates of semi-static data streams. Adaptive and fault-tolerant routing algorithms will also be used to limit the effects of blocked networks links for non-prioritized data streams due to their exclusive use for semi-static data streams. Non-prioritized data streams need to be rerouted in such a way that prioritized connections can be sustained as long as possible. The network-on-chip architecture is dedicated for the use in ASIC designs as well as in partially reconfigurable FPGA designs. Performance, energy consumption and hardware requirements will be evaluated for both design alternatives. At the end of the project, the effectiveness of the network-on-chip architecture will be demonstrated by means of an FPGA-based test system.