#### **Annual Report** by Prof. Dr. Matthias Fertig tonstant University ## 2020 This annual report documents my activities in academic teaching, research and development at the Konstanz University of Applied Sciences. #### Preface - P.1 Symposium "Future Computer Hardware" at Heidelberg University - P.2 63. Workshop of the Multi-Project-Chip (MPC) Group - P.3 International Symposium on Automation, Information and Computing - P.4 Optimization of Electromagnetic Fourier Simulators - P.5 FMA-Unit with Posit arithmetic and Quire format - P.6 Performance Analysis of RISC-V processors with SystemVerilog Assertions - P.7 New elective subjects - Programming Laboratory ,Electromagnetic Simulation - SystemVerilog for Design - P.8 Master's and bachelor's theses, student projects, internships course offers, publication, third-party funding #### **Preface** Dear readers. 2020 turned out to be a very special year for university teaching. While the transition to online teaching proved challenging in some respects it turned out to be far less complicated than one might have expected as most content was already available online. That said, those among my colleagues who are still in the habit of using overhead projectors and transparency films did have a more challenging time adjusting. Due to delayed start of the summer term by one month there was time to prepare the online courses. Finally, all of my courses were offered as planned. Luckily, Cisco WebEx offers a free three-month trial version – just right for one semester's worth of teaching. Due to the preparation, students acclimated with the online format very quickly, enjoyed the flexibilities and extensively used the chat function to raise questions – much more than usually in the lecture room, maybe due to a certain degree of anonymity. With online teaching taken care of I was able to attend to other issues as well, coming up with new courses, for instance. During the summer term of 2019 I had started designing a course on 'SystemVerilog for Design' [w3], which I completed during the summer term and summer break. A very highly esteemed colleague introduced me the to idea of a self-learning programming labs. A strange concept as it lacks any regular contact between professor and students that lectures or tutorials usually afford, it triggered the following valuable idea: to design a course where students are required to develop their own code and then automatically receive feedback from the programming environment about correctness and performance. I thought that this approach might be worth pursuing, which is why I started programming such an environment right away that enables students to implement existing algorithms for electromagnetic simulation. A well-defined problem and solution was required but since I have been working with electromagnetic Fourier simulators for many years the choice was obvious and so the environment supports the BPM, WPM and VWPM algorithms for two-dimensional and three-dimensional models on CPU and GPU, scalar and vectorial fields. By selecting the approariate algorithm, the degree of complexity is adjusted in line with the participants' level of knowledge and thereby the course may be offered at bachelor's and master's level. The environment has been tested in two student projects and the course is ready to start in the summer term 2021 in bachelor and master programs. More information on the 'Programming Lab Electromagnetic Simulation' [w3] is available in this report. In addition to the 'Silicon Photonics Lab' [w3] and the 'Optics & Photonics' [w3] course I now offer four elective subjects. I attended the MPC workshop [w3] in February just right before the first lockdown was imposed and attended to my duties as part of the 'Technical Program committee' of the ISAIC 2020 [w3] online. As you will read below, the summer and winter terms 2020 turned out to be very productive indeed, in spite of the rather special circumstances. Enjoy reading. Cordially yours Matthias Fertig #### Future Computer Hardware symposium of the Institute of Computer Engineering at Heidelberg University (ZITI) From 24 April to 24 July 2020 the Chair of Computer Architecture based at the Institute of Computer Engineering at Heidelberg University (ZITI) organised a <u>symposium</u> on the topic of Future Computer Hardware. In the course of nine events selected speakers from academia and industry presented on the topic and discussed potential innovative hardware solutions, with a special focus on the demands on future generations of computer architecture for applications in the following areas: artificial intelligence, deep learning, big data, internet of things and automotive. Systolic array of PEs. The presentations focused on innovative computer architectures, intelligent sensors and energy-efficient high-performance computing, design and technology optimisation, open-source hardware for artificial intelligence applications, processing-in-memory for big data as well as efficiency, safety and flexibility of adaptive digital circuits. A representative from industry, Dr Michaela Bott from Xilinx Research gave a talk on FPGA-based computer architectures for deep learning applications. In his talk on Brain Inspired Computing, Dr Johannes Schemmel from the Kirchhoff-Institute for Physics (KIP) discussed the implementation of artificial neural networks and drew attention to how we can look towards nature for high-performance computing solutions. Processing element with integrated accelerator (interface) for special applications, e.g. artificial intelligence. Simulation of a negative-index waveguide with the VWPM. I was given the opportunity to present my talk on the topic of Innovative Computing as first speaker in the series, discussing innovative computer architectures based on the open-source instruction set RISC-V, e.g. systolic arrays of RISC-V processing elements (PEs), highly integrated silicon photonics components for optical interconnect technologies, high-performance computing units based on the posit or IEEE floating-point format, e.g. for artificial intelligence applications as well as systematic approaches in near-memory and in-memory computing based on universal memory automata ( <u>UMA</u>). As part of a general overview of current and future funding opportunities in Germany and Europe I also gave an example for the application of meta-materials in silicon photonics - the implementation of highly efficient waveguides based on negative index materials. The discussion that followed raised interesting questions about the implementation of resonant detectors similar to the ones I used to work with when I was still at IBM Research and Development. #### 63th MPC workshop in Mannheim Organised by the Multi-Project-Group Baden Wuerrtemberg On 20 and 21 February 2020, the MPC group met for its 63<sup>rd</sup> workshop in Mannheim, which was hosted by Hochschule Mannheim University of Applied Sciences. On these two days in February all thirteen member universities met to exchange ideas about ongoing or completed research projects as well as projects in the areas of microelectronics and integrated circuits. As in the previous years there were interesting talks by representatives from academia and industry. In his talk entitled Centimeters make all the difference – vehicle Multi-Project-Chip (MPC) group Baden Wuerrtemberg (since 1989). motion and position sensor, Dr Frey from Bosch GmbH talked about how to determine the position of moving vehicles using sensors. Further talks by representatives from the partner universities were on open-source hardware (Professor Glesner, TU Darmstadt), stability of operational amplifiers (Professor Zwick, Hochschule Mannheim), GPS and GSM with emulated tracking systems (Frank Wasinski, M.Sc., Hochschule Giessen), hybrid image processing with FPGA (Jannik Maier, M.Sc., Hochschule Ulm), e-paper displays (Andreas Angermaier, B.Sc., Hochschule Offenburg), and electronics from the 3D printer (Alexander Scholz, M.Sc., Hochschule Offenburg and Karlsruhe Institute of Technology). Architecture of the Universal Memory Automata (UMA). A concept for near-memory computing architectures. As a member of the MPC group I was able to attend the workshop and took the opportunity to submit a paper on the theory of Universal Memory Automata (<u>UMA</u>, [1]) for publication [2]. Universal Memory Automata extend the theory of finite and pushdown automata to universal memory concepts and thus enable systematic implementation of near-memory and in-memory computing approaches. Besides elaborating on the theory the paper also introduces the development tool VERIGEN [3]. <u>VERIGEN</u> generates synthesizable Verilog code from a UMA specification as well as a simple testing environment and can thus be used to implement, verify and synthezise Universal Memory Automata. <sup>[1]</sup> http://www-home.htwg-konstanz.de/~MFERTIG/pages/uma.html <sup>[2]</sup> M. Fertig, Universal Memory Automaton and Automated Verilog HDL Code Generation for a Cache Coherency Snooping Protocol, 63th Workshop of the Multi-Chip Project (MPC) Group, Mannheim, February 2020, publication pending [pre-print] <sup>[3]</sup> http://www-home.htwg-konstanz.de/~MFERTIG/files/tools/VERIGEN\_FREE.tar Conferences #### ISAIC - 1<sup>st</sup> International Symposium on Automation, Information and Computing The international symposium on automation, information technology and scientific computing (<u>ISAIC</u>) took place from 2 to 4 December 2020 at Bejing Jiaotong University in Beijing, China. Associated with ISWEE, the traditional conference on water, ecology and environment with a special focus on algorithms and applications, ISAIC provided a forum for the areas of automation, information and computer technology for the first time this year. International Symposium on Automation, Information and Computing 2020. ISAIC 2020 took place online with presentations held in the form of online meetings. The links were published in the <u>symposium program</u>. I was able to contribute as an *Invited Member* of the <u>Technical Programm Committee</u>. <u>ISAIC 2021</u> has been scheduled for 3 to 6 December 2021. It will be hosted by Jiaotong University on its Beijing campus again. Prof. Dr. Matthias Fertig http://www.confisaic.com/ http://www.confisaic.com/#/submissionsGuidelines http://www.confisaic.com/#/aboutlsaicCommittes ### Electromagnetic Fourier-Simulation Optimization of accurary and speed Energy-balanced waveguide simulation with the WPM 2020. Numerical methods are indispensable when it comes to designing and optimizing optical and photonic components as electromagnetic effects cannot be made sufficiently tangible and analysed otherwise. Since closed analytical solutions are available for a very narrow set of problems only, partial differential equations are solved through finite differences or Fourier analysis. As a result, discretisation effects occur that majorly impact run time and correctness. Finite difference methods can be used for small problems only due to their extended memory and run time requirements, which means that Fourier simulators pose a promising approach in spite of some limitations. Besides Rigorous Coupled Wave Analysis (RCWA), the Beam [1], Wave [2] and Vector Wave Propagation Method [3, 4] describe the standard algorithms for Fourier simulators by computing field propagation by decomposition into plane waves, the so-called Plane Wave Spectrum (PWS) or Plane Wave Decomposition (PWD). The Vector Wave Propagation Method is the only known method supporting the bi-directional propagation of vector waves across the entire spatial frequency range. The above-mentioned algorithms show deviations in energy flux, which means that they do not conserve energy unreservedly at boundary surfaces. In this context a link to index distribution and modelling of evanescent waves can be demonstrated that depends to the smoothness of the index distribution in the model. The energy flux at system boundaries needs to be analysed to optimize energy conservation. It appears promising to define energy-conserving building blocks for implementing complex energy-conserving models. In the context of numeric stability, a correlation to the absorbing properties of the model can be shown, as also observed with Finite Difference Methods like the FDTD. References in literature related to energy conservation and numeric stability of BPM, WPM and VWPM are available only for the BPM so far and to some extent for the VWPM in [4]. Influence of evanescent mode and absorption on the relative energy flux over z-layer index with the WPM. Since the complexity class of the WPM and <u>VWPM</u> algorithm as compared to the BMP algorithm is not logarithmic but square (two-dimensional case) and requires much less memory space and run time in comparison to finite difference methods, Fourier simulators are well suited to computing field propagations in complex optical systems, i.e. systems comprising many individual optical components (camera lenses or other imaging systems with an optical axis). When it comes to simulating such systems, optimization of run time is important. In this context, massively parallel systems such as graphics processing units have proven advantageous (see 2019 annual report). <sup>[1]</sup> Feit, M.D., Fleck, J.A.: Light propagation in graded-index optical fibers. Appl. Opt. 17, 3990–3998 (1978) <sup>[2]</sup> K.-H. Brenner, W. Singer, "Light propagation through microlenses: a new simulation method", Appl. Opt. 32, No. 26, 4984 - 4988, (1993) <sup>[3]</sup> M. Fertig, K.-H. Brenner, "The Vector wave propagation method (VWPM)", JOSA A, Vol. 27, No. 4, pp. 709 – 717, (2010) <sup>[4]</sup> M. Fertig, "Vector Wave Propagation Method", Dissertation am Lehrstuhl für Optoelektronik der Universität Heidelberg, 2011 #### ASIC development Development and verification of an FMA unit using Posit arithmethic and Quire format RISC-V, an open-source and extendable instruction set, is driving innovative and open hardware solutions. In the area of high-performance computing (HPC) and parallel compute clusters the floating point unit (FPU) decisively impacts performance. Compute power is measured by floating point operations per second and fast, highly efficient floating point units are indispensable in HPC and artificial intelligence (AI). The IEEE-754 standard describes the common format in scientific computing – the IEEE-754 floating point format. As the IEEE-format is unspecific with regard to rounding modes and arithmetic with IEEE-754 numbers violates fundamental mathematical laws (associative rule), implementations Number wheel for 4-Bit Posits. (Quelle: [1]) on hardware platforms differ and uniform results are not guaranteed. Also, under the IEEE-754 format, most floating point numbers cannot be expressed accurately, which means that a so-called inexact bit is envisaged. Processing denormalised numbers requires a major hardware effort. Decimal numbers cannot be expressed precisely under IEEE-754, which means that decimal floating point units (DFU) are developed. The posit format [3] represents an innovate floating point format that avoids these issues and offers better code density as well as larger dynamic range and precision. Structure of the test environment for POSIT-FMA. (Quelle: [1]) Distribution of test vectors for parameter k. (Quelle: [1]) As part of a collaborative master's thesis supervised at Heidelberg University and HTWG Konstanz, a RISC-V 64 bit fused-multiply-add (FMA) unit supported by the Quire format for processing floating point numbers in Posit format [1] was developed and verified. The so-called Quire format is a posit-specific hardware for efficient computing of long floating point value sequences as used in Al applications, for instance. By comparing this with the results from the 2018 master's thesis on the implementation and verification of an FMA unit with IEEE-754 arithmetic [2] we are now in a better position to determine whether Posit arithmetic has any advantages with regard to implementation in digital systems. <sup>[1]</sup> Christian Melzer, "Design and Verification of a Parametrizable Posit Unit with Fused Multiply-Add and Quire Support", Master thesis at the Department of Computer Architecture of Heidelberg University, 2020 <sup>[2]</sup> Felix Kaiser, "Design and Verification of a RISC-V Conform, Double-Precision Fused Multiply-Add Unit", Master thesis at the Department of Computer Architecture of Heidelberg University, 2018 <sup>[3]</sup> Gustafson, John & Yonemoto, I., "Beating floating point at its own game: Posit arithmetic", Supercomputing Frontiers and Innovations, 4. 71-86. 10.14529/jsfi170206, 2017 #### Performance analysis of microprocessors with SystemVerilog Assertions In microprocessor development, expected performance should be assessed early on, i.e. during the development process. Basic performance indicators that need to be assessed include, for instance, cycles per instruction (CPI), instruction per cycle (IPC), WPC (watts per cycle), WPI (watts per instruction). Often, this means that elaborate simulation models are written for performance analysis purposes, which need to be adjusted and verified in the course of the development cycle. It is therefore desirable to gain an idea about performance early on and without high effort in order to be able to optimize the architecture or its implementation. Since ASICs are implemented with hardware description languages such as SystemVerilog, SystemC or VHDL, it seems obvious to include program code for performance analysis into these implementations. Such extensions are already being implemented in verification (UVM and Specman e). In 2020, a student project examined to what extent SystemVerilog Assertions (SVA) are suited to implementing a process which enables efficient performance analysis of a digital system implemented in SystemVerilog (SV) using SVA. An already existing process was analysed and extended during the project so that, using SVA, it can now document cycle-accurate events occurring in a simulation of the HDL implementation This extended process is based on the tool *Verilator* to create an events log (.log), which, in a further step that I do not describe here, can be used to derive performance indicators and event correlations for performance analysis. Based on the HDL implementation (.sv), automated SVAs are generated, which, together with the HDL description of the microprocessor, is used to create a C++ model (.cpp). This model is cycle-accurate and used to simulate compiled code of standardised *SPEC benchmarks* to generate the events log (.log) based on scenarios that are as realistic and representative as possible. The C++ model used has the advantage that its run time is shorter by a magnitude of up to two than a simulation of the SystemVerilog model. This speeds up the process significantly. Process flow for events list generation. An ongoing bachelor's thesis investigates which format is best suited to the definition and output of events that must be monitored in order to develop a compiler for the required SVAs that monitor and document the processes inside the processing unit. It will also examine which format is best suited to outputting events (.log) in order to make automated analysis possible. If it is possible to automatically document and analyse the status and timing of events for functional units of a microprocessor and to infer performance limitations from event correlations, it may be possible to report about unit interactions and thus about system performance. From this analysis, suggestions for improving the implementation or even the architecture itself may be derived. # Academic Teaching #### SystemVerilog for design Elective subject SystemVerilog is a standard hardware description language (HDL) for digital systems. It can be used for design and verification. The course 'SystemVerilog for design' introduces to students how to implement digital designs with SystemVerilog. Students will be using the Cadence development environment in practical exercises. Cadence is widespread in industry and considered quasi-standard next to the development environment provided by Synopsys. The next goal is to also use the Synopsys environment to introduce students to both environments, thereby preparing them for their professional careers in ASIC design in the best possible way. Besides that, Synopsys provides a commercially available design environment for integrated Silicon Photonic components. The course encompasses lectures and lab units on ASIC development, hardware description at register transfer level, hardware synthesis, static timing analysis as well as engineering change orders. The course is worth six ECTS credits and has been designed for online teaching. The necessary licences for both development tools are available as part of an educational licensing agreement. The course was suggested to the study commission of the Faculty of Electrical Engineering to be offered in the relevant study programmes from summer term 2021. http://www-home.htwg-konstanz.de/~MFERTIG/pages/sv4syn.html #### Programming lab ,Electromagnetic Simulation<sup>6</sup> Elective subject Algorithms for computing electromagnetic field distribution and propagation in system models are essential to develop and optimize optical and photonic components. Correctness, run time and memory requirements are key when it comes to usability of such simulators. The algorithms used are based on the approximation techniques of the underlying partial differential equations through finite differences (FDTD algorithm) or by using Fourier theory (BPM, WPM, VWPM, RCWA algorithm). In the <u>Programming lab 'Electromagnetic Simulation'</u>, students will learn to implement known Fourier methods BPM, WPM and VWPM on microprocessors (CPU) or massively parallel graphics processing units (GPU). A programming environment will be made available which contains one file each with pre-defined prototype functions for two-dimensional and three-dimensional implementations of scalar and vectorial methods. Students will use these files to implement a Fourier method of their choice. The environment is capable of simulating the propagation of scalar or vectorial electromagnetic fields through two-dimensional or three-dimensional models built from various optical components. The <u>programming environment</u> contains a pre-compiled library with reference code to automatically analyse the student program code for correctness and deviation, run time and memory requirements. The environment provides a detailed report to simplify code debugging and make the course a successfull experience for students by providing immediate feedback about the individual performance in line with the chosen level of difficulty (algorithm and programming device). This helps students to properly assess the individual performance. The course may thus be taught at bachelor's and master's level. The course is worth three ECTS credits. It has been designed as a lab course and can be completed online. The course was suggested to the study commission of the Faculty of Electrical Engineering to be offered in the relevant study programmes from summer term 2021. #### Master's theses [2] N. Weiher, RISC-V Load-Store unit and the virtual memory system design, master's thesis, Institute for Computer Engineering at Heidelberg University, secondary examiner, ongoing. [1] C. Melzer, Design and Verification of a Parameterizable Posit Unit with Fused Multiply-Add and Quire Support, master's thesis, Institute for Computer Engineering at Heidelberg University, secondary examiner, 09/2020. #### Bachelor's theses - [3] G. Knis, Using SystemVerilog Assserions for Performance Analysis of microprocessors, bachelor's thesis, primary examiner, ongoing. - [2] K. Eberts, Systematische Betrachtung zur automatisierten Parameterverteilung in Netztopologien, bachelor's thesis, primary examiner, 06/2020. - [1] O. Höldin, *Implementierung einer Sensorsteuerung über LAN-Anbindung*, bachelor's thesis at Airbus Defense and Space Gmbh, Immenstaad, primary examiner, 04/2020. #### Student projects - [3] C. Casagranda, Implementierung des BPM-Algorithmus auf einer GPU mit CUDA, student project, ongoing. - [2] L. Strobel, Laufzeitptimierung des BPM-Algorithmus, z.B mit dynamischem Programmieren, student project, ongoing. - [1] G. Knis, Setting up a performance analysis process with SystemVerilog Assertions, student project, 10/2020. #### Internships - [4] D. Brunner, Einbindung eines Referenzsensor in die Sensorproduktion, Bosch GmbH, 08/2020 - [3] L. Fuchs, Einarbeitung in Innovation Labs und Entwurf einer Angebotsstruktur für Innovation Lab as a Service, Novatec Consulting GmbH, 03/2020. - [2] L. Strobel, Visualisierung eines MEMS und Fehlererkennung durch CRC, Bosch GmbH, 02/2020. - [1] M. Heim, Sicherstellung einer langfristigen Ersatzteilversorgung von Zellmodulen am Beispiel des Audi etron, Audi AG, 02/2020. #### List of courses on offer <u>Compulsory subjects:</u> Digital Engineering [<u>w3</u>], Electrical Engineering [<u>w3</u>], Electrical Engineering Lab, Electrical Engineering for incoming students <u>Elective subjects:</u> Programming lab ,Electromatic Simulation' (start 2021, [w3]), SystemVerilog for design (start 2021, [w3]), Potonics Lab (since 2018, [w3]), Optics and Photonics (since 2016, [w3]) #### **Publications** M. Fertig, "Universal Memory Automaton and Automated Verilog HDL Code Generation for a Cache Coherency Snooping Protocol", MPC-WORKSHOP FEBRUAR 2020, [pre-print] #### Third-party funding [1] MPC-Group Baden Württemberg confirmed cost coverage of 11.000 €, 06/2020. #### **Excursions** [1] Quantum Computer, IBM Zürich Research Laboratory, Rüschlikon (CH) for crackerjacks of the course Digital Engineering, winter term 2019, (cancelled due to Corona pandemic) Konstanz, January 2021 Text und Layout: Prof. Dr. Matthias Fertig Translation: Dr. Tullia Giersberg