## Annual Report by Prof. Dr. Matthias Fertig



# 2021

This annual report provides an overview of my activities in academic teaching, research and development at HTWG Konstanz - University of Applied Sciences.

#### Preamble

- p. 1 Universal Memory Machines for SystemVerilog
- p. 2 Runtime Optimizations of Electromagnetic Fourier Simulators
- p. 3 Energy Conservation of Electromagnetic Fourier Simulators
- p. 4 FPGA Board for the Development of Digital Circuits
- p. 5 Miscellaneous

# Preamble

Dear readers,

The year 2021 too was characterized by online teaching, and many students benefited from not having to look for and rent a flat - which without a doubt poses a problem in Konstanz. I remember one student who was very happy to attend classes online, since this meant that he could also help out in his parents' business while pursuing his studies. Another student attended lectures from his car while others still organized themselves in small groups. What started as a makeshift solution turned into a welcome and flexible way of studying which is becoming increasingly desirable and which modern universities are expected to provide. Especially during the early stages of a degree programme, where communication of knowledge is more important than scientific discourse, online teaching is ideally suited to provide new students with a location-independent means of orientation and facilitate a decision in favour of or against a particular university or study programme. I did not notice any of the anonymity that is often used as an argument against online teaching, and the much invoked student life on campus does not take place DURING lectures. Today, students are used to communicating and organizing themselves via social media. Due to advances in digitization, the ability to negotiate various digital media and virtual working environments is becoming indispensable in professional contexts, too. It is what employers expect in addition to any subject-specific qualifications.

With the summer term over, I transitioned into a "Freistellunssemester" (leave of absence) in September. Thank you very much to all of my colleagues who took over my classes in the 2021 winter term. Thank you also to the Faculty and university's Executive Board for supporting and approving my application. This leave of absence came at an opportune moment from an organizational point of view since dropping student numbers across Germany in electrical engineering and information technology subjects have meanwhile been making a lasting impact on the Department of Electrical Engineering and Information Technology. Ultimately, my semester off was universally welcomed.

Since the Executive Board and Faculty firmly and emphatically impressed upon me that I was not to use my time off to improve on existing courses or to develop new ones – I was told that this was to be done during regular duty hours – I was unfortunately unable to update some of my courses as previously planned or complete preparations for my elective subject on *SystemVerilog for Verification*. This is unfortunate, especially since updated content and teaching protocols as well as new courses ensure that study programmes remain up-to-date and attractive for new students.

The activities during my research semester included:

- ► further developing the Universal Memory Automata (UMA) for SystemVerilog,
- ► putting into operation an FPGA board (externally funded by the MPC group),
- ▶ implementing the BPM and WPM on a massively parallel hardware (GPU),
- ► developing and implementing single-thread optimizations for the BPM and WPM,
- ► investigating and optimizing energy conservation for the VWPM.

Please read on to learn more about what I did in 2021, both during my semester off and during my regular duty hours. Enjoy!

Sincerely yours, Matthias Fertig

# Universal Memory Automata (update)

#### with SystemVerilog

Digital Engineering The Universal Memory Architecture (Fertig 2020, pdf) is implemented using Universal Memory Automata. Previously, the hardware description language Verilog was used to that end. Thanks to the updated VERIGEN software package [tgz] it is now possible to implement Universal Memory Automata using SystemVerilog. This involves a simple formalism that is composed of expanded components of a Finite Determinate Automaton and uses language constructs from SystemVerilog [w3]. VERIGEN creates synthesizeable Verilog code. Compared to the previous version, this software version offers advanced options for the specification and configuration of memories, state transitions, hierarchies and so on. The package now further utilizes the free VERILATOR simulation environment [w3], thus offering the means for simulation and verification. The environment gives several examples that demonstrate the versatility of the VERIGEN tool.

| Öffnen 🔻 🛛 🛃 | Ð.                      |            | twe.cfg<br>-/per(/verigens//TWE_package                                                                |
|--------------|-------------------------|------------|--------------------------------------------------------------------------------------------------------|
| transitions  | _                       |            |                                                                                                        |
| / IDLE tran  |                         |            |                                                                                                        |
| DLE:         | DO IDLE;                | IDLE:      |                                                                                                        |
| DLE;         |                         |            |                                                                                                        |
|              | DO_LOCK;                | LOCKED;    | ;                                                                                                      |
| / Locked tr  |                         | LOCKED     |                                                                                                        |
| OCKED;       | <pre>!DO_UNLOCK;</pre>  | LOCKED;    | ;                                                                                                      |
| OCKED;       | DO_UNLOCK;              | IDLE;      | ;                                                                                                      |
|              | k transitions           |            |                                                                                                        |
|              | evel-1 to Level-1       |            |                                                                                                        |
| DLE;         | START_TWE;              | PTE0;      | <pre>twe_active=1'b1;</pre>                                                                            |
| TE0;         | <pre>!NEXT_LEVEL;</pre> | PTE0;      | <pre>twe_active=1'b1/?DATA_TO_MEM_LVL0/data_to_memory_vld=1'b1; data_from_tlb=POP_VA48_CAM(1'b1,</pre> |
|              |                         |            | /A48_PTE_PAGE_BASE_ADDR_WIDTH{1'b0}})                                                                  |
| TE0;         | IS_TLB_HIT;             | DONE;      | twe_active=1'b1/?SET_PHYS_ADDR_OUT_FROM_TLB/phys_addr_out_vld=1'b1;                                    |
| TE0;         | DO_IDLE;                | IDLE;      | ;                                                                                                      |
| TEO;         | NEXT_LEVEL;             | PTE1;      | <pre>twe_active=1'b1/?DATA_TO_MEM_LVL1;</pre>                                                          |
| / Level-1 t  | o Level-2               |            |                                                                                                        |
| TE1;         | <pre>!NEXT_LEVEL;</pre> | PTE1;      | <pre>twe_active=1'b1/?DATA_TO_MEM_LVL1/data_to_memory_vld=1'b1;</pre>                                  |
| TE1;         | NEXT_LEVEL;             | PTE2;      | <pre>twe_active=1'b1/?DATA_TO_MEM_LVL2;</pre>                                                          |
| PTE1;        | DO IDLE;                | IDLE;      |                                                                                                        |
| / Level-2 t  | o Level-3               |            |                                                                                                        |
| TE2;         | !NEXT_LEVEL;            | PTE2;      | twe_active=1'b1/?DATA_TO_MEM_LVL2/data_to_memory_vld=1'b1;                                             |
| TE2;         | NEXT LEVEL;             | PTE3:      | twe active=1'b1/?DATA TO MEM LVL3:                                                                     |
| TE2:         | DO IDLE;                | IDLE;      |                                                                                                        |
| / Level-3 t  |                         |            |                                                                                                        |
| TE3:         | INEXT LEVEL:            | PTE3;      | twe active=1'b1/?DATA TO MEM LVL3/data to memory vld=1'b1:                                             |
| TE3:         | NEXT LEVEL;             | WRITE TLB; | twe active=1'b1:                                                                                       |
| TE3:         | DO_IDLE;                | IDLE;      |                                                                                                        |
| / WriteTLB   |                         | ,          | ,                                                                                                      |
| IRITE TLB:   | !DO_IDLE;               | DONE:      | twe active=1'b1/?SET PHYS ADDR OUT/phys addr out vld=1'b1; PUSH VA48 CAM(1'b1,                         |
|              |                         |            | ata from memory[VA48 PTE PAGE BASE ADDR WIDTH-1:0]})                                                   |
| RITE TLB;    | DO IDLE:                | IDLE;      | · · · · · · · · · · · · · · · · · · ·                                                                  |
| / Done to I  |                         | LULL,      | ,                                                                                                      |
| ONE:         | !DO_IDLE;               | DONE;      |                                                                                                        |
| ONE:         | DO IDLE;                | IDLE;      | 3                                                                                                      |
| /transition  |                         | IDLE;      | ,                                                                                                      |
| / cransttton | 52                      |            |                                                                                                        |
|              |                         |            | Reiner Text 🔻 Tabulatorbreite: 8 👻 Z. 95, Sp. 13 💌 F                                                   |
|              |                         |            | Reiner Text 👻 Tabulatorbreite: 8 👻 Z. 95, Sp. 13 💌 E                                                   |

Extract from the UMA configuration file for a 3-level Table-Walk-Engine with TLB. The text shows the state transitions for TLB read access, Table Walk at TLB miss, and TLB write access with the physical address. (Development time until first simulation: approx. 1 day).

The following simple examples are available for download [w3]:

- ► Return Address Stack
- ► Dual-Port Queue with alternating write and parallel read access



Simulation of a 3-level Table-Walk-Engine with Translation-Lookaside Buffer. Developed with Universal Memory Architecture. The point in time of a TLB hit is highlighted.

- ► Parametrized Content Addressable Memory
- ► Parametrized 3-level Table-Walk-Engine w/ TLB
- ► Cache Coherency Engine (MESI)
- ► Parametrized 1st Level Cache (Intel Opteron)

All examples have what is known as a key-lock mechanism which can be linked to the process ID. for instance, to enable exclusive access to hardware instances.

## **Runtime Optimization**

#### for the Beam and Wave Propagation Methods

The Fourier methods BPM (Feit & Fleck 1978), WPM (Brenner & Singer 1993) and VWPM (Fertig & Brenner 2010) offer several advantages in terms of runtime and memory requirements over other methods such as RCWA or what is known as rigorous methods like FDTD. This is why these methods are used to simulate electromagnetic field distributions in complex systems (e.g. micro optics). Because very many parameters need to be optimized during the development of





such systems, it makes sense to optimize runtime to be able to carry out the required simulations as fast as possible. One way of optimizing runtime is by means of implementation on a massively parallel hardware, for instance a graphics board. To achieve good results with this setup, the algorithm must have parallelizable sequences. While optimizing runtimes for the scalar BPM and WPM, the parallel implementation is compared to different optimizations of the single-thread implementation. The programming environment is available online as a programme library [w3]. It further offers the means to design one's own algorithms and optimizations [tgz]. To this end, a complex benchmarking environment is provided.



The environment enables users to test their algorithms, to configure the appropriate benchmarks to match their individual requirements, to automatically test the standard algorithm results for correctness and to compare runtimes with the different optimization levels. The optimization levels for BPM [PDF] and WPM [PDF] are documented in detail in two technical reports.

Runtime behavior of the scalar Wave Propagation Method for inhomogenous index distributions with and without runtime optimization, compared to an implementation on a massivley parallel system (GPU).

- M. Fertig, "Runtime Optimizations for the Split-Step Beam Propagation Method", Technical Report, August 2021, unpublished, [PDF]
- [2] M. Fertig, "Runtime Optimizations for the Wave Propagation Method", Technical Report, Octuber 2021, unpublished. [PDF]

## **Energy Conservation**

### for the Vector Wave Propagation Method

Cioncs and Bhoronics The Vector Wave Propagation Method (Fertig & Brenner 2010) extends the WPM (Brenner & Singer 1993) to vector waves. The originally unidirectional algorithm was extended for bidirectional propagation and evanescent waves (Fertig 2011). Like the BPM and WPM, the VWPM shows a violation of energy conservation in certain circumstances. In these scenarios, the correctness of the results is limited with regard to energy and performance. Energy conservation is a well-known issue in Fourier-based methods and has remained unresolved until now. It would be desirable to achieve both numerical stability of the method and correct simulations results for certain applications.



Instability of the z-component of the electic field at the boundary of the wave guide, caused by floatingpoint arithmetic.

Interpreting the correctness of the results is challenging since Fourier Theory cannot always be reconciled with optical theory. This affects evanescent modes in particular and gives rise to the assumption that it is impossible to achieve correct results for all of the investigated index distributions. A short presentation of the contents and results is available for download [PDF]. The programmes and benchmarks which were developed for these studies are available as programme libraries and scripts [tgz].

My work on the conservation of energy analyzes the circumstances in which a violation of the law of energy conservation occurs and examines the following methods for achieving numerical stability as well as their impact on the correctness of energy flux.

- ► Approximation of the evanescent boundary.
- ► modeling waves in evanescent modal spaces,
- accounting for lateral field dependence,
- anti-aliasing filter, average value and Gauß filters,
- static and adaptive definition of the modal space.

The results show that it is possible in principle to numerically stabilize the flow of energy.



Stabilized energy flux for waveguides with an index contrast ranging from 1% to 100%. Energiefluss für Wellenleiter mit Indexkontrast zwischen 1% und 100%. The waveguide is traversed by a plane wave.

The environment provides the possibility to freely combine optimization approaches and develop one's own algorithms. The included benchmark allows for any combination of previously investigated measures (see above) to uphold the law of energy conservation.

[1] M. Fertig, "Conservation Law and the VWPM", Presentation, December 2021, unpublished, [PDF]

# Activating the FPGA Board

to implement and verify digital circuits



The Xilinx KCU-105 Evaluation Board.

A Board its In March 2021 I received the XILINX KCU-105 Evaluation Kit, financed by the MPC group, and a desktop computer. Thanks again to the MPC group for providing financial support! Activating the board had to wait until midyear due to the ongoing semester.

In a next step it is planned to synthesize the Universal Memory Architecture (UMA) [w3] for the FPGA and verify the implemented function on-site.

There will be several bachelor's or master's theses on this topic. The midterm goal is to implement many of the components of a RISC processor with the Universal Memory Architecture and to verify them on the FPGA. The intention is to give flexibility to the development approach and, in the form of the resulting RISC processor, to develop a vehicle for research and teaching. Especially promising in this regard seems to be the investigation of hardware accelarators which are integrated in the processor pipeline for computation-intensive and/or networked applications [PDF].

# Master's theses

Miscellaneous [3] N. Weiher, RISC-V Load Store Unit and the Virtual Memory System Design, master's thesis at the Institute of Computer Engineering of Heidelberg University, second reviewer, 04/2021.

[2] T. Bühler, Pipeline Control, Scoreboard and Commit Strategy of a RISC-V Microprocessor, master's thesis at the Institute of Computer Engineering of Heidelberg University, second reviewer, 04/2021.

[1] J. Philipp, Entwicklung einer Instruction-Fetch-Einheit mit Sprungvorhersage für einen RISC-V Prozessor, master's thesis at the Institute of Computer Engineering of Heidelberg University, second reviewer, 04/2021.

# Practical semester

[4] A. Büsra, Organisation, Durchführung und Dokumentation von Fahrzeugtests für Pressefahrzeuge. Mercedes-AMG GmbH. 08/2021.

[3] L. Fuchs, Beurteilung von Industrieumrichtern der Leistungsklasse 0.75kW bis 75kW, Lenze Schmidhauser, 08/2020.

[2] V. Herzog, Ermittlung des Marktpotentials für Logistiklösungen mit Taschensortern, EMHS GmbH. 08/2021.

[1] M. Kromer, Prüfverfahren zur Schnelltestauswertungen mit Lateral Flow und Disk Reader, DIALUNOX GmbH, 08/2020

# Courses

Basic subjects: Digital Engineering [w3], Electrical Engineering 1 [w3], Electrical Engineering Lab [w3]

Elective subjects: Electromagnetic Simulation Lab [<u>w3</u>], ASIC design with SystemVerilog [<u>w3</u>], Photonics Lab [<u>w3</u>], Optics and Photonics [<u>w3</u>]

# Publications, work reports & presentations

M. Fertig, "Conservation Law and the VWPM", Presentation, 12/2021 [pdf]

M. Fertig, "Runtime Optimizations for the Wave Propagation Method", Work Report, 10/2021 [pdf]

M. Fertig, "Runtime Optimizations for the Beam Propagation Method", Work Report, 8/2021 [pdf]



Konstanz, January 2022

Text and Layout:Prof. Dr. Matthias FertigTranslation;Dr. Tullia Giersberg