Received 6 April 2019; revised 1 June 2019; accepted 1 August 2019. Date of publication 8 August 2019; date of current version 11 December 2019.

Digital Object Identifier 10.1109/JPETS.2019.2933250

# Multi-Rate Mixed-Solver for Real-Time Nonlinear Electromagnetic Transient Emulation of AC/DC Networks on FPGA-MPSoC Architecture

# TONG DUAN<sup>®1</sup> (Student Member, IEEE), ZHUOXUAN SHEN<sup>®2</sup> (Member, IEEE), AND VENKATA DINAVAHI<sup>®1</sup> (Senior Member, IEEE)

<sup>1</sup>Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada <sup>2</sup>Department of Electrical Engineering, Tsinghua University, Beijing 100084, China CORRESPONDING AUTHOR: T. DUAN (tduan@ualberta.ca)

This work was supported in part by the Natural Science and Engineering Research Council of Canada (NSERC) and in part by the China Scholarship Council (CSC).

**ABSTRACT** Nonlinear phenomena widely exist in AC/DC power systems, which should be accounted for accurately in real-time electromagnetic transient (EMT) simulation for obtaining precise results for hardwarein-the-loop applications. However, iterative solutions such as the Newton-Raphason method that can precisely obtain the results for highly nonlinear elements, are time consuming and computationally onerous. To fully utilize the time space and optimize hardware computation resources without loss of accuracy, this work proposes a novel multi-rate mixed-solver for AC/DC systems, wherein both iterative and non-iterative solvers with different time-steps are applied to the decomposed subsystems, and the linear solvers are reused within each time-step. The proposed solver and the complete real-time emulation system are implemented on FPGA-MPSoC platform. The real-time results are captured by the oscilloscope and verified with PSCAD/EMTDC and SaberRD for system-level and device-level performance evaluation.

**INDEX TERMS** Electromagnetic transient analysis, field-programmable gate arrays, iterative scheme, multi-processing system-on-chip, multi-rate systems, nonlinear elements, parallel processing, power system simulation, real-time systems.

# I. INTRODUCTION

Nolinearities exist in a wide range of elements in power systems and power electronics circuits, such as the nonlinear v - i characteristic of protective surge arresters, magnetic saturation and hysteresis of transformers, and nonlinear switching phenomena of power converters and power electronics devices. These nonlinearities have significant impact on the system response during normal operation and under faulted conditions. Real-time electromagnetic transient (EMT) simulation is frequently used in hardware-in-the-loop test, which is an efficient and reliable method for verifying the control and power equipment before commissioning [1]–[4]. To precisely emulate the characteristics of the system-wide nonlinear elements, iterative schemes, such as the Newton-Raphson (N-R) method [5], are required for the solution of nonlinear impedance in the conductance matrix

resulting from nodal formulation of system equations. However, the iterative solver can be computational intensive and more time-consuming than the non-iterative linear solver in EMT simulation for large systems.

In modern power systems, the high voltage AC and DC transmission networks co-exist, wherein both may contain nonlinear elements [6]. The hardware emulation of solving nonlinear elements has been evaluated in [7], which provides the nonlinear solver for this work. Since iterative solution of the whole system may involve extremely intensive computation, the complete system can be decomposed into multiple smaller subsystems using the traveling wave latency on transmission lines due to the existence of long-distance distributed transmission lines. The location and the contained nonlinear elements can vary for different subsystems. In fact, there is no need to apply the same step-size for all subsystems [8]

since the size of the simulation time-step is dependent on the changing rate during the transient in a certain subsystem and the accuracy requirement. Therefore, this work proposes a novel multi-rate mixed-solver (MRMS) for the real-time EMT simulation of large-scale AC/DC grids, in which both the iterative solver for nonlinear elements and the conventional non-iterative solver are applied for different subsystems, and the time-step can be different among subsystems to achieve optimum performance for the overall accuracy and computation hardware resource consumption. Different from the variable time-stepping simulation [9] that changes the time-step over the simulation time, the time-step of MRMS is fixed for each subsystem, which is beneficial to the hardware implementation and reuse of linear solvers.

The proposed solver and the complete emulation system are implemented on a hybrid Xilinx<sup>®</sup> UltraScale+ fieldprogrammable gate array (FPGA) [10] and Xilinx<sup>®</sup> Zynq UltraScale+ multi-processing system-on-chip (MPSoC) [11] device. The MPSoC itself integrates the programmable logic (PL) resource and the ARM<sup>®</sup> multi-core processor system (PS) on the same chip. Compared with the solution of using discrete CPU and FPGA on different boards, the singlechip solution provides substantially higher communication bandwidth and coherence between the PS and the PL. The improved overall performance of both sequential and parallel computing by using FPGA-MPSoC platform enables the usage of the iterative method and the detailed models applied in this work. The advantages and features of the proposed multi-rate mixed-solver are the following:

- 1) decomposing the system to apply iterative schemes locally to reduce the computational effort;
- applying multiple time-step sizes for different subsystems according to the accuracy requirements;
- 3) reusing the linear solver among subsystems to reduce the hardware resource consumption.

In this work, an AC/DC system test case composed of two IEEE 39-bus systems and a three-terminal high voltage DC transmission (HVDC) system, is emulated in real-time. The simulation results are captured by the oscilloscope and compared with PSCAD/EMTDC<sup>®</sup> and SaberRD<sup>®</sup>. The rest of the paper is organized as follows: Section II introduces the multi-rate mixed-solver method and implementation principles. Section III presents the modeling scheme of the nonlinear elements applied in this work. Section IV introduces the FPGA and MPSoC devices and the hardware implementation of the case study. Section V presents the simulation results and the validation followed by the conclusion in Section VI.

# II. PROPOSED MULTI-RATE MIXED-SOLVER FOR EMT SIMULATION

In the real-time EMT simulation, the size of simulation timestep is an essential variable that directly determines the timestep dependent parameters and influences the element model selection and computational resource costs. Since the timestep requirements can vary between different subsystems, the multi-rate mixed-solver for real-time EMT simulation is



FIGURE 1. Decomposing a network into separated pure linear and nonlinear network.

proposed to reduce the hardware resource costs and improve the overall accuracy.

## A. MULTI-RATE MIXED-SOLVER

Typically, by applying the KVL and KCL to the network to be solved, the network equation can be derived for timediscretized EMT simulation, which is expressed as follows:

$$\mathbf{Y}\mathbf{v} = i_{eq} \tag{1}$$

where **Y** is the network conductance matrix,  $i_{eq}$  is the equivalent injected current source vector that changes at every timestep, and v is the unknown nodal voltage vector to be solved. For networks that only contain linear elements, **Y** is constant over simulation time. However, if the networks contain non-linear elements, **Y** may change during the simulation process. In such a case, the network can be decomposed into linear and nonlinear networks, in which the linear network only contains linear elements and leave the nonlinear elements as open-circuits, while the nonlinear network only contains nonlinear elements and leave the linear elements as open-circuits, as shown in Fig. 1. The current  $i_c = [i_1, i_2, ..., i_n]^T$  flows from the linear network to the nonlinear network.

The linear network can be solved as:

$$\mathbf{Y}_l v = i_{eq,l} - i_c \tag{2}$$

where  $\mathbf{Y}_l$  and  $i_{eq,l}$  are the linear network conductance matrix and equivalent injected current source vector only considering linear elements.

Nonlinear elements in the nonlinear network can be represented by piecewise linearization [12], N-R, or compensating current source methods [13]. The piecewise linear method uses piecewise linear segments to approximate nonlinear i-vfunctions, wherein the segment of next time-step is determined by the voltage of previous time-step, which may induce the overshoot problem. The N-R method can provide more accurate results by iteratively calculating the conductance matrix within each time-step, which is essential to sensitively respond to system changes. In this work, the N-R method is applied:

$$\mathbf{G}_{nl}(v^{k})v^{k+1} = i_{eq,nl}(v^{k}) + i_{c},$$
(3)

where k is the iteration number,  $v^k = [v_1^k, v_2^k, ..., v_n^k]^T$  is the results of  $k^{th}$  iteration,  $\mathbf{G}_{nl}$  and  $i_{eq,nl}$  are the Jacobian matrix representing conductance and equivalent injected

current source vector only considering nonlinear elements, given by:

$$\mathbf{G}_{nl}(v^{k}) = \begin{bmatrix} \frac{\partial f_{1}(v_{1})}{\partial v_{1}} |_{v^{k}} & \frac{\partial f_{1}(v_{1})}{\partial v_{2}} |_{v^{k}} & \cdots & \frac{\partial f_{1}(v_{1})}{\partial v_{n}} |_{v^{k}} \\ \vdots & \vdots & \vdots \\ \frac{\partial f_{n}(v_{n})}{\partial v_{1}} |_{v^{k}} & \frac{\partial f_{n}(v_{n})}{\partial v_{2}} |_{v^{k}} & \cdots & \frac{\partial f_{n}(v_{n})}{\partial v_{n}} |_{v^{k}} \end{bmatrix}.$$
(4)

where the function  $f_i(v_i)$  represents the nonlinear i - v characteristics for node voltage  $v_i$ . Then the iterative matrix equation for solving the nonlinear network can be derived from (2) and (3) by eliminating  $i_c$ :

$$[\mathbf{Y}_l + \mathbf{G}_{nl}(v^k)]v^{k+1} = i_{eq,l} + i_{eq,nl}(v^k)$$
(5)

Since the iteration times are uncertain and the conductance matrix could be re-factorized, the N-R method could consume more time and resource than piecewise linear method. Thus, it is extremely hard to apply N-R method in large AC/DC systems where the matrix size is large. However, since transmission lines widely exist in AC/DC systems and the line length is usually sufficiently long to guarantee the traveling time is longer than the simulation time-step, the large AC/DC network can be decomposed into *m* subsystems using the traveling-wave line model or frequency-dependent line model (FDLM), as shown below:

$$\begin{bmatrix} \mathbf{Y}_{11} & 0 & \cdots & 0 \\ 0 & \mathbf{Y}_{22} & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & \mathbf{Y}_{mm} \end{bmatrix} \begin{pmatrix} v_{S1} \\ v_{S2} \\ \vdots \\ v_{Sm} \end{pmatrix} = \begin{pmatrix} i_{eq,S1} \\ i_{eq,S2} \\ \vdots \\ i_{eq,Sm} \end{pmatrix}$$
(6)

where  $\mathbf{Y}_{ii}$  is the conductance matrix of subsystem  $S_i$ ,  $1 \leq S_i$  $i \leq m$ . Assume the first  $m_l$  subsystems are linear networks, and the last  $m_{nl}$  subsystems are nonlinear. These subsystems can be solved concurrently within each time-step. The linear solver only involves one process of solving the matrix equations, whereas the nonlinear solver may need several iterations of such process, which could take several times the latency of processing than that of the linear solver. Therefore, during the processing of nonlinear solver iterations in each time-step, there will be much idle time for linear solver, and as a result, there will be a lot of hardware resource wasted. On the other hand, the transient behaviors of subsystems where transients such as lightning and switching occur need to be adequately modeled and precisely revealed, while the subsystems distant from the transients are only slightly affected by them and thus they do not need very small timestep to capture the system behavior.

Based on the above observations, the multi-rate mixed-solver is proposed: to ensure high accuracy, both the iterative solver for nonlinear elements and the conventional non-iterative linear solver are applied for different subsystems; and to reduce computation resource consumption, the multiple time-step scheme is used and carefully designed for different subsystems. The proposed multi-rate mixedsolver can be formulated as follows:

$$\mathbf{Y}_{ii} v_{Si}^{\Delta t_{l}(i)} = i_{eq,Si}^{\Delta t_{l}(i)}, 1 \leq i \leq m_{l}$$
(7)
$$\mathbf{Y}_{ii} (v_{Si}^{k,\Delta t_{nl}(i)}) v_{Si}^{k+1,\Delta t_{nl}(i)} = i_{eq,Si}^{\Delta t_{nl}(i)} (v_{Si}^{k,\Delta t_{nl}(i)}),$$

$$\mathbf{m}_{i} + 1 \leq i \leq m$$
(8)

where

1

$$\mathbf{Y}_{ii}(v_{Si}^{k,\Delta t_{nl}(i)}) = \mathbf{Y}_{l,i} + \mathbf{G}_{nl,i}(v_{Si}^{k,\Delta t_{nl}(i)})$$
(9)

$$i_{eq,Si}^{\Delta t_{nl}(i)}(v_{Si}^{k,\Delta t_{nl}(i)}) = i_{eq,l,Si}^{\Delta t_{nl}(i)} + i_{eq,nl,Si}^{\Delta t_{nl}(i)}(v_{Si}^{k,\Delta t_{nl}(i)})$$
(10)

$$\Delta t_l(i), \, \Delta t_{nl}(i) \in \{\Delta t_j | 1 \leqslant j \leqslant p\}$$
(11)

Equation (11) denotes that there are *p* different time-steps  $(\Delta t_1, ..., \Delta t_p)$  applied, and subsystem  $S_i$  is assigned time-step  $\Delta t_l(i)$  or  $\Delta t_{nl}(i)$  depending on linear or nonlinear systems. Equations (9) and (10) have the same form as the derived iterative matrix equation (5). After each time-step, the results may need to be exchanged between connected subsystems, thus interpolation is required if the two subsystems use different time-steps. For example, if subsystem  $S_i$  needs the results  $v_{Sj}^{\Delta t_l(j)}$  at simulation time *t* (*t* is exactly integer multiple of  $\Delta t_l(i)$ ) from subsystem  $S_j$  into the data for its own use. In this work, linear interpolation is used:

$$v_{Sj}^{\Delta t_l(j)} \mid_t = v_{Sj}^{\Delta t_l(j)} \mid_{t_1} + \frac{t - t_1}{\Delta t_l(j)} (v_{Sj}^{\Delta t_l(j)} \mid_{t_2} - v_{Sj}^{\Delta t_l(j)} \mid_{t_1}), \quad (12)$$

$$t_1 = rounddown(\frac{t}{\Delta t_l(j)}) \times \Delta t_l(j)$$
(13)

$$t_2 = roundup(\frac{t}{\Delta t_l(j)}) \times \Delta t_l(j)$$
(14)

For the case shown in Fig. 2 as an example, there are two time-steps applied (small time-step  $\Delta t^S$  and large time-step  $\Delta t^L$ ). Within one small time-step, nonlinear subsystem solvers (NSS) perform iterative calculations, while the linear subsystem solver (LSS) with small time-step is reused by subsystems  $S_1^S - S_h^S$  to fully use the time space; and within one large time-step, linear solvers with large time-step are reused



FIGURE 2. Illustration of the multi-rate mixed-solver simulation.

by subsystems  $S_1^L - S_k^L$  and the results at the end of small time-step can be obtained by interpolation between two large time-step results. After each time-step, results of the NSS and LSS with small time-step and the LSS with large time-step are outputted for display respectively and history items are exchanged between adjacent subsystems.

# B. SUGGESTED PROCEDURE FOR TIME-STEP SELECTION

In the proposed multi-rate mixed-solver, the selection of timestep sizes and solver types for different subsystems should be carefully analyzed. Assume there are *m* subsystems **S**  $(S_1, S_2, ..., S_m)$ , and *p* different rates with time-step sizes of  $\Delta \mathbf{T}$  ( $\Delta t_1, \Delta t_2, ..., \Delta t_p$ ) to be selected. Other than the timestep size, reuse of the linear solver for multiple subsystems should also be evaluated. Let  $\mathbf{K} = (K_1, K_2, ..., K_q)$  denotes the used solvers including linear and nonlinear solvers, then the selection can be seen as a mapping  $g : \mathbf{S} \mapsto (\Delta \mathbf{T}, \mathbf{K})$ . The principle of time-step selection is to minimize the total cost including the accuracy and hardware resource consumption while guaranteeing the accuracy requirements, which can be formulated as follows:

$$\min C(g) = \sum_{i=1}^{m} \sum_{j=1}^{p} \sum_{k=1}^{q} [\alpha E(i, j, k) + \beta R(i, j, k)] \cdot g(i, j, k)$$
(15)

s.t. 
$$E(i, j, k) \cdot g(i, j, k) \leq E_{th, i}$$
 (16)

$$\sum_{i=1}^{m} g(i,j,k) \cdot t_k \leqslant \Delta t_j \tag{17}$$

where g(i, j, k) = 1 if  $S_i$  uses  $\Delta t_j$  as time-step, and is calculated by the solver  $K_k$ ; and otherwise g(i, j, k) = 0. E(i, j, k) and R(i, j, k) represent the simulation error and the corresponding resource cost respectively of subsystem  $S_i$  with time-step size of  $\Delta t_i$  using solver  $K_k$ , and they are both nonlinear functions of mapping g. Besides, as indicated in (16), E(i, j, k)g(i, j, k) should not be bigger than the pre-determined threshold error  $E_{th,i}$  of subsystem  $S_i$ , which means if E(i, j) is larger than  $E_{th,i}$  then g(i, j, k) should be equal to zero. Equation (17) indicates that the total calculating time of each solver selected by subsystem  $S_i$  (denoted as  $t_i$ ) should not exceed the selected time-step size, the summation sign means the reuse of solver is taken into consideration.  $\alpha$  and  $\beta$  are scaling factors that unify the two parts of cost. It also should be noted that the number of used solvers q is not a pre-determined constant but a variable of which the optimal value can be solved by (15). However, the equations above are just the principle for time-step selection, because the precise function of E(i, j, k) and R(i, j, k) can only be obtained by experiment and can vary between different systems and different implementation platforms.

## C. DATA-FLOW IN MRMS SIMULATION

The data-flow of the MRMS simulation is illustrated in Fig. 3. In the example, there are two rates with time-step of  $\Delta t_1$  and



FIGURE 3. Data-flow in the proposed multi-rate mixed-solver.

 $\Delta t_2$ , and three solvers named NSS1, LSS1 and LSS2. NSS1 is a nonlinear subsystem solver performing several iterations, after each iteration the voltage v and conductance matrix **G** is updated until  $|(v^k - v^{k-1})/v^k|$  is smaller than the predetermined threshold. LSS1~2 are linear solvers, and LSS1 is reused by subsystems  $S_1$ ,  $S_2$ , and  $S_3$ . For simplicity, Fig. 3 only illustrates the data exchange between subsystem  $S_1$  and  $S_4$ , and the other connections are omitted.

# 1) LINEAR SOLVER REUSE

Linear solver can be reused by subsystems only when they have the same size of input and output as well as the size of conductance matrix. The input contains conductance matrix, equivalent current source, and history values including the local element history items  $hist(t - \Delta t)$  and transmission line history items  $hist(t - \tau)$  maintained by the other side of the transmission line connection.

## 2) HISTORY VALUE UPDATE

History items can be partitioned into two types: local updated and remote updated. Local updated history items are handled inside each subsystem logic, which only computes and stores the values of last time-step. Remote history items are handled by the subsystem of the other side of transmission lines, and the history values should be stored more than just one item depending on how many times  $\Delta t$  is going to be  $\tau$  ( $\tau$  is the transmission delay through transmission lines). Since  $\tau$  is usually not just integer times of  $\Delta t$ , interpolation is introduced to obtain the approximate value of  $hist(t - \tau)$ .

#### 3) STATE CONTROL

All of the states are handled by the state controller, which can remove the state from solver and history item updating and enable the solver to be reused. There are also two types of states to be controlled: solver state and history value state. Solver state indicates which value should be inputted into the solver, and through precise solver state control the solver can be reused within one time-step. History value state indicates which value should be the right history value to be inputted into solver. For example, as shown in Fig. 3, at the time of  $t_1$ , the conductance matrix, equivalent source current and history

value  $hist(t - \Delta t)$  of subsystem  $S_1$  should be inputted into LSS1, while the history value  $hist(t - \tau_{1,4})$  of subsystem  $S_4$  should also be the input.

#### **III. POWER SYSTEM EQUIPMENT MODELS**

The most common types of nonlinear elements in power systems are AC/DC converters, synchronous generators, transformers with saturable inductive parts, and surge arresters with nonlinear resistances. Although transmission lines are strictly not nonlinear, they are also modeled since they are indispensable in a power grid.

# A. AC/DC CONVERTER AND POWER ELECTRONICS DEVICES

The common modular multi-level converter (MMC) structure [14] and its equivalent circuit based on the half-bridge submodule (HBSM) are illustrated in Fig. 4(a) $\sim$ (c). Each HBSM can be equivalenced to a cascaded resistance and voltage source, and the value depends on the switch state and the model applied.

#### 1) SYSTEM-LEVEL MODEL

In this model, the IGBT and diode switching transients are ignored while only the electrical model is presented. The terminal and internal dynamics of an individual HBSM depend on the arm current, capacitor voltage and the IGBTs operating state. The detailed equations for the equivalent circuit can be found in [15], [16].

## 2) DEVICE-LEVEL MODEL

This model focuses on the switching transients during turn-on and turn-off operations, and therefore requires smaller timesteps (typically smaller than  $0.5\mu$ s). The switches within each HBSM are equivalent to voltage sources or current sources depending on turn-on or turn-off operations [17], [18]. And based on the switch equivalence, the equivalent resistance and voltage source value of each HBSM can be derived.

The phase-disposition sinusoidal pulse width modulation (PD-SPWM) method [19] is adopted in this work for attaining desired voltage and power flow characteristics. By proper configurations the system-level controlled variables can be DC voltages or power flows.

## **B. FREQUENCY-DEPENDENT TRANSMISSION LINES**

The commonly used transmission line models in EMT simulation are the traveling-wave line model and frequencydependent line model (FDLM) [20]. The R, L, C, G line parameters in the traveling-wave line model are constant while in FDLM they are functions of frequency and thus can reproduce accurate line behavior over a wider transient range. In this work, the phase-domain FDLM model [21] is applied for all of the lines in the AC/DC grid. The equivalent circuit for transmission line is shown in Fig. 4(d), which divides the system into two correlated parts. The two endpoints of transmission line are interacted through the equivalent current source  $i_k^{hist}$  and  $i_m^{hist}$ , of which the values can be computed



FIGURE 4. Power system elements and equivalent circuit. (a) MMC structure; (b) HBSM structure; (c) System-level equivalent circuit for HBSM; (d) Equivalent circuit of transmission lines; (e) Equivalent circuit for synchronous machine; (f)  $\Psi - I$  characteristic and equivalent circuit of transformer; (g) V - I characteristic and equivalent circuit of surge arrester.

step-by-step. The detailed model representation and hardware emulation can be found in [22].

## C. SYNCHRONOUS MACHINES

The synchronous machine can work as Thévenin voltage source [23], [24] or Norton current source [25], [26] depending on the discretization method. To compare with PSCAD/EMTDC<sup>®</sup>, in this work, the Norton current source model for synchronous machine is utilized. The equivalent circuit and operating process for Norton current source calculation is shown in Fig 4(e). Since the current source



FIGURE 5. Topology of the AC/DC grid test case.

representation uses the terminal voltages at  $t - \Delta t$  to calculate the currents to be injected at time t, a terminating characteristic impedance  $r_0$  is always used to terminate the machine to the network. The exciter control system is attached with the machine to provide a feedback for the field voltage and make the machine model work stably.

# D. TRANSFORMERS AND SURGE ARRESTERS

In this paper, the classical compensation method [27] is used for simplicity while guaranteeing accuracy. Fig. 4(f) shows a typical  $\Psi - I$  curve for a modern high-voltage transformer and the compensating process for a two winding transformer, in which the compensating current source is determined by the  $\Psi - I$  curve function [26]. Since the nonlinear function is complex to implement in hardware, it is usually linearized into segments to approximately represent the curve when the time-step size is limited. Fig. 4(g) shows the V - Icharacteristic of the metal-oxide surge arrester [28] that is commonly used in modern power systems. The voltage region is divided into linear segments, resulting in an equivalent circuit composed of a voltage-dependent resistor and current source. To accurately calculate the location of segment within each time-step, the combination of N-R method and piecewise linear method is used in this paper. It is essentially an iterative method, but the nonlinear function is divided into linear segments instead of a continuous curve, which can achieve high accuracy while simplifying the computation process.

# IV. COMPREHENSIVE REAL-TIME EMULATOR IMPLEMENTATION

An interfaced FPGA-MPSoC platform has been developed for the real-time simulation of the test system. The FPGA-MPSoC hybrid hardware platform enables the integration of

188

FPGA and CPU resources and fast data exchange between boards via proper communication protocols.

# A. TEST SYSTEM

For thorough analysis of the proposed MRMS solver on realtime FPGA-MPSoC emulator, an AC/DC grid composed of two IEEE 39-bus systems [29] and a three-terminal HVDC system was chosen as the circuit topology, as shown in Fig. 5. In each IEEE 39-bus system, 10 generators, 12 transformers, 19 loads and 34 transmission lines are deployed, and the two IEEE 39-bus systems are connected by three AC/DC MMC converter stations that are connected with each other via 3 DC transmission lines. The control of converter C1 is used for DC voltage regulation, while in the converters C2/C3 the active power flow is chosen as the controlled variable. To protect generators, transformers, cables and other devices from overvoltages caused by lightning, short circuit, switching, etc, 6 surge arresters are also installed in the system.

# **B. FPGA-MPSoC IMPLEMENTATION**

The MPSoC ZCU102 board [11] used in this paper is featured with a quad-core ARM<sup>®</sup> Cortex-A53, dual-core Cortex-R5 real-time processors, and a Mail-400 MP2 graphics processing unit (GPU) based on programmable logic fabric. These processors (PS) communicate with the programmable logic (PL) using high-bandwidth Advanced eXtensible Interface (AXI) channels, enabling low-latency data exchange. The hybrid Virtex UltraScale+ FPGA VCU118 board [10] and MPSoC ZCU102 board platform enable the usage of the iterative method and the detailed models applied.

To extend the resource capacity for simulating the large system, the multi-board solution is applied in this work, as shown in Fig. 6, there are totally three FPGA/MPSoC boards (two VCU118 boards and one ZCU102 board) used



FIGURE 6. Hardware emulation of the case study on two FPGA boards and one MPSoC board.

to run the study case. ZCU102 board is the master board connecting with two VCU118 boards and sending instructions to control the behavior of the other two boards. The two VCU118 boards are slave boards, which start or stop to perform simulation under the instruction of the master board. The master MPSoC board has multiple processors which can be used to run sequential computing and state control, whereas the two slave boards have more hardware resources and more communication transceivers which enable larger subsystems to be simulated and faster data exchange between each other.

#### 1) SUBSYSTEM ALLOCATION

There are two IEEE 39-bus systems to be simulated, and each of them are allocated at one single VCU118 FPGA board for simplicity, which also reduces the amount of data to be exchanged between boards. The other three MMC converters are simulated in the ZCU102 MPSoC board to make full use of the APU resources for complex system-level control algorithms. The subsystem allocation for each board is shown in Fig. 5, which is determined by the specific circuit. For example, since almost every generator is connected with a transformer, it is beneficial for solver reuse if every generator-transformer pair is allocated to different individual subsystems, as marked as subsystem  $S_1 \sim S_6$ .

Due to the transmission lines between converters, the three MMC modules are also divided into three subsystems, each of which is composed of equivalent circuit calculation, value-level control and system-level control. Since the system-level control of MMC converter is sequentially calculated and may consume many hardware resources to execute, it is more efficient to implement the control logic including the inner loop and outer loop control in the PS part of MPSoC board. The computation of value-level control is more intensive than system-level control but the tasks can be well paralleled due to the independence of each SM, and thus are performed in the PL part.

## 2) MULTI-RATE MIXED-SOLVER

To perform the EMT simulation on FPGA board, the main complexity is contributed by solving the matrix equation. Firstly, the reuse of matrix solver is discussed. The conductance matrix of most subsystems can be divided into smaller matrices, for example, subsystem  $S_{11}$  contains 8 buses, which will generate a 24×24 matrix to be solved. However, considering the uncoupling function of the transmission line model, the 24×24 matrix can be divided into eight separated  $3 \times 3$  matrices and they can be solved by reuse of a  $3 \times 3$  linear solver (except for buses with surge arresters that require iterative solvers). The subsystems  $S_1 \sim S_6$  composed of a generator and transformer can not be divided, because the two sides of transformer are coupled and thus at least a  $6 \times 6$  matrix is generated. Therefore, the  $6 \times 6$  linear solver can be also reused among these subsystems. Subsystem  $S_7$  and  $S_8$  contain the largest matrix (9×9 and 12×12 respectively) and cannot share the solver with other subsystems, thus consume the longest time to finish calculation within each time-step.

Secondly, multi-rate with four time-step sizes of  $0.2\mu s$ ,  $5\mu$ s,  $10\mu$ s, and  $20\mu$ s is applied among subsystems. Based on the principles of time-step selection, the time-step of the subsystems where transients happen should firstly conform to (17), and then should be as small as possible to meet the accuracy requirements (16) since the hardware resource of VCU118 board is affluent. Therefore, the time-step of  $10\mu s$ is widely used in subsystems of the IEEE 39-bus, because the processing time of most subsystems is just less than 10 $\mu$ s. Subsystem  $S_1 \sim S_6$  can reuse the 6×6 solver between two subsystems to fully occupy the time space. The reuse of  $3 \times 3$  solver is also adopted in subsystem  $S_{11}$  to make full use of the time space when the iterative solver is dealing with buses containing surge arresters. The time-step of subsystem  $S_8$  and subsystem  $S_{12} \sim S_{14}$  (MMC converters) 20 $\mu$ s is set at  $20\mu s$ , by considering the large processing delay of the  $12 \times 12$  matrix solver, the complex control of MMC and the communication latency between boards. The time-step of  $5\mu s$ is applied only in subsystem  $S_{10}$  just for a demonstration of multi-rate, because subsystem  $S_{10}$  has the smallest scale and matrix size. The time-step of  $0.2\mu$ s is adopted for device-level simulation, and considering the limited hardware resources in the MPSoC board only the first SM of MMC C2 is simulated in device-level.

## 3) DATA EXCHANGE

The history values of the two ends of a transmission line are exchanged after each time-step and stored in a FIFO, and the FIFO shift based on the common time-step size, which is the minimum step-size for the system  $(5\mu s)$ . For example, if the time-step of a subsystem is  $10\mu s$ , then the shifted-register will shift 2 to store the data, which makes the same location of the memory in different subsystems store the data generated simultaneously.

If the two ends of a line are computed in different boards, the communication between interfaced boards should be designed. To enable high-speed communication between the three boards, lightweight communication protocol should be used instead of the common TCP/IP protocol that involves too much time cost during connection establishment. Xilinx<sup>®</sup> provides a scalable link-layer communication protocol, Aurora [30], which is open and supported by different type of transceivers such as GTX, GTY, and GTH transceivers. The Xilinx® aurora core can automatically initialize and maintain the channel, and the AXI-4 user interface enables users to conveniently generate and receive data without considering the transmission details and handling transmission states. The communication part of the implementation is shown in Fig. 6, the three boards are connected with each other via 2 lanes. After channel establishment, the aurora core reads data from the RAM and a 64b AXI-4 stream based data is generated by combining the 32b user data and 32b address together. The 32b address is used to identify the user data and put it into the right address of the RAM after receiving.

# V. RESULTS AND VERIFICATION

The example test case described in Section IV is emulated on the three FPGA/MPSoC boards and the results are compared with PSCAD/EMTDC<sup>®</sup> and SaberRD<sup>®</sup> to show the effectiveness of the proposed multi-rate mixed-solver. The APU cores of MPSoC board run at 1.2GHz, while the clock frequency of FPGA boards is set at 100 MHz.

# A. HARDWARE RESOURCE UTILIZATION AND LATENCY

According to the hardware implementation details and subsystem allocation described above, the system-level hardware resource consumption and time-step size are presented in Table 1, in which VCU118-1 represents the version that does not reuse the mixed-solver, and VCU118-2 represents the optimized cost by reusing the linear solvers. Since the two VCU118 boards have nearly the same cost by simulating the same size of circuit topology (IEEE 39-bus system), Table 1 only shows one of them. Four representative types of resources are recorded: lookup table (LUT), flipflops (FF), block RAM (BRAM), and digital signal processing unit (DSP). Through reuse of solver, the logic resource (mainly refers to LUT) cost can be reduced by about 11.3%, and the computing resource (mainly refers to DSP) cost can be reduced by about 13.1%.

The processing latency of different solvers and functions are recorded in Table 2, which indicates that processing latency varies between different subsystems and different solvers. For example, subsystem  $S_{11}$  contains nonlinear surge arresters and subsystem  $S_{10}$  only contains linear elements, although the matrix equations they need to solve are both  $3\times3$ , the average latency has a big difference because the iterative matrix solver consumes about five times latency of the non-iterative matrix solver averagely. Subsystem  $S_8$ consumes the most simulation time because it has the largest matrix ( $12\times12$ ) to solve. It should be noticed that since the

TABLE 1. Hardware resource utilization of the case study.

| Resource  | VCU118-1          | VCU118-2        | ZCU102          |
|-----------|-------------------|-----------------|-----------------|
| LUT       | 867,134 (73.3%)   | 769,148 (65.1%) | 253,798 (92.6%) |
| FF        | 1,027,012 (43.4%) | 914,041 (38.7%) | 452,780 (82.6%) |
| BRAM      | 640 (29.6%)       | 624 (28.9%)     | 416 (45.6%)     |
| DSP       | 6,748 (98.6%)     | 5,864 (85.7%)   | 2,490 (98.8%)   |
| Time-step | 5/10/20µs         | 5/10/20µs       | 0.2/10/20µs     |

TABLE 2. Processing latency of communication and subsystems.

| Subsystem/Element             | Latency    | Subsystem/Element   | Latency        |
|-------------------------------|------------|---------------------|----------------|
| Subsystem 1~6                 | 5.1~5.6 μs | Subsystem 7         | 7.6 μs         |
| Subsystem 8                   | 11.2 μs    | Subsystem 9         | 4.2 μs         |
| Subsystem 10                  | 3.2 µs     | Subsystem 11        | 9.8 μs         |
| Subsystem 12~14               | 14.2µs     | Aurora              | 0.95 μs        |
| Trans. Line                   | 2.35 µs    | Transformer         | 2.15 µs        |
| Generator                     | 1.05 µs    | Surge Arrester      | 4.65 μs        |
| $3 \times 3$ iterative solver | 4.1 μs     | 3×3 linear solver   | $0.71 \ \mu s$ |
| $5 \times 5$ linear solver    | 1.67 μs    | 6×6 linear solver   | 1.81 μs        |
| 9×9 linear solver             | 4.47 μs    | 12×12 linear solver | 7.75 μs        |

hardware-based calculation is running in parallel, the latency is not just the simple addition of processing latency of each element.

The latency of Aurora communication is 0.95  $\mu$ s for data transmission of fifteen 32bit single floating-point data, which includes the transmission latency and the latency of writing and reading data to/from the RAM. Since the three boards use the same clock frequency, the communication latency is almost the same although they use different types of transceivers. The transmission time through fiber is also estimated by end-to-end transmission latency test, which is less than 3 clocks thus is negligible compared to the Aurora core processing latency.

### **B. REAL-TIME EMULATION RESULTS**

To simulate the non-linear behavior of the AC/DC system, the lightning surge at Phase C of AC transmission line 4-14 (between bus 4 and bus 14) in both 39-bus systems and ground fault of both poles at DC line 40-42 (between bus 40 and bus 42) are chosen as the transient test. The results are evaluated by the proposed emulator and PSCAD/EMTDC<sup>®</sup>, in which PSCAD/EMTDC<sup>®</sup> uses constant time-step of  $10\mu$ s and  $20\mu$ s respectively while the proposed emulator uses multiple time-step of  $0.2/5/10/20\mu$ s.

Firstly, the steady state operation results are recorded. As representatives, the DC voltage and power flow of the three converters are used to show the power flow between the two 39-bus systems. As shown in Fig. 7(a), it takes about 0.2sfor capacitor charging before the DC voltages reach at steadystate of 400kV. The results of the proposed emulator marked as MRMS match well with PSCAD/EMTDC® results with  $20\mu$ s, and the difference is less than 3%, which is relatively small considering the large scale of topology and number of nonlinear elements. The power flow change operation is shown in Fig. 7(b), in which the power flow from C1 to C2 changes at simulation time of 2.2s, and the power flow from C1 to C3 changes at 3.0s. The simulation results of MRMS are almost the same as PSCAD/EMTDC<sup>®</sup> at steadystate, but there are some differences during power flow changing operation, because the values outputted by outer and inner loop control will change a lot during power flow change and thus will generate a bigger difference.

The device-level switching transient at the first SM of MMC C2 is also simulated with time-step of  $0.2\mu s$ , as shown in Fig. 8, the results are compared with SaberRD<sup>®</sup>. Since the voltage across the switch  $v_{ce,on}$  and conducting current  $i_{c,off}$  is based on the experimental curve during turn-off and turn-on transient respectively, the corresponding switching results can match with SaberRD<sup>®</sup> well, but the counterpart  $i_{c,on}$  and  $v_{ce,off}$  have some differences (less than 2%) with SaberRD<sup>®</sup> after the gate voltage ( $v_{ge}$ ) changes because they are computed by using the equivalent circuit shown in Fig. 4(c).

Secondly, the transient of lightning surge current is simulated to show the nonlinear behavior of surge arresters and transformers. The standard  $10/350\mu s$  lightning surge



FIGURE 7. Steady-state operation of converters. (a) DC voltage at 3-terminals. (b) Power flow change operation of multi-converters.



FIGURE 8. Device-level switching transients of IGBT1-SM1 in MMC C2. (a)(b) Turn-on transients of SaberRD and MRMS. (c)(d) Turn-off transients of SaberRD and MRMS.

current [31] is applied in this work, given as:

$$I_{LS}(t) = CI_m(t/\tau_1)^k e^{\frac{-t}{\tau_2}} / [1 + (t/\tau_1)^k]$$
(18)

where the coefficient C = 1.075, k = 10; the time constant  $\tau_1 = 19\mu$ s,  $\tau_2 = 485\mu$ s; the maximum value of the surge current  $I_m = 10$ kA. In this simulation, the lightning surge current is applied at exactly 3s of the simulation, and the results without and with surge arresters are shown in Fig. 9(a)~(c) and (d)





FIGURE 9. Lightning transient results. (a)~(c) PSCAD/EMTDC<sup>®</sup> results with 10,  $20\mu$ s time-step and MRMS results without surge arresters deployed. (d) Results with surge arresters installed.

respectively. The peak value and transient details of surge voltage and current without surge arresters installed indicate that changing the time-step value will significantly impact the accuracy. MRMS uses mixed time-step of  $0.2/5/10/20\mu s$ , and



FIGURE 10. Ground fault transient results for MMCs. (a) DC voltage at 3-terminals. (b) Power flow of the three MMCs.

the results are more reasonable than the PSCAD/EMTDC<sup>®</sup> results with 20 $\mu$ s time-step and are close to that of 10 $\mu$ s. From Fig. 9(d) it can be observed that the MRMS results can even show more details than PSCAD/EMTDC<sup>®</sup> with 10 $\mu$ s time-step while in the case without surge arresters installed the MRMS results are between the PSCAD/EMTDC<sup>®</sup> results with 10/20 $\mu$ s time-steps, which indicate that the transient results of MRMS with surge arresters deployed are more reasonable than the results of PSCAD/EMTDC<sup>®</sup>. That is because PSCAD/EMTDC<sup>®</sup> uses the piecewise linear method to deal with the nonlinearity of surge arresters while MRMS uses the iterative solver to solve the nonlinear elements for accuracy, although the nonlinear function is simplified with piecewise linear segments.

Finally, the ground fault transient at 8*s* on the DC link between bus 40 and bus 42 is simulated and the results are shown in Fig. 10. It is observed that the DC voltages of the two links are both impacted by the fault, but converter C1 is impacted more seriously because C1 is not powercontrolled but voltage-controlled. The results are reasonable to prove that the HBSMs used to construct the MMCs are not able to block the fault current, because the diodes in each SM can conduct current even when the gate signals are all pulled down, the blocked state restricts current flow only in one of the two potential directions. However, by turning off all the IGBTs when arm currents exceed the predetermined threshold, the MMCs can recover to normal state after 1*s*.

#### **VI. CONCLUSION**

For the real-time EMT simulation of large AC/DC networks that contain various nonlinear elements, iterative solvers are requisite to acquire precise transient results, which could also consume more hardware computational resources. To optimize the accuracy as well as the resource cost, a novel multi-rate mixed-solver is proposed. In the proposed solver, the power system is decomposed into several subsystems, in which multiple time-steps are applied for different subsystems according to the accuracy requirements; by applying the iterative schemes locally and reusing the linear solver among subsystems, the computational resource consumption is reduced. The AC/DC network composed of two IEEE 39-bus systems and three MMC converters is emulated in real-time on the hybrid FPGA-MPSoC platform. Through the solver reuse, the LUT cost can be reduced by about 11.3%, and the DSP cost can be reduced by about 13.1%. The processing delay of different subsystems and solvers is evaluated, which shows the practicality of multi-rating with  $0.2/5/10/20\mu$ s time-steps applied. The real-time emulation results are captured and compared with PSCAD/EMTDC® and SaberRD®, which demonstrates the effectiveness of the proposed solver. The multi-rate mixed-solver can be used for large AC/DC system simulations that consist of various types of elements with requirements of high accuracy and optimum computation resource consumption. In the future work, the emulation system can be further enlarged with more complicated nonlinear models [32]-[35] for conventional power equipment as well as power electronic apparatus. More types of solvers which are suitable for the subsystem containing specific elements, and more time-step ratings could also be taken into consideration for the system real-time EMT simulation.

#### **VII. APPENDIX**

Parameters of the test system: Base values: 100MVA, 230kV, 60Hz; Synchronous generator and loads: the same as [29]; Transformers: 230kV/230kV, leakage inductance 0.2pu, copper loss 0.004pu, knee voltage (saturation) 1.17pu, magnetizing current (saturation) 2%; MMC:  $V_{dc} = 400$ kV,  $C_{sm} = 2.5$ mF,  $f_C = 2000$ Hz, N = 16,  $L_{arm} = 0.0189$ H; Device-level MMC:  $t_{d,on} = 0.5\mu s$ ,  $t_r = 0.55\mu s$ ,  $t_{d,off} = 4.3\mu s$ ,  $t_f = 0.4\mu s$ ; Surge arrester V-I piecewise points: (0kV, 0kA), (263.82kV, 0.001kA), (317.19kV, 0.1kA), (362.42kV, 2.8kA), (429.89kV, 200kA); Transmission lines: the length of lines in the IEEE 39-bus system is same as [29], and the length of DC lines is 150km, and all of the transmission line parameters are derived using the same tower geometry as in [8].

#### REFERENCES

- G. G. Parma and V. Dinavahi, "Real-time digital hardware simulation of power electronics and drives," *IEEE Trans. Power Del.*, vol. 22, no. 2, pp. 1235–1246, Apr. 2007.
- [2] B. Lu, X. Wu, H. Figueroa, and A. Monti, "A low-cost real-time hardwarein-the-loop testing approach of power electronics controls," *IEEE Trans. Ind. Electron.*, vol. 54, no. 2, pp. 919–931, Apr. 2007.

- [3] M. O. Faruque and V. Dinavahi, "Hardware-in-the-loop simulation of power electronic systems using adaptive discretization," *IEEE Trans. Ind. Electron.*, vol. 57, no. 4, pp. 1146–1158, Apr. 2010.
- [4] M. S. Vekic, S. U. Grabic, D. P. Majstorovic, I. L. Celanovic, and N. L. Celanovic, "Ultralow latency HIL platform for rapid development of complex power electronics systems," *IEEE Trans. Power Electron.*, vol. 27, no. 11, pp. 4436–4444, Nov. 2012.
- [5] L. O. Chua and P. Lin, Computer-Aided Analysis of Electronic Circuits: Algorithms and Computational Techniques. Englewood Cliffs, NJ, USA: Prentice-Hall, 1975, pp. 131–140.
- [6] B. Wu, High–Power Converters and AC Drives. Hoboken, NJ, USA: Wiley, 2006, pp. 3–13.
- [7] Y. Chen and V. Dinavahi, "An iterative real-time nonlinear electromagnetic transient solver on FPGA," *IEEE Trans. Ind. Electron.*, vol. 58, no. 6, pp. 2547–2555, Jun. 2011.
- [8] Y. Chen and V. Dinavahi, "Multi-FPGA digital hardware design for detailed large-scale real-time electromagnetic transient simulation of power systems," *IET Generat., Transmiss., Distrib.*, vol. 7, no. 5, pp. 451–463, May 2013.
- [9] Z. Shen and V. Dinavahi, "Dynamic variable time-stepping schemes for real-time FPGA-based nonlinear electromagnetic transient emulation," *IEEE Trans. Ind. Electron.*, vol. 64, no. 5, pp. 4006–4016, May 2017.
- [10] VCU118 Evaluation Board User Guide (UG1224), Xilinx, San Jose, CA, USA, Oct. 2018.
- [11] ZCU102 Evaluation Board User Guide (UG1182), Xilinx, San Jose, CA, USA, Oct. 2018.
- [12] L. Chua, "Efficient computer algorithms for piecewise-linear analysis of resistive nonlinear networks," *IEEE Trans. Circuit Theory*, vol. 18, no. 1, pp. 73–85, Jan. 1971.
- [13] H. W. Dommel, "Nonlinear and time-varying elements in digital simulation of electromagnetic transients," *IEEE Trans. Power App. Syst.*, vol. PAS-90, no. 6, pp. 2561–2567, Nov. 1971.
- [14] A. Lesnicar and R. Marquardt, "An innovative modular multilevel converter topology suitable for a wide power range," in *Proc. IEEE Bologna Power Tech Conf.*, Bologna, Italy, Jun. 2003, pp. 1–6.
- [15] Q. Song, W. Liu, X. Li, H. Rao, S. Xu, and L. Li, "A steady-state analysis method for a modular multilevel converter," *IEEE Trans. Power Electron.*, vol. 28, no. 8, pp. 3702–3713, Aug. 2013.
- [16] Z. Shen and V. Dinavahi, "Real-time device-level transient electrothermal model for modular multilevel converter on FPGA," *IEEE Trans. Power Electron.*, vol. 31, no. 9, pp. 6155–6168, Sep. 2016.
- [17] Z. Huang and V. Dinavahi, "A fast and stable method for modeling generalized nonlinearities in power electronic circuit simulation and its real-time implementation," *IEEE Trans. Power Electron.*, vol. 34, no. 4, pp. 3124–3138, Apr. 2019. doi: 10.1109/TPEL.2018.2851570.
- [18] C. Liu, R. Ma, H. Bai, Z. Li, F. Gechter, and F. Gao, "FPGA-based realtime simulation of high-power electronic system with nonlinear IGBT characteristics," *IEEE J. Emerg. Sel. Topics Power Electron.*, vol. 7, no. 1, pp. 41–51, Mar. 2019.
- [19] G. Carrara, S. Gardella, M. Marchesoni, R. Salutari, and G. Sciutto, "A new multilevel PWM method: A theoretical analysis," *IEEE Trans. Power Electron.*, vol. 7, no. 3, pp. 497–505, Jul. 1992.
- [20] H. W. Dommel, *EMTP Theory Book*. Portland, OR, USA: Bonneville Power Administration, 1984.
- [21] B. Gustavsen and A. Semlyen, "Simulation of transmission line transients using vector fitting and modal decomposition," *IEEE Trans. Power Del.*, vol. 13, no. 2, pp. 605–614, Apr. 1998.
- [22] Y. Chen and V. Dinavahi, "Digital hardware emulation of universal machine and universal line models for real-time electromagnetic transient simulation," *IEEE Trans. Ind. Electron.*, vol. 59, no. 2, pp. 1300–1309, Feb. 2012.
- [23] S. D. Pekarek, O. Wasynczuk, and H. J. Hegner, "An efficient and accurate model for the simulation and analysis of synchronous machine/converter systems," *IEEE Trans. Energy Convers.*, vol. 13, no. 1, pp. 42–48, Mar. 1998.
- [24] L. Wang and J. Jatskevich, "A voltage-behind-reactance synchronous machine model for the EMTP-type solution," *IEEE Trans. Power Syst.*, vol. 21, no. 4, pp. 1539–1549, Nov. 2006.
- [25] V. Brandwajn and H. W. Dommel, "Interfacing generator models with an electromagnetic transients program," presented at the IEEE Power Eng. Soc. Summer Meeting, Portland, OR, USA, Jul. 1976, Art. no. A76359-0.
- [26] User's Guide: A Comprehensive Resource for EMTDC, Version 4.7, Manitoba HVDC Res. Centre, Winnipeg, Manitoba, MB, Canada, 2010.

# **IEEE Power and Energy Technology Systems Journal**

- [27] J. Liu and V. Dinavahi, "A real-time nonlinear hysteretic power transformer transient model on FPGA," *IEEE Trans. Ind. Electron.*, vol. 61, no. 7, pp. 3587–3597, Jul. 2014.
- [28] IEEE Guide for the Application of Metal-Oxide Surge Arresters for Alternating-Current Systems, IEEE Standard C62.22-2009, Jul. 2009, pp. 1–142.
- [29] PSCAD TM IEEE 39 Bus System, Revision 1, Manitoba HVDC Res. Centre, Winnipeg, MB, Canada, 2010.
- [30] Aurora 64B/66B v11.2: LogiCORE IP Product Guide, Xilinx, San Jose, CA, USA, Apr. 2018.
- [31] Protection Against Lightning Electromagnetic Impulse—Part I: General Principles, IEC Standard 61312-I, Feb. 1995.
- [32] N. R. Tavana and V. Dinavahi, "Real-time nonlinear magnetic equivalent circuit model of induction machine on fpga for hardware-in-the-loop simulation," *IEEE Trans. Energy Convers.*, vol. 31, no. 2, pp. 520–530, Jun. 2016.
- [33] J. Liu and V. Dinavahi, "Detailed magnetic equivalent circuit based real-time nonlinear power transformer model on FPGA for electromagnetic transient studies," *IEEE Trans. Ind. Electron.*, vol. 63, no. 2, pp. 1191–1202, Feb. 2016.
- [34] W. Wang, Z. Shen, and V. Dinavahi, "Physics-based device-level power electronic circuit hardware emulation on FPGA," *IEEE Trans. Ind. Informat.*, vol. 10, no. 4, pp. 2166–2179, Nov. 2014.
- [35] N. Lin and V. Dinavahi, "Dynamic electro-magnetic-thermal modeling of MMC-based DC–DC converter for real-time simulation of MTDC grid," *IEEE Trans. Power Del.*, vol. 33, no. 3, pp. 1337–1347, Jun. 2018.



**TONG DUAN** (S'18) received the B.Eng. degree in electronic engineering from Tsinghua University, Beijing, China, in 2013. He is currently pursuing the Ph.D. degree in electrical and computer engineering with the University of Alberta, Edmonton, AB, Canada. His research interests include real-time simulation of power systems, power electronics, and field-programmable gate arrays.



**ZHUOXUAN SHEN** (S'14–M'18) received the B.Eng. degree in electrical engineering from Jiangsu University, Zhenjiang, China, in 2013, and the Ph.D. degree in electrical and computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2018. He is currently a Post-Doctoral Researcher with Tsinghua University. His research interests include real-time simulation of power systems and power electronics systems.



**VENKATA DINAVAHI** (S'94–M'00–SM'08) received the B.Eng. degree in electrical engineering from the Visveswaraya National Institute of Technology (VNIT), Nagpur, India, in 1993, the M.Tech. degree in electrical engineering from IIT Kanpur, Kanpur, India, in 1996, and the Ph.D. degree in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 2000. He is currently a Professor with the Department of Electrical and Computer

Engineering, University of Alberta, Edmonton, AB, Canada. His research interests include real-time simulation of power systems and power electronic systems, electromagnetic transients, device-level modeling, large-scale systems, and parallel and distributed computing.