# Variable Time-Stepping Modular Multilevel Converter Model for Fast and Parallel Transient Simulation of Multiterminal DC Grid

Ning Lin<sup>®</sup>, Student Member, IEEE, and Venkata Dinavahi<sup>®</sup>, Senior Member, IEEE

Abstract—The efficiency of multiterminal dc (MTDC) grid simulation decreases with an expansion of its scale and the inclusion of accurate component models. Thus, the variable time-stepping scheme is proposed in this paper to expedite the electromagnetic transient computation. A number of criteria are proposed to evaluate the time-step and regulate it dynamically during simulation. Meanwhile, as the accuracy of results is heavily reliant on the switch model in the modular multilevel converter, the nonlinear behavioral model with a greater accuracy is proposed in addition to the classic ideal model, and their corresponding variable time-stepping schemes are analyzed. Circuit partitioning is effective in accelerating the MTDC grid simulation via fine-grained separation of nonlinearities. A subsequent large number of identical circuits enabled a massively parallel implementation on the graphics processing unit, which achieved a remarkable speedup over the CPU-based implementation. The inclusion of variable time-stepping schemes eventually makes the simulation of MTDC grid with highly detailed nonlinear switch models feasible. The results are validated by commercial device-level and system-level simulation tools.

Index Terms—Electromagnetic transients, graphics processing unit (GPU), multiterminal direct-current (MTDC), modular multilevel converter (MMC), nonlinear IGBT model, parallel processing, power system simulation, variable time-stepping (VTS).

# I. INTRODUCTION

T HE modular multilevel converter (MMC) has witnessed a spike in its application in high-voltage direct current (HVdc) transmission, and in the multiterminal dc (MTdc) system which is formed by connecting several HVdc terminals [1], [2]. Electromagnetic transient (EMT) simulation is a major approach for studying the system's performance under various operation conditions [3], [4]; however, the solution becomes very time-consuming when the system scale expands and more accurate models are involved.

Digital Object Identifier 10.1109/TIE.2018.2880671

The power semiconductor switch model is crucial to the MMC accuracy, and several types have been identified [5]. The two-state switch model (TSSM) which is deemed as the detailed model for the insulated-gate bipolar transistor (IGBT) and diode are currently prevalent in EMT tools, but it is not often seen in its original form for a high-level MMC simulation, as it still brings a heavy computational burden to CPUs. Thus, model reductions are necessary, such as the detailed equivalent model which merges all submodules (SMs) in an arm[6], the average value model [7] and its variants [8], [9]; however, under some scenarios such as dc line faults, their accuracy is compromised due to either neglecting the freewheeling diode, or a lack of proper impedance for the ideal switch model.

Meanwhile, the power loss and efficiency of a high power converter, as well as the semiconductor switch stresses induced by voltage, current, and thermal diffusion, are of significant concern in the converter design process [10]–[13]. The EMT simulation of power electronics containing detailed switch models is an effective approach to acquire all of these information under various test conditions without scaling down the converter. Nevertheless, due to the nonlinearity and the high-order nature, devicelevel IGBT/diode models are not considered by current EMT simulators for system-level study, since they require Newton– Raphson (N-R) iterations conducted under a small time-step for convergent results, making it too burdensome for the CPU to calculate even a small time span. Moreover, numerical divergence is frequently encountered when the number of nonlinear devices rises, resulting in early simulation termination.

It is noticed that a fixed time-step (FTS) is not always mandatory for the EMT simulation as at distinct simulation stages, the requirement on the density of results is different. The idea that a sufficiently small time-step is used when the concerned events which trigger severe current or voltage variations occur, and that it should be enlarged when the impact mitigates avoids unnecessary computation and consequently accelerates the simulation. This variable time-stepping (VTS) scheme was successfully applied on a few occasions where the system is relatively small [14]–[16], but its application to a larger nonlinear system such as the MTDC grid with device-level specifics has yet to be explored.

On the other hand, the graphics processing unit (GPU) has begun to draw attention in large-scale ac power system analysis and transient simulation where a high computational speedup is attained. The massive parallelism could potentially help achieve

0278-0046 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received July 2, 2018; revised September 24, 2018; accepted November 1, 2018. Date of publication November 15, 2018; date of current version April 30, 2019. This work was supported by the Natural Science and Engineering Research Council of Canada. *(Corresponding author: Ning Lin.)* 

The authors are with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada (e-mail: ning3@ualberta.ca; dinavahi@ualberta.ca).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

a faster simulation speed for the MTDC grid as well. Unlike the highly regular ac system, which has dozens or hundreds of buses, the dc grid has a much smaller number of system-level equipment. Thus, a substantial quantity of identical circuit units is created after partitioning MMC SMs from its arms using coupled voltage-current source pairs.

Therefore, in this paper, VTS schemes are applied to achieve fast simulation of MMC-based MTDC grid on both the CPU and GPU. For the nonlinear behavioral model (NBM), a combination of GPU and the VTS schemes makes system-level simulation involving device-level details viable.

This paper is organized as follows. The proposed VTS schemes are classified in Section II. Section III introduces nonlinear behavioral IGBT/diode model for the MMC. The parallel GPU program is designed in Section IV, followed by Section V where the simulation results and model validation are given. Conclusions are provided in Section VI.

# II. PROPOSED VTS SCHEMES

## A. Event-Correlated Criterion

This criterion is based on events taking place in the system, i.e., transmission line faults, breaker operation, and even a power semiconductor switch's action. Though any change in the state of a component results in perturbation to the system, the impact caused by different components varies. It is dependent on the simulation requirement, e.g., for system-level results, line faults have a far significant impact than a single switch on the grid. But when device-level transients are of concern, the turn-ON and turn-OFF processes become the focus. Therefore, regulation of the time-step is dependent on the system's sensitivity to events related to a certain component, and all these impacts are eventually reflected by the variation in currents or voltages. Therefore, their change rates dv/dt and di/dt are two common criteria for time-step control.

## B. Local Truncation Error (LTE)

The LTE is another general criterion [17] for time-step adjustment due to an extensive distribution of energy storage components such as the inductor and capacitor whose integral i - v relationship needs discretization before being applied for the EMT calculation. One-step integration approximations such as Backward Euler and Trapezoidal rule are the main methods. However, due to their relatively low orders, the estimation has lower precision than the multistep methods, and the error increases along with the integral step. Thus, the LTE is obtained in such a manner that the prediction is first computed by the linear multistep method, and within that time-step, it is compared with the corresponding solution of the nodal matrix equation.

For linear energy storage components, their i - v characteristics can generally be expressed by the following first-order ordinary differential equation:

$$y' = F(t, y(t)) \tag{1}$$

where y can be either inductor current or capacitor voltage. Given a new time-step n + 1, the y value is calculated by integrating both sides

$$y_{n+1} = y_n + \int_{t_n}^{t_{n+1}} F(\tau, y(\tau)) d\tau.$$
 (2)

In EMT simulation, the above equation needs to be discretized. An accurate y can be predicted by the implicit *s*-step Adams–Moulton (AM) Method

$$\bar{y}_{n+1} = y_n + \sum_{m=n-s+1}^{n+1} \left( \int_0^{\Delta \tilde{t}} \psi(\tau) d\tau \right) F(t_m, y_m)$$
 (3)

where  $\Delta \tilde{t}$  is the adaptive time-step, and  $\psi(\tau)$  is the Lagrange interpolating polynomial

$$\psi(\tau) = \prod_{k=n-s+1, k \neq m}^{n+1} \frac{\tau - t_k}{t_m - t_k}.$$
 (4)

On the other hand, solving the circuit nodal equation based on the One-Step integration approximation gives the EMT simulation outcome  $y_{n+1}$ , which has a lower accuracy. Therefore, the relative error  $\epsilon$  is obtained by

$$= \left| \frac{\bar{y}_{n+1} - y_{n+1}}{\bar{y}_{n+1}} \right| \times 100\%.$$
 (5)

The time-step is dynamically adjusted according to  $\epsilon$ , either being reduced to ensure accuracy or enlarged to accelerate the simulation in a predefined manner.

For efficient computation, medium-order AM formulas are adopted to predict the value of the next time-step, e.g., the fourth-order AM is given as

$$\bar{y}_{n+1} = y_n + \frac{\Delta \tilde{t}}{24} (9F_{n+1} + 19F_n - 5F_{n-1} + F_{n-2}).$$
 (6)

#### C. Newton-Raphson Iteration Count

The nodal voltage equation of an EMT system at time instant n can generally be written as

$${}_{n}\mathbf{U}^{k} =_{n} (\mathbf{G}^{-1})^{k} \cdot_{n} \mathbf{J}^{k}$$

$$\tag{7}$$

where k is the iteration count in case the N-R iteration is required for solving a nonlinear system when (7) is computed repeatedly within a time-step until the nodal voltage vector U is deemed convergent if the difference between results of two successive iterations is smaller than the threshold  $\zeta$ , i.e.,

$$\left\|\frac{{}_{n}\mathbf{U}^{k+1}-{}_{n}\mathbf{U}^{k}}{{}_{n}\mathbf{U}^{k+1}}\right\| \leq \zeta.$$
(8)

For a linear system, the nodal equation can be solved instantly and therefore k is constantly 1. Thus, the Newton–Raphson iteration count is a VTS criterion peculiar to nonlinear components. The time-step can be determined according to the number of iterations, e.g., for the nonlinear behavioral IGBT/diode model, the steady state has fewest iterations and consequently the largest applicable time-step, while during transient stages it reduces.

#### D. Hybrid Time-Step Control and Synchronization

A small value is preferred as the default time-step in the EMT program initialization. Once the simulation commences,



Fig. 1. Hybrid FTS-VTS scheme for MTDC grid simulation. (a) System structure. (b) Time instant synchronization.



Fig. 2. MMC Configuration. (a) Three-phase topology. (b) Circuit partitioning to one phase.

the time-step is doubled each time if required until it reaches the upper limit, which may not be two times larger than the second largest value. Similarly, when the time-step is reducing, it follows the same route, i.e., if the current time-step is already maximum, it first steps down to its nearest value; otherwise, the time-step halves until it reaches the lower limit.

In the MTDC grid, a large number of components utilize VTS schemes, which means that each of them will produce an individual time-step  $\Delta t_i$ . The *localized* VTS algorithm is applied, taking the converter as a basic unit, as Fig. 1(a) shows where a hybrid FTS and VTS scheme is adopted. With regard to system-level components, they are computed at a fixed *global* time-step  $\Delta T$ , which is much larger than the maximum VTS value. Since a large disparity exists between them, the FTS system proceeds at a much lower frequency, i.e., it enters the next time-step only when the time instants of all VTS systems reach beyond its current value, as demonstrated in Fig. 1(b). Otherwise, all VTS systems continue individual computations while the FTS system waits for them to finish.

## **III. MODULAR MULTILEVEL CONVERTER MODELS**

Fig. 2(a) shows the configuration of a three-phase MMC, where each phase contains two arms which constitute an inductor and N SMs for attaining N + 1 voltage levels. The two-state resistor is the prevalent switch model in EMT-type tools, but it

falls short of device-level behaviors and accuracy. Thus, the NBM which is able to reveal more information becomes necessary. Regardless of the switch model adopted, the computational burden is heavy considering a high-output voltage level.

Therefore, the MMC circuit is partitioned based on voltagecurrent source coupling which separates all SMs from the arm for faster simulation speed, as shown in Fig. 2(b) where a singlephase MMC is reconfigured. The voltage sources on the main circuit side and the current source on the SM side have one time-step delay, but since the variation of arm currents in two neighboring time-steps is small due to a much higher EMT computation frequency, it can be deemed as constant and, therefore, the accuracy is ensured. In the partitioned configuration, 2N SMs are split from each phase of an (N + 1)-level MMC, meaning the entire three-phase MMC has all its 6N nonlinear SMs partitioned. Meanwhile, the remaining main circuit composed of coupled voltage sources and arm inductors becomes linear. Consequently, one integral (N + 1)-level MMC contains 6N + 1 subsystems, and the admittance matrix corresponding to the original large circuit is mathematically split into the same number of matrices with much smaller dimensions, which can be solved by the GPU or multicore CPU more efficiently.

The other benefit brought by the proposed partitioning scheme is that a large number of subsystems caters to the singleinstruction-multiple-thread (SIMT) architecture of the GPU, whose capability of massively parallel execution is the main factor relied on to gain computational speedup over CPU. Therefore, creating the maximum possible number of subsystems is a major criterion for assessing the efficacy of circuit partitioning. Otherwise, the massive parallelism of GPU will be compromised in addition to solving a still large matrix equation with a long iterative process.

#### A. Two-State Switch Model

The TSSM is the simplest model for power semiconductor switches whose turn-ON and turn-OFF action completes instantaneously with the transition of two distinct states lasting only one time-step. Due to a lack of device specifics, when it is applied in the MTDC grid, system-level results are of interest. Therefore, the switching is not taken as a criterion for time-step control, nor is the N-R iteration for the absence of nonlinearity in the converter. Nonetheless, discrete events are still a criterion for indicating state shift in other components such as the transmission line. The LTE as a general method is applicable to the MMC, as it contains a number of capacitors and six arm inductors. Therefore, in the TSSM-based dc grid, both LTE and events-correlated criterion can be utilized.

Take the partitioned two-node half-bridge SM (HBSM) in Fig. 3 for instance. It has the following matrix equation:

$$\begin{bmatrix} U_1(t) \\ U_2(t) \end{bmatrix} = \begin{bmatrix} G_C + R_1^{-1} & -R_1^{-1} \\ -R_1^{-1} & R_1^{-1} + R_2^{-1} \end{bmatrix} \cdot \begin{bmatrix} I_{\text{Ceq}}(t) \\ J_s(t - \Delta t) \end{bmatrix}$$
(9)

for computing nodal voltages, where  $G_C$  and  $I_{Ceq}$  are the discrete-time companion model of the SM capacitor,  $R_1(R_2)$  represent the equivalent resistance of the upper (or lower)



Fig. 3. Partitioned MMC EMT model.

switch, and  $J_s(t - \Delta t)$  indicates one time-step delay due to circuit partitioning. The calculated capacitor voltage  $U_C(t)$  equivalent to  $U_1(t)$  is then compared with its predicted value in (6) where

$$F_m = \frac{1}{C} (G_C \cdot U_C (t - (n+1-m)\Delta t) - I_{Ceq} (t - (n+1-m)\Delta t))$$
(10)

is universal for the capacitor cases. The subscripts m = (n + 1, n, n - 1, n - 2) denote the values at the current time-step and previous time-steps.

## B. MMC Main Circuit

For various MMC topologies, the main circuit is always the same after partitioning. It contains five nodes after deriving the Norton equivalent circuit, i.e., three nodes on the ac side and the other 2 on the dc side, as shown in Fig. 3. The matrix equation for this universal part is

$$\mathbf{G} = \mathbf{G}_{\text{ext}} + \begin{bmatrix} 2G_{\Sigma} \cdot \mathbf{I}_{3\times3} & [-G_{\Sigma}]_{3\times2} \\ [-G_{\Sigma}]_{2\times3} & (3G_{\Sigma} + G_{C_{C}}) \cdot \mathbf{I}_{2\times2} \end{bmatrix}$$
(11)

$$\mathbf{J} = \begin{bmatrix} -J_{\Sigma A u} + J_{\Sigma A d}, -J_{\Sigma B u} + J_{\Sigma B d}, -J_{\Sigma C u} + J_{\Sigma C d} \\ J_{\Sigma A u} + J_{\Sigma B u} + J_{\Sigma C u} + I_{C_1 eq} - J_{\Sigma A d} - J_{\Sigma B d} - J_{\Sigma C d} \\ - I_{C_2 eq} \end{bmatrix} + \mathbf{J}_{ext}$$
(12)

where matrices  $\mathbf{G}_{\text{ext}}$  and  $\mathbf{J}_{\text{ext}}$  represent the elements contributed by ac and dc grids the MMC connects to,  $G_{C_C}$  and  $I_{Cieq}$  (i = 1, 2) denote the dc bus capacitor in case they exist, and  $G_{\Sigma}$  and  $J_{\Sigma}$  are the companion model of an MMC arm where subscripts u and d stands for the upper and lower arm, respectively. Then, the arm currents can be derived after solving the nodal voltages by (7), as expressed below

$$I_{\rm arm} = \begin{cases} G_{\Sigma}(U_4 - U_k) - J_{\Sigma u}, & (\text{upper arm}) \\ G_{\Sigma}(U_k - U_5) - J_{\Sigma d}, & (\text{lower arm}) \end{cases}$$
(13)



Fig. 4. IGBT/diode nonlinear electrothermal model.

where k = 1, 2, 3 represents the node number on the ac side. For VTS control, the predicted arm currents are calculated in a similar manner by (6) with the history terms of an inductor expressed as

$$F_m = \frac{1}{L} (U_L (t - (n + 1 - m)\Delta t))$$
(14)

where m = (n + 1, n, n - 1, n - 2) are the time-steps.

# C. MMC Nonlinear Behavioral Model

a) Nonlinear IGBT/Diode Behavioral Model: In Fig. 4, the nonlinear electrothermal model of IGBT and its antiparallel diode are given [18], [19]. The current source  $i_{mos}$  accounting for ON and OFF characteristics of the IGBT is its core part, written as

$$i_{\rm mos} = \begin{cases} 0, & (v_{\rm Cge} < V_t) || (v_d \le 0) \\ a_2 v_d^z \cdot (1 - \frac{b_2}{a_2} v_d), & v_d < (y \cdot (v_{\rm Cge} - V_t))^{\frac{1}{x}} \\ \frac{(v_{Cge} - V_t)^2}{a_1 + b_1 \cdot (v_{Cge} - V_t)}, & (\text{others}) \end{cases}$$
(15)

where  $v_d$  and  $v_{Cge}$  are the voltages over itself and  $C_{ge}$ , respectively,  $V_t$  is the conduction threshold voltage, and  $a_1, a_2, b_1, b_2$  are coefficients. The current source  $i_{\text{tail}}$  is responsible for IGBT tailing current during the turn-OFF stage

$$i_{\text{tail}} = \left( \left| \frac{v_{\text{tail}}}{r_{\text{tail}}} - i_{\text{mos}} \right| + \left( \frac{v_{\text{tail}}}{r_{\text{tail}}} - i_{\text{mos}} \right) \right) \cdot \frac{i_{\text{rat}}}{2}$$
(16)

where  $v_{\text{tail}}$  is the voltage over  $r_{\text{tail}}$  and  $i_{\text{rat}}$  is a constant.

The symbol *NLD* represents the diode's exponential static characteristics, which is

$$I_d = I_s \cdot \left( e^{\frac{V_j}{nV_T}} - 1 \right) \tag{17}$$

where  $I_s$  is the saturation current,  $V_j$  is the junction voltage,  $V_T$  the thermal voltage, and n is a coefficient. The remaining parts account for the diode reverse recovery phenomenon.

**b)** Nonlinear Behavioral SM Model: The NBM improves the MMC accuracy. However, as can be seen, the HBSM would have eight nodes since a single IGBT/diode contains five nodes. Moreover, an iterative N-R method is required for the convergent results. Thus, computing this type of SM is significantly time-consuming than the TSSM and other system-level models, which is the reason that it is rarely seen in system-level simulation. The admittance matrix of the NBM-based HBSM is constructed by

$$\mathbf{G}_{8\times8} = \begin{bmatrix} G_C & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix} + \begin{bmatrix} \mathbf{G}_{\mathbf{T}5\times5} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix} + \begin{bmatrix} \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{G}_{\mathbf{T}4\times4} \end{bmatrix}$$
(18)

and the current contribution takes the form of

$$\mathbf{J} = \begin{bmatrix} I_{Ceq} & \mathbf{0}_{1\times3} & J_s & \mathbf{0}_{1\times3} \end{bmatrix} + \begin{bmatrix} \mathbf{J}_{\mathbf{T}1\times5} & \mathbf{0} \end{bmatrix} + \begin{bmatrix} \mathbf{0} & \mathbf{G}_{\mathbf{T}1\times4} \end{bmatrix}$$
(19)

where  $\mathbf{G}_{\mathbf{T}}$  together with  $\mathbf{J}_{\mathbf{T}}$  are elements in the nonlinear IGBT/diode model.

The full-bridge SM (FBSM)-based MMC is another common topology which has the capability to block the dc fault when all of its IGBTs are turned OFF. Compared with its half-bridge counterpart, it has 13 nodes. Its matrix equation formulation follows the same principle, e.g., the negative terminal of the current source  $J_s$  is taken as the virtual ground and therefore be not counted as an unknown node.

To accelerate the computation of nonlinear SMs, raising the time-step is a common method in traditional FTS simulation, but the achievable time-step cannot be too large in order to attain the twofold goals of keeping the simulation results convergent and maintaining a high accuracy. Considering that the transient state is particularly easy to be subjected to numerical divergence, a time-step of 10 ns is adopted in the FTS simulation so that the switching transients can be captured properly.

In contrast, the VTS schemes realize these two goals simultaneously by adding the proposed algorithms, along with its main benefit of expediting the simulation. As a universal approach to systems which comprise reactive components, the LTE is one choice for time-step regulation, and the procedure is the same as illustrated above. According to (15), the capacitor voltage  $v_{Cge}$  is one of the main factors responsible for the IGBT's transient behavior; thus, it is selected in the LTE scheme for dynamic time-step adjustment. Its EMT simulation outcome based on One-Step integration approximation can be instantly obtained following the solution of SM matrix equation. On the other hand, the prediction is calculated by the fourth-order AM formula (6) and the history terms  $F_m$  (m = (n + 1, n, n - 1, n - 2)) are calculated in the same manner as (10).

Meanwhile, a proper judgment on the events can also be utilized, e.g.,  $v_{Cge}$  is an indicator for switching behavior: when the IGBT turns ON or OFF, it approaches the gate voltage, making dv/dt nonzero; otherwise, its derivative is nearly 0. Therefore, the time-step can be regulated by comparing the dv/dt value with a number of preset thresholds. The N-R iteration count is the most convenient criterion for nonlinear systems. For the solution of the SM matrix (7), the steady state takes the fewest number of iterations, and it is tolerable of a time-step of up to 500 ns, which is set as the upper limit in this paper. On the other hand, the transient stage requires more iterations, and it is prone to divergence if the time-step can be mapped to the number of N-R iterations: the more the number of iterations, the smaller the time-step.



Fig. 5. GPU SIMT architecture for EMT simulation.

Since the basic unit for applying the VTS scheme is one SM, there will be many time-steps for one MMC, which is taken as a VTS system. It is obvious that among all the time-steps produced by various SMs of the same converter, the minimum value should be chosen as the localized time-step for the entire MMC because it satisfies the requirements of convergence and accuracy of all (6N + 1) subsystems created by circuit partitioning, including the linear main circuit.

## IV. MMC GPU KERNEL DESIGN

## A. GPU EMT Simulation

The NVIDIA Tesla V100 GPU (Volta architecture) is used for the circuit simulation. It has 5120 CUDA cores, 16 GB HBM2 memory, and a memory bandwidth of 900 GB/s [20].

The GPU operates in a SIMT mode, i.e., once the kernel—a global function programmed in CUDA C language—is invoked, concurrent computation of a designated number of threads executing the same instructions begins. More specifically, a kernel corresponds to a computational grid, which contains a definitive number of threads grouped in blocks.

The V100 GPU's 7.0 compute capability allows dynamic parallelism to be utilized for time-domain circuit simulation, as Fig. 5 shows. The EMT program is first initiated in the CPU, and since variables defined there cannot be accessed by the GPU directly, they are copied to the global memory via PCIe 3.0 interface. The GPU kernel activated by the CPU is also able to launch its own kernels. The overall MTDC grid is programmed as *kernel*<sub>0</sub>, and within it, *kernel*<sub>1</sub>–*kernel*<sub>N</sub> represent various circuit components. When any of them is launched in *a* blocks each containing *b* threads, an exact  $a \times b$  real components are calculated concurrently. In addition, device functions are also introduced to deal with VTS. At the end of the simulation, all needed data are copied back to the CPU via PCIe 3.0 bus, so that the results can be analyzed.

#### B. VTS MMC Kernel

As the core part of the MTDC grid, the GPU program of the NBM-based MMC with N-R iteration VTS scheme is specified in Fig. 6. The outputs of all functions—regardless device or global—are stored in the GPU global memory so that they can be accessed by other kernels or device functions. The kernels are designed according to the number of functions the



Fig. 6. Nonlinear behavioral SM kernel with VTS scheme.



Fig. 7. MMC-based MTDC grid with wind farm integration. (a) System configuration. (b) MMC controller.

partitioned MMC has. Among them, the SM kernel is the most complex part. The IGBT/diode model is programmed as a GPU device function which could be instantly called by the SM kernel. Their outputs are properly organized according to (18) and (19). The N-R iteration of the matrix (7) repeats until all nodal voltages are convergent, and the final iteration count  $K_{\text{NRi}}$  of that SM is stored in global memory so that it can be read by the VTS module, which first extracts the maximum iteration number  $K_{NR}$  and then produces a proper time increment for the next calculation.

It is noticed that not all threads launched by the same kernel implement exactly identical instructions, e.g., the number of N-R iterations conducted by the SM kernel varies in different SMs, and therefore synchronization of all threads is implemented at the end.

## V. VTS SIMULATION RESULTS AND VALIDATION

## A. System Setup

Fig. 7(a) shows two HVdc links integrated with offshore wind farms (OWFs), and by connecting them, e.g., between buses B3 and B4, an MTDC system is formed.  $MMC_1$  and  $MMC_2$ are rectifiers which provide stable ac voltage for the OWFs while simultaneously converting their energy into dc.  $MMC_3$ and  $MMC_4$  operate as inverters regulating the converter side dc voltage  $V_{dco3}$  and  $V_{dco4}$ , respectively, as the general control scheme in d - q frame for the converter stations in Fig. 7(b) shows. After Inverse Park's Transformation, the control signals  $v_{abc}$  are decoupled and sent to individual MMC inner-loop controllers using phase-shift control (PSC) strategy [21]. The angle reference  $\theta$  is derived from the ac grid on the inverter side, while it is an integral of the targeted frequency on the rectifier side. To curb a rapid current rise in case of dc fault, an dc inductor  $Z_L$ is installed at each pole between the MMC and the dc bus connecting the transmission line, which adopts Bergeron's traveling wave model [22], and transformers are required on the rectifier side for wind energy integration. Each OWF is modeled as an aggregation of 100 doubly fed induction generators (DFIGs). Once the line  $TL_3$  connects both HVdc links, the power exchange between them can be estimated under the circumstance that both inverters have the same dc voltage reference  $V_{dc}^*$ , as the current  $I_{exc}$  is approximately

$$I_{\rm exc} = 2 \frac{I_{\rm dc1} - I_{\rm dc2}}{Z_{TL} + 4Z_L} Z_L \tag{20}$$

where  $Z_{TL}$  is the line impedance, and  $I_{dc1}$  and  $I_{dc2}$  are the currents on the neighboring lines  $TL_1$  and  $TL_2$ .

As can be seen from Fig. 7(a), when applying the proposed hybrid FTS-VTS scheme to the MTDC grid, system-level models including the transmission line and the OWF are calculated with a FTS of 20  $\mu$ s, and the MMCs each of which adopts individual VTS scheme fall within a time-step range of 10 and 500 ns. It should be pointed out that to test the performance of point-to-point HVdc, MMC<sub>1</sub> has an option of connecting to a stiff ac grid, and in this case, the overall converter station is taken as the subsystem VTS<sub>1</sub>.

The VTS simulations are conducted on both the CPU and GPU under the 64-bit Windows 10 operating system on the 2.2 GHz Intel Xeon E5-2698 v4 CPU and 192 GB RAM. The device-level and system-level results are validated by SaberRD and PSCAD/EMTDC, respectively. The former commercial tool has its own inherent VTS scheme and the same maximum time-step of 500 ns is set, while the latter tool as a system-level EMT-type solver determines that its FTS can be much larger and therefore a typical value of 20  $\mu$ s is chosen.

## B. VTS in Device- and System-Level Simulation on CPU

In Fig. 8, all three proposed VTS schemes are tested in simulating a nonlinear single-phase nine-level MMC fed with 8-kV dc bus voltage and switched at 1 kHz. For the convenience of results validation by SaberRD, the IGBT module BSM 300GA 160D is selected, and the maximum time-step is 500 ns. The output voltages are shown on the left, which are virtually the same, although the time-stepping schemes are different. The time-step variation in a zoomed 0.5 ms segment is shown on the right. As can be seen, the proposed three schemes yield different time-steps, but all of them give an apparent regulation in the time-step under different stages. Under steady state, the LTE and N-R iteration count give the maximum time-step, while the event-based criterion can hardly produce a fixed 500 ns. In contrast, the time-step changes dramatically under transient states, when it repeatedly reaches the minimum 10 ns to keep the simulation convergent and capture the transients accurately. The efficiency of the schemes in computing low-level MMCs for a



Fig. 8. VTS schemes for nonlinear MMC simulation (left: output voltage; right: time-steps in zoomed-in area). (a) SaberRDresults. (b) Eventcorrelated criterion. (c) LTE. (d) N-R iteration count.

TABLE I COMPARISON OF VTS SCHEMES' EFFICIENCY ON CPU

|       |          | Speedup |       |       |       |                 |                 |
|-------|----------|---------|-------|-------|-------|-----------------|-----------------|
| Level | SaberRD® | FTS     | Event | LTE   | N-R   | Sp <sub>1</sub> | Sp <sub>2</sub> |
| 5-L   | 247      | 218     | 28.1  | 29.3  | 13.8  | 18              | 16              |
| 7-L   | 367      | 303     | 44.6  | 40.3  | 20.0  | 18              | 15              |
| 9-L   | 608      | 557     | 65.1  | 50.4  | 42.5  | 14              | 13              |
| 17-L  | _        | 653     | 183.6 | 132.9 | 60.1  | _               | 11              |
| 33-L  | -        | 1102    | 555.2 | 305.3 | 142.6 | -               | 8               |

100 ms duration by CPU are summarized in Table I. With circuit partitioning, MMCs having more than nine voltage levels can be computed, which SaberRD is unable to achieve. The N-R iteration method has the highest efficiency, gaining a speedup  $Sp_1$  and  $Sp_2$  of 18 and 16 times over SaberRD and FTS for 5-L MMC, respectively.

In Fig. 9, device-level results are given from the 9-L MMC whose time-step is controlled by N-R iteration count. Fig. 9(a) and (b) gives the IGBT turn-ON and turn-OFF waveforms, which show that the density of points is higher during the transient stage, and it is also varying, meaning that the MMC is computed at a variable frequency. The diode reverse recovery waveforms in Fig. 9(c) also demonstrate the same phenomenon. The power loss eventually induces junction temperature variation, as shown in Fig. 9(d). The temperature of the lower IGBT/diode  $T_{vj2}$  surges to over 100 °C immediately after the converter is started, but it is still within normal operation region. On the other hand, the temperature of upper IGBT/diode  $T_{vj1}$  is much lower, and finally, they all reach around 30 °C. SaberRD verifies these results by showing identical waveforms.



Fig. 9. IGBT/diode NBM VTS control. (a) IGBT turn-ON. (b) IGBT turn-OFF. (c) Diode reverse recovery. (d) Junction temperatures.

| TABLE II                                         |   |
|--------------------------------------------------|---|
| TSSM-BASED HVDC SYSTEM SPEEDUPS WITH 1 S DURATIC | N |
|                                                  |   |
|                                                  |   |

| MMC   | CPU I   | <b>HBSM</b> $t$ | $_{exe}$ /s     | <b>CPU FBSM</b> t <sub>exe</sub> /s |       |                 |  |  |  |
|-------|---------|-----------------|-----------------|-------------------------------------|-------|-----------------|--|--|--|
| Level | FTS VTS |                 | $\mathbf{Sp}_3$ | FTS                                 | VTS   | Sp <sub>4</sub> |  |  |  |
| 51-L  | 23.2    | 0.91            | 25              | 80.3                                | 2.80  | 29              |  |  |  |
| 101-L | 41.6    | 1.53            | 27              | 155.3                               | 5.50  | 28              |  |  |  |
| 201-L | 77.1    | 3.67            | 21              | 310.2                               | 10.63 | 29              |  |  |  |
| 401-L | 149.4   | 5.33            | 28              | 628.7                               | 21.03 | 29              |  |  |  |

Meanwhile, the LTE is suitable for accelerating TSSM-based MMC-HVDC systems employing the most prevalent TSSM. With a relative error of arm currents less than 5%, and the time-step range of 1 to 30  $\mu$ s, speedups around 30 are gained in Table II for the two types of MMCs.

# C. Nonlinear MTDC System Preview on GPU

The proposed VTS-NBM MMC model is also applied in system studies of the  $\pm 100$ -kV 4-terminal dc grid given in Fig. 7(a), and the parameters are listed in the Appendix. In Fig. 10, results of long-term pole–pole fault with a resistance of 1  $\Omega$  occurring at t = 5 s in HVdc Link1 are given. To show the response of a typical point-to-point HVdc system as well as the importance of device-level switch models in determining the fault current, the dc line  $TL_3$  is disconnected, and the ac sides of both converters are supported by a stiff grid with a line-line voltage of 135 kV. Immediately after detecting the fault, all IG-BTs are blocked. However, with HBSM topology, as given in Fig. 10(a), the dc system is still interactive with the ac grid, because the freewheeling diodes are operating as a rectifier. Thus, a residual line-line voltage of 25 kV is observed, and also a dc current of 9.4 kA. The fact that the residual current is dependent on the resistance of the switch leads to various values in PSCAD/EMTDC, while with NBM, the current is definitive. As can be seen, since a fixed ON-state resistance of 0.4 m $\Omega$  yields closer results to that of the nonlinear IGBT/diode pair, it is set as the default in the system-level simulation tool. In Fig. 10(b), the blocking capability of FBSM-MMC enables the dc line-line voltages and currents eventually remain at 0. PSCAD/EMTDC



Fig. 10. HVdc-link1 supported by stiff ac grids pole–pole fault ((a), (b) left: proposed model, right: PSCAD/EMTDC). (a) HBSM-MMC response. (b) FBSM-MMC response. (c) IGBT junction temperatures of HBSM (left) and FBSM (right).

shows similar results, where a distinct fixed switch resistance leads to slightly different results. The IGBT junction temperatures during the protection process are given in Fig. 10(c). With HBSM, the upper IGBT junction temperature  $T_{vj1}$  drops after it is blocked; however, the lower IGBT witnesses a dramatic rise in the junction temperature  $T_{vj2}$  to over 140 °C since its freewheeling diode is conducting a large current, meaning other protection measures are required. On the other hand, the IGBT junction temperatures in FBSM are low since the SM is able to block the fault completely.

As a comparison, dc fault in the HBSM-MMC-based HVdclink1 with the rectifier supported by OWF<sub>1</sub> was also tested and the results are shown in Fig. 11. Before the fault, the ac side voltage of MMC<sub>1</sub> has a peak phase voltage of around 90 kV, which fits with the control target  $V_{gd}^*$ , as given in Fig. 11(a). Then, it reduces to around 0 after the MMC and converters within the DFIG block following the dc fault at t = 5 s. On the dc side, the rectifier has a slight margin over its counterpart in the voltage under normal operation. However, once both MMCs are blocked after the fault, their relationship is reversed: the voltage on the dc side of MMC<sub>3</sub> turns out to be higher since it



Fig. 11. OWF-sustained HVdc-link1 pole-pole fault response by HBSM-MMC (left: proposed model, right: PSCAD/EMTDC). (a) Grid 1 ac voltage. (b) DC voltages. (c) DC currents.

is operating as a rectifier while MMC<sub>1</sub> loses a stiff grid voltage support on its ac side under this scenario, as shown in Fig. 11(b). Fig. 11(c) shows the dc currents.  $I_{dc3}$  rises to around 10 kA after fault while  $I_{dc1}$  decreases to 0 due to the absence of a stiff ac grid. These results are validated by PSCAD/EMTDC simulation as it produces identical trends in the waveforms.

In Fig. 12, the impact of wind speed on MTDC system is shown. Started at t = 12 s, the wind speed at OWF<sub>1</sub> rises linearly from 8 to 11 m/s in 1 s; while the reverse is true for OWF<sub>2</sub>. It is observed that the voltage at Grid 1 maintains stable due to the proper functioning of MMC<sub>1</sub>, and so does the voltage at OWF<sub>2</sub>, both of which are close to sinusoidal waveforms. Due to a stronger wind, the current  $I_{grid1}$  fed by OWF<sub>1</sub> more than doubled, while its counterpart has the opposite trend. As for a single wind turbine, the power of a DFIG at OWF<sub>1</sub> increases from approximately 750 kW to 2.0 MW, while those at  $OWF_2$ has the exact opposite output. The variations in wind speed also affect the power flow in the dc grid, and the dc line voltage as well. The power delivered by  $MMC_1$  and  $MMC_2$  has the same trend to a single DFIG in the respective OWFs, other than the fact that values are 100 times larger, and the power received by the two inverters also exchanged position. Meanwhile, minor perturbations are witnessed in dc voltages, but due to the voltage regulation function of the inverter stations, they are recovered immediately. The dc line currents are also given, which show that on the line  $TL_3$ , the current  $I_{exc}$  is not 0 and its numerical relationship with  $I_{dc1}$  and  $I_{dc2}$  fits well with (20).

Table III summarizes the execution times of different NBMbased MMCs with two time-stepping schemes by the processors, where the CPU execution times with FTS are estimated by running a shorter period using 10 ns as the time-step. Tested under a switching frequency of 200 Hz, the proposed VTS scheme

| TABLE III                                                                                  |
|--------------------------------------------------------------------------------------------|
| EXECUTION TIME ( <i>t</i> <sub>EXE</sub> ) OF A 4-T NBM-BASED DC SYSTEM FOR 0.1 S DURATION |

| MMC   | CPU HBSM t <sub>exe</sub> /s |      | GPU HBSM t <sub>exe</sub> /s |     | Speedup | CPU FBSM t <sub>exe</sub> /s |                 | GPU FBSM t <sub>exe</sub> /s |      |                 | Speedup |     |                 |                    |
|-------|------------------------------|------|------------------------------|-----|---------|------------------------------|-----------------|------------------------------|------|-----------------|---------|-----|-----------------|--------------------|
| Level | FTS                          | VTS  | $\mathbf{Sp}_5$              | FTS | VTS     | $\mathbf{Sp}_{6}$            | $\mathbf{Sp}_7$ | FTS                          | VTS  | $\mathbf{Sp}_8$ | FTS     | VTS | Sp <sub>9</sub> | $\mathbf{Sp}_{10}$ |
| 51-L  | 11539                        | 349  | 33                           | 554 | 23.2    | 24                           | 497             | 25471                        | 571  | 45              | 1181    | 79  | 15              | 322                |
| 101-L | 24442                        | 821  | 30                           | 550 | 22.1    | 25                           | 1106            | 51421                        | 1138 | 45              | 1155    | 133 | 8.7             | 387                |
| 201-L | 50484                        | 1574 | 32                           | 548 | 26.4    | 21                           | 1912            | 94309                        | 2201 | 43              | 1355    | 154 | 8.8             | 612                |
| 401-L | 102084                       | 3177 | 32                           | 574 | 75.0    | 7.7                          | 1361            | 186410                       | 4409 | 42              | 1203    | 197 | 6.1             | 946                |



Fig. 12. MTDC system dynamics with wind farms (left: proposed model, right: PSCAD/EMTDC).

helps the CPU to achieve over 30 times speedup for both HBSM and FBSM MMCs. The GPU gains 6 to 15 times speedup with FBSM-MMC and about 8 to 24 times for HBSM-MMCs with various voltage levels. Therefore, the proposed VTS scheme implemented on GPU is able to attain a dramatic speedup over the CPU with the FTS scheme, e.g., for the two types of MMCs, the speedup  $Sp_7$  and  $Sp_{10}$  could reach approximately 2000 and 1000 times, respectively. It is noticed that with an increase of the MMC voltage level, the speedup that the VTS method could gain decreases, especially in the case of HBSM-based MMC on GPU. The asynchronous switching behaviors of the IGBTs account for this phenomenon. The occurrence of the switching process of an IGBT in any SM will bring down the overall time-step of the entire MMC since the ultimate time-step is determined by the minimum value; therefore, a higher level MMC results in a higher chance of switching process which forces the adoption of a small time-step and consequently leads to a lower speedup.

# VI. CONCLUSION

A VTS MMC model with nonlinear device-level power semiconductor switches was presented in this paper to accelerate the parallel EMT simulation of a detailed MTDC grid on the GPU. The high-order IGBT and diode models were more accurate and reveal information unavailable in the detailed model based on two-state switch representation, but their nonlinearity may lead to an inefficient solution. Thus, circuit partitioning is applied to separate MMC SMs from the arms, which consequently creates a large number of identical circuits corresponding to a smaller matrix dimension. The SIMT mode enables the GPU to conduct massively parallel execution and thus avoids sequential calculations of the partitioned circuits. Meanwhile, several VTS schemes were proposed, and their application scenarios to different MMC models were analyzed. The event-correlated criterion and LTE are general methods regardless of the linearity of the system they apply to, while the N-R iteration count is specific to the nonlinear MMC. A hybrid FTS-VTS scheme was proposed to mitigate the computational burden of the overall system, and the simulation conducted on different processors gained significant speedups compared with the FTS scheme. The execution time of an MTDC system by GPU also indicates that system-level EMT simulation involving highly complex nonlinear device-level IGBT/diode models is feasible when massive parallelism is utilized.

#### **APPENDIX**

The MMC parameters are: rated power 400 MW, ac voltage 135 kV, nominal dc voltage  $V_{dc} = \pm 100$  kV, dc yard inductor 100 mH/1 $\Omega$ , SM capacitor 20 mF, arm inductor 50 mH, MMC voltage level: 201 (when  $V_{dc} = \pm 100$  kV).

DC line TL<sub>1</sub> $\sim$ TL<sub>3</sub> parameters: 10 mΩ/km, 0.1 mH/km, 0.3  $\mu$ F/km, length 100 km.

The BSM 300GA 160D IGBT module:  $v_t = 6.3$  V,  $v_{on} = 0.8$  V, x = 0.97, y = 1.43, z = 1.37,  $a_1 = 0.02$ ,  $b_1 = 0.004$ ,  $c_{ge} = 40$  nF,  $r_{tail} = 1\mu\Omega$ ,  $i_{rat} = 0.05$ ,  $r_g = 5$   $\Omega$ ; freewheeling diode:  $L_d = 10$  pH,  $r_{on} = 10$  m $\Omega$ ,  $v_{on} = 0.7$  V,  $r_{Ld} = 12.8$   $\mu\Omega$ , k = 98741.

#### REFERENCES

- G. P. Adam and B. W. Williams, "Half- and full-bridge modular multilevel converter models for simulations of full-scale HVDC links and multiterminal DC grids," *IEEE J. Emerg. Sel. Topics Power Electron.*, vol. 2, no. 4, pp. 1089–1108, Dec. 2014.
- [2] S. Debnath and M. Chinthavali, "Numerical stiffness based simulation of mixed transmission systems," *IEEE Trans. Ind. Electron.*, vol. 65, no. 12, pp. 9215–9224, Dec. 2018.
- [3] P. T. Lewis, B. M. Grainger, H. A. Al Hassan, A. Barchowsky, and G. F. Reed, "Fault section identification protection algorithm for modular multilevel converter-based high voltage DC with a hybrid transmission corridor," *IEEE Trans. Ind. Electron.*, vol. 63, no. 9, pp. 5652–5662, Sep. 2016.
- [4] M. Amin, A. Rygg, and M. Molinas, "Self-synchronization of wind farm in an MMC-based HVDC system: A stability investigation," *IEEE Trans. Energy Convers.*, vol. 32, no. 2, pp. 458–470, Jun. 2017.
- [5] Working Group B4.57, Guide for the Development of Models for HVDC Converters in a HVDC Grid. CIGRE, 2014.
- [6] U. N. Gnanarathna, A. M. Gole, and R. P. Jayasinghe, "Efficient modeling of modular multilevel HVDC converters (MMC) on electromagnetic transient simulation programs," *IEEE Trans. Power Del.*, vol. 26, no. 1, pp. 316–324, Jan. 2011.
- [7] N. Ahmed, L. Ängquist, S. Norrga, A. Antonopoulos, L. Harnefors, and H. P. Nee, "A computationally efficient continuous model for the modular multilevel converter," *IEEE J. Emerg. Sel. Topics Power Electron.*, vol. 2, no. 4, pp. 1139–1148, Dec. 2014.
- [8] N. R. Chaudhuri, R. Oliveira, and A. Yazdani, "Stability analysis of vectorcontrolled modular multilevel converters in linear time-periodic framework," *IEEE Trans. Power Electron.*, vol. 31, no. 7, pp. 5255–5269, Jul. 2016.
- [9] A. Beddard, C. E. Sheridan, M. Barnes, and T. C. Green, "Improved accuracy average value models of modular multilevel converters," *IEEE Trans. Power Del.*, vol. 31, no. 5, pp. 2260–2269, Oct. 2016.
- [10] A. Christe and D. Dujic, "Virtual submodule concept for fast seminumerical modular multilevel converter loss estimation," *IEEE Trans. Ind. Electron.*, vol. 64, no. 7, pp. 5286–5294, Jul. 2017.
- [11] S. Rodrigues, A. Papadopoulos, E. Kontos, T. Todorcevic, and P. Bauer, "Steady-state loss model of half-bridge modular multilevel converters," *IEEE Trans. Ind. Appl.*, vol. 52, no. 3, pp. 2415–2425, May 2016.
- [12] A. F. Cupertino, J. V. M. Farias, H. A. Pereira, S. I. Seleme, and R. Teodorescu, "Comparison of DSCC and SDBC modular multilevel converters for STATCOM application during negative sequence compensation," *IEEE Trans. Ind. Electron.*, vol. 66, no. 3, pp. 2302–2312, Mar. 2019.
- [13] Y. Zhang, H. Wang, Z. Wang, Y. Yang, and F. Blaabjerg, "Simplified thermal modeling for IGBT modules with periodic power loss profiles in modular multilevel converters," *IEEE Trans. Ind. Electron.*, vol. 66, no. 3, pp. 2323–2332, Mar. 2019.
- [14] J. J. Sanchez-Gasca, R. D'Aquila, W. W. Price, and J. J. Paserba, "Variable time step, implicit integration for extended-term power system dynamic simulation," *Proc. IEEE Power Ind. Comput. Appl. Conf.*, May 7–1, 1995, pp. 183–189.
- [15] J. Yao, T. Wang, and J. Roychowdhury, "An efficient time step control method in transient simulation for DAE system," *Proc. IEEE 21st IEEE Int. Conf. Electron., Circuits Syst.*, Dec. 7–10, 2014, pp. 44–47.

- [16] S. Kumashiro, T. Kamei, A. Hiroki, and K. Kobayashi, "An accurate metric to control time step of transient device simulation by matrix exponential method," in *Proc. IEEE Int. Conf. Simul. Semicond. Process. Devices*, Sep. 7–9, 2017, pp. 37–40.
- [17] L. O. Chua and P. M. Lin, Computer-Aided Analysis of Electronic Circuits: Algorithms and Computational Techniques. Englewood Cliffs, NJ, USA: Prentice-Hall, 1975.
- [18] A. Courtay, "MAST power diode and thyristor models including automatic parameter extraction," *SABER User Group Meeting*, Brighton, U.K., Sep. 1995.
- [19] M. Zhang, A. Courtay, and Z. Yang, "An improved behavioral IGBT model and its characterization tool," *Proc. IEEE Electron Devices Meeting*, Hong Kong, 2000, pp. 142–145.
- [20] NVIDIA Tesla V100 Datasheet. Dec. 2017. [Online]. Available: http:// www.nvidia.com/content/PDF/Volta-Datasheet.pdf
- [21] M. Hagiwara and H. Akagi, "Control and experiment of pulsewidthmodulated modular multilevel converters," *IEEE Trans. Power Electron.*, vol. 24, no. 7, pp. 1737–1746, Jul. 2009.
- [22] H. W. Dommel, "Digital computer solution of electromagnetic transients in single-and multiphase networks," *IEEE Trans. Power App. Syst.*, vol. PAS-88, no. 4, pp. 388–399, Apr. 1969.



Ning Lin (S'17) received the B.Sc. and M.Sc. degrees in electrical engineering from Zhejiang University, Hangzhou, China, in 2008 and 2011, respectively. He is currently working toward the Ph.D. degree in energy systems with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada.

He was a Research Engineer on FACTS and HVDC control and protection in China. His research interests include real-time simulation, parallel simulation of power electronics, and

field-programmable gate arrays.



Venkata Dinavahi (S'94–M'00–SM'08) received the B.Eng. degree in electrical engineering from Visveswaraya National Institute of Technology, Nagpur, India, in 1993, the M.Tech. degree in electrical engineering from the Indian Institute of Technology Kanpur, Kanpur, India, in 1996, and the Ph.D. degree in electrical and computer engineering from the University of Toronto, ON, Canada, in 2000.

He is currently a Professor with the Department of Electrical and Computer Engineering,

University of Alberta, Edmonton, AB, Canada. His research interests include real-time simulation of power systems and power electronic systems, electromagnetic transients, device-level modeling, large-scale systems, and parallel and distributed computing.