The Exponentially Weighted Moving Average (EWMA) algorithm is the simplest discrete-time low-pass filter. It generates an output $y_i$ in the i-th iteration that corresponds to a scaled version of the current input $x_i$ and the previous output $y_{i-1}$.
$$\boxed{y_i=\left \{ \begin{matrix} \; \; \; \; \; \; 0, \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; i=0 \\ \alpha \cdot x_i + (1-\alpha) \cdot y_{i-1}, \; \; \; \; i=1,2.. \end{matrix} \right.}$$
The smoothing factor, $\alpha \in [0,1]$, indicates the normalized weight of the new input in the output. For example, an $\alpha=0.03$ implies that each new input will contribute a 3% to the output, while the previous output will contribute a 97%. The boundary values for the smoothing factor are 0 and 1, which implies $y_i=y_{i-1}$ and $y_i= x_i$, respectively. In the following points, we analyze the algorithm from different points of view.
The EWMA could be considered as an Auto Regressive Moving Average (ARMA) filter because it depends on the history of values from both the input and the output. However, if the EWMA equation is developed, it is possible to represent the current output based only on the contributions of past inputs, i.e. Moving Average (MA) filter.
The impulse response h(t) of its Linear Time Invariant (LTI) system equivalent has an infinite duration, which implies that the transfer function H(z) will have finite duration. If the symbol $*$ represents the convolution operand and $u(n)$ corresponds to the step function, it can be stated:
The EWMA could be considered as an Auto Regressive Moving Average (ARMA) filter because it depends on the history of values from both the input and the output. However, if the EWMA equation is developed, it is possible to represent the current output based only on the contributions of past inputs, i.e. Moving Average (MA) filter.
$$y_i = \alpha\cdot x_i + \alpha(1-\alpha)\cdot x_{i-1} + \alpha(1-\alpha)^2 \cdot x_{i-2} +... = \sum_{j=0}^{i} \alpha(1-\alpha)^{i-j} \cdot x_j = \sum_{j=0}^{i} w_{ij} \cdot x_j$$
In the i-th iteration, the output is a weighted sum of each previous input value $x_j$ with $j \in \{0,i\}$, where the scaling corresponds to an exponentially weighted coefficient $w_{ij}= \alpha \cdot (1- \alpha)^{i-j}$.The impulse response h(t) of its Linear Time Invariant (LTI) system equivalent has an infinite duration, which implies that the transfer function H(z) will have finite duration. If the symbol $*$ represents the convolution operand and $u(n)$ corresponds to the step function, it can be stated:
$$y(n)=x(n)*h(n) \; \; \; with \; \; h(n)=a^n \cdot u(n)=\alpha(1-\alpha)^n \cdot u(n)$$
The EWMA algorithm corresponds to the simplest Infinite Impulse Response (IIR) discrete-time filter. The main advantage that IIR systems have over FIR ones is their implementation efficiency. On the other hand, IIR systems are harder to analyze. In order to simplify the analysis, it is imposed that the system has zero initial conditions. Thus, the 2nd order IIR filter corresponds to:
$$y_i = \frac{1}{a_0} \cdot (a_1 \cdot y_{i-1} + b_0 \cdot x_i + b_1 \cdot x_{i-1} ) \rightarrow H(z)=\frac{Y(z)}{X(z)} = \frac{b_0 \cdot z^{-0} + b_1 \cdot z^{-1}}{a_0 \cdot z^{-0} + a_1 \cdot z^{-1}} = \frac{b_0 + b_1 \cdot z^{-1}}{a_0+a_1 \cdot z^{-1}}$$
In the Fig 1., it is presented the simplified and complete Direct Form 1 (DF1) of the filter. In the case of the EWMA, the coefficients have fixed values in terms of the smoothing factor that correspond to: $a_0=1, \; \; a_1=1-\alpha, \; \; b_0=\alpha, \; \; b_1=0$. Applying those constraints, the transfer function becomes:
The inclusion of the previous outputs in the computation of the current output, $a_j\neq0$, creates a feedback that makes the filter an IIR. Besides, the feedback coefficients have not negative values, $a_j\geq0$; hence, the systems could be unstable. In order to study the stability of the system, the Bounded-Input Bounded-Output (BIBO) criterion is applied, which is satisfied when the Region of Convergence (ROC) includes the unit circle. The EWMA is a casual system; thus, the ROC must include infinity as well. In order that the ROC includes infinity and the unit circle without the poles, all poles must be located within the unit circle of the z-plane, i.e. all poles of the transfer function must have an absolute value smaller than one, which is satisfied when the smoothing factor belongs to the range [0,1].
$$H(z) = \frac{\alpha}{1-(1-\alpha)\cdot z^{-1}} \rightarrow \infty \; \Leftrightarrow \; 1-(1-\alpha) \cdot z^{-1} = 0 \; \Leftrightarrow \; z=1-\alpha $$
$$|z_p |<1 ⇔ |\alpha-1|<1 ⇔ 0<\alpha<2$$
In the Fig 3., the left sub-plot presents the weighting coefficients for the EWMA of length 20 for different values of $\alpha$, namely 0.05 (blue), 0.1 (green) and 0.5 (red). A large smoothing factor gives larger weight to the last inputs, i.e. high reactiveness. On the other hand, a small smoothing factor weights more equally all the history of inputs, i.e. high stability. Besides, the memory on this case could be too long, which implies that the situation that occurred long time ago is still affecting the system (high persistency), and is slow in reacting to condition changes. Therefore, the choice of the optimum smoothing factor is a delicate task that must balance the reactiveness and stability of the system. In Fig 3., the right sub-plot presents the EWMA output for a step function, i.e. the output of the system when the input rises from 0 to 1. In this case, the reactiveness effect is more clearly presented. In fact, for a smoothing factor of 0.05, it is needed almost 100 samples/iterations to converge.
The effect of each input in the final output decay exponentially until the system resolution hides the contribution of very old input values. This idea corresponds to the discrete-time equivalent of the first order exponential decay, which is represented by the time constant $\tau$ that corresponds to the needed time for a 63.21% decay (2dB) in amplitud. The relation between the smoothing factor and the time constant can be expressed with the relation: $\alpha=1-exp(-T/\tau) \rightarrow \tau/T=-1/Ln(1-\alpha)$. After 4-5 time constants, the contribution of the past input to the current output is 98.17%-99.33%; hence, it can be neglected. For example, for $\alpha=0.05$, the constant time corresponds to $\tau=-1/ln(1-0.05)\simeq 20$.
The EWMA can be also studied as a predictor-corrector model, where the current output corresponds to the previous output, $y_i = y_{i-1}$, plus a correction term $\Delta y_i$ that accounts for the behavior that is not modeled in the system, i.e. it is the surprise or generalization error. When $\alpha \rightarrow 0 \Rightarrow \Delta y_i \rightarrow 0 \Rightarrow y_i = y_{i-1}$, there is not suprise/error. This approach can be considered as a simple special case of the Kalman filter.
$$y_i = y_{i-1} + \Delta y_i = y_{i-1} + \alpha \cdot (x_i - y_{i-1} )$$
Apart from the sampling interval, this system has only one tuning parameter, and it requires only one memory register. Hence, it is a low-pass filter that is easy to implement and requires few computation resources. Thus, it would be used for smoothing data where low complexity is needed, e.g. real-time applications.