Marginal and conditional node monitors over the last observation of the data for all vertices of a Bayesian network using the full dataset
Value
A dataframe including the names of the vertices, the marginal node monitors and the conditional node monitors. It also return two plots where vertices with a darker color have a higher marginal z-score or conditional z-score, respectively, in absolute value.
Details
Consider a Bayesian network over variables \(Y_1,\dots,Y_m\) and suppose a dataset \((\boldsymbol{y}_1,\dots,\boldsymbol{y}_n)\) has been observed, where \(\boldsymbol{y}_i=(y_{i1},\dots,y_{im})\) and \(y_{ij}\) is the i-th observation of the j-th variable. Let \(p_n\) denote the marginal density of \(Y_j\) after the first \(n-1\) observations have been processed. Define $$E_n = \sum_{k=1}^Kp_n(d_k)\log(p_n(d_k)),$$ $$V_n = \sum_{k=1}^K p_n(d_k)\log^2(p_n(d_k))-E_n^2,$$ where \((d_1,\dots,d_K)\) are the possible values of \(Y_j\). The marginal node monitor for the vertex \(Y_j\) is defined as $$Z_j=\frac{-\log(p_n(y_{ij}))- E_n}{\sqrt{V_n}}.$$ Higher values of \(Z_j\) can give an indication of a poor model fit for the vertex \(Y_j\).
The conditional node monitor for the vertex \(Y_j\) is defined as $$Z_j=\frac{-\log(p_n(y_{nj}|y_{n1},\dots,y_{n(j-1)},y_{n(j+1)},\dots,y_{nm}))- E_n}{\sqrt{V_n}},$$ where \(E_n\) and \(V_n\) are computed with respect to \(p_n(y_{nj}|y_{n1},\dots,y_{n(j-1)},y_{n(j+1)},\dots,y_{nm})\). Again, higher values of \(Z_j\) can give an indication of a poor model fit for the vertex \(Y_j\).
References
Cowell, R. G., Dawid, P., Lauritzen, S. L., & Spiegelhalter, D. J. (2006). Probabilistic networks and expert systems: Exact computational methods for Bayesian networks. Springer Science & Business Media.
Cowell, R. G., Verrall, R. J., & Yoon, Y. K. (2007). Modeling operational risk with Bayesian networks. Journal of Risk and Insurance, 74(4), 795-827.