Statistical machine learning and dissolved gas analysis: a review

Piotr Mirowski and Yann LeCun

IEEE Transactions on Power Delivery, vol. 27, no. 4, October 2012

Dissolved gas analysis (DGA) of the insulation oil of power transformers is an investigative tool to monitor their health and to detect impending failures by recognizing anomalous patterns of DGA concentrations. We handle the failure prediction problem as a simple data-mining task on DGA samples, optionally exploiting the transformer’s age, nominal power and voltage, and consider two approaches: 1) binary classification and 2) regression of the time to failure. We propose a simple logarithmic transform to preprocess DGA data in order to deal with long-tail distributions of concentrations. We have reviewed and evaluated 15 standard statistical machine-learning algorithms on that task, and reported quantitative results on a small but published set of power transformers and on proprietary data from thousands of network transformers of a utility company. Our results confirm that nonlinear decision functions, such as neural networks, support vector machines with Gaussian kernels, or local linear regression can theoretically provide slightly better performance than linear classifiers or regressors. Software and part of the data are available at https://github.com/piotrmirowski/DGA.

Comparison of six regression or classification techniques on a simplified 2-D version of the Duval dataset consisting of log-transformed and standardized values of DGA measures.

Paper link: NYU, code, appendix.