June 9, 2023

Statistical methods of interest: Data Assimilation

By Jacopo Paglia, PhD · 2 minute read

Data assimilation combines mathematics, statistics and computer science techniques to achieve an estimate of a system based on available information. It was initially developed for applications in meteorology, where scientists have to deal with a number of challenges including large amounts of data, measurements that carry uncertainty, and a system continuously evolving in time. Applications of data assimilation techniques can now be found in many fields such as geoscience, finance, biology and medicine, among others.

Data assimilation allows us to integrate measurements from various sources, while also handling uncertainties resulting from measurements to obtain a better estimate of the current status of the system, or to improve forecasts of future states of the system. This is typically conducted by starting with an initial estimation of the state of the system and then updating it when measurements become available.

Seeing data assimilation from this perspective demonstrates a clear connection with Bayesian theory. There is indeed an field of Bayesian theory that deals specifically with integration of data which is often referred to as Bayesian inversion. This is mathematically represented by vector of values (x1,..,xn) that is observed through noisy measurements (d1,..,dn). The scope of bayesian inversion is to get an estimation of the vector (x1,..,xn) using measurements and laws that regulate the interaction between the elements of the vector and between the vector and the measurements. When building these models, it is thus extremely important to have a good understanding of what we are modeling and seek assistance from experts in the field to better understand the laws that govern the system we are modeling.

Bayesian inversion can be divided in three main areas:

Filtering, where the interest is on getting an estimate of the current state xt given all the measurement up to time t
Smoothing, that aims to estimate the state xt using measurements collected in an time interval that starts before time t and ends after t.
Prediction, focusing on estimating future state x t+k, given data up to current time t

Within each of these three areas there exists a large number of algorithms. Of note is the Kalman filter, arguably the most infamous filtering algorithm. The choice of which algorithm to use depends on the scope of what you are trying to achieve and the phenomenon we are modelling.

One of the many fields where data assimilation can bring value is cybersecurity, where huge amounts of behavioral data are collected every day but are often not applied to their maximum value. At Praxis Security Labs, we are always finding innovative solutions for our customers, and data assimilation is one approach we use to help them get the most out of the data collected.

To learn more about what we can do to help you make the most of your data, schedule a meeting with our CEO, Kai Roer, or with our Director of Research, Thea Mannix using the link below.

Contact Praxis Security Labs