# Kalman filtering on gridded data

Setup environment for tutorials

This notebook gives an example of Met.no data post-processing to correct temperature forecasts based on comparison to observations. The following steps are described:

**Loading required python modules and setting path to Shyft installation**

Setup steps is about creating synthetic data, and backtesting those so that we have a known forecast that gives a certain response at the four observation points

**Generate synthetic data for temperature observation time-series****Transform observations from set to grid (Kriging)****Create 3 forecasts sets for the 1x1 km grid**

grid-pp steps is about orchestrating a grid-pp algorithm given our syntethic data from above

**Transform forecasts from grid to observation points (IDW)****Calculate the bias time-series using Kalman filter on the difference of observation and forecast set at the observation points****Transform bias from set to grid (Kriging) and apply bias to the grid forecast**

Final steps to plot and test the results from the grid-pp steps

**Transform corrected forecasts grid to the observation points (IDW)****Plot the results and bias**

## 1. Loading required python modules and setting path to Shyft installation

```
# first you should import the third-party python modules which you'll use later on
# the first line enables that figures are shown inline, directly in the notebook
%matplotlib inline
import os
from os import path
import sys
from matplotlib import pyplot as plt
```

```
# once the shyft_path is set correctly, you should be able to import shyft modules
import shyft
from shyft.hydrology import shyftdata_dir
import shyft.hydrology as api
import shyft.time_series as sts
# if you have problems here, it may be related to having your LD_LIBRARY_PATH
# pointing to the appropriate libboost_python libraries (.so files)
from shyft.hydrology.repository.default_state_repository import DefaultStateRepository
from shyft.hydrology.orchestration.configuration import yaml_configs
from shyft.hydrology.orchestration.simulators.config_simulator import ConfigSimulator
from shyft.time_series import Calendar
from shyft.time_series import deltahours
from shyft.time_series import TimeAxis
from shyft.time_series import TimeSeries
from shyft.time_series import time_shift
from shyft.hydrology import TemperatureSource
from shyft.hydrology import TemperatureSourceVector
from shyft.hydrology import GeoPoint
from shyft.hydrology import GeoPointVector
from shyft.hydrology import bayesian_kriging_temperature
from shyft.hydrology import BTKParameter
from shyft.hydrology import idw_temperature
from shyft.hydrology import IDWTemperatureParameter
from shyft.hydrology import KalmanFilter
from shyft.hydrology import KalmanState
from shyft.hydrology import KalmanBiasPredictor
from shyft.time_series import create_periodic_pattern_ts
from shyft.time_series import POINT_AVERAGE_VALUE as stair_case
```

```
# now you can access the api of shyft with tab completion and help, try this:
#help(api.GeoPoint) # remove the hashtag and run the cell to print the documentation of the api.GeoPoint class
#api. # remove the hashtag, set the pointer behind the dot and use
# tab completion to see the available attributes of the shyft api
```

## Setup: 1. Generate synthetic data for temperature observation time-series

```
# Create time-axis for our syntethic sample
utc = Calendar() # provide conversion and math for utc time-zone
t0 = utc.time(2016, 1, 1)
dt = deltahours(1)
n = 24*3 # 3 days length
#ta = TimeAxisFixedDeltaT(t0, dt, n)
ta = TimeAxis(t0, dt, n) # same as ta, but needed for now(we work on aligning them)
# 1. Create the terrain based geo-points for the 1x1km grid and the observations
# a. Create the grid, based on a syntethic terrain model
# specification of 1 x 1 km
grid_1x1 = GeoPointVector()
for x in range(10):
for y in range(10):
grid_1x1.append(GeoPoint(x*1000, y*1000, (x+y)*50)) # z from 0 to 1000 m
# b. Create the observation points, for metered temperature
# reasonable withing that grid_1x1, and with elevation z
# that corresponds approximately to the position
obs_points = GeoPointVector()
obs_points.append(GeoPoint( 100, 100, 10)) # observation point at the lowest part
obs_points.append(GeoPoint(5100, 100, 270 )) # halfway out in x-direction @ 270 masl
obs_points.append(GeoPoint( 100, 5100, 250)) # halfway out in y-direction @ 250 masl
obs_points.append(GeoPoint(10100,10100, 1080 )) # x-y at max, so @1080 masl
# 2. Create time-series having a constant temperature of 15 degC
# and add them to the syntetic observation set
# make sure there is some reality, like temperature gradient etc.
ts = TimeSeries(ta, fill_value=20.0,point_fx=stair_case) # 20 degC at z_t= 0 meter above sea-level
# assume set temp.gradient to -0.6 degC/100m, and estimate the other values accordingly
tgrad = -0.6/100.0 # in our case in units of degC/m
z_t = 0 # meter above sea-level
# Create a TemperatureSourceVector to hold the set of observation time-series
constant_bias=[-1.0,-0.6,0.7,+1.0]
obs_set = TemperatureSourceVector()
obs_set_w_bias = TemperatureSourceVector()
for geo_point,bias in zip(obs_points,constant_bias):
temp_at_site = ts + tgrad*(geo_point.z-z_t)
obs_set.append(TemperatureSource(geo_point,temp_at_site))
obs_set_w_bias.append(TemperatureSource(geo_point,temp_at_site + bias))
```

## Setup 2. Transform observation with bias to grid using kriging

```
# Generate the observation grid by kriging the observations out to 1x1km grid
# first create idw and kriging parameters that we will utilize in the next steps
# kriging parameters
btk_params = BTKParameter() # we could tune parameters here if needed
# idw parameters,somewhat adapted to the fact that we
# know we interpolate from a grid, with a lot of neigbours around
idw_params = IDWTemperatureParameter() # here we could tune the paramete if needed
idw_params.max_distance = 20*1000.0 # max at 10 km because we search for max-gradients
idw_params.max_members = 20 # for grid, this include all possible close neighbors
idw_params.gradient_by_equation = True # resolve horisontal component out
# now use kriging for our 'syntethic' observations with bias
obs_grid = bayesian_kriging_temperature(obs_set_w_bias,grid_1x1,ta.fixed_dt,btk_params)
# if we idw/btk back to the sites, we should have something that equals the with_bias:
# we should get close to zero differences in this to-grid-and-back operation
back_test = idw_temperature(obs_grid, obs_points, ta.fixed_dt, idw_params) # note the ta.fixed_dt here!
for bt,wb in zip(back_test,obs_set_w_bias):
print("IDW Diff {} : {} ".format(bt.mid_point(),abs((bt.ts-wb.ts).values.to_numpy()).max()))
#back_test = bayesian_kriging_temperature(obs_grid, obs_points, ta, btk_params)
#for bt,wb in zip(back_test,obs_set_w_bias):
# print("BTK Diff {} : {} ".format(bt.mid_point(),abs((bt.ts-wb.ts).values.to_numpy()).max()))
```

Output:

```
IDW Diff GeoPoint(100.0,100.0,10.0) : 0.0268954741303169
IDW Diff GeoPoint(5100.0,100.0,270.0) : 0.03278720737025864
IDW Diff GeoPoint(100.0,5100.0,250.0) : 0.056529126054066126
IDW Diff GeoPoint(10100.0,10100.0,1080.0) : 0.004978394987212198
```

## Setup 3. Create 3 forecasts sets for the 1x1 km grid

```
# Create a forecast grid by copying the obs_grid time-series
# since we know that idw of them to obs_points will give approx.
# the obs_set_w_bias time-series
# for the simplicity, we assume the same forecast for all 3 days
fc_grid = TemperatureSourceVector()
fc_grid_1_day_back = TemperatureSourceVector() # this is previous day
fc_grid_2_day_back = TemperatureSourceVector() # this is fc two days ago
one_day_back_dt = deltahours(-24)
two_days_back_dt = deltahours(-24*2)
noise_bias = [0.0 for i in range(len(obs_grid))] # we could generate white noise ts into these to test kalman
for fc,bias in zip(obs_grid,noise_bias):
fc_grid.append(TemperatureSource(fc.mid_point(),fc.ts + bias ))
fc_grid_1_day_back.append(
TemperatureSource(
fc.mid_point(),
time_shift(fc.ts + bias, one_day_back_dt) #time-shift the signal back
)
)
fc_grid_2_day_back.append(
TemperatureSource(
fc.mid_point(),
time_shift(fc.ts + bias, two_days_back_dt)
)
)
grid_forecasts = [fc_grid_2_day_back, fc_grid_1_day_back, fc_grid ]
```

## grid-pp: 1. Transform forecasts from grid to observation points (IDW)

```
# Now we have 3 simulated forecasts at a 1x1 km grid
# fc_grid, fc_grid_1_day_back, fc_grid_2_day_back
# we start to do the grid pp algorithm stuff
# - we know the our forecasts have some degC. bias, and we would hope that
# the kalman filter 'learns' the offset
# as a first step we project the grid_forecasts to the observation points
# making a list of historical forecasts at each observation point.
fc_at_observation_points = [idw_temperature(fc, obs_points, ta.fixed_dt, idw_params)\
for fc in grid_forecasts]
historical_forecasts = []
for i in range(len(obs_points)): # correlate obs.point and fc using common i
fc_list = TemperatureSourceVector() # the kalman bias predictor below accepts TsVector of forecasts
for fc in fc_at_observation_points:
fc_list.append(fc[i]) # pick out the fc_ts only, for the i'th observation point
#print("{} adding fc pt {} t0={}".format(i,fc[i].mid_point(),utc.to_string(fc[i].ts.time(0))))
historical_forecasts.append(fc_list)
# historical_forecasts now cntains 3 forecasts for each observation point
```

## grid-pp: 2. Calculate the bias time-series using Kalman filter on the observation set

```
# Create a TemperatureSourceVector to hold the set of bias time-series
bias_set = TemperatureSourceVector()
# Create the Kalman filter having 8 samples spaced every 3 hours to represent a daily periodic pattern
kalman_dt_hours = 3
kalman_dt =deltahours(kalman_dt_hours)
kta = TimeAxis(t0, kalman_dt, int(24//kalman_dt_hours))
# Calculate the coefficients of Kalman filter and
# Create bias time-series based on the daily periodic pattern
for i in range(len(obs_set)):
kf = KalmanFilter() # each observation location do have it's own kf &predictor
kbp = KalmanBiasPredictor(kf)
#print("Diffs for obs", i)
#for fc in historical_forecasts[i]:
# print((fc.ts-obs_set[i].ts).values.to_numpy())
kbp.update_with_forecast(historical_forecasts[i], obs_set[i].ts, kta)
pattern = KalmanState.get_x(kbp.state)
#print(pattern)
bias_ts = create_periodic_pattern_ts(pattern, kalman_dt, ta.time(0), ta)
bias_set.append(TemperatureSource(obs_set[i].mid_point(), bias_ts))
```

## grid-pp: 3. Spread the bias at observation points out to the grid using kriging

```
# Generate the bias grid by kriging the bias out on the 1x1km grid
btk_params = BTKParameter()
btk_bias_params = BTKParameter(temperature_gradient=-0.6, temperature_gradient_sd=0.25, sill=25.0, nugget=0.5, range=5000.0, zscale=20.0)
bias_grid = bayesian_kriging_temperature(bias_set, grid_1x1, ta.fixed_dt, btk_bias_params)
# Correct forecasts by applying bias time-series on the grid
fc_grid_improved = TemperatureSourceVector()
for i in range(len(fc_grid)):
fc_grid_improved.append(
TemperatureSource(
fc_grid[i].mid_point(),
fc_grid[i].ts - bias_grid[i].ts # By convention, sub bias time-series(hmm..)
)
)
```

```
# Check the first value of the time-series. It should be around 15
tx =ta.time(0)
print("Comparison original synthetic grid cell [0]\n\t at the lower left corner,\n\t at t {}\n\toriginal grid: {}\n\timproved grid: {}\n\t vs bias grid: {}\n\t nearest obs: {}"
.format(utc.to_string(tx),
fc_grid[0].ts(tx),
fc_grid_improved[0].ts(tx),
bias_grid[0].ts(tx),
obs_set[0].ts(tx)
)
)
```

Output:

```
Comparison original synthetic grid cell [0]
at the lower left corner,
at t 2016-01-01T00:00:00Z
original grid: 18.985602994491913
improved grid: 19.740020177551433
vs bias grid: -0.7544171830595209
nearest obs: 19.94
```

## Presentation&Test: 8. Finally, Transform corrected forecasts from grid to observation points to see if we did reach the goal of adjusting the forecast (IDW)

```
# Generate the corrected forecast set by Krieging transform of temperature model
fc_at_observations_improved = idw_temperature(fc_grid_improved, obs_points, ta.fixed_dt, idw_params)
fc_at_observations_raw =idw_temperature(fc_grid, obs_points, ta.fixed_dt, idw_params)
```

## 9. Plot the results

```
# Make a time-series plot of temperature sets
for i in range(len(bias_set)):
fig, ax = plt.subplots(figsize=(20, 10))
timestamps = [datetime.datetime.utcfromtimestamp(p) for p in obs_set[i].ts.time_axis.time_points]
ax.plot(timestamps[:-1], obs_set[i].ts.values, label = str(i+1) + ' Observation')
ax.plot(timestamps[:-1], fc_at_observations_improved[i].ts.values, label = str(i+1) + ' Forecast improved')
ax.plot(timestamps[:-1], fc_at_observations_raw[i].ts.values,linestyle='--', label = str(i+1) + ' Forecast (raw)')
#ax.plot(timestamps, bias_set[i].ts.values, label = str(i+1) + ' Bias')
fig.autofmt_xdate()
ax.legend(title='Temperature')
ax.set_ylabel('Temp ($^\circ$C)')
```

```
# Make a scatter plot of grid temperature forecasts at ts.value(0)
x = [fc.mid_point().x for fc in bias_grid]
y = [fc.mid_point().y for fc in bias_grid]
fig, ax = plt.subplots(figsize=(10, 5))
temps = np.array([bias.ts.value(0) for bias in bias_grid])
plot = ax.scatter(x, y, c=temps, marker='o', s=500, lw=0)
plt.colorbar(plot).set_label('Temp bias correction ($^\circ$C)')
```