================================
Kalman filtering on gridded data
================================

:ref:`setup-shyft`

This notebook gives an example of Met.no data post-processing to correct temperature forecasts based on comparison to observations. The following steps are described:

1. **Loading required python modules and setting path to Shyft installation**

Setup steps is about creating synthetic data, and backtesting those
so that we have a known forecast that gives a certain response at the
four observation points

1. **Generate synthetic data for temperature observation time-series**
2. **Transform observations from set to grid (Kriging)**
3. **Create 3 forecasts sets for the 1x1 km grid**

grid-pp steps is about orchestrating a grid-pp algorithm given
our syntethic data from above

1. **Transform forecasts from grid to observation points (IDW)**
2. **Calculate the bias time-series using Kalman filter on the difference of observation and forecast set at the observation points**
3. **Transform bias from set to grid (Kriging) and apply bias to the grid forecast**

Final steps to plot and test the results from the grid-pp steps

1. **Transform corrected forecasts grid to the observation points (IDW)**
2. **Plot the results and bias**

1. Loading required python modules and setting path to Shyft installation
-------------------------------------------------------------------------

.. code-block:: python

    # first you should import the third-party python modules which you'll use later on
    # the first line enables that figures are shown inline, directly in the notebook
    %matplotlib inline
    import os
    from os import path
    import sys
    from matplotlib import pyplot as plt

.. code-block:: python

    # once the shyft_path is set correctly, you should be able to import shyft modules
    import shyft
    from shyft.hydrology import shyftdata_dir
    import shyft.hydrology as api
    import shyft.time_series as sts

    # if you have problems here, it may be related to having your LD_LIBRARY_PATH
    # pointing to the appropriate libboost_python libraries (.so files)
    from shyft.hydrology.repository.default_state_repository import DefaultStateRepository
    from shyft.hydrology.orchestration.configuration import yaml_configs
    from shyft.hydrology.orchestration.simulators.config_simulator import ConfigSimulator
    from shyft.time_series import Calendar
    from shyft.time_series import deltahours
    from shyft.time_series import TimeAxis
    from shyft.time_series import TimeSeries
    from shyft.time_series import time_shift
    from shyft.hydrology import TemperatureSource
    from shyft.hydrology import TemperatureSourceVector
    from shyft.hydrology import GeoPoint
    from shyft.hydrology import GeoPointVector
    from shyft.hydrology import bayesian_kriging_temperature
    from shyft.hydrology import BTKParameter
    from shyft.hydrology import idw_temperature
    from shyft.hydrology import IDWTemperatureParameter
    from shyft.hydrology import KalmanFilter
    from shyft.hydrology import KalmanState
    from shyft.hydrology import KalmanBiasPredictor
    from shyft.time_series import create_periodic_pattern_ts
    from shyft.time_series import POINT_AVERAGE_VALUE as stair_case

.. code-block:: python

    # now you can access the api of shyft with tab completion and help, try this:

    #help(api.GeoPoint) # remove the hashtag and run the cell to print the documentation of the api.GeoPoint class
    #api. # remove the hashtag, set the pointer behind the dot and use 
        # tab completion to see the available attributes of the shyft api

Setup: 1. Generate synthetic data for temperature observation time-series
-------------------------------------------------------------------------

.. code-block:: python

    # Create time-axis for our syntethic sample
    utc = Calendar() # provide conversion and math for utc time-zone
    t0 = utc.time(2016, 1, 1)
    dt = deltahours(1)
    n = 24*3 # 3 days length
    #ta = TimeAxisFixedDeltaT(t0, dt, n)
    ta = TimeAxis(t0, dt, n) # same as ta, but needed for now(we work on aligning them)

    # 1. Create the terrain based geo-points for the 1x1km grid and the observations

    # a. Create the grid, based on a syntethic terrain model
    # specification of 1 x 1 km
    grid_1x1 = GeoPointVector()
    for x in range(10):
        for y in range(10):
            grid_1x1.append(GeoPoint(x*1000, y*1000, (x+y)*50)) # z from 0 to 1000 m

    # b. Create the observation points, for metered temperature
    # reasonable withing that grid_1x1, and with elevation z
    # that corresponds approximately to the position
    obs_points = GeoPointVector()
    obs_points.append(GeoPoint( 100, 100, 10))  # observation point at the lowest part
    obs_points.append(GeoPoint(5100, 100, 270 )) # halfway out in x-direction @ 270 masl
    obs_points.append(GeoPoint( 100, 5100, 250)) # halfway out in y-direction @ 250 masl
    obs_points.append(GeoPoint(10100,10100, 1080 )) # x-y at max, so @1080 masl

    # 2. Create time-series having a constant temperature of 15 degC
    #    and add them to the syntetic observation set
    #    make sure there is some reality, like temperature gradient etc.
    ts = TimeSeries(ta, fill_value=20.0,point_fx=stair_case)  # 20 degC at z_t= 0 meter above sea-level
    # assume set temp.gradient to -0.6 degC/100m, and estimate the other values accordingly                                                   
    tgrad = -0.6/100.0  # in our case in units of degC/m
    z_t = 0 # meter above sea-level

    # Create a TemperatureSourceVector to hold the set of observation time-series
    constant_bias=[-1.0,-0.6,0.7,+1.0]
    obs_set = TemperatureSourceVector()
    obs_set_w_bias = TemperatureSourceVector()
    for geo_point,bias in zip(obs_points,constant_bias):
        temp_at_site =  ts + tgrad*(geo_point.z-z_t)
        obs_set.append(TemperatureSource(geo_point,temp_at_site))
        obs_set_w_bias.append(TemperatureSource(geo_point,temp_at_site + bias))

Setup 2. Transform observation with bias to grid using kriging
--------------------------------------------------------------

.. code-block:: python

    # Generate the observation grid by kriging the observations out to 1x1km grid
    # first create idw and kriging parameters that we will utilize in the next steps
    # kriging parameters
    btk_params = BTKParameter()  # we could tune parameters here if needed
    # idw parameters,somewhat adapted to the fact that we
    #  know we interpolate from a grid, with a lot of neigbours around
    idw_params = IDWTemperatureParameter() # here we could tune the paramete if needed
    idw_params.max_distance = 20*1000.0 # max at 10 km because we search for max-gradients
    idw_params.max_members = 20 # for grid, this include all possible close neighbors
    idw_params.gradient_by_equation = True # resolve horisontal component out 
    # now use kriging for our 'syntethic' observations with bias
    obs_grid = bayesian_kriging_temperature(obs_set_w_bias,grid_1x1,ta.fixed_dt,btk_params)


    # if we idw/btk back to the sites, we should have something that equals the with_bias:
    # we should get close to zero differences in this to-grid-and-back operation
    back_test = idw_temperature(obs_grid, obs_points, ta.fixed_dt, idw_params)  # note the ta.fixed_dt here!
    for bt,wb in zip(back_test,obs_set_w_bias):
        print("IDW Diff {} : {} ".format(bt.mid_point(),abs((bt.ts-wb.ts).values.to_numpy()).max()))
        
    #back_test = bayesian_kriging_temperature(obs_grid, obs_points, ta, btk_params)
    #for bt,wb in zip(back_test,obs_set_w_bias):
    #    print("BTK Diff {} : {} ".format(bt.mid_point(),abs((bt.ts-wb.ts).values.to_numpy()).max()))

Output:

.. code-block:: none

    IDW Diff GeoPoint(100.0,100.0,10.0) : 0.0268954741303169 
    IDW Diff GeoPoint(5100.0,100.0,270.0) : 0.03278720737025864 
    IDW Diff GeoPoint(100.0,5100.0,250.0) : 0.056529126054066126 
    IDW Diff GeoPoint(10100.0,10100.0,1080.0) : 0.004978394987212198 

Setup 3. Create 3 forecasts sets for the 1x1 km grid
----------------------------------------------------

.. code-block:: python

    # Create a forecast grid by copying the obs_grid time-series
    # since we know that idw of them to obs_points will give approx.
    #  the obs_set_w_bias time-series
    #  for the simplicity, we assume the same forecast for all 3 days

    fc_grid = TemperatureSourceVector()
    fc_grid_1_day_back = TemperatureSourceVector() # this is previous day
    fc_grid_2_day_back = TemperatureSourceVector()  # this is fc two days ago
    one_day_back_dt = deltahours(-24)
    two_days_back_dt = deltahours(-24*2)
    noise_bias = [0.0 for i in range(len(obs_grid))] # we could generate white noise ts into these to test kalman
    for fc,bias in zip(obs_grid,noise_bias):
        fc_grid.append(TemperatureSource(fc.mid_point(),fc.ts + bias ))
        fc_grid_1_day_back.append(
            TemperatureSource(
                    fc.mid_point(),
                    time_shift(fc.ts + bias, one_day_back_dt) #time-shift the signal back
                )
            )
        fc_grid_2_day_back.append(
            TemperatureSource(
                    fc.mid_point(),
                    time_shift(fc.ts + bias, two_days_back_dt)
                )
            )

    grid_forecasts = [fc_grid_2_day_back, fc_grid_1_day_back, fc_grid ]

grid-pp: 1. Transform forecasts from grid to observation points (IDW)
---------------------------------------------------------------------

.. code-block:: python

    # Now we have 3 simulated forecasts at a 1x1 km grid
    # fc_grid, fc_grid_1_day_back, fc_grid_2_day_back
    # we start to do the grid pp algorithm stuff
    # - we know the our forecasts have some degC. bias, and we would hope that
    #   the kalman filter 'learns' the offset
    # as a first step we project the grid_forecasts to the observation points
    # making a list of historical forecasts at each observation point.

    fc_at_observation_points = [idw_temperature(fc, obs_points, ta.fixed_dt, idw_params)\
                                for fc in grid_forecasts]
    historical_forecasts = []
    for i in range(len(obs_points)):  # correlate obs.point and fc using common i
        fc_list = TemperatureSourceVector()  # the kalman bias predictor below accepts TsVector of forecasts
        for fc in fc_at_observation_points:
            fc_list.append(fc[i]) # pick out the fc_ts only, for the i'th observation point
            #print("{} adding fc pt {} t0={}".format(i,fc[i].mid_point(),utc.to_string(fc[i].ts.time(0))))
        historical_forecasts.append(fc_list)
    # historical_forecasts  now cntains 3 forecasts for each observation point

grid-pp: 2. Calculate the bias time-series using Kalman filter on the observation set
-------------------------------------------------------------------------------------

.. code-block:: python

    # Create a TemperatureSourceVector to hold the set of bias time-series
    bias_set = TemperatureSourceVector()

    # Create the Kalman filter having 8 samples spaced every 3 hours to represent a daily periodic pattern

    kalman_dt_hours = 3
    kalman_dt =deltahours(kalman_dt_hours)
    kta = TimeAxis(t0, kalman_dt, int(24//kalman_dt_hours))

    # Calculate the coefficients of Kalman filter and 
    # Create bias time-series based on the daily periodic pattern
    for i in range(len(obs_set)):
        kf = KalmanFilter() # each observation location do have it's own kf &predictor
        kbp = KalmanBiasPredictor(kf)
        #print("Diffs for obs", i)
        #for fc in historical_forecasts[i]:
        #    print((fc.ts-obs_set[i].ts).values.to_numpy())
        kbp.update_with_forecast(historical_forecasts[i], obs_set[i].ts, kta)
        pattern = KalmanState.get_x(kbp.state)
        #print(pattern)
        bias_ts = create_periodic_pattern_ts(pattern, kalman_dt, ta.time(0), ta)
        bias_set.append(TemperatureSource(obs_set[i].mid_point(), bias_ts))

grid-pp: 3. Spread the bias at observation points out to the grid using kriging
-------------------------------------------------------------------------------

.. code-block:: python

    # Generate the bias grid by kriging the bias out on the 1x1km grid
    btk_params = BTKParameter()
    btk_bias_params = BTKParameter(temperature_gradient=-0.6, temperature_gradient_sd=0.25, sill=25.0, nugget=0.5, range=5000.0, zscale=20.0)
    bias_grid =  bayesian_kriging_temperature(bias_set, grid_1x1, ta.fixed_dt, btk_bias_params)
    # Correct forecasts by applying bias time-series on the grid
    fc_grid_improved = TemperatureSourceVector()
    for i in range(len(fc_grid)):
        fc_grid_improved.append(
            TemperatureSource(
                fc_grid[i].mid_point(),
                fc_grid[i].ts - bias_grid[i].ts # By convention, sub bias time-series(hmm..)
            )
        )

.. code-block:: python

    # Check the first value of the time-series. It should be around 15
    tx =ta.time(0)
    print("Comparison original synthetic grid cell [0]\n\t at the lower left corner,\n\t at t {}\n\toriginal grid: {}\n\timproved grid: {}\n\t vs bias grid: {}\n\t nearest obs: {}"
        .format(utc.to_string(tx),
                fc_grid[0].ts(tx),
                fc_grid_improved[0].ts(tx),
                bias_grid[0].ts(tx),
                obs_set[0].ts(tx)
                )
        )

Output:

.. code-block:: none

    Comparison original synthetic grid cell [0]
	 at the lower left corner,
	 at t 2016-01-01T00:00:00Z
	original grid: 18.985602994491913
	improved grid: 19.740020177551433
	 vs bias grid: -0.7544171830595209
	 nearest obs: 19.94

Presentation&Test: 8. Finally, Transform corrected forecasts from grid to  observation points to see if we did reach the goal of adjusting the forecast (IDW)
-------------------------------------------------------------------------------------------------------------------------------------------------------------

.. code-block:: python

    # Generate the corrected forecast set by Krieging transform of temperature model
    fc_at_observations_improved = idw_temperature(fc_grid_improved, obs_points, ta.fixed_dt, idw_params)
    fc_at_observations_raw =idw_temperature(fc_grid, obs_points, ta.fixed_dt, idw_params)

9. Plot the results
-------------------

.. code-block:: python

    # Make a time-series plot of temperature sets
    for i in range(len(bias_set)):
        fig, ax = plt.subplots(figsize=(20, 10))
        timestamps = [datetime.datetime.utcfromtimestamp(p) for p in obs_set[i].ts.time_axis.time_points]
        ax.plot(timestamps[:-1], obs_set[i].ts.values, label = str(i+1) + ' Observation')
        ax.plot(timestamps[:-1], fc_at_observations_improved[i].ts.values, label = str(i+1) + ' Forecast improved')
        ax.plot(timestamps[:-1], fc_at_observations_raw[i].ts.values,linestyle='--', label = str(i+1) + ' Forecast (raw)')
        #ax.plot(timestamps, bias_set[i].ts.values, label = str(i+1) + ' Bias')
        fig.autofmt_xdate()
        ax.legend(title='Temperature')
        ax.set_ylabel('Temp ($^\circ$C)')

.. image:: result_plot.png

.. image:: result_plot2.png

.. image:: result_plot3.png

.. image:: result_plot4.png


.. code-block:: python

    # Make a scatter plot of grid temperature forecasts at ts.value(0)
    x = [fc.mid_point().x for fc in bias_grid]
    y = [fc.mid_point().y for fc in bias_grid]
    fig, ax = plt.subplots(figsize=(10, 5))
    temps = np.array([bias.ts.value(0) for bias in bias_grid])
    plot = ax.scatter(x, y, c=temps, marker='o', s=500, lw=0)
    plt.colorbar(plot).set_label('Temp bias correction ($^\circ$C)')

.. image:: scatterplot.png