TimeSeries

After the successful creation of a TimeAxis, a Shyft TimeSeries can be instantiated. In Shyft we see the TimeSeries as a function that can be evaluated at any time point f(t). If we evaluate the TimeSeries outside the time axis range, it will return NaN. Inside the defined intervals of the time axis, it will interpolate between values, how it will be interpolated will depend on what type of TimeSeries we use.

A concrete time series can be instantiated by giving a TimeAxis, ta, a set of values, values, and the point interpretation, point_fx, you want the time series to have. The available point interpretations are POINT_INSTANT_VALUE and POINT_AVERAGE_VALUE. These are available in shyft.time_series.point_interpretation_policy.

As mentioned above, the time series value f(t) is defined by its numbers and point-interpretation within the TimeAxis total_period, and is NaN outside this definition interval.

From this it mathematically follows that a binary operation between two time-series will yield a time series with at time axis covering the intersection of the two operands. Thus the result will also be NaN outside the resulting time axis.

For all time-series representation, we strongly recommend to use un-prefixed SI-units.

E.g: W (Watt) not MW (MegaWatt)

This greatly simplify doing math using formulas, and also works well with integral, and derivative functions. Avoid using anything but un-prefixed SI-units until presentation level (UI, or export to 3rd party system interfaces).

Recall that time in shyft is simply a number, SI-unit is s (Seconds).

Time series types

POINT_INSTANT_VALUE

A POINT_INSTANT_VALUE is a time series where a value is linearly interpolated between the start and end points of each interval. So the term linear between points captures an important implication of this time-series: it requires at a minimum two points to define that line.

This representation is useful for signals that represents ‘state’, like observed water level at a point in time.

So if we create a 4 interval time axis and input the values [0, 3, 1, 4] as shown below

from shyft.time_series import (
        TimeSeries, TimeAxis, Calendar, point_interpretation_policy
)

ta = TimeAxis(0, 1, 4)
values = [0, 3, 1, 4]

ts_instant = TimeSeries(
        ta=ta,
        values=values,
        point_fx=point_interpretation_policy.POINT_INSTANT_VALUE
        ) 

it will produce a time series on the form

            t3_____t4
      t1     /
      /\    /
     /  \  /
    /    \/
t0 /     t2

and evaluating the time series at different time points we can see how it interpolates between the points.

print(ts_instant(0))    # 0.0
print(ts_instant(1))    # 3.0
print(ts_instant(1.5))  # 2.0
print(ts_instant(3.5))  # 4.0
print(ts_instant(4.0))  # nan

Note

Worth noticing is that in the last time interval it extrapolates the last value as a straight line since it does not have a last value to interpolate to.

POINT_AVERAGE_VALUE

A POINT_AVERAGE_VALUE is a time series type where the whole interval has the same value. It is typically used to represent signals that are constant over a time interval. Like effect produced [W], or water-flow [m3/s].

ts_average = TimeSeries(
        ta=ta,
        values=values,
        point_fx=point_interpretation_policy.POINT_AVERAGE_VALUE
        ) 
               t3______t4
     t1_____    |
       |   |    |
       |   |    |
       |   |____|
t0_____|   t2

And evaluating the average time series at the same points as the instant series shows the differences between their interpolation

print(ts_average(0))    # 0.0
print(ts_average(1))    # 3.0
print(ts_average(1.5))  # 3.0
print(ts_average(3.5))  # 4.0
print(ts_average(4.0))  # nan

Inspection functions

To inspect the time series there exists a few utility functions we should know about.

TimeSeries(t: int/float/time)

If we call the time series with an int, float or time object it will evaluate itself on the specific time point and return the value.

ta = TimeAxis(0, 1, 4)
values = [0, 3, 1, 4]

ts = TimeSeries(
        ta=ta,
        values=values,
        point_fx=point_interpretation_policy.POINT_AVERAGE_VALUE
        )

print(ts(1)) # 3.0

point_interpretation()

Returns the point interpretation of the time series.

print(ts.point_interpretation()) # POINT_AVERAGE_VALUE

size()

Returns the number of intervals in the time series

print(ts.size()) # 4

time_axis

Returns the time axis of the time series

print(ts.time_axis) # TimeAxis('1970-01-01T00:00:00Z', 1s, 4)

values.to_numpy()

Returns a numpy array with the values it was set up with

print(ts.values.to_numpy()) # [0. 3. 1. 4.]

General time series manipulation

For a comprehensive list of available functions see shyft.time_series.TimeSeries(). The time series in Shyft are thought of as mathematical expressions and not indexed values. This makes the manipulation of time series a bit different than with numpy arrays or pandas series. We set up some convenience functions to inspect the time series.

from shyft.time_series import (
        time, TimeSeries, TimeAxis, POINT_AVERAGE_VALUE as stair_case,
        POINT_INSTANT_VALUE as linear, Calendar,
        FORWARD as d_forward, BACKWARD as d_backward, CENTER as d_center
        )


def show_values(text: str, ts: TimeSeries):
    print(f'{text}:\n {ts.values.to_numpy()}')

Arithmetics

Doing basic arithmetics with time series is as simple as with numbers.


HOUR = time(3600)
values = list(range(24))

ta = TimeAxis(start=time('2021-01-01T00:00:00Z'), delta_t=HOUR, n=len(values))
ts = TimeSeries(ta=ta, values=values, point_fx=stair_case)

show_values('Initial TimeSeries', ts)
# Initial TimeSeries:
# [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
# 18. 19. 20. 21. 22. 23.]

ts_addition = ts + 2
ts_multiplication = ts*2

show_values('TimeSeries + 2', ts_addition)
#TimeSeries + 2:
# [ 2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
# 20. 21. 22. 23. 24. 25.]

show_values('TimeSeries*2', ts_multiplication)
#TimeSeries*2:
# [ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34.
# 36. 38. 40. 42. 44. 46.]

We can also do the same arthmetic on two time series


values.reverse()
ts_reversed = TimeSeries(ta=ta, values=values, point_fx=stair_case)

show_values('TimeSeries + TimeSeries', ts + ts_reversed)
#TimeSeries + TimeSeries:
# [23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23. 23.
# 23. 23. 23. 23. 23. 23.]

show_values('TimeSeries*TimeSeries', ts*ts_reversed)
#TimeSeries*TimeSeries:
# [  0.  22.  42.  60.  76.  90. 102. 112. 120. 126. 130. 132. 132. 130.
# 126. 120. 112. 102.  90.  76.  60.  42.  22.   0.]

One thing to notice when we do arithmetics on time series is if they have different time axis. So if we for example have two time series with partial overlap in time, only overlapping time intervals will be defined.


values.reverse()
ta1 = TimeAxis(start=time('2021-01-01T00:00:00Z'), delta_t=HOUR, n=len(values))
# time axis shifted 12 hours
ta2 = TimeAxis(start=time('2021-01-01T12:00:00Z'), delta_t=HOUR, n=len(values))

ts1 = TimeSeries(ta=ta1, values=values, point_fx=stair_case)
ts2 = TimeSeries(ta=ta2, values=values, point_fx=stair_case)
ts_addition = ts1 + ts2

show_values('TimeSeries.time_axis1 + TimeSeries.time_axis2', ts_addition)
#TimeSeries.time_axis1 + TimeSeries.time_axis2:
# [12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34.]

print(f'{"t":15}{"ts1":10}{"ts2":10}{"ts1 + ts2":10}')
for t in map(int, sorted(set(list(ta1.time_points) + list(ta2.time_points)))):
    print(f'{t:<15}{ts1(t):<10}{ts2(t):<10}{ts_addition(t):<10}')

#t              ts1       ts2       ts1 + ts2 
#1609459200     0.0       nan       nan       
#1609462800     1.0       nan       nan       
#1609466400     2.0       nan       nan       
#1609470000     3.0       nan       nan       
#1609473600     4.0       nan       nan       
#1609477200     5.0       nan       nan       
#1609480800     6.0       nan       nan       
#1609484400     7.0       nan       nan       
#1609488000     8.0       nan       nan       
#1609491600     9.0       nan       nan       
#1609495200     10.0      nan       nan       
#1609498800     11.0      nan       nan       
#1609502400     12.0      0.0       12.0      
#1609506000     13.0      1.0       14.0      
#1609509600     14.0      2.0       16.0      
#1609513200     15.0      3.0       18.0      
#1609516800     16.0      4.0       20.0      
#1609520400     17.0      5.0       22.0      
#1609524000     18.0      6.0       24.0      
#1609527600     19.0      7.0       26.0      
#1609531200     20.0      8.0       28.0      
#1609534800     21.0      9.0       30.0      
#1609538400     22.0      10.0      32.0      
#1609542000     23.0      11.0      34.0      
#1609545600     nan       12.0      nan       
#1609549200     nan       13.0      nan       
#1609552800     nan       14.0      nan       
#1609556400     nan       15.0      nan       
#1609560000     nan       16.0      nan       
#1609563600     nan       17.0      nan       
#1609567200     nan       18.0      nan       
#1609570800     nan       19.0      nan       
#1609574400     nan       20.0      nan       
#1609578000     nan       21.0      nan       
#1609581600     nan       22.0      nan       
#1609585200     nan       23.0      nan       
#1609588800     nan       nan       nan       

Since ts2 is not defined in the intervals 1 to 11, the resulting time series will not be defined in that period either. The same goes for where ts1 is not defined.

Mathematical operations

There are many built-in functions to help with manipulating time series in Shyft. These examples are not exhaustive, so please refer to the documentation on shyft.time_series.TimeSeries() for a complete list.

Average

The average function takes only one argument, the time axis. The resulting expression (still a time-series), yields the true average over the time periods of that time axis.

The value of each time period interval of the resulting time-series is equal to the true average of the non-nan sections of that interval. E.g. the i’th interval, ranging from period(i).start to period(i).end

\[\begin{split}\begin{align*} TS(t_i) &= \frac{1}{p_{notnan}(i)}\int_{t=period(i).start}^{t=period(i).end}ts(t)*dt\\ \text{where}&\\ TS &= \text{true average time series}\\ ts &= \text{original time series}\\ p_{notnan}(i) &= \text{The period in seconds, where ts(t) is not nan}\\ \end{align*}\end{split}\]

Example: We create two time axis, both with a period of a week, but one with daily resolution and one with hourly resolution. Then we create two hourly time series, one stair case and one linear, with linearly increasing values. After that we average both of them with the daily time axis.

t0 = time('2021-01-01T00:00:00Z')
ta_hourly = TimeAxis(start=t0, delta_t=HOUR, n=24*7)
ta_daily = TimeAxis(start=t0, delta_t=24*HOUR, n=7)

values = [i for i in range(ta_hourly.size())]
ts_stairs = TimeSeries(ta=ta_hourly, values=values, point_fx=stair_case)
ts_linear = TimeSeries(ta=ta_hourly, values=values, point_fx=linear)

show_values('Original stairs', ts_stairs)
#Original stairs:
# [  0.   1.   2.   3.   4. ... 163. 164. 165. 166. 167.]

show_values('Original linear', ts_linear)
#Original linear:
# [  0.   1.   2.   3.   4. ... 163. 164. 165. 166. 167.]

ts_stairs_avg = ts_stairs.average(ta_daily)
ts_linear_avg = ts_linear.average(ta_daily)

show_values('Daily stairs average', ts_stairs_avg)
#Daily stairs average:
# [ 11.5  35.5  59.5  83.5 107.5 131.5 155.5]

show_values('Daily linear average', ts_linear_avg)
#Daily linear average:
# [ 12.   36.   60.   84.  108.  132.  155.5]

Note

The point interpretation of a time series that is created from an averaging will always be a stair case series, as per definition: it represents the true average of that interval.

print(f'Point interpretation of ts_stairs_avg: {ts_stairs_avg.point_interpretation()}\n'
      f'Point interpratation of ts_linear_avg: {ts_linear_avg.point_interpretation()}')
#Point interpretation of ts_stairs_avg: POINT_AVERAGE_VALUE
#Point interpratation of ts_linear_avg: POINT_AVERAGE_VALUE

Accumulate

Accumulate takes a time axis as input and returns a new time series where the i’th value is the integral of non-nan fragments from t0 to ti.

\[\begin{split}\begin{align*} TS(t_i) &= \int_{t=t_0}^{t=t_i}ts(t)dt\\ \text{where}&\\ TS &= \text{Accumulated time series}\\ ts &= \text{original time series}\\ \end{align*}\end{split}\]
ts_linear_hourly_acc = ts_linear.accumulate(ta_hourly)
ts_linear_daily_acc = ts_linear.accumulate(ta_daily)

show_values('Hourly accumulation', ts_linear_hourly_acc/HOUR)
#Hourly accumulation::
# [0.00 0.50 2.00 4.50 8.00 12.50
#  ... 
#  13122.0 13284.5 13448.0 13612.5 13778.0 13944.5]

show_values('Daily accumulation', ts_linear_daily_acc/(HOUR*24))
#Daily accumulation::
# [  0.  12.  48. 108. 192. 300. 432.]

Note

Integral operations on shyft time series are done with a dt of seconds, which is the SI unit for time. It implies that if the source unit of the time-series is W (watt), and you integrate, or accumulate it, it gives the correct unit of Ws -> J/s x s -> J (Joule).

Derivative

We can compute the derivative of a time series forwards, backwards or center. As with accumulate the operations happen on second resolution so the resulting time-unit is pr standard SI system.

E.g.: if you have a time series with SI-unit J (Joule), and apply the .derivative() function, the resulting time-unit will accordingly be J/s (Joule pr second), e.g. W (Watt).

show_values('Daily derivative forward', ts_stairs_avg.derivative(d_forward)*(HOUR*24))
#Daily derivative forward:
# [24. 24. 24. 24. 24. 24.  0.]

show_values('Daily derivative backward', ts_stairs_avg.derivative(d_backward)*(HOUR*24))
#Daily derivative backward:
# [ 0. 24. 24. 24. 24. 24. 24.]

show_values('Daily derivative center', ts_stairs_avg.derivative(d_center)*(HOUR*24))
#Daily derivative center:
# [12. 24. 24. 24. 24. 24. 12.]

Integral

We can integrate a time series over a specified time axis. As for the average function, it works with the non-nan section of each interval of the time axis.

\[\begin{split}\begin{align*} TS(T) &= \int_{t=T_{\text{start}}}^{T_{\text{end}}}ts(t)dt && \text{if } ts(t) \neq NaN\\ \text{where}&\\ TS &= \text{integrated time series}\\ ts &= \text{original time series}\\ T &= \text{period to integrate over}\\ \text{note: the value computed is for the non-nan sections of the interval}\\ \end{align*}\end{split}\]
show_values('Daily integral', ts_stairs.integral(ta=ta_daily)/(HOUR*24))
#Daily integral:
# [ 11.5  35.5  59.5  83.5 107.5 131.5 155.5]

Statistics

With the statistics function we can directly get the different percentiles over a specified time axis.

show_values('Daily 10 percentile', ts_linear.statistics(ta=ta_daily, p=10))
#Daily 10 percentile:
# [  2.3  26.3  50.3  74.3  98.3 122.3 146.3]

show_values('Daily 50 percentile', ts_linear.statistics(ta=ta_daily, p=50))
#Daily 50 percentile:
# [ 11.5  35.5  59.5  83.5 107.5 131.5 155.5]

show_values('Daily 90 percentile', ts_linear.statistics(ta=ta_daily, p=90))
#Daily 90 percentile:
# [ 20.7  44.7  68.7  92.7 116.7 140.7 164.7]

Utility functions

Here is a small collection of helpful functions when manipulating or extracting information from time series.

Inside

This function creates a new time series with values where it is either inside or outside a defined range. We can set the minimum and maximum value of the range, the value it should use where it meets NaN, and also the values to set where it is inside or outside the range. By default NaN will continue to be NaN, inside range will be 1 and outside range 0.

\[\begin{split}\begin{equation*} TS(t) = \begin{cases} 1, &\text{if } \text{min_v} \leq ts(t) < \text{max_v}\\ 0 &\text{otherwise} \end{cases} \end{equation*}\end{split}\]
t0 = time('2021-01-01T00:00:00Z')
ta = TimeAxis(start=t0, delta_t=HOUR, n=10)
values = [i*10 for i in range(ta.size())]
ts = TimeSeries(ta=ta, values=values, point_fx=linear)

show_values('Smaller than 50', ts.inside(min_v=float('nan'), max_v=50))
#Smaller than 50:
# [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]

show_values('Larger than 50', ts.inside(min_v=50, max_v=float('nan')))
#Larger than 50:
# [0. 0. 0. 0. 0. 1. 1. 1. 1. 1.]

To have no upper or lower limit we set the min_v or max_v to NaN.

show_values('Between 25 and 65', ts.inside(min_v=25, max_v=65, inside_v=10, outside_v=20))
#Between 25 and 65:
# [20. 20. 20. 10. 10. 10. 10. 20. 20. 20.]

Here we check if values are inside the range 25-65 and set map inside values to 10 and outside values to 20.

Max/min

These function returns a new time series with filled in values of whichever value that is maximum/minimum of the input value or value in the time series.

\[\begin{split}\begin{equation*} max_v = 10\\ TS(t) = \begin{cases} 10, &\text{if } ts(t) \leq 10\\ ts(t) &\text{if } ts(t) \gt 10 \end{cases} \end{equation*}\end{split}\]
\[\begin{split}\begin{equation*} min_v = 10\\ TS(t) = \begin{cases} ts(t), &\text{if } ts(t) \leq 10\\ 10 &\text{if } ts(t) \gt 10 \end{cases} \end{equation*}\end{split}\]
show_values('Max of 40', ts.max(number=40))
#Max of 40:
# [40. 40. 40. 40. 40. 50. 60. 70. 80. 90.]

show_values('Min of 40', ts.min(number=40))
#Min of 40:
# [ 0. 10. 20. 30. 40. 40. 40. 40. 40. 40.]

Time shift

This function shifts the values forward or backward in time on the basis of a dt. It moves forward for positive time step and backwards for negative time step.

ts_hour_shift = ts.time_shift(HOUR)

show_values('Shifted time series', ts_hour_shift)
#Shifted time series:
# [ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.]

print(ts.time_axis)
#TimeAxis('2021-01-01T00:00:00Z', 3600s, 10)
print(ts_hour_shift.time_axis)
#TimeAxis('2021-01-01T01:00:00Z', 3600s, 10)

As we can see here the values stay the same, but the time axis has been shifted an hour forward.