What is Shyft?¶

Shyft is a cross-platform open source LGPL v3 framework for building high-performance operational modelling systems based on large time-series datasets.

The framework was developed at Statkraft in cooperation with the University of Oslo’s Department of Geosciences.

Shyft is used in 24×7 operational environments at Statkraft, where it forms a central part of the data management and modelling infrastructure supporting hydrological forecasting and energy market analysis.

Originally developed as a hydrological modelling toolbox, Shyft has evolved into a complete framework for operational modelling systems, combining high-performance libraries, distributed services, persistent storage, and stable protocols for data exchange between deployments.

The original architectural principle — combining a high-performance C++ core with Python-based orchestration — remains central to the framework. Over time the platform has expanded to support distributed services, enterprise-grade storage, versioned communication protocols, and scalable operational deployments.

Framework Architecture¶

The following diagram illustrates the current architecture of the Shyft framework and how its components interact.

        flowchart TD

    A["Domain Models<br/>Hydrology and Energy Market"]

    B["Python Interfaces<br/>Orchestration and analysis"]

    C["High-performance C++ Core<br/>Algorithms and data structures"]

    D["Distributed Services<br/>Compute nodes and DTSS replicas"]

    E["Persistent Storage<br/>RocksDB + versioned serialization"]

    F["Versioned Protocols<br/>Service and deployment boundaries"]

    G["Controlled Data Ingestion<br/>Collectors and publishers"]

    H["Operational Systems<br/>Forecasting, planning, dashboards"]

    A --> B
    B --> C
    C --> D
    C --> E
    D --> F
    G --> F
    F --> H
    E --> H

What Shyft has become¶

Today Shyft provides the core infrastructure required to design, operate and scale time-series driven modelling systems.

The framework combines:

High-performance modelling libraries implemented in modern C++
Python interfaces for orchestration, scripting and analysis
Scalable distributed microservice architecture supporting both distributed computation and replicated in-memory caching
Enterprise-grade persistent storage based on RocksDB
Backward-compatible object serialization enabling long-term storage stability
Stable versioned protocols for communication between services
Operational data-exchange mechanisms for transferring time-series between deployments
Architectural patterns for running modelling systems in production environments

Many Shyft services support dynamic scaling by allowing worker nodes to register with dispatch or controller services. This enables systems to increase compute capacity simply by adding additional nodes.

Scalable Service Architecture¶

Shyft services are designed to scale horizontally across multiple nodes. Both computational workloads and memory-backed caching can be distributed across service replicas.

        flowchart TD

    A["Dispatcher / controller"]

    B1["Compute node 1"]
    B2["Compute node 2"]
    B3["Compute node N"]

    C1["DTSS replica 1<br/>cache + local evaluation"]
    C2["DTSS replica 2<br/>cache + local evaluation"]
    C3["DTSS replica N<br/>cache + local evaluation"]

    D["Persistent storage<br/>RocksDB"]

    E["Users / applications<br/>models, dashboards, services"]

    A --> B1
    A --> B2
    A --> B3

    D --> C1
    D --> C2
    D --> C3

    E --> C1
    E --> C2
    E --> C3

Persistent Storage¶

Shyft includes high-performance persistent storage components designed for operational modelling systems.

The framework uses RocksDB, an embedded log-structured key–value database widely used in large-scale production systems.

RocksDB provides several important characteristics:

write-ahead logging and crash recovery
checksummed storage ensuring data integrity
efficient storage of large datasets
high throughput for both reads and writes
predictable performance under heavy workloads

These properties make it well suited for time-series workloads and long-running operational systems.

Long-Term Object Compatibility¶

Shyft uses Boost.Serialization to persist complex C++ objects in a backward compatible manner.

Serialized types are explicitly versioned so that objects written by previous releases can be automatically upgraded when read by newer versions of the framework.

Many common scientific file formats such as HDF5, NetCDF, JSON or similar file-based approaches provide convenient data exchange but typically lack strong guarantees regarding schema evolution, backward compatibility, and storage integrity.

By combining RocksDB with versioned Boost.Serialization, Shyft provides a storage foundation designed for long-lived operational datasets.

Operational Data Architecture¶

Controlled Data Ingestion¶

External data sources such as weather forecasts or market data services are often fragile and unsuitable as direct runtime dependencies.

Shyft therefore adopts a boundary-controlled ingestion pattern.

Dedicated collector and publisher processes are responsible for:

network reliability
supplier service availability
retries and recovery
normalization and validation of incoming data

Once data is collected and validated, it is distributed within the Shyft environment where internal services can consume it reliably and at scale.

Reliable Data Exchange Between Deployments¶

Shyft supports native, versioned transfer of time-series data between independent deployments of the Distributed Time Series Service (DTSS).

This allows data to be replicated or exchanged between operational environments while maintaining protocol compatibility.

The communication direction can be configured so that connections are initiated from the lower-security environment. This allows systems in higher-security zones to keep inbound firewall rules closed.

DTSS also supports publishing collections of time-series as queue elements for downstream consumers with both:

technical delivery guarantees
application-level acknowledgement semantics

This mechanism allows datasets to be distributed safely between clusters, namespaces, or data centres.

Foundational Architecture¶

The original Shyft architecture was built around a clear separation between a high-performance C++ core and Python-based orchestration.

This principle remains central to the framework and allows the system to combine computational efficiency with flexible analysis workflows.

../../_images/shyft_architecture.drawio.png

History¶

Shyft was initially created to replace proprietary software solutions that were difficult to modify and slow to adapt to new research findings.

The initial focus was hydrological inflow forecasting for hydropower systems.

In Norway, roughly 40–50% of annual energy production originates from snowpack, making accurate modelling of snow accumulation and melt essential for forecasting water inflow.

Key questions include:

How much water is currently stored in reservoirs and snowpack?
When will water enter the hydropower system?
How will weather conditions influence runoff?

Answering these questions requires efficient modelling of processes such as snow accumulation, groundwater flow and soil moisture dynamics.

These requirements led to the development of efficient time-series processing and distributed computation components that later evolved into the broader Shyft framework.

https://cdn.britannica.com/89/62689-050-66A0068F/water-hydrologic-cycle-land-surface-atmosphere-ocean.jpg