This page describes the data model of the UMH stack - from the message payloads up to database tables.
3 minute read
The Data Infrastructure of the UMH consists out of the three components: Connectivity, Unified Namespace, and Historian (see also Architecture). Each of the components has their own standards and best-practices, so a consistent data model across
multiple building blocks need to combine all of them.
If you like to learn more about our data model & ADR’s checkout our learn article.
Connectivity
Incoming data is often unstructured, therefore our standard allows either conformant data in our _historian schema, or any kind of data in any other schema.
Our key considerations where:
Event driven architecture: We only look at changes, reducing network and system load
Ease of use: We allow any data in, allowing OT & IT to process it as they wish
The UNS employs MQTT and Kafka in a hybrid approach, utilizing MQTT for efficient data collection and Kafka for robust data processing.
The UNS is designed to be reliable, scalable, and maintainable, facilitating real-time data processing and seamless integration or removal of system components.
These elements are the foundation for our data model in UNS:
Incoming data based on OT standards: Data needs to be contextualized here not by IT people, but by OT people.
They want to model their data (topic hierarchy and payloads) according to ISA-95, Weihenstephaner Standard, Omron PackML, Euromap84, (or similar) standards, and need e.g., JSON as payload to better understand it.
Hybrid Architecture: Combining MQTT’s user-friendliness and widespread adoption in Operational Technology (OT) with Kafka’s advanced processing capabilities.
Topics and payloads can not be interchanged fully between them due to limitations in MQTT and Kafka, so some trade-offs needs to be done.
Processed data based on IT standards: Data is sent after processing to IT systems, and needs to adhere with standards: the data inside of the UNS needs to be easy processable for either contextualization, or storing it in a Historian or Data Lake.
IT best-practice: used SQL and Postgres for easy compatibility, and therefore TimescaleDb
Straightforward queries: we aim to make easy SQL queries, so that everyone can build dashboards
Performance: because of time-series and typical workload, the database layout might not be optimized fully on usability, but we did some trade-offs that allow it to store millions of data points per second