This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Contracts / API

This page describes how messages flow in the UMH, which message goes where, how it has to be formatted and how you can create your own structures.

1: Historian Data Contract

2: Custom Data Contracts

What are Data Contracts

Data Contracts are agreements that define how data is structured, formatted, and managed when different parts of a Unified Namespace (UNS) architecture communicate. They cover metadata, data models, and service levels to ensure that all systems work together smoothly and reliably.

Simply put, data contracts specify where a message is going, the format it must follow, how it’s delivered, and what happens when it arrives - all based on agreed-upon rules and services. It is similar to an API: you send a specific message, and it triggers a predefined action. For example, sending data to _historian automatically stores it in TimescaleDB, just like how a REST API’s POST endpoint would store data in its database.

Example Historian

To give you a simple example, just think about the _historian schema. Perhaps without realizing it, you have already used the Historian Data Contract by using this schema.

Whenever you send a message to a topic that contains the _historian schema via MQTT, you know that it will be bridged to Kafka and end up in TimescaleDB. You could also send it directly into Kafka, and you know that it gets bridged to MQTT as well.

But you also know that you have to follow the correct payload and topic structure that we as UMH have defined. If there are any issues like a missing timestamp in the message, you know that you could look them up in the Management Console.

These rules ensure that the data can be written into the intended database tables without causing errors, and that the data can be read by other programs, as it is known what data and structure to expect.

For example, the timestamp is an easy way to avoid errors by making each message idempotent (can be safely processed multiple times without changing the result). Each data point associated with a tag is made completely unique by its timestamp, which is critical because messages are sent using “at least once” semantics, which can lead to duplicates. With idempotency, duplicate messages are ignored, ensuring that each message is only stored once in the database.

If you want a lot more information and really dive into the reasons for this approach, we recommend our article about Data Modeling in the UNS on our Learn page.

Rules of a Data Contract

Data Contracts can enforce a number of rules. This section provides an overview of the two rules that are enforced by default. The specifics can vary between Data Contracts; therefore, detailed information about the Historian Data Contract and Custom Data Contracts is provided on their respective pages.

Topic Structure

As mentioned in the example, messages in the UMH must follow our ISA-95 compliant structure in order to be processed. The structure itself can be divided into several sections.

flowchart LR umh --> v1 v1 --> enterprise enterprise -->|Optional| site site -->|Optional| area area -->|Optional| productionLine productionLine -->|Optional| workCell workCell -->|Optional| originID originID -.-> _schema["_schema (Ex: _historian, _custom)"] _schema -->_opt["Schema dependent context"] classDef mqtt fill:#00dd00,stroke:#333,stroke-width:4px; class umh,v1,enterprise,_schema mqtt; classDef optional fill:#77aa77,stroke:#333,stroke-width:4px; class site,area,productionLine,workCell,originID optional; enterprise -.-> _schema site -.-> _schema area -.-> _schema productionLine -.-> _schema workCell -.-> _schema

You can check if your topics are correct in the validator below.

Topic validator

Prefix

The first section is the mandatory prefix: umh.v1. It ensures that the structure can evolve over time without causing confusion or compatibility problems.

Location

The next section is the Location, which consists of six parts: enterprise.site.area.productionLine.workCell.originID.

You may be familiar with this structure as it is used by your instances and connections. Here the enterprise field is mandatory.

When you create a Protocol Converter, it uses the Location of the instance and the connection to prefill the topic, but you can add the unused ones or change the prefilled parts.

Schemas

The schema, for example _historian, tells the UMH which data contract to apply to the message. It is specified after the Location section and is highlighted with an underscore to make it parsable for the UMH and to clearly separate it from the location fields.

There is currently only one default schema in the UMH: _historian; for more detailed information, see the Historian Data Contract page.

To add your own custom schemas, you need to add a Custom Data Contract.

Schema Dependent Content

Depending on the schema used, the next parts of the topic may differ. For example, in the `_historian’ schema, you can either attach your payload directly or continue to group tags.

Allowed Characters

Topics can consist of any letters (a-z, A-Z), numbers (0-9), and the symbols (- & _). Note that the _ cannot be used as the first character in the Location section.

Be careful to avoid ., +, #, or / as these are special symbols in Kafka or MQTT.

Note that our topics are case-sensitive, so umh.v1.ACMEIncorporated is not the same as umh.v1.acmeincorporated.

Payload Structure

A Data Contract can include payload rules. For example, in the Historian Data Contract, you must include a timestamp in milliseconds and a key-value pair.

These requirements are unique to each Data Contract.

Components of a Data Contract

In addition to the rules, a Data Contract consists of individual components. The specifics can vary between Data Contracts; therefore, detailed information about the Historian Data Contract and Custom Data Contracts is provided on their respective pages.

Data Flow Components

As the name implies, a Data Flow Component manages the movement and transformation of data within the Unified Namespace architecture. Data Flow Components can be of three different types: Protocol Converter, Data Bridge, or Custom Data Flow Component. All are based on BenthosUMH.

Protocol Converter

You have probably already created a Protocol Converter and are familiar with its purpose: get data from different sources into your instances. You format the data into the correct payload structure and send it to the correct topics. When you add a Protocol Converter, the Management Console uses the configuration of the underlying Connection and instance to automatically generate most of the configuration for the Protocol Converter.

Data Bridges

Data Bridges are placed between two components of the Unified Namespace, such as Kafka and MQTT, and allow messages to be passed between them. The default Data Bridges are the two between MQTT and Kafka for the _historian schema, and the bridge between Kafka and the database. Each Data Bridge is unidirectional and specific to one schema.

Custom Data Flow Components

To meet everyone’s needs and enable stream processing, you can add Custom Data Flow Components (creative naming is our passion). Unlike Protocol Converters or Data Bridges, you have full control over their configuration, which makes them incredibly versatile, but also complicated to set up. Therefore, they must be manually enabled by switching to Advanced Mode in the Management Console Settings.

Other Data Contracts

Data Contracts can build on existing contracts. For example, if you use a Custom Data Contract to automatically calculate KPIs, you can send the raw data to _historian, process it with a Custom Data Flow Component, and publish it to a new schema. The new Data Contract uses the Historian to collect data from the machines and store it in the database.

1 - Historian Data Contract

This page is a deep dive of the Historian Data Contract of the UMH including the configuration and rules associated to it.

This section focuses on the specific details and configurations of the Historian Data Contract. If you are not familiar with Data Contracts, you should first read the Data Contracts / API page.

Historian

The purpose of the Historian Data Contract is to govern the flow of data from the Protocol Converter to the database. It enforces rules for the structure of payloads and topics, and provides the necessary infrastructure to bridge data in the Unified Namespace and write it to the database.

This ensures that data is only stored in a format accepted by the database, and makes it easier to integrate services like Grafana because the data structure is already known.

It also ensures that each message is idempotent (can be safely processed multiple times without changing the result), by making each message within a tag completely unique by its timestamp. This is critical because messages are sent using “at least once” semantics, which can lead to duplicates. With idempotency, duplicate messages are ignored, ensuring that each message is only stored once in the database.

Topic Structure in the Historian Data Contract

flowchart LR umh --> v1 v1 --> enterprise enterprise -->|Optional| site site -->|Optional| area area -->|Optional| productionLine productionLine -->|Optional| workCell workCell -->|Optional| originID originID -.-> _historian _historian -->_tagGroups classDef mqtt fill:#00dd00,stroke:#333,stroke-width:4px; class umh,v1,enterprise,_historian mqtt; classDef optional fill:#77aa77,stroke:#333,stroke-width:4px; class site,area,productionLine,workCell,originID optional; enterprise -.-> _historian site -.-> _historian area -.-> _historian productionLine -.-> _historian workCell -.-> _historian

The prefix and Location of the topic in the Historian Data Contract follows the same rules as already described on the general Data Contracts page.

Prefix

The first section is the mandatory prefix: umh.v1.. It ensures that the structure can evolve over time without causing confusion or compatibility problems.

Location

The next section is the Location, which consists of six parts: enterprise.site.area.productionLine.workCell.originID. You may be familiar with this structure as it is used by your instances and connections. Here, the enterprise field is mandatory.

When you create a Protocol Converter, it uses the Location of the instance and the connection to prefill the topic, but you can add the unused ones or change the prefilled parts.

Schema: _historian

The only schema in the Historian Data Contract is _historian. Without it, your messages will not be processed.

Tag groups

In addition to the Location, you can also use tag groups. A tag group is just an additional part after the schema:

umh.v1.location._historian.tag-group.tagname

You can add as many tag groups as you like:

umh.v1.location._historian.tag-group1.tag-group2.tagname

In the tag browser, a tag group will look like any field in the Location, except that it is located after the schema.

Example

Tag groups can be useful for adding context to your tags or for keeping track of them in the tag browser. For example, you might use them to categorize the sensors on a CNC mill.

A group for the x, y, z axis positions:
umh.v1.umh.cologne.ehrenfeld.development.cnc-mill-1234._historian.axis.tagname
A second group for the machine state:
umh.v1.umh.cologne.ehrenfeld.development.cnc-mill-1234._historian.machine-state.tagname

Payload structure

The Historian Data Contract requires that your messages be a JSON file with a specific structure and include a timestamp and at least one tag with a value, both as a key-value pair. The most basic message looks like this

{
  "timestamp_ms": 1732280023697,
  "tagname": 42
}

The timestamp must be called "timestamp_ms" and contain the timestamp in milliseconds. The value of the tag can be either a number "tagname": 123 or a string "tagname": "string". The "tagname" is used in the tag browser or for Grafana.

It is also possible to include multiple tags in a single payload.

{
  "timestamp_ms": 1732280023697,
  "tagname1": 123,
  "tagname2": "string",
  "tagname3": "string"
}

If you want to use tag groups, you can also do this in the payload.

{
  "timestamp_ms": 1732280023697,
  "taggroup": 
  {
    "tagname1": 123,
    "tagname2": "string"
  }
}

Both, tagname1 and tagname2, will appear in the [...]._historian.taggroup topic.

You can see the full JSON schema below:

Show JSON Schema

Data Flow Components

The Historian Data Contract enables data acquisition and processing through the use of Protocol Converters and the automatic deployment of three Data Bridges.

Data Bridges

There are three Data Bridges in the Historian Data Contract, which are automatically created and configured when the instance is created. The first bridge routes messages from Kafka to MQTT, the second from MQTT to Kafka. The third Data Bridge bridges messages from Kafka to the TimescaleDB database. The Data Bridges are responsible for validating the topic and payload, and adding error logs in case a message is not valid. Their configurations are not editable in the Management Console.

Protocol Converters

The easiest way to get data into your UNS is to use a Protocol Converter. If you want to learn how to do this, you can follow our Get Started guide. The configuration of a Protocol Converter consists of three sections:

Input: Here you specify the address, protocol used, authentication, and the “location” of the data on the connected device. This could be the NodeID on an OPC UA PLC.
Processing: In this section, you manipulate the data, build the timestamped payload, and specify the topic.
Output: The output is completely auto-generated and cannot be modified. The data is always sent to the instance’s Kafka broker.

Information specific to the selected protocol and section can be found by clicking on the vertical PROTOCOL CONVERTER button on the right edge of the window.

Verified Protocols

Our Protocol Converters are compatible with a long list of protocols. The most important ones are considered verified by us; look for the check mark next to the protocol name when selecting the protocol on the Edit Protocol Converter page in the Management Console.

If you are using one of the verified protocols, many of the fields will be populated automatically based on the underlying connection and instance. The input section uses the address of the connection and adds prefixes and suffixes as necessary. If you are using OPC UA, the username and password are autofilled. The preconfigured processing section will use the location of the instance and the connection to build the topic and use the name of the original tag as the tag name. It will also automatically generate a payload with a timestamp and the value of the incoming message. If the preconfiguration does not meet your needs, you can change it.

Database

We use TimescaleDB as the database in the UMH. By default, only tags from the Historian Data Contract are written to the database.

Our database for the Historian Data Contract consists of three tables. We chose this layout to allow easy lookups based on the asset, while maintaining separation between data and names. The separation into tag and tag_string prevents accidental lookups of the wrong data type, which could break queries such as aggregations or averages.

erDiagram asset { int id PK "SERIAL PRIMARY KEY" text enterprise "NOT NULL" text site "DEFAULT '' NOT NULL" text area "DEFAULT '' NOT NULL" text line "DEFAULT '' NOT NULL" text workcell "DEFAULT '' NOT NULL" text origin_id "DEFAULT '' NOT NULL" } tag { timestamptz timestamp "NOT NULL" text name "NOT NULL" text origin "NOT NULL" int asset_id FK "REFERENCES asset(id) NOT NULL" real value } tag_string { timestamptz timestamp "NOT NULL" text name "NOT NULL" text origin "NOT NULL" int asset_id FK "REFERENCES asset(id) NOT NULL" text value } asset ||--o{ tag : "id" asset ||--o{ tag_string : "id"

asset

An asset to us is the unique combination of the parts of the Location: enterprise, site, area, line, workcell, and origin_id. Each asset has an id that is automatically assigned.

All keys except id and enterprise are optional. The example below shows how the table might look. A new asset is added to the bottom of the table.

id	enterprise	site	area	line	workcell	origin_id
1	acme-corporation
2	acme-corporation	new-york
3	acme-corporation	london	north	assembly
4	stark-industries	berlin	south	fabrication	cell-a1	3002
5	stark-industries	tokyo	east	testing	cell-b3	3005
6	stark-industries	paris	west	packaging	cell-c2	3009
7	umh	cologne	office	dev	server1	sensor0
8	cuttingincorporated	cologne	cnc-cutter

tag

This table is a Timescale hypertable. These tables are optimized to hold a large amount of data roughly sorted by time.

For example, we send data to umh/v1/cuttingincorporated/cologne/cnc-cutter/_historian/head using the following JSON:

{
 "timestamp_ms": 1670001234567,
  "pos":{ 
    "x": 12.5,
    "y": 7.3,
    "z": 3.2
  },  
  "temperature": 50.0,
  "collision": false
}

This results in the following table entries:

timestamp	name	origin	asset_id	value
1670001234567	head$pos$x	unknown	8	12.5
1670001234567	head$pos$y	unknown	8	7.3
1670001234567	head$pos$z	unknown	8	3.2
1670001234567	head$temperature	unknown	8	50.0
1670001234567	head$collision	unknown	8	0

All tags have the same asset_id because each topic contains the same Location. The tag groups are not part of the asset and are prefixed to the tag name.

The origin is a placeholder for a later feature, and currently defaults to unknown.

tag_string

This table is similar to the tag table, but is used for string data. For example, a CNC cutter could also output the G-code being processed.

umh/v1/cuttingincorporated/cologne/cnc-cutter/_historian

{
 "timestamp_ms": 1670001247568,
  "g-code": "G01 X10 Y10 Z0"
}

Posting this message to the topic from above would result in this entry:

timestamp	name	origin	asset_id	value
1670001247568	g-code	unknown	8	G01 X10 Y10 Z0

2 - Custom Data Contracts

In addition to the standard data contracts provided, you can add your own.

This section focuses on Custom Data Contracts. If you are not familiar with Data Contracts, you should first read the Data Contracts / API. We are currently working on a blog post that will explain the concept of Custom Data Contracts in more detail.

Why Custom Data Contracts

The only Data Contract that exists per default in the UMH is the Historian Data Contract. Custom Data Contracts let you add additional functionalities to your UMH, like automatically calculate KPIs or further processing of data.

Example of a custom Data Contract

One example for a Custom Data Contract is the automated interaction of MES and PLCs. Every time a machine stops, the latest order ID from the MES needs to be automatically written into the PLC.

We begin by utilizing the existing _historian data contract to continuously send and store the latest order ID from the MES in the UNS.

Additionally, a custom schema (for example, _action) is required to handle action requests and responses, enabling commands like writing data to the PLC. The next step is to implement Protocol Converters to facilitate communication between systems. For ingoing messages, a Protocol Converter fetches the latest order ID from the MES and publishes it to the UNS using the _historian data contract.

For outgoing messages, another Protocol Converter listens for action requests in the manually added _action data contract and executes them by getting the last order ID from the UNS and writing the order ID to the PLC.

Protocol Converters can be seen as an interface between the UMH and external systems.

Finally, we have to set up a Custom Data Flow Component as a stream processor that monitors the UNS for specific conditions, such as a machine stoppage. When such a condition is detected, it generates an action request in the _action data contract for the output protocol converter to process.

Additionally, we have to add Data Bridges for the _action schema. In these you enforce a specific topic and payload structure.

The combination of the Historian Data Contract, the additional _action schema, custom Data Bridges, the two Protocol Converters and the stream processor and enforcement of payload and topic structure from this new Data Contract.

Topic Structure in Custom Data Contracts

The topic structure follows the same rules as specified in the Data Contracts / API page, until the schema-dependent content.

The schema-dependent content depends on your configuration of the deployed custom Data Bridges.

Add custom schema

More information about custom schemas will be added here when the feature is ready to use.