Production Data Queries

Overview

The production data queries of the symmedia Hub Public API allow machines to transmit telemetry data points such as measurements, counters, running job information, or machine states, into the symmedia Hub cloud in a secure and generic way.

The data needs to be:

  • provided via OPC UA, on a server reachable from the edge device (for details see Publish OPC UA Adapter for Customers)

    • or provided via a custom OPC UA server adapter

    • or provided directly via a custom IoT edge module as module messages (simple format, available on request)

  • Data Points must map to (OPC UA) nodes with fixed ids (models like dynamic UMATI is not yet supported)

The ingestion into the system is performed by the OPC UA Router mechanism, and the cold data path is ingested and handled within Azure Data Explorer.

Support for other node mapping types such as UMATI (OPC UA Machinery Specification) and completely different protocols is planned and prepared.




Data Model

Currently only OPC UA nodes with fixed IDs are supported. As soon as the data is transmitted into the cloud, a mapping is performed considering the edge device, endpoint (machine) and the (OPC UA) node id in question to a data model agnostic symbolic “MeasurementName” (i.e.: the name of the data point in question, choosable by the 3rd party data model).

The symbolic name, timestamp and asset ID is used to store the data with its value.

Support for other node mapping lookup types such as UMATI (OPC UA Machinery Specification) and completely different protocols is planned and prepared.




Data point storage and Retrieval via DataPoints Query

The database that is used tries to automatically optimize the storage representation based on its type. Certain optimizations for time series data and for compressible or deduplicatable data are performed as well.

For sake of simplicity, data can only be queried in its string representation. Since the data already is sent stringified via JSON around on the edge device, into the cloud, into the database, out of the database, and to the client with GraphQL, there is no advantage in trying to mimic a complex type system beyond the data storage.

Data points are marked as both “Dynamic“ and “Numeric”. They can be queried with the generic data point query API (See the query example below). Data points that are treated as numeric are processed and condensed into time-series windows and optimized for aggregated queries.

Example Query

Example of a data points query
query {
    dataPoints(
		options: {
        	assetIds: ["30408395-3856-4c15-b165-986ac1e67614"]
        	timeRange: { from: "2024-05-29T00:00:00Z", to: "2024-05-30T19:00:00Z" }
        	dataPointNames: [
            	"ActiveJobExecutionStartTimestamp"
            	"ActiveJobExecutionPlannedRuntime"
            	"ActiveJobExecutionMainProgramName"
            	"ActiveJobExecutionMainProgramFile"
            	"UnitStatusState"
        	]
        	take: 1000  // limit the query to 1000 results
        	dataPointQueryRange: IN_RANGE // specifies the behavior at the edges. I.e. when a calculation needs to be done where elements are subtracted, data can be queried with the INCLUDE_LAST_BEFORE option to get the last value update before the start interval.
        	sortOrder: ASCENDING }
    ) {
        machineDataPoints {
            assetId
            dataPoints {
                timestampUtc
                dataPointName
                value
            }
        }
    }
}




Pre-aggregated, pre-condensed Numeric Data

Overview

The data model can opt-in to treat certain data points with a pre-aggregation. This condenses the incoming data into predefined time windows and persists the maximum, minimum, and average value found in that time frame.
This is ideal for data that is, for instance, sampled with a 1 second accuracy, but only evaluated with a 1 hour accuracy such as time counters for “amount produced”, “total amount of time spend in error states”, and anything else relevant to key performance indicators that have no meaning below a certain time resolution.

With this method, query times are greatly enhanced, and storage space required usually drops by a factor of 1000.

The database captures also when the minimum and maximum value that is persisted actually occurred in the time window, but this data is not yet available via the API.

Minimum Resolution

Currently, the minimum time window supported is 5 minutes. Querying for larger windows is supported, as long as they are divisible by 5 minutes. Internal optimizations are performed for data queried with a 1 day time window.

Supported Aggregations

  • Minimum (MIN)

  • Maximum (MAX)

  • Average (AVG)
    The average of all data that appeared in that time window

  • Delta (DELTA)
    The difference between MAX and MIN

Example Query

Example of a pre-aggregated time-series query
query {
  timeSeries(
    assetIds: ["a5b9bb93-0f82-4f99-a636-5a2523dc9b6d"],
    timeRange: {
        from: "2024-05-29T00:00:00Z"
        to: "2024-05-30T19:00:00Z"
    }
    dataPointNames:["ActiveJobExecutionEstimatedRemainingRuntime"]
    aggregation: AVG
    binSize: "pt5m" // hereby the time window is configured. 'PT' stands for 'Period of Time', followed by a number designating the time window - e.g. 30m, 1h, 1d...
  ) {
    timestamps
    assetSeries {
      assetId
      dataPointSeries {
        dataPointName
        values {
          timestamp
          value
        }
      }
    }
  }
}


How missing Data is represented

The series API will not report back any assets in its result set that are either not found, cause an access denied error, or have no data available during the requested period for the requested measurements.

The same is true for individual data points within an asset result set.

If some data is available in some time windows, but not in others, the windows missing data are indicated with a null value. The returned array always contains exactly the same amount of entries as there are bin timestamps returned.