# 20.10.3 (Chicago) Release Notes for Cloud
# New Features
# [80031] - Storing Function Definition in TDR
# New Functionality
Separate storage for function definition is created to save space from the object model limit. This is a workaround for the Microsoft CosmosDB gremlin API limitation of maximum 368 variables/properties in the Information Model. Users can define functions using the new endpoints and have additional space for other variables/properties in Information Model.
# Limitations
Due to the CosmosDB gremlin API limitation number of max variables/properties in the Information Model is 368.
# [73263] Preview of Alarms and Events Modeling
This is a Preview Only feature which is disabled in the Platform by default. To enable the feature, please contact Ability Platform Operations Team with the relevant request. For documentation of this feature, please refer to Alarms and Events under API section in ADP.
# Resolved Issues
# [83205] - Cold storage process falls behind and failed to store telemetry data
Improve performance of Cold storage process for data ingress.
Additional corrections:
- Telemetry messages format v1 and v2 supported concurrently
- Cold Storage file paths updated readability improvement, zeros padding added to the directory and files names (example: /2021/02/01/09/01.json)
- Common names for telemetry types (/alarms /events /variables) under objectId blob are now used.
- Auto-scaling rules updated
# [79599] - Edge deviceCreate messages get lost, no errors returned to edge
Platform APIs may occasionally return errors 5xx to the client or Platform may slow down the internal processing for a short period of time (up to several minutes).
This is caused by temporary networking issues or Azure services not being available (for example because of internal maintenance or upgrades).
This has been resolved by adding retry policies as part of handling these exceptions of Azure services not being available.
# [79784] - Information Model query performance problem
The Shared Kernel library, which created a new connection whenever sending a message has been upgraded. The issue has been resolved, the library retains the connection for multiple messages.
# [80179] - AuthZ creates excessive trace statements
The info log was part of the debug mode trace. It has been resolved.
# [81246] - Cloud to Device message propagation enhancement by Device Configuration Service
C2D message propagation enhancement was made by:
Implementing a caching mechanism to reduce the number of interactions with IoT Hub,
Implementing a dynamic reschedule time frame, which means, the dynamic number of seconds/minutes which the rescheduled message should be pushed forward.
# [81357] - If telemetry items in a batch are rejected, the order was not guaranteed
Telemetry data of type "variable" may be sent to the system as an array of JSON objects. Each such object undergoes validation procedure separately. Once an individual object is not compliant with validation rules, it is rejected.
Objects that adhere to all the rules, are later batched together again and published further to a hot/warm path.
The problem may appear during that re-batching phase and situations when part of the original batch was violating validation rules. Under some situations, the order of re-batched variables was not corresponding to the original order of incoming data. This problem is now solved.
# [82079] - Principal Manager service produce excessive logging
Principal Manager logging set at the default for Information level which resulted in excessive entries in application insights. The issue was resolved by moving this logging to Debug level.
# [82119] - A DSL query that use references and union may returns 400 Bad Request and parsing error
The issue is not reproducible on Chicago release.
# [82713] - AuthZ under high load reports out of connections
AuthZ had a limit of 512 concurrent connections caused it to run out of connection points during high load. This issue was resolved.
# [82969] - Audit Logging: exception when adding two events from the same transaction, which have the same timestamp
In some circumstances, platform events belonging to the same business transaction, as for example:
Access.Granted
eventObjectModel.Created
event
may be generated with the same timestamp (for example if the origin services may have their internal clocks not properly synchronized).
In such a case, Audit Logging would abandon the second and any subsequent event from the same business transaction, treating it as already stored.
This problem has been resolved and all events from the same business transaction are now properly stored by Audit Logging.
# [83206] - Data Processing Pipeline containers would restart randomly in a stable configuration
Components, which form the Ability Platform ingestion pipeline use support from various Azure components, such as Azure Storage or Azure EventHubs.
These components undergo different background processing, like re-balancing, server migrations, etc. During these activities, these components may not be available. This may manifest itself with transient errors to the client applications.
The correction provides proper handling of similar problems with Azure Storage and apply retry policies and extend timeouts to allow for graceful shutdowns and restarts of Ability Platform Data Processing Pipeline components.
# [83346] - Azure records 429 errors with burst request against Principal Manager
Platform APIs may occasionally return errors 5xx to the client or Platform may slow down the internal processing for a short period of time (up to several minutes).
This is caused by temporary networking issues or Azure services not being available (for example because of internal maintenance or upgrades).
This has been resolved by adding retry policies as part of handling these exceptions of Azure services not being available.
# [83367] - Retrieval of inherited types of custom (user defined) model definition returns additional types in response
When user-defined model definition (not built-in) was used to build inherited types hierarchy and the user was trying to GET only a limited
number of type definitions (using 'limit' filter) the response contained additional type definitions of this inheritance hierarchy (over the limit).
This was possible only for specific data in the database, not reproducible for built-in model definitions (e.g. abb.ability.device
).
# [83581] - Data Processing Pipeline parameter Diagnostic-Id has been removed by Default from telemetry message
Data Processing Pipeline, loses Application Insights for monitoring and debugging.
When Application Insights is enabled, it automatically tracks some of the communication between different services, to allow for easy navigation and verification of data flows in the app.
In DPP, one of such tracked dependencies is Event Hub. When calls to this component are executed, Application Insights SDK adds an additional piece of information - "Diagnostic-Id" - to help with dependency tracking.
This data however also lands in TSI, consuming unnecessary space and increasing overall storage cost.
"Diagnostic-Id" has been removed. This means however that, by default there will be no Application Insights dependency tracking from DPP. In case of an urgent debugging or monitoring, set the configuration switch EnableDependencyTrackingTelemetryModule to true.
# [85165] - Platform APIs occasionally returned errors 5xx due to networking issues or Azure services not being available
Platform code was enhanced to add internal retries policies that should cover short connectivity or availability issues.
This should limit number of externally reported error but such errors still can happen for longer issues. Client code should still implement retry policies and retry failed calls based on parameters of component's SLA.
# [85180] - Principle Manager - 500 error returned while requesting a bearer token for applications with several rules in grant
The Principal Manager would return a 500 internal server error when the request for a bearer token for an application with multiple rules in a grant. This issue has been resolved.
# [86109] - BadHttpRequestException in TLS when AuthZ calls Principle Manager
When AuthZ calls PM, a sporadic 500 BadHttpRequestException can occur. To resolve this issue the MinRequestBodyDataRate in AuthZ was changed to null.
# Known Issues and Limitations
# [84304] - Max limit of properties in TSI
Max limit of properties (columns) in TSI v1 is equal to 600 on S1, 800 on S2, and 1000 on P1 SKU.
# [73963] - Latency issue causes newly created app in principal manager to be created without secrets
Known Issue: Occasionally a newly created application in the principal manager service will be created without secrets causing the app to become unusable because a bearer token cannot be obtained.
Workaround: Try creating the application once again after about 60 secs.
# [77522] - AuditLog events Count Mismatch for Device Created, Updated and Deleted operations
The body of platform events is stored in Audit Logging storage. However, in some cases this body contains a JSON object which exceeds Azure Table Storage column limitations.
In this case, when a platform event body is longer than 16k of characters, Audit Logging saves the following warning information
into the "data" column: {"auditLogInformation": "Event body too long"}
.
The original body of the event is not saved, however the user can still navigate to actual changes by using the event correlationId.
Limitations source:
- size of an entity in table storage up to 1MB (https://docs.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide#capacity-considerations)
- size of single string column up to 64 Kb (which equals to 32k characters, https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-the-table-service-data-model#property-types)
# [73453] - A Solution cannot register more than 40K devices
The workaround for this issue is to create a duplicate solution and add additional devices if you exceed the limit of 40K devices.
The limitation is caused by a maximum 2MB document size in CosmosDb.
# [73073] - Bad Request on variable subscription for long request not including objectIDs
Known Issue: When creating a subscription with a user token, filters that do not contain an objectId may result in an error preventing the subscription from being created of which the http response code may be 4xx. This is due to a limitation of a service bus filter only allowing for up to 1024 characters. This is usually hit due to a large ability-condition header being unable to be broken down into small enough filters for service bus.
Workaround: To workaround, include an objectId in the filter property for the Data Access request. Up to 40 objectids can be included in the filter.
Background tokens do not have tenancy therefore the authorization service does not optimize the ability-condition based on the given objectId in the filter.
# [81401] - Data Ingress limit limitation for Time Series Insights
Three levels of Time Series Insights are supported, S1, S2, and P1.
S1 supports up to 120 1K TSI events per second and S2 supports 1200 1K TSI evens per second. If the TSI event is greater than 1K in size, maximum ingress will be reduced. For example, if all events are between 3K and 4K in size, the maximum ingress is one forth the maximum supported, 30 TSI events per second for S1 and 300 TSI Events for S2.
P1 is based on a bandwidth of data stored. TSI P1 supports 6MB/s of ingress. However, other components restrict ingress to 1MB/s at this time. 1MB/s is approximately 6000/s of basic telemetry.
# [35242] - DA-groupBy field in the Data Access query doesn't support "int"
In TSI gen1 queries grouping can be done only on columns that are denoted with the data type. The only known columns and their data types are objectId, model, timestamp, and variable/event/alarm. Since the "value" column can be any simple data type (string, double, Boolean, timestamp) or complex object, the system cannot determine at query time what to group by for data type. By default, the grouping operation data type is assumed to be "string".
# [76007] - DSL query escape sequence handling for backward slash() in property value filter is not consistent
Known Issue: When using the backslashes ("") in the object model properties and then trying to query them using DSL, the user cannot obtain it by a single escape character ("\"), which is expected behavior.
Workaround: The workaround is to use double escaping in DSL query ("\\").
For example, having property:
{
"browseName": {
"value": "some\\path"
}
}
one needs to use the DSL:
models(...).hasProperty("browseName", "some\\\\path")
# [80407] - Object cannot be created due to limit of 64kB for create object gremlin query
Due to the CosmosDB gremlin API limitation it is not possible to
create an object model with a size bigger than 64KB (exactly 63035B).
The request with too big object model returns 400 (BadRequest) with detail error message:
{
"errors":{
"code": 400,
"title": "BadRequest",
"detail": "Object model query exceeded maximum length. Allowed query size: 65400. Consider decreasing length of string values assigned in object model.",
"id": "correlationId",
"status": "BadRequest"
}
}
# [80721] - IM service failed with 502 bad gateway error
Known Issue: A 502 bad gateway error is returned when making a request to the Authorize endpoint, if the request payload contains a query filter which has too many OR conditions. This is not an optimal way of using the query filter and causes the execution to take an excessively long amount of time and impact sthe performance of the application.
Workaround: When using filters with many objectids, use the IN clause as opposed to the OR clause for optimal performance.
Please see the below example:
Example payload with OR condition: (objectId='462e58e9-b6de-4dfb-a00d-ecfdd0c45a37' OR objectId='5a4472ac-6ddd-4624-b909-8f6f08fa4bcd' OR objectId='4caade18-f599-47b7-8689-55afbd10278b' OR objectId='f239eb85-11fa-4fcb-b452-1adc01170312' OR objectId='d841bb1e-7854-44d4-a030-c736f0b63259') AND (variable STARTS_WITH 'docker.' OR variable STARTS_WITH 'heartbeat' OR variable STARTS_WITH 'utilization.')
Optimized payload with IN condition: objectId in ['462e58e9-b6de-4dfb-a00d-ecfdd0c45a37','5a4472ac-6ddd-4624-b909-8f6f08fa4bcd','4caade18-f599-47b7-8689-55afbd10278b','f239eb85-11fa-4fcb-b452-1adc01170312','d841bb1e-7854-44d4-a030-c736f0b63259'] AND (variable STARTS_WITH 'docker.' OR variable STARTS_WITH 'heartbeat' OR variable STARTS_WITH 'utilization.'
# [75339] - Sorting functionality which has been implemented as part of Pagination & Searching feature in Principal Manager APIs, is case sensitive
Sorting functionality which has been implemented as part of Pagination & Searching feature in Principal Manager APIs, is case sensitive.
For example: when trying to sort a set of tenants, {ABB01, Robotics01, abb02, Volvo01, robotics02, volvo02} the result which will be returned when sorting ascending is {ABB01, Robotics01, Volvo01, abb02, robotics02, volvo02}
# [81239] - Issue with AD token for users with more than 200 group memberships
Known Issue: When adding a user with AD group membership to Admin Portal, users who have more than 200 group memberships (direct & indirect) are not able to login to the portal as the memberships are not being returned in the AD token. This is a limitation on AD side and needs some major implementation changes to overcome this problem.
Workaround: User can be added under a tenant as a tenant admin to overcome this problem. There is a 3 day time limit on the initial login, after which the user will be deleted automatically if not logged in yet. This is customizable for each client installation.
# [62908] - Principle Manager API fails to remove tenants - BadGateway
The problem can occur based on concurrent requests to the principal manager API. The Principal Manager APIs are using Azure B2C services to create Applications for business entities, e.g. Application, Solution, etc. The workflow in the PM is sequential and dependent on the result of the B2C operation. After a successful result from the B2C operation, the request is further processed to provide the respective response to the caller.
For any B2C related request, some buffer time needs to be provided so that the action can be completed.
It is advised to maintain a gap of 60 secs between two requests.
# [59246] - Requests for bearer token for new apps lead to Bad Request
Known Issue: When concurrent requests to get a bearer token are sent, a client can receive a Bad Request response.
Workaround: The recommendation from Microsoft is that we should wait for a few seconds before trying to get the token for the application that has been created. According to Microsoft, it takes a maximum of 60 seconds to replicate the Application Settings across Azure regions.
# [55864] - Solution create audit log shows wrong event for audit log
The audit log for create solution shown is incorrect as an update instead of create.
# [45735] - QEL expressions with with sentence: "name in ['x']" doesn't work
MongoDB MongoCollection cannot handle certain array arguments. Am application may see the following: creating a condition with IN ['x'] will return 400 Bad Request.
Creating a combination of "name = 'x' OR "name = 'y' " should be used instead as a workaround. For additional information, please reference the following article, MongoDB site, issue https://jira.mongodb.org/browse/CSHARP-2727.
# [77345] - When creating a solution or resource, principle manager service sporadically returns a 400 Bad Gateway response code
Known Issue: When creating a solution (or possibly another resource), Microsoft Graph API sporadically returns a 400 Bad Gateway response code with the message, "One or more of your reply urls is not valid". As a result the Solution is not created.
Workaround: The end user will need to resubmit the request
# [79098] - TSI storage doubled(costs) and max throughput decreased when DPP status code processing is enabled
Data quality decoration increases the total size of telemetry message significantly.
An original message may look like this:
{
"objectId": "2B129E4C-0944-4534-8E8B-DEB49D8AF0AC",
"model": "abb.somedomain.somemodel",
"variable": "SomeVariableName",
"timestamp": "2018-05-217T23:00:00Z",
"value": 42,
"quality" : 1073741954
}
After quality decoding it may look like this:
{
"objectId": "2B129E4C-0944-4534-8E8B-DEB49D8AF0AC",
"model": "abb.somedomain.somemodel",
"variable": "SomeVariableName",
"timestamp": "2018-05-217T23:00:00Z",
"value": 42,
"quality" : 1073741954,
"qualityFlags" : {
"validity" : "uncertain",
"limit" : "low",
"historian" : "interpolated"
}
}
This in turn has a direct impact on:
- the total capacity of data that Ability Platform ingress pipeline may accept (Azure Event Hubs limits that to 20MB/sec.)
- the total amount of data being stored to TSI, which has a direct impact on the cost of the system
Data quality decoration is turned off by default and not recommended for use at this time.
A future update is planned to provide better control over the extra space used when this feature is enabled.
# [68309] - Unable to search for a file using user token after upload
Know Issue: When searching for files uploaded via Edge, requests using a user token are failing when the number of objects exceeds 500.
Workaround: When querying, the objectid, along with the path, can be passed in QEL format to overcome this limitation.
# [74595] - User cannot access applications when their "read" permission is limited to "user" delegation
Known Issue:
- Query apps endpoint - passing 'user' instead of 'User' for delegation parameter will return empty results.
- Get apps endpoint - passing 'User' instead of 'user' for delegation parameter returns empty results.
Workaround: When querying for applications using the "Query apps" or "Get apps" endpoint, limited to user delegation, pass (delegation='user' OR delegation='User') for delegation parameter to get the expected results