Status, Incident and
Maintenance (SIM) Standard.

Creating a standard for sharing the health of the internet by exposing the status of Cloud based services in a discoverable and machine readable manner.

What is SIM?

SIM is a simple JSON standard that defines how cloud service providers should expose information about the health of their services.

  • Status - The current operational state of the services. For example is it operating normally or are service impared.
  • Incidents - Details and updates of any incidents affect the services. Keeping your customers updated in near real time.
  • Maintenance - Let your customers know ahead of time about known outages and at risk events.

By making this information machine readable and discoverable means we can build better and more robust systems.

dashboard feature image
dashboard feature image

Why SIM?

The world has embraced the move to Cloud services, including SaaS, PaaS and IaaS services. A modern web application will consume multiple services and are dependent on these services.

The challenge for any IT departments is monitoring the health of these critical services. Some companies Cloud providers have status pages, some don't and the many vary in the information given and how it is shared. Many have reverted to using Atom and RSS.

When your website experiences an issue you need to know why and if it is one of your Cloud service providers? This means going to each supplier's status page and manually checking the status. This delays resolving issues and costs money.

SIM aims to change this by creating a standard by which the status of Cloud services can be shared. By having a single standard it becomes easy to provision dashboards that monitor the services you are dependent on. When an issue occurs you can see from a single page where the issue is and with which provider.

Project Aims

  • Make service health information discoverable.
  • Make it machine readable.
  • Standardise the industry.
  • Allow for the easy creation of dashboards and aggregators.
  • Reduce the time it takes for IT teams to find faults and mitigate issues.
  • Improve the health of the internet.

v0.1

The Spec

The SIM Standard defines a JSON structure which describes the Status of services, a list of Incidents and any planned Maintenance.

The JSON specification is deliberately simple to make it easy for development teams to implement and consume. The aim is to create a specification that describes necessary information and does not get swamped with details.

We are still working on the current specification and want your feedback!

If you have any suggestion contact via @mike_treestacks on twitter. Twitter

Version 0.1

Updated: 15 October 2022

The Root Object

The root object returned by the status endpoint.

Attributes


  • providerProvider Object (required)

    Details on the service provider.


  • servicesService Object[] (required)

    A list of services that can be monitored.


  • incidentsIncident Object[]

    A list of ongoing or historic incidents.


  • maintenanceMaintenance Object[]

    A list of planned or historic maintenance.

The Root Object

The Provider Object

The Provider object contains details about the company providing the cloud services.

Attributes


  • namestring (required)

    The name of the service provider.

    Max length 100 characters


  • websitestring (required)

    The website of the service provider

    Max length 100 characters


  • emailstring (required)

    Email address of the service provider.

    Max length 100 characters


  • logostring (required)

    The logo of the service provider.

    Image size: Width: 250px Height: 250px

The Provider Object

The Service Object

The Service object contains the health status of service or system. For geo-located services an optional location property can be added.

Attributes


  • idstring (required)

    Unique identifier of the service.

    Max length 10 characters


  • namestring (required)

    The name of the service.

    Max length 100 characters


  • descriptionstring (required)

    A short description of the service.

    Max length 250 characters


  • datedate (required)

    The date the status was updated.

    yyyy-MM-ddTHH:mm:ssz


  • refreshnumber (required)

    The rate that this status message is updated.

    In seconds


  • statusnumber (required)

    Indicates the health status of the service or systems.

    Accepted values: 200 - Ok, 300 - Warning, 400 - Critical


  • locationLocation Object

    The latitude and longitude of where the service is based.


  • servicesService Object[]

    A collection of sub-systems of the current service.

The Service Object

The Incident Object

Provides details of ongoing or historic incident. It is recommend that historic information is only maintained for 30 days in the feed.

Attributes


  • idstring (required)

    Unique identifier of the incident.

    Max length 10 characters


  • openeddate (required)

    The date and time that the incident started.

    yyyy-MM-ddTHH:mm:ssz


  • closeddate

    The date and time that the incident was resolved. If the incident is currently ongoing do not supply this value.

    yyyy-MM-ddTHH:mm:ssz


  • titlestring (required)

    A short description of the incident

    Max Length 100 characters


  • refreshnumber (required)

    The rate that this incident message is updated.

    In seconds


  • servicesstring[] (required)

    The IDs of the services affected. Where all sub-systems of a service are affected only list the top level service.


  • updatesUpdates Object[] (required)

    A list of updates about the status and resolution of the incident.


  • supportSupport Object

    Provides ways for a customer to access more information.

The Incident Object

The Maintenance Object

Provides details of planner or historic maintenance windows45 . It is recommend that historic information is only maintained for 30 days in the feed.

Attributes


  • idstring (required)

    Unique identifier of the maintenance.

    Max length 10 characters


  • startdate (required)

    The date and time that the maintenance window will start.

    yyyy-MM-ddTHH:mm:ssz


  • finishdate (required)

    The date and time that the maintenance window will end

    yyyy-MM-ddTHH:mm:ssz


  • titlestring (required)

    A short description of the maintenance.

    Max Length 100 characters


  • refreshnumber (required)

    The rate that this maintenance message is updated.

    In seconds


  • servicesstring[] (required)

    The IDs of the services affected by the maintenance. Where all sub-systems of a service are affected only list the top level service.


  • updatesUpdates Object[] (required)

    A list of updates about the maintenance.


  • supportSupport Object

    Provides ways for a customer to access more information.

The Maintenance Object

The Location Object

Provides location information.

Attributes


  • latnumber (required)

    The latitude of the location


  • longnumber (required)

    The longitude of the location

The Location Object

The Update Object

Contains an update message.

Attributes


  • detailsstring (required)

    The details of the update.

    Markdown. Maximum 1500 characters.


  • datedate (required)

    The date and time of the update.

    yyyy-MM-ddTHH:mm:ssz

The Update Object

The Support Object

Contains information on where a customer can find out more information.

Attributes


  • emailstring

    A support email address

    Email


  • phonestring

    A support telephone number


  • urlstring

    A URL to a page with further information

The Support Object

Give It A Try

Try out the spec yourself by using one of our examples of pointing it at your own implementation.

Examples

The Alternatives

Where does the SIM Standard sit within the ecosystem and what alternatives are there already?

If there are other alternatives that you think we should compare to please contact us.

Opent Telemetry LogoThe SIM Standard is not a replacement or competitor for Open Telemetry (OT), each has different aims and operates at different levels within your ecosystem.

OT is a collection of tools for collecting very detailed information about the status of the systems you are running, providing you with information on Logs, Tracing, Metrics. This is very detailed system level information, containing all the information that a team operating a service needs to understand to support and maintain the service they are running. It allows teams to chase down errors and issues in the services they provide.

SIM is not designed to operate at this detailed level, instead operates at the level above by providing a general overview of the health of the whole service that can be shared publicly. Data capture by OT often should not be exposed publicly since it may contain sensitive data. Instead this data can be aggregated to provide a general overview of the service status that can be shared publicly via SIM.

In this way SIM complements OT by creating a standard that can be used to generalise the information from OT to a format that can be shared with a service provider's customer base.

How Can You Help?

We need your help to make this initiative work

Cloud Service Providers

We need you to implement the SIM spec and start sharing the health of your services.

Tell us and we can add you to the list of companies backing SIM.

IT Professionals

Provide your feedback on the SIM Spec. What would you like to see?

Talk with other professionals in the industry and spread awareness of SIM. Together we can make our jobs easier.

Cloud Customers

Contact your service providers and ask them to implement a SIM endpoint.

Build your own dashboards and push for change.

Reach out to @mike_treestacks for more information, questions and to add your name to the supporters list.