Creating a standard for sharing the health of the internet by exposing the status of Cloud based services in a discoverable and machine readable manner.
SIM is a simple JSON standard that defines how cloud service providers should expose information about the health of their services.
By making this information machine readable and discoverable means we can build better and more robust systems.
The world has embraced the move to Cloud services, including SaaS, PaaS and IaaS services. A modern web application will consume multiple services and are dependent on these services.
The challenge for any IT departments is monitoring the health of these critical services. Some companies Cloud providers have status pages, some don't and the many vary in the information given and how it is shared. Many have reverted to using Atom and RSS.
When your website experiences an issue you need to know why and if it is one of your Cloud service providers? This means going to each supplier's status page and manually checking the status. This delays resolving issues and costs money.
SIM aims to change this by creating a standard by which the status of Cloud services can be shared. By having a single standard it becomes easy to provision dashboards that monitor the services you are dependent on. When an issue occurs you can see from a single page where the issue is and with which provider.
v0.1
The SIM Standard defines a JSON structure which describes the Status of services, a list of Incidents and any planned Maintenance.
The JSON specification is deliberately simple to make it easy for development teams to implement and consume. The aim is to create a specification that describes necessary information and does not get swamped with details.
We are still working on the current specification and want your feedback!
If you have any suggestion contact via @mike_treestacks on twitter. Twitter
The root object returned by the status endpoint.
Details on the service provider.
A list of services that can be monitored.
A list of ongoing or historic incidents.
A list of planned or historic maintenance.
The Provider object contains details about the company providing the cloud services.
The name of the service provider.
Max length 100 characters
The website of the service provider
Max length 100 characters
Email address of the service provider.
Max length 100 characters
The logo of the service provider.
Image size: Width: 250px Height: 250px
The Service object contains the health status of service or system. For geo-located services an optional location property can be added.
Unique identifier of the service.
Max length 10 characters
The name of the service.
Max length 100 characters
A short description of the service.
Max length 250 characters
The date the status was updated.
yyyy-MM-ddTHH:mm:ssz
The rate that this status message is updated.
In seconds
Indicates the health status of the service or systems.
Accepted values: 200 - Ok, 300 - Warning, 400 - Critical
The latitude and longitude of where the service is based.
A collection of sub-systems of the current service.
Provides details of ongoing or historic incident. It is recommend that historic information is only maintained for 30 days in the feed.
Unique identifier of the incident.
Max length 10 characters
The date and time that the incident started.
yyyy-MM-ddTHH:mm:ssz
The date and time that the incident was resolved. If the incident is currently ongoing do not supply this value.
yyyy-MM-ddTHH:mm:ssz
A short description of the incident
Max Length 100 characters
The rate that this incident message is updated.
In seconds
The IDs of the services affected. Where all sub-systems of a service are affected only list the top level service.
A list of updates about the status and resolution of the incident.
Provides ways for a customer to access more information.
Provides details of planner or historic maintenance windows45 . It is recommend that historic information is only maintained for 30 days in the feed.
Unique identifier of the maintenance.
Max length 10 characters
The date and time that the maintenance window will start.
yyyy-MM-ddTHH:mm:ssz
The date and time that the maintenance window will end
yyyy-MM-ddTHH:mm:ssz
A short description of the maintenance.
Max Length 100 characters
The rate that this maintenance message is updated.
In seconds
The IDs of the services affected by the maintenance. Where all sub-systems of a service are affected only list the top level service.
A list of updates about the maintenance.
Provides ways for a customer to access more information.
Provides location information.
The latitude of the location
The longitude of the location
Contains an update message.
The details of the update.
Markdown. Maximum 1500 characters.
The date and time of the update.
yyyy-MM-ddTHH:mm:ssz
Contains information on where a customer can find out more information.
A support email address
A support telephone number
A URL to a page with further information
Try out the spec yourself by using one of our examples of pointing it at your own implementation.
Where does the SIM Standard sit within the ecosystem and what alternatives are there already?
If there are other alternatives that you think we should compare to please contact us.
The SIM Standard is not a replacement or competitor for Open Telemetry (OT), each has different aims and operates at different levels within your ecosystem.
OT is a collection of tools for collecting very detailed information about the status of the systems you are running, providing you with information on Logs, Tracing, Metrics. This is very detailed system level information, containing all the information that a team operating a service needs to understand to support and maintain the service they are running. It allows teams to chase down errors and issues in the services they provide.
SIM is not designed to operate at this detailed level, instead operates at the level above by providing a general overview of the health of the whole service that can be shared publicly. Data capture by OT often should not be exposed publicly since it may contain sensitive data. Instead this data can be aggregated to provide a general overview of the service status that can be shared publicly via SIM.
In this way SIM complements OT by creating a standard that can be used to generalise the information from OT to a format that can be shared with a service provider's customer base.
The SIM Standard is similar to the Atlassian Statuspage product, both aim to expose the health of the services being monitored in an easy to read form. Statuspage covers incidents, component health and metrics.
SIM differs in several ways. Firstly SIM tries to create a simple standard that all providers can use rather than it being a proprietary format. The SIM standard is also simpler, combining many elements into a single page request and reducing the amount of information that needs to be shared.
We need your help to make this initiative work
We need you to implement the SIM spec and start sharing the health of your services.
Tell us and we can add you to the list of companies backing SIM.
Provide your feedback on the SIM Spec. What would you like to see?
Talk with other professionals in the industry and spread awareness of SIM. Together we can make our jobs easier.
Contact your service providers and ask them to implement a SIM endpoint.
Build your own dashboards and push for change.
Reach out to @mike_treestacks for more information, questions and to add your name to the supporters list.