Creating a New Infra Platform

This documents how to deploy the platform services for an infra. These include:

  • MQTT Broker

  • Opsgenie Forwarder

  • SMTP Mailer

  • Auto Alerter

  • NoData Executor

  • Incident Tracker

  • Meld Continuous Query Generator

  • Influx Writer

Grafana, the Dashboard Generator, and the Infra Api have been moved to the UI Deployment and are not part of and infra deployment anymore.
Infra deployments follow the 'infra' convention.

Infra Naming

You can in principle name your infras whatever you like. If they are dev or test infras though, you should consider appending dev or test to the name. See Ansible Deployments for an explanation.

Influx Database

The influx database is deployed separately. As you can add a database to an existing instance, you will probably not need to create one anyway. Start by creating the database in the influx instance.

Playbooks

The services.yml playbook will deploy the rest of the services required for an infra.

Hosts File

This is a the basic template for an infra host file:

; Physical Devices
[<Hostname>]
<hostname>.dgcsdev.com

[MQTT_BROKER]
mqtt.<infra_id>.<...>.smartermicrogrid.com


; Characterisations

[ec2:children]
MQTT_BROKER
<Hostname>


; Installation

[infra:children]
ec2


; Services

[mqtt_server:children]
MQTT_BROKER

[influxdb_server:children]
<Hostname>

[auto_alerter_server:children]

[incident_tracker_server:children]

[nodata_executor_server:children]
<Hostname>

[opsgenie_forwarder_server:children]
<Hostname>

[smtp_mailer_server:children]
<Hostname>

[meld_continuous_query_generator_server:children]
<Hostname>
Section Intent

Physical Devices

Declares the servers

Characterisation

Creates the conventional ec2 group and any other characteristics of the target machine

Installation

Creates the required infra group and any other appropriate behavioural variations

Services

Lists the services to deploy

MQTT Broker

The MQTT Broker is not deployed by the standard infra services.yml playbook. It is typically built separately. See Deploy an MQTT Broker for details.

When deciding on the broker’s DNS, you should use a grouping level below the TLD - as described in the linked docs above. For infras, although I wouldn’t go so far as to call this a convention, you could do worse than use `mqtt.<infra_id>.infras.smartermicrogrid.com.

InfluxDB Server

This reference exists to declares where the influx_writer service should deploy. Conventionally, the influx_writer service deploys to the same server that the influx instance lives on. The reasons for this are mostly historical stemming from efforts to stabilise this service in the face of previous bugs and fragility. For example, the Influx is not a Docker container so the influx_writer container uses 'host' networking. Also, we wanted to take network issues off the table as a possible culprit.

Group Vars

Here is an example of a group_vars/infra file:

deploy_level: prod
config_src: <infra_id>
infra: <infra_id>
infra_public: <infra_id>
hosts_dir: <infra_id>


influx_database: prod_<infra_id>
influx_backup_minute: 18
influx_writer_report_interval: 5000
auto_alerter_nodeduper: true
incident_tracker_bus_schedule: '0 6,18 * * *'
opsgenie_summary_schedule: '0 8,12,16,20 * * *'
opsgenie_summary_recipients: '...'


meld_continuous_query_generator:
    default_bucket_size: 15m
    request_timeout: 2000

incident_tracker_name: incident_tracker
incident_tracker_agent_id: incident_tracker


default_inspect_port:                   12340
influx_writer_inspect_port:             12340
auto_alerter_inspect_port:              12343
nodata_executor_inspect_port:           12344
incident_tracker_inspect_port:          12345
opsgenie_forwarder_inspect_port:        12346
smtp_mailer_inspect_port:               12347
meld_continuous_query_generator_inspect_port:   12349


influx_writer_healthcheck_port:         11230
auto_alerter_healthcheck_port:          11233
nodata_executor_healthcheck_port:       11234
incident_tracker_healthcheck_port:      11235
opsgenie_forwarder_healthcheck_port:    11236
smtp_mailer_healthcheck_port:           11237
meld_continuous_query_generator_healthcheck_port:   11239


opsgenie_forwarder_version:   7.1.4
smtp_mailer_version:          4.0.10
influx_writer_version:        15.1.7
auto_alerter_version:         6.0.9
incident_tracker_version:     11.1.5
nodata_executor_version:      6.0.9
meld_continuous_query_generator_version:    0.7.2

docker_memory: 150M
incident_tracker_docker_memory: 200M
opsgenie_forwarder_docker_memory: 200M


mqtt_broker_host: mqtt.<infra_id>.<...>.smartermicrogrid.com
mqtt_broker_port: 202##
mqtt_username: xyz
mqtt_password: ***********


mqtt_config: |
  connection my_other_infra
  ...

Most of these variables are driven by the needs of the specific service roles or playbooks, and are better documented there. However, the following vars are generically required:

Var Purpose Value/Example

deploy_level

Mostly used as a grouping level in names, such as directories or DNS.

prod or dev

config_src

Very historical. This used to be used to publish core config messages directly to mqtt. This is rarely used now, if at all.

Conventionally, <infra_id>, with a matching directory in the fel-provisioning repo at /config/mqtt/<infra_id>.

infra

This infra id itself

<infra_id>

infra_public

Replaces the infra id for public dns entries (i.e. on smartermicrogrid.com)

<infra_id>

hosts_dir

Manual reference to the hosts dir, used by some configuration scripts

<infra_id>

influx_database

Explicitly names the influx database

Conventionally <deploy_level>_<infra_id>

  • mqtt_broker_host

  • mqtt_broker_port

  • mqtt_username

  • mqtt_password

Required details for the mqtt broker. The host should not specify the protocol.

Conventionally, the host should look like mqtt.<infra_id>.<…​>.smartermicrogrid.com

docker_memory

The default memory limit for Docker containers. This can be overridden on a per-service basis

Most (nodejs) platform services which cache signal properties will want at leas 150M for a modestly sized infra

mqtt_config

Optional text block to add to the mqtt broker configuration. This can be used to create bridges to other brokers for example

See the official Mosquitto documentation

Running the Deploy

To run the deploy, use the scripts as outlined in Ansible Deployments.

Validated the Deploy

Unfortunately, at the moment there isn’t really a handy way of validating a deploy in a single step. One just needs to check all the containers are up and running based on their logs:

$ docker logs <infra_id>.<deploy_level>.<service_id>

Standard nodejs services should be started if you can see some lines like the following:

2020-11-24T17:28:37.155299980Z 2020-11-24T17:28:37.155Z INFO MQTT Connecting to client at mqtt://mqtt.customers.infras.smartermicrogrid.com:20288
2020-11-24T17:28:37.160251196Z 2020-11-24T17:28:37.160Z INFO rss: 69.04 MB heapTotal: 59.4 MB heapUsed: 30.09 MB external: 0.56 MB. Up 396ms
2020-11-24T17:28:37.160351114Z 2020-11-24T17:28:37.160Z INFO Runner started
2020-11-24T17:28:37.160482898Z 2020-11-24T17:28:37.160Z INFO HEALTHCHECK: SERVICE_LIVE
Remember to ssh into all the servers. The influx_writer for example is typically not on the same server as any other services.