Kornit X Platform Disaster Recovery

Monitoring & Escalation

The platform is well monitored via a combination of New Relic and CloudWatch metrics.

Both monitoring tools are configured to raise alerts:

Via SMS for senior team members
Via Slack for all relevant team members
Via email for all relevant team members

Alerts are configured across a variety of health points, such as:

HTTP ping end points
Database replication lag
Application error rates
Scheduled task throughput
Server resource and capacity

Typically we would expect the relevant staff members to be aware of a critical system issue before it is reported by users.

There is also a separate emergency support escalation process for customers which allows for direct SMS messaging to support team staff who would then be able to escalate to technical.

Subsystem Segregation

The platform is segregated into two key separate components, in order of priority:

Distributed smartlinks (AWS)
Platform (UKFast)

Distributed smartlinks are the primary tool through which end user transactions are created and therefore has the highest priority. Distributed smartlinks have been designed to be (in most cases) independent of the Platform. In most cases, end users would still be able to generate transactions even in the event of the Platform being unavailable.

The Platform primarily consists of:

Order management and routing
Product management
Supplier workflow and fulfilment

Downtime of the Platform is therefore less business critical and in most cases, given a relatively short period (i.e. < 3 hours), would not negatively affect end user transactions to any great degree.

Customer Communication

During any incident, regular communication to customers, partners and staff should be performed via our status page (https://status.kornitx.net/).

Distributed Smartlinks - Data

Distributed smartlinks are powered almost completely by static files hosted on AWS S3. S3 is a highly fault tolerant and redundant system.

In the event of any data loss within the distributed smartlink S3 bucket, data can be redeployed to the distributed smartlink system via the Platform.

Distributed Smartlinks - Backend Servers

AWS Elasticbeanstalk (EB) is used to provide some key smartlink functionality such as saving print jobs and handling user image uploads.

Application code is executed on EB via the use of Docker.

In the event of any outage (assuming there is not a more widespread AWS outage):

A redeployment to a new EB environment should be performed.
This would typically take 10 - 30 minutes.
A CNAME switch should then be performed, swapping the old and new environments.
This would typically take a few minutes for DNS settings to propagate.
Once traffic to the new environment has been confirmed and validated.
a post mortem should be carried out on the old environment.

AWS Outage

In the event of a more widespread AWS outage of either EB or S3, the AWS status page should be monitored with communication added to https://status.kornitx.net as appropriate.

Platform - Scheduled Tasks

In the event of any disaster, all scheduled tasks should be stopped immediately. This can be achieved by halting the docker daemon on the scheduled task servers.

Scheduled tasks should only be resumed once the platform has regained stability.

Platform - Database Backups

A full database backup is performed every 3 hours, with 24 hours of backups stored on site and then archived to S3 for 3 months.

Additionally as an extra layer of redundancy, UKFast also performs daily off site incremental backups with a full backup on a Sunday night.

In the event of database corruption at the block or file level, a failover to a redundant database server should be performed rather than a full or partial restore.

Currently, fail overs are a manual process and require an application level configuration update.

In the event of data being corrupted via SQL (e.g. an application bug that causes loss of data), a full restore should only be considered if the loss of data is significant (i.e. greater than 75%).

Where data loss is not significant, a partial restore should be performed by first restoring a previous database backup to a temporary VM followed by a manual restore of the affected data to the primary database server.

A full database restore would be expected to take around 6 hours during which all Platform functionality would be unavailable and should therefore be treated as a last resort.

Platform - File Backups

Generally, file storage is used for items such as product assets, fonts, etc. No sensitive PII is contained within file storage, with the exception of user uploaded images.

The majority of assets are stored within S3. S3 versioning is enabled for most asset types to allow for “undeletion”.

In the event of any data loss, files should be restored from either the S3 backup (for legacy assets) or from the S3 object’s version history.

Platform - Ransomware or similar

In the event of a ransomware attack against our web tier, all web tier VMs should be spun down and replaced with fresh VMs and rebuilt docker container images.

Once system stability has been restored, the compromised VMs should be isolated and investigated to learn more about the possible attack vectors.

In the event of a ransomware attack against our primary database server, a failover to a read replica should be attempted. The old primary server should then be investigated, formatted and brought back into service.

If all read replicas have also been compromised then it would be necessary to revert the primary to a previous well known snapshot and rebuild and reinitialise all read replicas.

Related Articles
Languages Support Within Platform
When using the Kornit X Platform, you are able to load the system in your chosen language. We currently offer support for the following languages within our platform, which are listed below. Deutsche - German Nederlands - Dutch English - British ...
Kornit Pallet names and sizes
As of February 2024, this article is up to date with all current Kornit Machine pallet sizes When setting up your Kornit workflow, you will need to make sure the artwork generated by the system has the correct naming convention. If the item doesn't ...
Kornit X Shopify Apps
Introduction A whitelabel shopify app from Kornit X is a great way to enable Shopify sites to integrate with Kornit X and easily sell personalised and print on demand products. Whilst Kornit X provides the app infrastructure, it is necessary for the ...
Kornit X Feeds | Webhooks
What Does The Webhook Feed Do? Previously the ability to export a products retailer feed was available through Platform, which downloads a .csv file to the users system. The web hook feed produces the same data format as the retailer feed, however, ...
Kornit X Platform Version 3 FAQ and Troubleshooting
Kornit X have released version 3 of our platform interface which comes with a whole host of enhancements. V3 can be accessed via this link - platform.kornitx.net The below video is a webinar discussing the new UI changes. The presentation shown ...

Kornit X Platform Disaster Recovery

Kornit X Platform Disaster Recovery

Monitoring & Escalation

Subsystem Segregation

Customer Communication

Distributed Smartlinks - Data

Distributed Smartlinks - Backend Servers

AWS Outage

Platform - Scheduled Tasks

Platform - Database Backups

Platform - File Backups

Platform - Ransomware or similar

Related Articles

Languages Support Within Platform

Kornit Pallet names and sizes

Kornit X Shopify Apps

Kornit X Feeds | Webhooks

Kornit X Platform Version 3 FAQ and Troubleshooting