Introducing the Data Gateway
The Data Gateway is a tool that enables organizations to publish data to their Koordinates site from data sources within their own internal systems —including SQL databases, network shares, and ArcGIS servers. The Data Gateway is a secure, efficient way for organizations to easily manage the publication and updates of their data, without relying on additional technical support.
The Date Gateway scans and pre-validates compatible data sources before publishing, and allows the use of common organization network protocols (CIFS, SQL) which are not designed for high-latency network links, or are unsuitable for securely exposing to and transmitting data over the Internet.
Because data source administration is self-service, data administrators and organizational IT teams can work together to bring new data sources online without needing external support to verify or troubleshoot data source configurations, and without needing changes to external firewalls or Internet connectivity.
How does the Data Gateway work?
The Data Gateway involves a virtual machine running on VMWare, AWS, Azure or comparable systems, typically firewalled in a DMZ from the rest of the customer's corporate network. This virtual machine connects via a secure network link to Koordinates, enabling the publisher to upload and update data to their Koordinates site directly from their own internal data sources.
The full benefits from scanning, described under ‘Advantages’ below, are available to all these data sources with no dedicated data staging infrastructure required. All communication between the Data Gateway and Koordinates is encrypted, and the locality of the appliance to the data provides for optimised and reliable transfer throughput.
Data sources are configured through the Koordinates UI by data administrators, allowing administrators and IT staff to test data publishing workflows without needing additional technical support. Security and software updates for the Data Gateway are fully managed by Koordinates.
The advantages of the Data Gateway
Greatly simplified publishing workflows
The Data Gateway removes the need for publishers to create and maintain their own “data staging” infrastructure and additional data publishing workflows. Instead, the Data Gateway supports a web-based user interface for data administrators to load, configure, and update datasets from all internal, Internet, or uploaded data sources.
Potential to integrate multiple data sources
By deploying additional Data Gateways, multiple data sources across different networks and offices of an organization can be integrated into a single customer account.
Additional security protections
Koordinates has developed a robust security model for the Data Gateway appliance, which ensures that all data communications across the Internet are encrypted. This additional layer of protection means that any misconfiguration of internal data sources does not pose immediate security risks to the organisation.
Optimised data performance
Performance of access to many common databases and network protocols is a significant problem over high latency (Internet) connections because of their original designs, or are complex to configure correctly. The Data Gateway has been designed to optimise data performance while keeping data source configuration simple.
Troubleshooting performance or configuration problems with data sources is greatly simplified, with only one network/firewall “hop” between the data source and appliance. This allows self-service configuration of data source changes by organizations without needing Koordinates support. Koordinates monitors the performance of the Data Gateway throughput to provide useful diagnostics of both the source to Data Gateway links as well as the Data Gateway to Koordinates link.
Maintain benefits of data scanning
Before data is imported to a Koordinates site, designated data sources are scanned in order to:
Identify and pre-validate supported data present on the data source.
Estimate data size, so organizations understand cost impact prior to import.
Automatically identify data that may provide updates to existing published datasets, including incremental “changeset” updates.
Enables an intelligent admin UI to support the data import and update processes and make data publishing self-service for users.
Allows fully automated data updates and publishing.
Scanning is a crucial process for the data import workflows used by staff, and needs to be reliable and performant in order for efficient data publishing.
Further technical and security details
The Data Gateway consists of two major components. The first is an orchestration service that resides in the Koordinates cloud as an isolated part of our platform, and the second is a linux virtual machine appliance that is downloaded and run by the customer. The appliance is authenticated against a customer account, and all communications between it and the orchestration service occur over an encrypted VPN connection.
For security, the primary focus has been on minimising and eliminating any potential attack threat surface the appliance might represent to customer networks, while additionally keeping the appliance constantly updated with the latest configuration and security patches from Koordinates. Logging and audit reporting are available for organisation IT monitoring.
The Data Gateway has been designed and built from secure and supported components based on a customized and locked-down Debian Linux operating system. It is a focused appliance containing and running only software relating to its primary purposes.
The orchestration service is isolated from the main Koordinates platform in separate infrastructure, both for customer and platform security. The data processing application interacts through task-oriented APIs with the orchestration service, and has no ability to run arbitrary code on the data gateway. Several different threat models for Data Gateway have been developed, reviewed, and mitigated as part of the design process. Koordinates runs centralised logging, monitoring, and alerting systems across all infrastructure.