Skip to main content

External Service Gateways

An external service gateway is one of the three pillars of a subsystem (1. Control Service, 2. Backend for Front-end or BFF, and 3. ESG). The ESG pattern works at the boundary of the system to provide an anti-corruption layer that encapsulates the details of interacting with other systems, such as third-party, legacy, and sister subsystems. These ESG services act as a bridge to exchange events with the external systems. External systems are owned by a differen "organization". This may be another organization in your company or another company altogether. The implication is that we have little to no control over the design and architecture of these systems. Nor do we have much say over when and how they might change. THis means it is in our best interest to isolate the interactions iwth external systems, to protect the rest of our system and allow each to change independently.

Below, we see a representation of a typical autonomous subsystem. The relative position of services from left to right implies the flow of events on a timeline, with upstream services to the left and downstream services to the right. Events flow in from upstream systems via ingress ESGs; the core BFF and Control services of the subsystem make their contrivutions and produce more events. Events flow out to downstream systems via egress eSGs. The core of the subsustem is completely decoupled from the external systems:

ESG Layers
  • Each ESG service is focused on adapting a single external system to the internal model of the system.
  • Each ESG is a separately deployable autonomous service, allowing us to employ the Liskov Substitution Principle (LSP)

In case of legacy migration and the use of the Strangler pattern, we can leverage the ESG pattern to run the old and new simultaneously and simply turn off the old when it is no longer needed, without upstream or downstream impacts

When we divded the system into autonomous subsystems (Core BFF, Control Service), we treat each sister subsystem as an external system. For each external system, we create at least one ESG service in the appropriate subsystem.

We need to first understand if events will flow in from the external system, flow out to the external system, or both. We also need to determine which domain events teh ESG service will consume and produce. Then we need to understand the options for connecting to the external system, define semantic maps between the external and internal domain entities, establish whether we are aiming to invoke an action or elicit a reaction, implement the ingress and/or egress pipelines, and decide how to package the service.

An ESG for a third-party service will typically have an egress flow that invokes a specific action in the external system. Then an ingress flow receives an event from the third-party webhook that contains the results of the action. It transforms the external event into an internal domain event, so that downstream services can decide how to react to the outcome of the action. This keeps the ESG as a simple bridge.

For ESG to connect autonomous subsystems, these services are focused on transforming between internal and external domain events and relaying them between subsystems so that the downstream subsystem cn react to upstream events. This scenario includes an egress flow upstream and an ingress flow downstream. Nether flow actually invoke a business action. They simply react in order to amke domain events available to the BFF or Control services that provide the business action.

ESG pattern is also used to integrate different systems by synchronizing (that is, replicating) data between them. This is most evident when implementing a bi-directional integration with a legacy system. Either system can carry out a business action and the ESG layer is simply reacting to keep the two systems in sync.

Egress

Am egress flow is responsible for reacting to an internal domain event and integrating it with an external system. In this basic example, the external system has the ability to reeive events and we are simply forwarding the logical domain event from one system to the other. THis means we are only intenting to elicit a reaction in the external system, but not necessarily a specific reaction. In this example, we will assume that the external system exposes an AWS SNS topic that it consumes.

egress esg

Sample code for the listener function

export const handler = async (event: KinesisStreamEvent) =>
fromKinesis(event).filter(onEventType).map(toExternalFormat).through(send()).through(toPromise)

const onEventType = (uow: any) => uow.event.type.match(/thing-*/)
const toExternalFormat = (uow: any) => ({
...uow,
sendRequest: {
//variable expected by 'send' util
TopicArn: process.env.TOPIC_ARN,
Message: JSON.stringify({
f1: uow.event.thing.field1,
f2: uow.event.thing.field1,
f3: uow.event.thing.field1
})
}
})

The domain events of interest are avilable on a stream in the event hub. The listener function consumes from the stream and filters for the event types of interest, such as all events for the thing domain entity. The toExternalFormat step transforms the internal format to the external message format. The send step connects to the external system and sends the message.

Ingress

An ingress flow is responsible for integrating with an external system to receive external domain events and transform them into internal domain events so that our system can react as needed.

In a basic example, the external system has the ability to emit events and we are simply forwarding the logical domain event from one system to the other. This means that we are only intending to elicit a reaction to the external system, but not necessarily in a specific reaction. In this example, we will assume that the external system exposes an AWS SNS topic where it emits events.

ingress esg

Sample code for the trigger function

export const handler = async (event: KinesisStreamEvent) =>
fromSqs(event).map(toInternalFormat).through(publish()).through(toPromise)

const toInternalFormat = (uow: any) => ({
...uow,
event: {
//variable expected by 'publish' util
id: uow.record.messageId,
type: `thing-${uow.record.body.action}`,
timestamp: uow.record.SentTimestamp,
thing: {
field1: uow.record.body.f1,
field2: uow.record.body.f2,
field3: uow.record.body.f3
}
}
})

The external system emitss events on an AWS SNS topic. The ESG creates an AWS SQS queue to connect to the external system and receive events from the topic. The trigger function consumes messages from the queue. The toInternalFormat step transforms the external format to the internal event format. The publish step puts the internal domain event on the event hub, so that core services can react to the event.

Packaging

  • Each ESG lives in its own source code repository and cloud stack, like all autonomous services.
  • As a default starting point, there should be one ESG per external system that contains all the ingress and egress pipelines.
  • Alternatively, we can create a separate ESG for ingress and egress pipelines per external system. This is useful when the way the ESG connects to the external system is significantly different for ingress and egress.
  • For large and complex external systems, it can be advantageous to create multiple ESG services that group related pipelines and that change together, such as per external system capability. This approach can also be combined with the packaging by flow direction approach.
  • Use a service naming convention that makes it easy to recognize the purpose of the specific ESG. A good convention will alphabetically group related services in the sourec code tool and cloud console, such as <external-service-name>-ingress|egress<capability-name>-gateway

Integrating with third-party systems

We wrap third-party system with an ESG service that adapts the external system to the core model. This gives us flexibility in the future if we switch providers or decide to do the functionality in-house. Then we can simply substitute a different ESG service when we change directions. We can even run multiple implementations simultaneously for different scenarios.

Let's look at the egress flow for performing an action and the ingress flow for receiving updates about the progress of the action.

Egress - API Calls

egress api esg

The domain events of interest are avilable on a stream in the event hub. Once again, the listener function is a stream processor that consumes from the stream and filters for the right event type, such as OrderSubmitted. It transforms the data to create the external request and invokes the action via a REST POST:

.map(toPostRequest)
.ratelimit(opts.number,opts.ms)
.through(post())

One thing of note is that third-party APIs are typically throttled, and pricing may be based on the rate at which they are invoked. Therefore, it is recommended to add explicity backpressure with a rate-limiting feature. See Chapter 4 of "Software” Architecture Patterns for Serverless Systems" book.

It is important that we do not attempt to invoke an external action and then produce an internal event about the success of the action in the same unit of work. Instead, we will rely on the third-party system's webhook feature to notify us about the success of the action.

Ingress - webhook

Third-party SaaS products provide webhooks as a generic way to integrate with other systems and allow them to register a callback, so that they can be notified about internal changes and progress updates. Some examples include a payment service emitting an event when a payment is completed or an email service emitting an event when a message is flagged as junk.

These ingress flows are responsible for integrating with a webhook to receive external domain events and transform them into internal domain events so that the system can react as needed.

ingress webhook esg

The third-party system emits events via its webhook feature. The ESG uses an API gateway to implement a callback endpoint that will receive external events from the webhook. The callback function transforms the external format to the internal event format and forwards the domain event to the event hub, so that core services can react to the event.

Asychronous request response

The interactions with external systems are typically the least resilient part of a system, because they require a synchronous call to another system that you do not control. This is why it is preferable to have ESG services that are completely asynchronous and react to events from other services. This supports the majority of scenarios where a fire-and-forget flow is appropriate, such as when on user performs step and another user performs the next step after man automated steps have successfully completed in between.

However, there are scenarios where a single user performs a step and needs a response from an external system before continuing to the next step. It is important that the user proceeds to the next step as quickly as possible, but the user shouldn' have to wait fora flaky and/or slow response. In these scenarios, we can employ an asynchronous request-response flow to make the user experience more resilient and responsive.

Async request response

The BFF's graphql function on command/query side provides a mutation to initiate the request. It stores a request object in the entities datastore. The trigger function publishes a request event, such as <entity>-<action>-requested.

The ESG's listener function consumes the request event and makes the external API call. The external system's webhook feature invokes a callback with the results of the action. The callback function publishes a response event, such as <entity>-<action>-completed.

The BFF's listener function consumes the response event and updates the request object in the entities store. Meanwhile, the frontend polls the BFF's graphql function to query for the status of the request. The polling implements an exponential backoff and results in an error after several attempts. If the response arrives later, then a fresh query will return the results, which provides for more resilience. Alternatively, a live update approach can be used.

Third-party systems are the most obvious kind of external system. However, as a system grows larger, it will need to be divided into multiple related systems.

Integrating with other subsystems

Each autonomous subsystem has its own cloud account, has its own event hub, and only exposes a set of external domain events for inter-subsystem communication.

We can then create arbitrarily complex systems by connecting autonomous subsystems in a simple fractal pattern.

Egress - upstream subsystem

Upstream subsystems make a set of external domain events available and route them to downstream subsystems.

egress upstream subsystem

The autonomous services within a subsystem exchange internal domain events. The backward compatibility of these event types is only guaranteed within the subsystem. The Producer services publish internal domain events to the bus as usual and remain completely decoupled from downstream subsystems.

Each subsystem produces a set of external domain events that contain the information they are willing to share and support. This hides the internal dirty laundry of a subsystem and provides stronger backward compatibility guarantees to downstream subsystems. One or more ESG services consume the internal domain events and transform them into external domain events. Then these events are published to the bus as well.

From here, the fully managed event bus service, such as AWS EventBridge, is responsible for forwarding events from the bus in one account to the bus in another account. One or more egress rules are defined to relay the external domain events to downstream event hubs.

Here is an example of an egress rule for AWS EventBridge that routes all external domain events to one or more downstream subsystems.

The ESG services set the source field to external to simplify the routing rules. We add a target for each downstream bus that will receive the upstream subsystem's events:

# SubsystemX Rule
EgressEventRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source: ['external']
Targets:
- Id: SubsystemY
Arn: arn:aws:events:*:123456789012:event-bus/event-hub-bus

Both subsystems must agree to the exchange of events. A trust agreement is established by configuring permission policies in both accounts stating which accounts a subsystem is willing to send events to and receive events from.

Now let's complete the picture by looking at a downstream subsystem's ingress flow.

Ingress - downstream subsystem

At this point, upstream susystems have made their external domain events available and routed them to downstream subsystems. Next, each downstream subsystem makes the events of interest available to its own services. See here the inter-subsystem ingress flow and the resources involved:

ingress downstream subsystem

The bus of a downstream event hub receives external domain events from an upstream subsystem and routes the events to an ingress stream to isolate them from internal domain event flows. One or more ESG services consume the external domain events and transform them into internal domain events. Then the internal domain events are published to the bus and routes to the internal streams for further processing, as usual, by Consumer services.

Note that the ingress stream is optional. If the volume is low, then it may not be necessary. You can start without it and add it as needed. It can also be excluded in lower environments to control cost.

This variation of the ESG pattern allows subsystems to evolve independently. Only the ESG logic needs to be modified when the definition of an internal or external domain event changes. The Robustness principle is followed when a change is necessary, so that both sides do not have to change at the exact same time. This anti-corruption layer is the primary benefit of this ESG pattern scenario.

Implementing subsystems on the same cloud provider allows leveraging the features of common services.

Few notes on AWS EventBridge

EventBridge has 3 parts:

  • Event sources
  • Event buses
  • Rules: allow you to match against values in meta data or payload that is ingested, and send it to "Targets" which could be AWS Lambda, Kinesis, and over 78 other targets in AWS

Example event

{
"detail-type": "Ticket Created",
"source". "aws.partner/example.com/123",
"detail": {
"ticketld". "987654321",
"department": "billing",
"creator". "user12345"
...
}
}

Example rule:

{
"detail-type": "Ticket Created"
}