There are many articles and books about good API design. But more often we connect to external systems via API, not developing.
I divide programming interfaces into two big sets – public and not so public. The first category – APIs of big companies like Facebook, Twitter, Paypal, etc. Generally, this kind of API is stable, developer-friendly and well-designed. The second category is APIs of small or middle-sized companies, state services and internal applications in big companies. This article will focus on the second set.
I once worked for a company, which software has a connection to thousands APIs and I’ve compiled some rules for building seamless connection to external service providers. The examples just given are taken from real projects which I worked on.
- Log raw inbound and outbound packets. Inbound packets must be logged before deserialization, outbound – after serialization. Otherwise, you can lose or corrupt significant data. If it’s possible – provide some indexed storage for raw requests (e.g., ElasticSearch-based solutions like ELK). In case of a problem, you can extract data packets fast and easy. In real projects, problems happen more often that I would like.
- It sounds weird, but do not trust the contracts. I’ve encountered situations when contracts were changed without any notifications, and you must be prepared for those cases. Take a look at this xsd-schema. It uses for response DTO generation (via JAXB).
<xs:element name="order"> <xs:complexType> <xs:sequence> <xs:element name="id" type="xs:string"/> <xs:element name="customerName" type="xs:string"/> ... <xs:element name="mcc" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>The document doesn’t allow new elements. And one day it happened. After that our application had validation error on any request because the response was invalid in terms of XML.. The solution is to add “any” element to the xsd.
<xs:element name="order"> <xs:complexType> <xs:sequence> <xs:element name="id" type="xs:string"/> <xs:element name="customerName" type="xs:string"/> ... <xs:element name="mcc" type="xs:string"/> <xs:any minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element>This is not always possible to fix a schema, but if it’s small – just perform some review. There were many cases when mandatory (according to documentation) fields were empty or null. If these fields are necessary for process – you cannot manage it. Just log (see previous paragraph) it and throw an exception or return the error code. Also, check that your client/parser/protocol is ready for unexpected changes. E.g., if you use ObjectMapper from Jackson, don’t forget to allow unknown fields in JSON (via @JsonIgnoreProperties(ignoreUnknown = true) or ObjectMapper setting). More details about the schema migration and evolution you may read in DDIA book.
- Validate your input. In some cases, schema validation is impossible because the schema doesn’t exist or different by meaning requests use the same data structures. Anyway, it’s better to validate input at an earlier stage. Throw a validation exception in the first step of processing is cheaper than looking for a reason of NullPointerException in the rest of source code.
- Use rate limiters and circus breakers. I saw so many times when bank services became unavailable because of the huge amount of request from our side. Tools like resilience4j are very powerful and helps you to avoid the problem.
- Of course, It’s good to use a monitoring system with common metrics – rpm, latency, throughput.
- All timeouts (connect, read, etc.) and buffer sizes must be set up explicitly. It’s very important, but developers often forget about this. As I mentioned before that kind of services is not as stable as we want and connection issues are prevalent issues for them. Setup timeouts in order to prevent a situation when one hanging request is blocking all the queue.
- Provide a switch for disabling connection to external system without application stop or reboot. I remember a few cases when we were sending incorrect data to an external system, but it was impossible to shutdown the whole application. It’s pretty simple, but you can save your colleagues from a heart attack.
- Retry patterns and deferred tasks are beneficial things when you are communicating with the external system. You may use SpringRetry which is just in-memory retry or different persistent queues or message brokers. You must be able to set timeout, retry and fail policies.
- Codebase of integration modules is like nightmare – thousands of parameters with secret knowledge, magic numbers, and implicit relationships between that. I couldn’t say what’s happening here in one month after I did this. In order to make code maintenance easier, I stick with a few rules.
- All communication with external APIs must be located in single place. It sounds like a revelation of Captain Obvious, but it’s common practice when a code is located in different places. And there are many excuses: had no time, had terrible mood and so on. But anyway it makes code unmaintainable. In the case of the microservice architecture, there is another option when source code located in one place (e.g., library) but each microservice has its own connection to the external system. It also leads to loss of control of connection. To avoid this, you may use a single proxy for all services. It makes it possible to apply caching and other helpful things. Proxy sounds like a single point of failure, but you may run a few instances of app.
- Protocols, parameters, and tricks must be well documented, and it must be in an actual state. I just decline to merge requests without actual docs.
- I often develop or refine different gateways. Usually, It’s a simple app, which moves data from one system to another. But it’s not as simple as I want. Each app becomes more complex over time. Moreover, the count of input and output types may increase, and it turns into a complete disaster. In order to prevent that I use a middle tier domain model. It takes more time, but in long-term projects, it pays off itself.
- Check negative cases such as loss of connection, long reads, invalid requests, etc. You will be surprised and will be able to come up with appropriate reactions (fix it automatically, log it, notify an operator and/or provide tools for manually fixing)
Now a little bit about testing. Let’s consider unit and end-to-end testing.
- In case of unit testing everything looks clear. Just use mocks like MockRestServiceServer or something like that.
- In case of end-to-end testing there is a question – how to emulate external system. There are few main approaches, let’s consider some of them
- There is a connection with test environment of the external system. It’s the most realistic way, but there are some limitation such as temporary unavailability or because it impossible because of contracts terms. A significant disadvantage is tests instability. If you have a dozen connections be ready to get 50% of red statuses on buildserver.
- Another way – a mock is inside an application. It’s very stable, but there are test code inside the production app. It makes it possible to break something in case of incorrect settings.
- The last one – mock outside the app. I found it suits enough. It means that there is a service which is emulating target system. It very simple in common and may contain some logic (e.g., responsible for corner cases, etc.). It takes time on the first steps, but provide stable tests without any interference in production code. Also, it’s so easy to emulate network problems, faults, failures and so on.
Integration with internal systems isn’t as simple as it looks. It takes a lot of time and energy to make it robust and predictable