Duda Collections: The Microservice Way

Collections are a special data structure included in the content libraries of Duda sites. Collections are incredibly useful for managing websites at scale and especially helpful when working with dynamic pages and Duda’s native widget builder.

Since collections are so important to so many of the web pros that use our platform, we thought it would be interesting to pull back the curtain and explain how these crucially important tools work.

Let’s dive in...

A Quick Glance at Our Current Implementation

In Duda Website Builder, we have two major groups of collections:

Internal collections (image gallery and various other data field types) — These are stored and managed by customers inside the Duda platform.
External collection (Instagram, Airtable, Google Sheets, etc.) — Customers manage this data outside our platform and provide Duda with some mechanism to fetch and cache it for faster access.

There is a principle difference between these two groups. Internal collections data is stored in the database (Oracle). Externally-provided data, since it’s transient by nature and can always be re-fetched, is stored in a fast in-memory cache (Redis).

Current Limitations With Internally Stored Collections

The first and, probably, most important limitation is the database itself. Our Oracle database is heavily loaded with multiple requests — accounts, payments, pages, widgets, blogs, templates metadata and many, many others.

One option is to scale Oracle vertically; i.e., use more powerful servers. However, relational databases do not scale well. Another option is to extract some of the load to a dedicated database that scales horizontally; i.e., add servers that operate in parallel. In this case, if more and more users use more and more collections with more and more data, it won’t affect data operations that are unrelated to collections.

Also, Oracle works best for storing data with predefined structure. For example, some “Users” table fields can be defined in advance (e.g., email, login, first and last name). However, each collection has different fields and types of fields. A “Plants” collection would be something like “plant name,” “origin,” “care tips,” etc., but an “Employees” table would be totally different. So, internally we store data as stringified JSON documents. Even though Oracle has extensions that allow convenient operations on unstructured JSON documents, there are other databases that are specifically designed for storing and processing documents without predefined structure.

Current Limitations With Externally Provided Collections

External collections may change at any moment, and Duda needs some way to apply these changes. We have two mechanisms for this:

Data expiration. This capability comes out-of-the-box when using in-memory data store Redis. All external collection data expires in two hours, and needs to be re-fetched. However, if we fail to fetch data, we’ll use “fallback” data that is cached for one week (so, we in fact have to store the same data twice — once in a “fast cache” and then again in a “long cache”).
A background job that runs every hour and checks if external collection data of published sites has changed. If it has, we clean the cache for rendered runtime pages and pages will be re-rendered with fresh re-fetched data upon the next request.

So, the functional problem with this solution is that Redis is a key-value store, and we store all data as a single chunk. Thus, in order to search/filter data, we have to do it in-memory. In 99 percent of use cases, this is not a big issue, but all of our big collections (>1000 rows and >1 MB of raw data) are external collections. If you’ve got a long list widget that is connected to such a collection, your page is cached with all the data.This may make your site visitors a little unhappy if they end up loading 2 MB from the web.

At Duda, we have lots of new collections-related feature requests and optimizations of existing functionality that need to be done (big collections can cause big issues). Our current solution can’t fit all requirements, so the decision was made to extract site collections into a dedicated infrastructure — into a microservice.

Why Microservices: A Global Trend of Distributed Systems

If you’re familiar with software development, or have worked in an IT-related field for at least a few years, you must have heard the term “microservice.” The concept behind microservices is that instead of having a huge “monolithic” application, you split into multiple smaller applications that are nearly-independently from each other. Microservices usually have different databases (though it’s not a requirement) to better suit their needs.

There is no single “Google” application — such huge services consist of multiple applications (sometimes even thousands) that communicate together behind the scenes. There might be “Gmail” microservice, an “ads” microservice, an “authentication” microservice, etc., but you as a Google user don’t really know or care. You simply type in “google.com”, perform any actions, and it works. So, why microservices?

Autonomy of scale — If suddenly millions of people will start sending emails via Gmail instead of WhatsApp messages, only the “Gmail” microservice will need to handle this additional load. More servers will be started and users will still have good experience. With Duda collections - we definitely want our customers to have the option to create more and bigger collections with rich content. We don’t want this additional load to affect other services and capabilities.
Autonomy of deploy — Whenever a new feature or a patch is released, a microservice can be deployed in less than an hour without the need to change other parts of the system. And since lots of relatively long automated tests may need to run before each deploy, you don’t need to test the whole huge system, which can take hours! Instead, you only need to test the flows that affect a small part of it, which takes ~10 minutes in the case of Duda’s collections microservice). At present, the Duda monolith is redeployed every business day, but the collections microservice doesn’t need to wait for that. When needed, we release improvements up to 4 times a day. When not needed, we keep the same stable version for days.
Too many more to list... — There are many, many other reasons why we decided on microservices, but we’re not going to mention them because this article is about Duda Collections, not microservices in general (if it were, it’d be much longer).

However, it’s also important to mention that microservices significantly increase complexity of the system as a whole.

When there are multiple “moving parts,” a system is more likely to experience failures. Independent microservices still need to communicate over the network — which can fail — and the workflow needs some way to recover from those failures.

For instance, whenever a published site tries to retrieve collection data, but the collections microservice is not available for some reason, it’s fine to render an empty list widget instead of showing “Sorry, error while rendering your page.” But when the issue is resolved, all affected pages should be re-rendered.

How Do Duda Collections Benefit From Microservices?

Collections benefit in so many ways! Namely, improved performance, new awesome features and a lower time-to-market for those features.

A Powerful Database That Opens New Horizons

As described above, a major reason for extracting all collections functionality is to increase scalability. As opposed to the Duda monolith, our collections microservice is using Amazon DocumentDB (if you heard about MongoDB, it’s almost the same). It’s a non-relational database that is designed to work effectively with user-structured documents. This allows it to perform granular search and filter operations at the database level without the need to load all of the data if you only need to show the first 10 items. This database is also designed to work in cluster mode; i.e., if the primary server fails, edits of collection data won’t be available for some time, but reads won’t be affected thanks to replica servers that still respond within 30-50 milliseconds.

More Frequent and Safer Releases

With microservices, the team has more freedom to choose when to implement new features. Adding a new table to the database doesn’t need to be approved with our system engineers because it won’t affect the main database. Changes in code won’t affect other teams working in the monolith. Thus, “technical overhead” (i.e., feature development) is reduced drastically.

The performance and behavior of a microservice is also easier to monitor. When the development team notices some bad logs, it can be fixed in less than 1 hour, or even rolled-back to the previous stable version in less than 10 minutes.

FAQs

Here are some answers to the most frequent questions we've received about using microservices to power Duda collections.

Is Migration Associated With Downtime or Risk of Wrong Data Displayed in Widgets?

Even though migration to a microservice is complex, we can perform it without any downtime or serious risk of unexpected behavior. With our development processes based on feature flags, we can take granular control of the process. At each step of the migration, we can ensure backward compatibility.

We started the migration process a few months ago, and our general idea was “even when a request comes to the microservice, users should see ‘reference’ values that are returned from the stable system (monolith)”. So for any request, the microservice first redirects it to monolith to perform the relevant operation, then the same operation is executed in the microservices. Results are then compared and the response from monolith is returned. This doesn’t affect performance because time-consuming operations (like ‘get data’) are executed in parallel.

Is There Any Action Required From the Duda Customer Side?

No! The process of migrating all flows to a microservice happens behind the scenes. You may not know it, but your published sites may already be served by our microservice. And some microservice-served customers already have access to beta-stage new capabilities!