Schema versioning

The subjects of schema versioning, document polymorphism, and possibly schema migration, are a vast and sometimes controversial subject.

When it comes to displaying the schema for a MongoDB collection, there are several features that come into play:

Required property
When fields are required to appear in every document, we have a visual in the ERD (a star * when a field is required) and a property in the properties pane. When we reverse-engineer a collection to infer the schema, we will mark as "required" any field that appears in 100% of the documents sampled. If a field appears in strictly less than 100% of the sampled documents, it will not be marked as required.

Multiple data types
A given field name can have multiple data types. We support this too. Not only for scalar data types, but also for complex data types (objects and arrays.) Their representation differs depending on whether all the multiple data types are scalar (we simply display the word "multiple" in the ERD and show the details in the properties pane)

Multiple data type - select

Image

or at least one of them is a complex data type. In this case, we have to represent them using a oneOf choice, for example:

Image

In the ERD, the representation is as follows:

Polymorphism ERD


We detect this polymorphism during reverse-engineering.

Schema versioning
Schema versioning can be tricky to auto-detect during reverse-engineering. That's because we can't tell whether missing fields are on purpose, or as a result of a schema evolution (or worse, as a result of lax development. ) But you may for example specify a query for a particular version to be used during reverse-engineering, then repeat the operation for another schema version, then merge the version using a oneOf choice as mentioned above.

Schema migration

Document databases are quite permissive. Used properly, this is a huge advantage. But it also has some drawbacks:

Of course, one of the great benefits of NoSQL databases is the ability to evolve schemas without downtime. But contrary to popular belief, we observe that best practices, at major organizations having successfully used NoSQL databases for a long time, include systematic schema migration to counter-balance the caveats mentioned above.

There are several schema migration strategies:

Each strategy has its pros and cons, and an associated cost. You should evaluate the strategy best suited for each use case. The context of a specific collection might warrant its specific strategy. Hackolade does not currently provide migration features or services.