Update heavy JSONB with low memory consumption in PostgreSQL
Introduction
We are working at QIMA, a company working in quality control and supply chain audit. We are currently building an awesome application called QIMAone.
One of the complex thing we must consider in our application is to permit to our clients to create their workflows which contains their test checklists and defect checklists for example. A workflow is a set of checklists (test, defect, measurement) and a checklists is a set of checkpoint and inspector must check to confirm if a product was produced as it was designed. A workflow is totally customizable and allow our clients to define how inspectors have to proceed his inspections. They define the kind of controls they want (measurement, quality check, most common defects …) The result of this workflow is stored in our database as a JSON which can be huge depending on how many test the client configured. We decided to use JSON format because as we need to load the full workflow, we wanted to avoid managing a very complex and generic database model which generate several join between all those table every time we want to access this workflow.
During the inspection, the inspector will answer all those questions.
In a first time, we decided to use this JSON to save each answer directly on their related question
For example, with a simplified workflow :
{
"testChecklist": [
{
"question": "is the packaging damaged ?",
"type": "YES/NO",
"id": "1",
"constraint": {
...
}
},
{
"question": "how many items in the carton ?",
"type": "NUMBER",
"id": "2",
"constraint": {
...
}
},
...
],
...
}
If the inspector answered the question 1 “YES” and next question 2 : 10, then as all actions are atomically saved, we will :
API call 1
- load this JSON
- set the answer on question 1
- save the JSON
API call 2
- load the fresh updated JSON
- set the answer on question 2
- save the JSON
The result of this JSON will be :
{
"testChecklist" : [
{
"question": "is the packaging damaged ?",
"type": "YES/NO",
"id": "1",
"constraint" : {
...
},
"answer": "YES"
}, {
"question": "how many items in the carton ?",
"type": "NUMBER",
"id": "2",
"constraint" : {
...
},
"answer": 10
},
...
],
...
}```
This is precisely what we want but, with the increasing users quantity and our business logic more and more complex we noticed that load a full JSON object which can weigh more than 5mo, just for adding a single line of comment will quickly create some troubles.
So, we decided to use another approach, more complex but lighter, which consist to store each inspector action as an event. With this approach we just need to save every answer as a single line in our database, and we don't need to load the full workflow json to update it.
Database schema
Now the workflow is immutable and can be stored in our cache because we still need to check that the inspector answer is valid according to the kind of question (for example, the inspector cannot answer 10 on a YES/NO question).
Database event lines
With this technical change, we have now a lighter API which require less memory on each inspector call, permit to be able to absorb more call at the same time. This improvement is especially interesting if we consider that an inspector can perform his inspection in offline mode and will synchronize maybe 150 actions (set answer, add photo, set comment …) when he will move to the online mode.
But how to load the workflow with all answers?
Each action possible on the workflow is now stored as an event, which is represented as a class in our Java code base. We use Jackson jsonSubTypes to convert the JSON object into the correct Java object. All event extends the following generic class which implements a BiConsumer which will be responsible to update the workflow according to his own business logic.
NB : InspectionAnswerEvent
is an hibernate entity which represent each event store in the database.
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "event_type")
@JsonSubTypes({
@JsonSubTypes.Type(value = DefectChecklistCommentEvent.class, name = "DEFECT_COMMENT"),
@JsonSubTypes.Type(value = DefectChecklistImageEvent.class, name = "DEFECT_IMAGE"),
@JsonSubTypes.Type(value = TestChecklistAnswerEvent.class, name = "TEST_ANSWER"),
@JsonSubTypes.Type(value = TestChecklistCommentEvent.class, name = "TEST_COMMENT"),
...
})
public abstract class AbstractWorkflowAnswerEvent
implements BiConsumer<InspectionAnswerEvent, Workflow> {
}
For example, the TestChecklistCommentEvent is simply described like this :
@Getter
@Setter
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class TestChecklistCommentEvent extends AbstractWorkflowAnswerEvent {
private String comment;
@Override
public void accept(
InspectionAnswerEvent inspectionAnswerEvent,
Workflow workflow) {
workflow
.searchAnswer(inspectionAnswerEvent.getPathId())
.ifPresent(
answer -> {
var commentObject = new InspectionComment();
commentObject.setTimezone(inspectionAnswerEvent.getCreatedTimezone());
commentObject.setDate(inspectionAnswerEvent.getCreatedDate());
commentObject.setMessage(this.comment);
answer.setComment(commentObject);
});
}
}
Benefits :
- low memory consumption
- reduced processing time
- save the history if you change few times the answer of one question
- business enabler due to history (fraud detection for example)
- no big algorithm to manage all kind of actions, every event use a consumer to set itself in the full JSON
- straightforward to add a new kind of action by creating a new event
- easier to manage migration if the event model need to alter
Disadvantages :
- more complex to implement
- no global vision of the result directly in the database
Next step :
Use JSON path PostgreSQL feature to load only the question configuration for the validation part instead of caching the full workflow JSON
We are currently hiring Join us !
Written by Cyril WIRTH backend software engineer at QIMA ✍
#Json #Optimization #Design️