Race condition and Mongo DB - part 1

How this all started…

Recently, I’ve encountered race condition issue in Mongo DB while updating a document. The document I’ve been wanting to update had a huge array inside them which had more than 700 items (This is an anti-pattern in Mongo DB and we should not be doing this!) and I had to update the item in this array one by one due to the API constraint.

When I tried to update hundreds of the item inside the document, I was blissfully unaware what was going on. I realised, afte spot checking the data, that something was going seriously wrong. Some of the items that should have been updated weren’t updated at all! There couldn’t have been a concurrency issue right? I’ve seen conflicts exception getting thrown for creation.. surely there would have been something similar to this for the update operation.

I’ve consulted the lead and he had a very clear answer for me: race condition. He had already encountered the same issue, suffered and fixed the data by updating slowly, putting some time buffer between each transaction so that we only update after the previous one is finished. I did just that by putting Thread.Sleep(20000) and was able to update without any issue but this whole process took a day to complete because of this additional buffer time.

ACID

ACID (Atomicity, Consistency, Isolation and Durability) is a core concept in database that basically defines that the database should be:

Atomicity: Should treat everything that happens in the transaction like it’s one atom. If something fails, then rest of the transaction even though it was successful, should revert. For example if address was failed to be created for a new customer even though customer creation was successful in that transaction, we should mark the transaction as a failure and revert the customer creation.

:warning: For mongo DB atomicity is by default supported for single document, updateMany() for example will not rollback other updates if one of the update fails.

Consistency: We should have a validation inside the database and we should be using them for all database transaction to achieve data consistency. For example, if we start creating a blot post with 1000 characters even though the database restricts the field to only allow 255 characters, we should not allow this data to be created.
Isolation: A database transaction needs to be isolated: It should not impact other transaction and other transactions should not affect this transaction. For example, user A and user B tries to update the same document, one needs to wait until the other one finishes their transaction.
Durability: Even in the critical failure, the transaction needs to be saved to the database. E.g. power loss

In my case, the “Isolation” part of the ACID principle was not happening at all. The original data when the first transaction completed has changed and the second transaction read the previous state and was making a transaction on the outdated data. Ok then, how do we handle concurrency issue then? Most of the database drivers handle concurrency in its own ways.

What’s optimistic concurrency and pessimistic concurrency?

Optimistic concurrency assumes that the concurrency will happen infrequently and puts no lock in the resources. However, before the change is commited, it will check if the data has been changed and if it has will abort and rollback the changes.

Pessimistic concurrency actually locks the resources in the database so nobody can read or write the resource (depending on the lock configuration) until the transaction is completed. This is much difficult to implement and the fact that the resource will be locked out for reads is a huge downside of implementing a pessimistic concurrency as it will impact the performance.

Next we will see if we can implement optimistic concurrency in MongoDB

Published Mar 11, 2024

Interested in C# .NET DevOps anything backendMinju Park on Twitter