Switching from SQL to MongoDB — should you?
There's a lot of hype surrounding ‘NoSQL’ these days, and it's often hailed as the next big thing in data storage. Enough that you may be considering switching your project from something like MySQL to MongoDB to get rid of that pesky SQL.
What's SQL?
This blogs going to open a can of worms too big for one post, but here it goes.
Structured Query Language (SQL) is a language designed so humans can interact with relational databases in relatively simple English (legend has it it was made so secretaries could talk to computers).
Databases are like spreadsheets. You've records (spreadsheet rows) and data in these records (spreadsheet columns). So, for instance, you could have a list of your customers, with their names, phone numbers, and addresses.
Relational databases are like modern multi-sheet spreadsheets. Now, if your company sells AI-powered robotic cucumbers as toys for cats, you can have a table (sheet) for your CUSTOMERS, another for your PRODUCTS (SQL, like many things that originated at IBM, likes capital letters), and a table for PURCHASES. PURCHASES can link CUSTOMERS to PRODUCTS purchased without duplicating the information of either table.
You can now list your customers with the nearly plain-English query ‘SELECT NAME, PHONE FROM CUSTOMERS’ (just imagine a secretary, angrily clicking away at her 1970s Teletype in ALL CAPS). You could also list the addresses of all customers who bought the CyberMouse 2000 with realistic tail movement and five megaWatt laser eyes for a customer satisfaction survey (or to notify the authorities).
SQL has been married to the web since the 90s, and with good cause: it fits the needs of the nascent dynamic site and the information it stored.
What about NoSQL and MongoDB?
NoSQL is the 21st-century name for any database that doesn't (or can't) support SQL. These used to be called DBMS (Database Management Systems) before, to contrast them with RDBMS (Relational DBMS). It should come as no surprise that ‘everything that isn't SQL’ isn't a single technology but an huge great range of them. It covers everything from early 20th Century punched cards to the way your computer stores and organises all your files.
MongoDB is a very particular DBMS product that doesn't use SQL as its query language. It completely forgoes the paradigm of tabular data in favour of something very different: collections of entirely free-form data called documents. Where e.g. the SQL-based paradigm looks like a spreadsheet, the MongoDB one looks like a collection of text files that could have anything in them.
But of course, a database isn't much use if you can't query it, and for this, we need some structure. In SQL, this structure (the schema) comes from the names of the ‘spreadsheet’ columns and what data is allowed to fit in them (e.g. dates, amounts of money). In MongoDB, everything is free-form, with one restriction: every value we store needs to be tagged with a ‘key’, a word describing what this value is. For instance, where the CUSTOMERS table might have columns for the name, phone number and address, the MongoDB document would be something like ‘the name is John Smith; the phone number is…’. MongoDB stores these documents in a format derived from (and compatible with) JSON, the open standard for the interchange of structured data.
The magic comes from the fact that MongoDB can tell what information is stored by the names of those keys. More magic comes from the fact you can have things that don't match the paradigm of a table: lists of things, including other, nested lists of things, more sub-documents within your document, and so on—the entire power and simplicity of JSON, in fact. MongoDB is highly attuned to this and knows how to make sure data (including the names of keys) isn't duplicated when stored. You may be storing millions of documents with the name key, but MongoDB is expecting just this, and has already taken care of it. This takes away the need for you to do this deduplication step yourself, by using tables and relations in an RDBMS.
Is it worth switching from SQL to MongoDB?
There are compelling arguments either way.
Do you have big data?
MongoDB was designed for big data from the get-go. It makes it easy to store hundreds of millions of documents. It does this by making sharding trivial to implement: a large collection of documents can be split automatically between multiple servers, and the MongoDB driver in your application will know where to go to get anything, completely transparently to you.
If you're having trouble with the sheer volume of your data and retrieval speed, MongoDB could help.
You could also investigate optimising your SQL database through the clever use of RDBMS techniques like indexing, splitting data into multiple tables, using caches, satellite tables, and the like. Enterprise RDBMS solutions offer a similar feature in partitioning.
Is your data very sparse?
Visualise the spreadsheet view of your data. Do you have large areas of empty cells? Is it just empty cells with data sprinkled around? If so, the tabular model of the RDBMS may not be a perfect match for you.
MongoDB was designed to handle this type of data storage, and it excels at it (if you'll excuse the bad pun).
Alternatively, there are many ways to store sparse data in SQL, eg, a technique called normalisation. With large amounts of data, this can add a lot of complexity, but it may be worth the development effort in your case, and your front-end framework may be able to do this for you.
Is your data complex?
In relational databases, if a customer has multiple phone numbers, you traditionally add columns for ‘PHONE’, ‘PHONE2’, ‘PHONE3’, etc. And then you need to balance how many phones you need to store against the inconvenience of dealing with users with more phones, or the vast majority of them who'll only have one—leading to space being wasted. This way of doing things also makes querying harder. And the fix is more tables, more relations, and more complex code in the front-end.
On MongoDB, your ‘phone’ field can contain a list of phone numbers. Better yet, the MongoDB query language knows about such lists of things and can query them just fine. If you use indexes for faster searching, every item in the list will be indexed separately so searching works the way you expect it too.
And you can also store nested and structured data in MongoDB: lists, documents, lists of lists, lists of other documents, etc. Deeper-level data like that can still be accessed by the query language and indexed. Anything you can express in JSON can be stored and queried.
If your data tends to be very dynamic, MongoDB might be a good match for you.
On the other hand, if your data tends to be mostly static with easy to understand relations, an SQL-based database would be a better fit!
Is your overall schema getting out of hand?
One of our current projects allows its users to upload large data sets in CSV format, and generate reports from them. Users must upload some standard columns with every CSV, but they can also have their own, self-defined columns. Most users do, and everyone's columns are naturally different. This is a terrible match for the traditional SQL model. It’d require pretty much all of the optimisation techniques in the book to just be performant enough for production, and you still wouldn’t call it lightning fast. It just can’t scale. Plus, it takes a lot of expertise to understand how data is interconnected in the SQL database.
MongoDB is a really good solution to this problem: not only is its free-form document style suited to just this, but you can even provide segregation (for privacy and security) by storing each user's uploaded data in different collections (tables, in SQL-talk). Since there's no schema, you don't need to expressly create your collections as you would in SQL. Just storing data in a collection that doesn't exist will create it.
Do you need easy clustering?
Clustering gives you safety through redundancy: multiple database servers host the same data. If one should go down, the others will pick up the work so the data is always available. Most database systems provide clustering as an expensive, enterprise-level option. MongoDB comes with it as standard, and its set-up is one of the easiest we've ever encountered.
If you have servers coming out of your ears, or huge amounts of data that need to be fully redundant, you can combine sharding and clustering. And given how easy it is to cluster with MongoDB, this works well with cloud deployment (eg, spinning up new AWS instances on demand).
So, to switch or not to switch?
Perhaps the best tip we can provide to anyone considering a switch, is you don't have to. The benefits must outweigh the risks. But this isn't a question with just a yes-no answer.
MongoDB is just another tool for storing and organising data, like SQL databases, or the file system on your PC. You can use multiple tools to get a project done!
Rather than migrating a project fully from a relational model to MongoDB's document model, you could consider migrating just the data that's giving you trouble, or the data that just doesn't fit the relational model. No one ever said you have to have a single database server in your application. Many real-world apps need multiple, different ones. Modern frameworks can handle this, and also abstract the tasks of using and querying different database types, so code redesign and rewrite would be minimised.
At CIVIC, we've had decades of expertise with databases, from the humble beginnings of the web. We know what it takes to get your web application storing the right data, in the right way. Talk to us to find out how we can help you with yours!