As those of you who have used MongoDB
extensively have probably experienced, it is an incredibly powerful and performant tool for performing CRUD operations on a single document or collection. One of the main reasons your team might be using it is because of the speed and scale at which it can operate!
"MongoDB is an open source NoSQL database management program. NoSQL (Not only SQL) is used as an alternative to traditional relational databases. NoSQL databases are quite useful for working with large sets of distributed data. MongoDB is a tool that can manage document-oriented information, store or retrieve information". – Tech Target
Put simply, MongoDB is a database that uses a format similar to JSON
, known as BSON, to store documents. Unlike in SQL
, where each table has a strongly defined schema, a collection can either have no schema, a schema controlled explicitly by application logic or a strong/loosely defined JSON schema, allowing for a wide range of applications.
In a traditional SQL
database, we generally design our data to be stored in the third normal form and to have strongly defined relationships between the data. A lot of application developers who are new to MongoDB
attempt to use the same paradigm for their data design, which leads to a lot of inefficiencies and additional $lookups (Joins in the SQL
world). Using MongoDB
, you can avoid a lot of future issues by adopting the first normal form of data design instead.
MongoDB has several advantages over a more traditional SQL database:
BSON
) allows for a more seamless transition between database and code logic, especially if you are using Node.js
as the formats of your objects will be similar.MongoDB
’s horizontal scaling makes it easy (and often cheap) to scale your application as it grows. It excels at both reads and writes, allows you to select whether you read from a primary or secondary node, and allows you to specify write concerns for optimal performance.MongoDB
has a shallower learning curve than a traditional SQL
database, especially if they are already familiar with JSON
and JavaScript
.Even if you follow a first normal form design for your data, sooner or later you will need to do multiple document and collection queries. While MongoDB offers powerful querying capabilities via the aggregation pipeline, the syntax can be quite challenging for someone that isn’t familiar with it. In addition, it will generally require the query to be built into the codebase (although it is possible to construct a query engine on your frontend application).
This is where more traditional databases appear to have an advantage: they allow a user to input SQL
statements to query data across one or more tables. This is especially important when compiling reports for key business decisions, frontend graphs and dashboards. SQL
is the ubiquitous query language that most people who work with data understand and use. Wouldn’t it be great if we could have the best of both worlds, i.e., MongoDB
’s fast and scalable basic CRUD
along with the reporting power of SQL
?
Here at Synatic, this is exactly the question we asked and the problem we set out to solve, culminating in the creation of NoQL. NoQL is an open source library that we built, which will construct either a filter query
or an aggregation pipeline
from a SQL
statement. You can then use them in your application to retrieve the data. It's a great starting point for your queries that can then be customized further.
You can even get started right now using NoQL
without installing anything! Simply head on over to our playground, write a SQL
statement and hit convert
. This will generate a filter query
or an aggregation pipeline
that you can copy into your codebase and test out!
Let's say we are building a basic e-commerce system and we would like to build some reports to improve our business and keep things running.
💡 NOTE: Please note there are plenty of aspects missing from the following data model for simplicity's sake.Let's suppose our customer object looks like this:
import { ObjectId } from "mongodb";
export interface Customer {
_id: ObjectId;
emailAddress: string;
mobileNumber: string;
firstName: string;
lastName: string;
cart: Cart;
billingAddress?: Address?;
shippingAddress?: Address?;
orders?: ObjectId[]?;
}
export interface Address {
id: string;
line1: string;
line2: string;
suburb: string;
city: string;
postalCode: string;
}
export interface Cart {
products: {
productId: ObjectId;
dateAdded: Date;
}[];
}
Note here that all of this information for a customer
would be in a single document in a single collection. Contrast this with a traditional SQL
implementation where you would instead have the following tables:
From a design and conceptual perspective, this approach is a lot simpler to code, test, evolve and understand!
Now let's say our products look as follows:
export interface Product {
_id: ObjectId;
title: string;
description: string;
stock: {
sku: string;
size: "small" | "medium" | "large";
currentCount: number;
price: Price;
}[];
gallery?: {
imageUrl: string[];
altText: string;
}[];
category:
| "Shoes"
| "Pants"
| "Dresses"
| "Shirts"
| "Socks"
| "Tops"
| "Underwear"[];
}
export interface Price {
currency: "ZAR" | "ETH" | "BTC";
amount: number;
}
Once again, our data can be stored in a single document in a single collection. Lastly, we have our order information and discount information, each in a single document:
export interface Order {
_id: ObjectId;
deliveryType: "Pickup" | "Delivery";
products: {
priceAtPurchase: Price;
productId: ObjectId;
quantity: number;
sku: string;
}[];
discount?: DiscountCode;
totalExcludingVat: Price;
vat: Price;
totalWithVat: Price;
address?: Address;
status:
| "Created"
| "Paid"
| "Out for delivery"
| "Ready for collection"
| "Completed";
createdDate: Date;
statusDate: Date;
paymentInfo: unknown;
}
export interface DiscountCode {
_id: ObjectId;
code: string;
used: boolean;
dateUsed?: Date;
usedByOrder?: ObjectId;
discountAmount?: number;
}
Now for the exciting part, let's query our data!
Find the first 100 customers who haven't made an order yet and get us their contact details so we can send them a discount code. Let's also get that in order of most recently created first since those customers might be more likely to buy.
SELECT emailAddress,
mobileNumber
FROM customers
WHERE size_of_array(orders) > 0
ORDER BY _id DESC
LIMIT 100
This query doesn't need to use the aggregation pipeline
, so it will result in the following Node.js
code snippet that we can use:
await db
.collection("customers")
.find(
{ $expr: { $gt: [{ $size: "$orders" }, 0] } },
{ emailAddress: "$emailAddress", mobileNumber: "$mobileNumber" }
)
.sort({ _id: -1 })
.limit(100)
.toArray();
Some things to note:
size_of_array
is one of the custom functions that NoQL
provides to make working with MongoDB
data a lot easier, since standard SQL
doesn't really have ways to query arrays and JSON
objects (BSON
in the case on MongoDB
). For a full list of functions, you can check out our API documentation on SQL syntax_id
can be used as a "Created Date" as they inherently have a timestamp component. Just be aware that if you are passing in arguments when manually constructing an ObjectId
this won't always hold true.Find all the products that are running low on stock so that we know what to order:
SELECT title,
(SELECT * FROM stock WHERE currentCount <= 2) AS lowStockItems
FROM products
WHERE stock.currentCount <= 2
Which again doesn't need the aggregation pipeline and results in the following JS code:
db.collection("products")
.find(
{ currentCount: { $lte: 2 } },
{
title: "$title",
lowStockItems: {
$map: {
input: {
$filter: {
input: "$stock",
cond: { $and: [{ $lte: ["$$this.currentCount", 2] }] },
},
},
in: "$$this",
},
},
}
)
.toArray();
However, should we want to, we can force it to use an aggregation pipeline:
await db
.collection(products)
.aggregate([
{
$match: {
"stock.currentCount": {
$lte: 2,
},
},
},
{
$project: {
title: "$title",
lowStockItems: {
$map: {
input: {
$filter: {
input: "$stock",
cond: {
$and: [
{
$lte: ["$$this.currentCount", 2],
},
],
},
},
},
in: "$$this",
},
},
},
},
])
.toArray();
Let's refine Report #2 a bit more. Say we only want to get the products running low on stock where we have had a paid-for order in the last 7 days, since, perhaps, we only want to restock items that are in demand:
SELECT p.title,
(SELECT * FROM p.stock WHERE currentCount <= 2) AS lowStockItems,
unset(_id)
FROM orders o
INNER JOIN "products|unwind" p on o.products.productId = p._id
WHERE o.createdDate > SUBTRACT(CURRENT_DATE(), 7 * 24 * 60 * 60 * 1000)
unset(_id)
: depending on the options you pass in to NoQL
, by default, it will return the ObjectId
of the document you are working with; this command essentially removes it.|unwind
: when you perform a join, the $lookup
command that is used under the hood will return an array; since, in this case, we just want to flatten it into an object, we provide the unwind
keyword.
CURRENT_DATE()
: this function returns the current date as a date object.SUBTRACT
: in this context, this function will subtract the number value (7 days, specified in milliseconds) from the current date.Slight change in SQL, but quite a dramatic difference for the aggregation pipeline
output, especially since this query has to be an aggregation pipeline. Writing this pipeline (or similar) by hand has quite a steep learning curve and really showcases how powerful NoQL
is:
await db
.collection("orders")
.aggregate([
{
$project: {
o: "$$ROOT",
},
},
{
$lookup: {
from: "products",
as: "p",
localField: "o.products.productId",
foreignField: "_id",
},
},
{
$unwind: {
path: "$p",
preserveNullAndEmptyArrays: true,
},
},
{
$match: {
p: {
$ne: null,
},
},
},
{
$match: {
$expr: {
$gt: [
"$o.createdDate",
{
$subtract: [
"$$NOW",
{
$multiply: [
{
$multiply: [
{
$multiply: [
{
$multiply: [7, 24],
},
60,
],
},
60,
],
},
1000,
],
},
],
},
],
},
},
},
{
$unset: ["_id"],
},
{
$project: {
title: "$p.title",
lowStockItems: {
$map: {
input: {
$filter: {
input: "$p.stock",
cond: {
$and: [
{
$lte: ["$$this.currentCount", 2],
},
],
},
},
},
in: "$$this",
},
},
},
},
])
.toArray();
If you would like to use the library directly in your project you can follow along here!
Step 1: install the npm package
npm i @synatic/noql
Step 2: integrate it into your code
const SQLParser = require("@synatic/noql");
const { MongoClient } = require("mongodb");
(async () => {
try {
client = new MongoClient("mongodb://127.0.0.1:27017");
await client.connect();
const db = client.db("noql-test");
const parsedSQL = SQLParser.parseSQL("select id from `films` limit 10");
if (parsedSQL.type === "query") {
console.log(
await db
.collection(parsedSQL.collection)
.find(parsedSQL.query || {}, parsedSQL.projection || {})
.limit(parsedSQL.limit || 50)
.toArray()
);
} else if (parsedSQL.type === "aggregate") {
console.log(
await db
.collection(parsedSQL.collections[0])
.aggregate(parsedSQL.pipeline)
.toArray()
);
}
} catch (exp) {
console.error(exp);
}
})();
Be aware of how you are utilizing the library. Since it uses both SQL
and NoSQL
, you need to be aware of and guard against SQL injection attacks and NoSQL injection attacks.