October 20, 2023

No Reporting? No Problem!

As those of you who have used MongoDB extensively have probably experienced, it is an incredibly powerful and performant tool for performing CRUD operations on a single document or collection. One of the main reasons your team might be using it is because of the speed and scale at which it can operate!

What is MongoDB?

"MongoDB is an open source NoSQL database management program. NoSQL (Not only SQL) is used as an alternative to traditional relational databases. NoSQL databases are quite useful for working with large sets of distributed data. MongoDB is a tool that can manage document-oriented information, store or retrieve information". – Tech Target

Put simply, MongoDB is a database that uses a format similar to JSON, known as BSON, to store documents. Unlike in SQL, where each table has a strongly defined schema, a collection can either have no schema, a schema controlled explicitly by application logic or a strong/loosely defined JSON schema, allowing for a wide range of applications.

In a traditional SQL database, we generally design our data to be stored in the third normal form and to have strongly defined relationships between the data. A lot of application developers who are new to MongoDB attempt to use the same paradigm for their data design, which leads to a lot of inefficiencies and additional $lookups (Joins in the SQL world). Using MongoDB, you can avoid a lot of future issues by adopting the first normal form of data design instead.

Why would you choose to use MongoDB?

MongoDB has several advantages over a more traditional SQL database:

The JSON-like data structure (BSON) allows for a more seamless transition between database and code logic, especially if you are using Node.js as the formats of your objects will be similar.
As mentioned above, the schema system has more flexibility and this becomes especially powerful as your data structure evolves, allowing you to deploy changes without having to do tricky data migrations (although this is still possible if/when required).
MongoDB’s horizontal scaling makes it easy (and often cheap) to scale your application as it grows. It excels at both reads and writes, allows you to select whether you read from a primary or secondary node, and allows you to specify write concerns for optimal performance.
New developers who haven’t used any databases before find that MongoDB has a shallower learning curve than a traditional SQL database, especially if they are already familiar with JSON and JavaScript.

The reporting problem

Even if you follow a first normal form design for your data, sooner or later you will need to do multiple document and collection queries. While MongoDB offers powerful querying capabilities via the aggregation pipeline, the syntax can be quite challenging for someone that isn’t familiar with it. In addition, it will generally require the query to be built into the codebase (although it is possible to construct a query engine on your frontend application).

This is where more traditional databases appear to have an advantage: they allow a user to input SQL statements to query data across one or more tables. This is especially important when compiling reports for key business decisions, frontend graphs and dashboards. SQL is the ubiquitous query language that most people who work with data understand and use. Wouldn’t it be great if we could have the best of both worlds, i.e., MongoDB’s fast and scalable basic CRUD along with the reporting power of SQL?

Enter NoQL!

Here at Synatic, this is exactly the question we asked and the problem we set out to solve, culminating in the creation of NoQL. NoQL is an open source library that we built, which will construct either a filter query or an aggregation pipeline from a SQL statement. You can then use them in your application to retrieve the data. It's a great starting point for your queries that can then be customized further.

You can even get started right now using NoQL without installing anything! Simply head on over to our playground, write a SQL statement and hit convert. This will generate a filter query or an aggregation pipeline that you can copy into your codebase and test out!

Writing a basic report query

Let's say we are building a basic e-commerce system and we would like to build some reports to improve our business and keep things running.

Let's suppose our customer object looks like this:

import { ObjectId } from "mongodb";

export interface Customer {
  _id: ObjectId;
  emailAddress: string;
  mobileNumber: string;
  firstName: string;
  lastName: string;
  cart: Cart;
  billingAddress?: Address?;
  shippingAddress?: Address?;
  orders?: ObjectId[]?;
}

export interface Address {
  id: string;
  line1: string;
  line2: string;
  suburb: string;
  city: string;
  postalCode: string;
}

export interface Cart {
  products: {
    productId: ObjectId;
    dateAdded: Date;
  }[];
}

Note here that all of this information for a customer would be in a single document in a single collection. Contrast this with a traditional SQL implementation where you would instead have the following tables:

Customers
CustomerAddresses
CustomerCarts
CustomerOrderMapping (Many-to-many)

From a design and conceptual perspective, this approach is a lot simpler to code, test, evolve and understand!

Now let's say our products look as follows:

export interface Product {
  _id: ObjectId;
  title: string;
  description: string;
  stock: {
    sku: string;
    size: "small" | "medium" | "large";
    currentCount: number;
    price: Price;
  }[];
  gallery?: {
    imageUrl: string[];
    altText: string;
  }[];
  category:
    | "Shoes"
    | "Pants"
    | "Dresses"
    | "Shirts"
    | "Socks"
    | "Tops"
    | "Underwear"[];
}

export interface Price {
  currency: "ZAR" | "ETH" | "BTC";
  amount: number;
}

Once again, our data can be stored in a single document in a single collection. Lastly, we have our order information and discount information, each in a single document:

export interface Order {
  _id: ObjectId;
  deliveryType: "Pickup" | "Delivery";
  products: {
    priceAtPurchase: Price;
    productId: ObjectId;
    quantity: number;
    sku: string;
  }[];
  discount?: DiscountCode;
  totalExcludingVat: Price;
  vat: Price;
  totalWithVat: Price;
  address?: Address;
  status:
    | "Created"
    | "Paid"
    | "Out for delivery"
    | "Ready for collection"
    | "Completed";
  createdDate: Date;
  statusDate: Date;
  paymentInfo: unknown;
}

export interface DiscountCode {
  _id: ObjectId;
  code: string;
  used: boolean;
  dateUsed?: Date;
  usedByOrder?: ObjectId;
  discountAmount?: number;
}

Now for the exciting part, let's query our data!

Report #1

Find the first 100 customers who haven't made an order yet and get us their contact details so we can send them a discount code. Let's also get that in order of most recently created first since those customers might be more likely to buy.

SELECT  emailAddress,
        mobileNumber
FROM customers
WHERE size_of_array(orders) > 0
ORDER BY _id DESC
LIMIT 100

This query doesn't need to use the aggregation pipeline, so it will result in the following Node.js code snippet that we can use:

await db
  .collection("customers")
  .find(
    { $expr: { $gt: [{ $size: "$orders" }, 0] } },
    { emailAddress: "$emailAddress", mobileNumber: "$mobileNumber" }
  )
  .sort({ _id: -1 })
  .limit(100)
  .toArray();

Some things to note:

size_of_array is one of the custom functions that NoQL provides to make working with MongoDB data a lot easier, since standard SQL doesn't really have ways to query arrays and JSON objects (BSON in the case on MongoDB). For a full list of functions, you can check out our API documentation on SQL syntax
_id can be used as a "Created Date" as they inherently have a timestamp component. Just be aware that if you are passing in arguments when manually constructing an ObjectId this won't always hold true.

Report #2

Find all the products that are running low on stock so that we know what to order:

SELECT  title,
        (SELECT * FROM stock WHERE currentCount <= 2) AS lowStockItems
FROM products
WHERE stock.currentCount <= 2

Which again doesn't need the aggregation pipeline and results in the following JS code:

db.collection("products")
  .find(
    { currentCount: { $lte: 2 } },
    {
      title: "$title",
      lowStockItems: {
        $map: {
          input: {
            $filter: {
              input: "$stock",
              cond: { $and: [{ $lte: ["$$this.currentCount", 2] }] },
            },
          },
          in: "$$this",
        },
      },
    }
  )
  .toArray();

However, should we want to, we can force it to use an aggregation pipeline:

await db
  .collection(products)
  .aggregate([
    {
      $match: {
        "stock.currentCount": {
          $lte: 2,
        },
      },
    },
    {
      $project: {
        title: "$title",
        lowStockItems: {
          $map: {
            input: {
              $filter: {
                input: "$stock",
                cond: {
                  $and: [
                    {
                      $lte: ["$$this.currentCount", 2],
                    },
                  ],
                },
              },
            },
            in: "$$this",
          },
        },
      },
    },
  ])
  .toArray();

Report #3

Let's refine Report #2 a bit more. Say we only want to get the products running low on stock where we have had a paid-for order in the last 7 days, since, perhaps, we only want to restock items that are in demand:

SELECT  p.title,
        (SELECT * FROM p.stock WHERE currentCount <= 2) AS lowStockItems,
        unset(_id)
FROM orders o
INNER JOIN "products|unwind" p on o.products.productId = p._id
WHERE o.createdDate > SUBTRACT(CURRENT_DATE(), 7 * 24 * 60 * 60 * 1000)

What's happening here?

unset(_id): depending on the options you pass in to NoQL, by default, it will return the ObjectId of the document you are working with; this command essentially removes it.
|unwind: when you perform a join, the $lookup command that is used under the hood will return an array; since, in this case, we just want to flatten it into an object, we provide the unwind keyword.
- if you want to work with the result as an array, this isn't required.
- you can also pick either the first or the last item in the array instead.
CURRENT_DATE(): this function returns the current date as a date object.
SUBTRACT: in this context, this function will subtract the number value (7 days, specified in milliseconds) from the current date.

Slight change in SQL, but quite a dramatic difference for the aggregation pipeline output, especially since this query has to be an aggregation pipeline. Writing this pipeline (or similar) by hand has quite a steep learning curve and really showcases how powerful NoQL is:

await db
  .collection("orders")
  .aggregate([
    {
      $project: {
        o: "$$ROOT",
      },
    },
    {
      $lookup: {
        from: "products",
        as: "p",
        localField: "o.products.productId",
        foreignField: "_id",
      },
    },
    {
      $unwind: {
        path: "$p",
        preserveNullAndEmptyArrays: true,
      },
    },
    {
      $match: {
        p: {
          $ne: null,
        },
      },
    },
    {
      $match: {
        $expr: {
          $gt: [
            "$o.createdDate",
            {
              $subtract: [
                "$$NOW",
                {
                  $multiply: [
                    {
                      $multiply: [
                        {
                          $multiply: [
                            {
                              $multiply: [7, 24],
                            },
                            60,
                          ],
                        },
                        60,
                      ],
                    },
                    1000,
                  ],
                },
              ],
            },
          ],
        },
      },
    },
    {
      $unset: ["_id"],
    },
    {
      $project: {
        title: "$p.title",
        lowStockItems: {
          $map: {
            input: {
              $filter: {
                input: "$p.stock",
                cond: {
                  $and: [
                    {
                      $lte: ["$$this.currentCount", 2],
                    },
                  ],
                },
              },
            },
            in: "$$this",
          },
        },
      },
    },
  ])
  .toArray();

How you can integrate it into your project

If you would like to use the library directly in your project you can follow along here!

Step 1: install the npm package

npm i @synatic/noql

Step 2: integrate it into your code

const SQLParser = require("@synatic/noql");
const { MongoClient } = require("mongodb");

(async () => {
  try {
    client = new MongoClient("mongodb://127.0.0.1:27017");
    await client.connect();
    const db = client.db("noql-test");

    const parsedSQL = SQLParser.parseSQL("select id from `films` limit 10");
    if (parsedSQL.type === "query") {
      console.log(
        await db
          .collection(parsedSQL.collection)
          .find(parsedSQL.query || {}, parsedSQL.projection || {})
          .limit(parsedSQL.limit || 50)
          .toArray()
      );
    } else if (parsedSQL.type === "aggregate") {
      console.log(
        await db
          .collection(parsedSQL.collections[0])
          .aggregate(parsedSQL.pipeline)
          .toArray()
      );
    }
  } catch (exp) {
    console.error(exp);
  }
})();

Caveat

Be aware of how you are utilizing the library. Since it uses both SQL and NoSQL, you need to be aware of and guard against SQL injection attacks and NoSQL injection attacks.