Django ORM handles the simple things, but at a certain point either the ORM becomes too verbose or doesn’t support necessary features for your query such as LATERAL joins or Postgres specific features like json_agg and related operators.

So you have to write the SQL in a string literal and use the psycopg cursor exposed by Django’s connection object.

This works fine until you need to modify the query slightly, not a big change, just a small tweak like an extra condition in the where clause that you only add sometimes.

With an ORM, this is easy to handle since we can manipulate the query in code, e.g.:

complicated_query: QuerySet = ...
more_specific_query = complicated_query.filter(name__startswith="foo")

But with a string literal we’re forced to concat strings together manually, which is error prone and annoying.

So we can’t use Django ORM, because it isn’t expressive enough, but we still want an easy way to manipulate the queries in Python land.

How about a Python SQL DSL?

Pypika and its Problems

The first result I landed on was Pypika, which seemed decent, but I quickly ran into problems when trying to convert more substantial SQL queries into the DSL.

LATERAL join

Pypika doesn’t have built-in support for LATERAL joins; however, with some tweaking it can work:

SELECT m.name
FROM manufacturers m
LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;

I end up needing to define a new enum (can’t extend the existing JoinType enum) with the LATERAL variants.

from pypika import Table, Query, CustomFunction
from pypika.enums import JoinType
from enum import Enum

get_product_names = CustomFunction("get_product_names", ["column"])
tbl = Table("alpha")
beta = Table("beta")

class PGJoinType(Enum):
    left_join_lateral = "LEFT JOIN LATERAL"
    right_join_lateral = "RIGHT JOIN LATERL"

query = (
    Query.from_(tbl)
    .join(beta, how=PGJoinType.left_join_lateral)
    .on(beta.id == tbl.foo)
    .select("*")
)

ordering inside of aggregations

Pypika can’t express an ORDER BY inside of an aggregation.

SELECT
    answer_group.id,
    array_agg(answer.body ORDER BY answer.created DESC) answers
FROM answer_group
JOIN answer ON answer.answer_group_id = answer_group.id
GROUP BY
    answer_group.id;

lack of built-in tsvector support

SELECT
    t.id,
    t.title,
    ts_rank(tb.document, to_tsquery('tomato')) "rank"
FROM
    table_beta tb
    JOIN table_alpha t ON t.id = tb.table_alpha_id
WHERE
    tb.document @@ to_tsquery('tomato')
    AND tb.institution_id = 190

becomes rather verbose:

import enum

from pypika import CustomFunction, Field, Order, Parameter, Query, Table
from pypika import Query
from pypika.terms import BasicCriterion

to_tsvector = CustomFunction("to_tsvector", ["text"])
to_tsquery = CustomFunction("to_tsquery", ["text"])
ts_rank = CustomFunction("ts_rank", ["column", "query"])

tb = Table("table_beta")
t = Table("table_alpha")

class Comp(enum.Enum):
    match = "@@"

query = (
    Query.from_(tb)
    .join(t)
    .on(t.id == tb.table_alpha_id)
    .select(
        t.id,
        t.title,
        ts_rank(tb.document, to_tsquery("tomato")).as_("rank"),
    )
    .where(BasicCriterion(Comp.match, tb.document, to_tsquery("tomato")))
    .where(tb.institution_id == 190)
    .orderby(Field("rank"), order=Order.desc)
)

json indexing

SELECT id, json_agg(column_name) ->> 0 new_col_name
FROM table_name
GROUP BY id;

With no built-in support json_agg I defined a custom function but that didn’t work when I wanted to use the indexing operator ->>.

doesn’t handle concatenation

from pypika import Query, Table

table = Table("table")
concat_query = (
    Query.from_(table)
    .where(table.first_name + table.last_name == "j person")
    .select(table.id, table.email)
)

Generates an invalid query because Pypika doesn’t have column types:

SELECT "id", "email"
FROM "table"
WHERE "first_name" + "last_name" = 'j person'

escaping

Pypika doesn’t handle escaping queries properly:

from pypika import Query, Table

table = Table("table_name")

class SomeType:
    def __str__(self):
        return "true; drop table foo --"

query = (
    Query.from_(table)
    .where(table.name == SomeType())
    .select(table.id, table.name)
)
# SELECT "id","name" FROM "table_name" WHERE "name"=true; drop tables foo --

So you need to replace any user provided data with a Parameter.

from pypika import Query, Table

table = Table("table_name")

query = (
    Query.from_(table)
    .where(table.name == Parameter("%s"))
    .select(table.id, table.name)
)
# SELECT "id","name" FROM "table_name" WHERE "name"=%s

And then substitute using your client library.

SQLAlchemy - Core

Although SQLAlchemy has an ORM, we only need the DSL provided by the Core.

Unlike Pypika, SQLAlchemy’s Python DSL supports all the necessary idioms to fully convert an arbitrary, Postgres flavored SQL query.

So Many Features

Supports all the non-trivial features used in the queries I tested:

Conclusion

If you need an SQL DSL in Python, use SQLAlchemy Core.