Model queries with raw (open-ended) SQL

As extensive as Django model queries are, there can be circumstances when neither of the options presented in previous sections is sufficient to execute certain CRUD operations. Under these circumstances, you must rely on raw (open-ended) SQL queries, which represent the most flexible approach to execute operations against databases connected to Django projects.

Django offers two ways to execute raw SQL queries. The first consists of using a model manager's raw() method that fits the resulting SQL query data into a Django model, which has the added advantage of raw SQL queries behaving as close as possible to native Django model queries. And a second approach which consists of using a Python database connection -- managed by Django -- to fetch the SQL query data and process it with lower level Python DB API functions (e.g. cursor)[6].

SQL queries with a model manager's raw() method

A model manager's raw() method should be your first option to execute raw SQL queries, since the results are structured as a RawQuerySet class instance, which is very similar to the QuerySet class instances produced by Django model queries you've used up to this point.

Thus a RawQuerySet class instance -- just like a QuerySet class instance -- offers an easy way to access records using a Django model's fields, the ability to defer the loading of model fields, as well as indexing and slicing. Listing 8-58 illustrates a series a raw SQL query performed with a model manager's raw() method.

Listing 8-58. Django model manager raw() method

from coffeehouse.items.models import Drink, Item

# Get all drink
all_drinks = Drink.objects.raw("SELECT * FROM items_drink")

# Confirm type
type(all_drinks)
# Outputs: <class 'django.db.models.query.RawQuerySet'>

# Get first drink with index 0
first_drink = all_drinks[0]

# Get Drink name (via item OneToOne relationship) 
first_drink.item.name

# Use parameters to limit a raw SQL query
caffeine_limit = 100

# Create raw() query with params argument to pass dynamic arguments
drinks_low_caffeine = Drink.objects.raw("SELECT * FROM items_drink where 
caffeine < %s",params=[caffeine_limit]);

The first snippet in listing 8-58 uses the Drink model manager's raw() method to issue the SELECT * FROM items_drink query. It's worth mentioning this raw query produces the same results as the native Django query Drink.objects.all(), but unlike native queries that produce QuerySet data structures, notice how raw() method queries produce a RawQuerySet data structure.

Because a RawQuerySet data structure is a subclass of QuerySet, listing 8-58 shows how it's possible to use many of the same mechanisms as QuerySet data structures. For example, access to records is also done by index (e.g. [0] to get the first elements) and it's also possible to access related models using dot notation.

Finally, the last example in listing 8-58 illustrates how to create raw SQL queries with dynamic arguments using the params argument. In all cases where you need to create raw() SQL queries that depend on dynamic values (e.g. provided by a user or another sub-routine) you should always create the backing raw SQL query string with placeholders -- %s -- which then get substituted through the params argument. In the case of listing 8-58, notice how the caffeine_limit variable is declared in params to later be substituted into the raw SQL query. The params argument ensures dynamic values are escaped from queries before being applied to the database, avoiding a potential SQL injection security attack[7].

The raw() SQL example in listing 8-58 is straightforward because the results of the query map directly to the intended model. In other words, the SELECT * FROM items_drink query produces the necessary results for Django to create Item records without additional help. Sometimes though, raw() SQL queries require additional configuration to be able to create the underlying model records.

For example, if you perform a raw SQL query on a legacy table or multiple tables, with the intention to use the raw() method of a certain model, you must ensure Django is able to interpret the results of raw SQL query to the model, by either using SQL AS statements in the raw SQL or relying on the raw() translations parameter. Listing 8-59 illustrates both techniques.

Listing 8-59. Django model manager raw() method with mapping, deferred fields and aggregate queries.

# Map results from legacy table into Item model 
all_legacy_items = Item.objects.raw("SELECT product_name AS name, product_description AS 
description from coffeehouse_products")

# Access legacy results as if they are standard Item model records
all_legacy_items[0].name

# Use explicit mapping argument instead of 'as' statements in SQL query
legacy_mapping = {'product_name':'name','product_description':'description'}

# Create raw() query with translations argument to map table results
all_legacy_items_with_mapping = Item.objects.raw("SELECT * from coffeehouse_products",
translations=legacy_mapping)

# Deferred model field loading, get item one with limited fields 
item_one = Item.objects.raw("SELECT id,name from items_item where id=1")

# Acess model fields not referenced in the raw query, just like QuerySet defer()
item_one[0].calories
item_one[0].price

# Raw SQL query with aggregate function added as extra model field
items_with_inventory = Item.objects.raw("SELECT *, sum(price*stock) as assets from items_item");

# Access extra field directly as part of the model
items_with_inventory[0].assets

The first example in listing 8-59 declares a raw() method with multiple SQL AS statements, in this case, each of the AS clauses corresponds to an Item model field. In this manner, when Django inspects the results of the raw query, it knows how to map the results to Item instances, irrespective of the underlying database table column names.

The second example in listing 8-59 declares a raw() method with the translations argument whose value is a Python dictionary that maps database table column names to Django model fields. In this case, when Django encounters an unknown database table column name in the raw query results, it uses the translations dictionary to determine how to map the results to Item instances.

The third example in listing 8-59 illustrates how even when issuing a partial raw SQL query with the raw() method, Django is capable of fetching missing fields as if it were a native QuerySet data structure. Finally, the fourth example in listing 8-59 illustrates how the raw() method is capable of handling extra fields declared as aggregation queries and how they become accessible as if they were added with the native model aggregate() method.

SQL queries with Python's DB API

Although the Django model raw() method offers a great alternative to create raw SQL queries and have the ability to leverage native Django model features, there are circumstances where raw SQL queries with a model's raw() method won't work. Either because the results of a raw SQL query can't be mapped to a Django model, or because, you simply want access to the raw data without any Django model influence.

Under such circumstances, you'll need to use the second Django alternative to perform raw SQL queries, which consists of directly connecting to a database and explicitly extracting the results of a query. Although this second Django alternative is technically the most flexible to interact with a database, it also requires using lower-level calls from Python's DB API.

The only thing you can leverage from Django when performing raw SQL queries using this technique is the database connection defined in a Django project (i.e. the DATABASES variable in settings.py). Once a database connection is established, you'll need to rely on Python DB API methods like cursor(), fetchone() and fetchall() -- as well as perform manual extraction of the results -- to be able to successfully run raw SQL queries.

Listing 8-60 illustrates SQL queries using the Python DB API in the context of Django.

Listing 8-60. Django raw SQL queries with connection() and low-level DB API methods

from django.db import connection

# Delete record
target_id = 1
with connection.cursor() as cursor:
    cursor.execute("DELETE from items_item where id = %s", [target_id])

# Select one record
salad_item = None
with connection.cursor() as cursor:
    cursor.execute("SELECT * from items_item where name='Red Fruit Salad'")
    salad_item = cursor.fetchone()
# DB API fetchone produces a tuple, where elements are accessible by index
salad_item[0] # id
salad_item[1] # name
salad_item[2] # description

# Select multiple records
all_drinks = None
with connection.cursor() as cursor:
    cursor.execute("SELECT * from items_drink")
    all_drinks = cursor.fetchall()
# DB API fetchall produces a list of tuples
all_drinks[0][0] # first drink id

The first statement in listing 8-60 imports django.db.connection which represents the default database connection defined in the DATABASES variable in settings.py. Once you have a connection reference to the Django database, you can start to make use of the Python DB API, which generally starts with the use of the cursor() method[8].

The first raw SQL query in listing 8-60 opens a cursor on the connection and executes the cursor.execute() method to perform a delete operation. Because delete queries don't return results, the operation is considered concluded after the calling the cursor.execute() method

Tip If you declare multiple database reference in DATABASES, you can can use the django.db.connections reference to create a cursor on a specific database, instad of the default:
from django.db import connections
cursor = connections['analytics'].cursor() # Cursor connects to 'analytics' DB

The second raw SQL query in listing 8-60 first declares the salad_item placeholder variable to store the results of the raw SQL query. Once this is done, another cursor is opened on the connection to execute a select operation using the same cursor.execute() method. Because select queries return a result, an additional call is made on the cursor.fetchone() method to extract the results of the query and assign them to the placeholder variable. Note the fetchone() method is used because it's expected the raw SQL query will return a single record result.

Next, observe how the results of raw SQL query in salad_item are accessed by index. Since the Python DB API cursor.fetchone() method makes no use of field names or other references, you have to know the order in which record fields are returned, a process that can be particularly cumbersome for raw SQL with many fields.

The third raw SQL query in listing 8-60 first declares the all_drinks placeholder variable to store the results of the raw SQL query. Once this is done, another cursor is opened on the connection to execute anther select operation using the same cursor.execute() method. Because select queries return a result, an additional call is made on the cursor.fetchall() method to extract the results of the query and assign them to the placeholder variable. Note the fetchall() method is used because it's expected the raw SQL query will return multiple record result.

Next, observe how the results of raw SQL query in all_drinks are accessed by multiple index. Since the Python DB API cursor.fetchall() method makes no use of field names or other references, the first index represents a record in the result and the second index represents a field value from a given record.

  1. https://www.python.org/dev/peps/pep-0249/    

  2. https://en.wikipedia.org/wiki/SQL_injection    

  3. https://en.wikipedia.org/wiki/Cursor_(databases)