CRUD operations with multiple records and Django models

Problem

You want to make CRUD operations for multiple database records that rely on Django models.

Solution

To create multiple records associated with a Django model use the bulk_create() method with an argument list of model instances. Be aware the bulk_create method does not support pre-save and post-save model signals or models that span multiple tables (i.e. have relationships among one another). To overcome the limitations of the bulk_create() method, you can use the standard save() method and manage transactions manually to speed up a multiple record creation operation.

To read multiple records using a Django model you can use the all(),filter() and exclude() methods, all of which can be chained to create more complex queries. The all(), filter() and exclude() methods produce a QuerySet data type which has a built-in cache mechanism and is lazy (i.e. it doesn't hit the database until certain triggers are reached).

To update multiple database records you can use the update() method with a set of arguments to update, as well the select_for_update() method which acquires a lock on the query rows that gives you time to perform complex updates that can't be expressed in a single statement. To delete multiple database records you can call the delete() method to a query.

How it works

Working with multiple records in Django models is just as easy as working with single records that were described in the previous recipe. However, because working with multiple records can entail multiple database calls, other factors such as caching and bulk operations need to be taken into account to minimize execution times.

Create multiple records with bulk_create()

To create multiple records based on a Django model you can use the built-in bulk_create() method. The advantage of the bulk_create() method is that it creates all entries in a single query, so it's very efficient if you have a list of a dozen or a hundred entries you wish to create. Listing 1 illustrates the process to create multiple records for the Store model.

Listing 1 - Create multiple records with a Django model using the bulk_create() method

# Import Django model class
from coffeehouse.stores.models import Store

# Create model Store instances
store_corporate = Store(name='Corporate',address='624 Broadway',city ='San Diego',state='CA',email='corporate@coffeehouse.com')
store_downtown = Store(name='Downtown',address='Horton Plaza',city ='San Diego',state='CA',email='downtown@coffeehouse.com')
store_uptown = Store(name='Uptown',address='240 University Ave',city ='San Diego',state='CA',email='uptown@coffeehouse.com')
store_midtown = Store(name='Midtown',address='784 W Washington St',city ='San Diego',state='CA',email='midtown@coffeehouse.com')

# Call bulk_create to create records in a single call
Store.objects.bulk_create([store_corporate,store_downtown,store_uptown,store_midtown])

As you can see in listing 1, the bulk_create() method accepts a list of model instances to create all records in a single step.

As efficient as the bulk_create() method is, you should be aware it has certain limitations:

In case you face some of the limitations outlined for the bulk_create() method, then the only alternative you have is to loop over each record and use the save() method to create each entry individually. Listing 2 shows this process.

Listing 2 - Create multiple records with a Django model using the save() method

# Import Django model class
from coffeehouse.stores.models import Store

# Create a model Store instances
store_corporate = Store(name='Corporate',address='624 Broadway',city ='San Diego',state='CA',email='corporate@coffeehouse.com')
store_downtown = Store(name='Downtown',address='Horton Plaza',city ='San Diego',state='CA',email='downtown@coffeehouse.com')
store_uptown = Store(name='Uptown',address='240 University Ave',city ='San Diego',state='CA',email='uptown@coffeehouse.com')
store_midtown = Store(name='Midtown',address='784 W Washington St',city ='San Diego',state='CA',email='midtown@coffeehouse.com')

# Create store list
store_list = [store_corporate,store_downtown,store_uptown,store_midtown]

# Loop over each store and invoke save() on each entry
for store in store_list:
# save() method called on each member to create record
    store.save()

As I mentioned at the start, the process in listing 2 can be highly inefficient particularly if its done for dozens or hundreds of records, but sometimes it's the only option for creating multiple records in bulk. However, the speed issues related to listing 2 can be dramatically improved if you manually deal with database transactions.

Listing 3 illustrates how to use the save() method and group the entire record creation process in a single transaction to speed up the bulk creation process.

Listing 3 - Create multiple records with a Django model using the save() method in a single transaction

# Import Django model class
from coffeehouse.stores.models import Store
from django.db import transaction

# Create a model Store instances
store_corporate = Store(name='Corporate',address='624 Broadway',city ='San Diego',state='CA',email='corporate@coffeehouse.com')
store_downtown = Store(name='Downtown',address='Horton Plaza',city ='San Diego',state='CA',email='downtown@coffeehouse.com')
store_uptown = Store(name='Uptown',address='240 University Ave',city ='San Diego',state='CA',email='uptown@coffeehouse.com')
store_midtown = Store(name='Midtown',address='784 W Washington St',city ='San Diego',state='CA',email='midtown@coffeehouse.com')

# Create store list
first_store_list = [store_corporate,store_downtown]
second_store_list = [store_uptown,store_midtown]

# Trigger atomic transaction so loop is executed in a single transaction
with transaction.atomic():
    # Loop over each store and invoke save() on each entry
    for store in first_store_list:
    	# save() method called on each member to create record
	store.save()

# Call bulk_store_creator with Store list
bulk_store_creator(second_store_list)

# Method is decorated with @transaction.atomic to ensure all logic is executed in a single transaction
@transaction.atomic
def bulk_store_creator(store_list):
    # Loop over each store and invoke save() on each entry
    for store in store_list:
    	# save() method called on each member to create record
	store.save()        

As you can see in listing 3, there are two ways to create bulk operations in a single database transaction, both using the django.db.transaction package.The first instance uses the with transaction.atomic(): statement, so any nested code within this statement is run in a single transaction. The second instance uses the @transaction.atomic method decorator, by decorating a method this way all of its operations are run in a single transaction.

Note Be careful with explicit transactions

There is a reason Django's default database transaction mechanism creates transactions for every single query, it's to err on the safe side and minimize the potential for data loss.

If you do decide to use explicit transactions as illustrated in listing 3 to improve performance, be aware that either all records will be created or none will. Although this might even be a desired behavior, for certain circumstances it might lead to unexpected results. Make sure you understand the implications of transactions on the data you're working with.

Read multiple records with all(), filter() or exclude()

To read multiple records associated with a Django model you can use several methods, which include all(), filter() and exclude(). The purpose of the all() method should be self explanatory, it retrieves all the records of a given model. The filter() method is used to restrict query results on a given model property, for example filter(state='CA') is a query to get all model records with state='CA'. And the exclude() method is used to execute a query that excludes records on a given model property, for example exclude(state='AZ') is a query to get all model records except those with state='AZ'.

It's also possible to chain filter() and exclude() methods to create more complex multiple record queries. For example, filter(state='CA').exclude(city='San Diego') is a query to get all model records with state='CA' and exclude those with city='San Diego'. Listing 4 illustrates more multiple record query examples.

Listing 4 - Read multiple records with a Django model with all(), filter() and exclude() methods

# Import Django model class
from coffeehouse.stores.models import Store

# Query with all() method or equivalent SQL: 'SELECT * FROM ...'
all_stores = Store.objects.all()

# Query with include() method
san_diego_stores = Store.objects.filter(city='San Diego') or equivalent SQL: 'SELECT....WHERE city = "San Diego"'

# Query with exclude() method or equivalent SQL: 'SELECT....WHERE NOT (city = "San Diego")'
non_san_diego_stores = Store.objects.exclude(city='San Diego')

# Query with include() and exclude() methods or equivalent SQL: 'SELECT....WHERE STATE='CA' AND NOT (city = "San Diego")'
ca_stores_without_san_diego = Store.objects.filter(state='CA').exclude(city='San Diego')
Note Append .query to view the actual SQL

Sometimes it can be helpful or even necessary to view the actual SQL executed by a Django model query. You can do so by appending .query to a query, as illustrated in the following listing:.


from coffeehouse.stores.models import Store

import logging
stdlogger = logging.getLogger(__name__)

# Get the Store records with city San Diego
san_diego_stores = Store.objects.filter(city='San Diego')
stdlogger.debug("Query %s" % str(san_diego_stores.query))

# Get the Store records without city San Diego
non_san_diego_stores = Store.objects.exclude(city='San Diego')
print str(non_san_diego_stores.query)

As you can see in the previous snippet, you can output the SQL query to a Python logger or use the 'quick & dirty' print statement. Note that .query only works with queries that output QuerySet's (i.e. it doesn't work with queries with the get() method). Take a look at the recipe Set up logging for a Django project on how to set up a Python logger.

Note More complex mutliple record queries with OR, AND, IN, LIKE and other variations

In addition to the previous multiple record queries that match exact criteria (e.g. WHERE city='San Diego' or WHERE state='CA') -- Django also offers a wide range of options for finer grained multiple record queries. For example, matches for case insensitive strings, number ranges, lists and dates.

See the recipe Django model queries classified by SQL keywords for a classification of the different Django query options classified by SQL keyword.

All the queries in listing 4 have a common behavior, their end result is always a Django QuerySet data type. The QuerySet data type is produced for all Django model queries that use either the all(), filter() or exclude() methods. So before we move on to the final two parts of this recipe -- how to update and delete multiple records -- we'll take a brief detour to explore the QuerySet data type.

Understanding a QuerySet: Lazy evaluation & caching

The first important characteristic of a QuerySet data type is technically known as lazy evaluation. This means a QuerySet isn't executed against the database right away, it's just lazy until its evaluated. In other words, the act of running a snippet like all_stores = Store.objects.all() doesn't involve any database activity. Listing 5 illustrates how you can even chain query after query and still not trigger any database activity.

Listing 5 - Django chained queries to illustrate concept of lazy evaluation.

# Import Django model class
from coffeehouse.stores.models import Store

# Query with all() method
stores = Store.objects.all()
# Chain filter() method on query
stores = stores.filter(state='CA')
# Chain exclude() method on query
stores = stores.exclude(city='San Diego')

As you can see in listing 5, there are three different statements chained using the all(),filter() and exclude() methods. In the end, listing 5 makes a query for Store records with state='CA' and excludes those with city='San Diego'. Though it can appear listing 5 makes three database calls, it actually doesn't make any.

So when does a query made up of a QuerySet data type hit the database ? There are many triggers that make a QuerySet evaluate and invoke an actual database call. Table 1 illustrates the various triggers.

Table 1 - Django QuerySet evaluation triggers that invoke an actual database call
Evaluation TriggerDescriptionExample
IterationCreating a loop on a QuerySet triggers a DB call.
for store in Store.objects.all():
Slicing with 'step' argument Slicing a QuerySet with a third argument (a.k.a. 'step' or 'stride' argument) triggers a DB call. NOTE: Slicing a Queryset with 1 or 2 arguments just creates another QuerySet
# A list of every 5th record, for the first 100 records
Store.objects.all()[:100:5]
# This does NOT trigger a DB hit, (2 arguments)
Store.objects.all()[49:99] # Records 50 to 100
Store.objects.all()[5:] # All records starting from the 6th
# This does NOT trigger a DB hit (1 argument)
Store.objects.all()[0] # First record
Pickling*Pickling a QuerySet forces all the results to be loaded into memory prior to pickling.
import pickle
stores = Store.objects.all()
pickled_stores = pickle.dumps(stores)
repr() methodCalling repr() on a QuerySet triggers a DB call. NOTE: This is for convenience in the Python interactive interpreter, so you can immediately see results when using the API interactively.
repr(Store.objects.all())
len() methodCalling len() on a QuerySet triggers a DB call. NOTE: If you only want the number of records, it's more efficient to use the count() method.
total_stores = len(Store.objects.all())
#NOTE: The count() method is more efficient to get a total count
efficient_total_stores = Store.objects.count()
list() methodCalling list() on a QuerySet triggers a DB call.
store_list = list(Store.objects.all())
Boolean tests (bool(), or, and or if statements)Making a boolean test on a QuerySet triggers a DB call. NOTE: If you only want to check if a record exists, it's more efficient to use the exists() method.
# Check if there's a store with city='San Diego'
if Store.objects.filter(city='San Diego'):
# There is a store in 'San Diego'
pass
#NOTE: The exists() method is more efficient for a boolean check
san_diego_stores = Store.objects.exists(city='San Diego')
* Pickling is Python's standard mechanism for object serialization, a process that converts a Python object into a character stream. The character stream contains all the information necessary to reconstruct the object at a later time. Pickling in the context of Django queries is typically used for heavyweight queries in an attempt to save resources (e.g. make a heavyweight query, pickle it and on subsequent occasions consult the pickled query). You can consider pickling Django queries a rudimentary form of caching.

Now that you know about the evaluation triggers that cause a QuerySet to make a call to a DB, let's take a final look at another important QuerySet subject: caching.

Every QuerySet contains a cache to minimize database access which is empty when the QuerySet is created. But the first time a QuerySet is evaluated and the actual database query takes place -- see evaluation triggers in table 1 -- Django saves the results in the QuerySet's cache for later use.

A QuerySet's cache is most useful when an application has a recurring need to use the same data, as it leads to less hits on a database. However, leveraging a QuerySet's cache comes with a few subtleties tied to the evaluation of a QuerySet. The rule of thumb is to first evaluate a QuerySet you plan to use more than once and then proceed to use its data to leverage the QuerySet cache. This is best explained with the examples presented in listing 6.

Listing 6 - Django QuerySet caching behavior.

# Import Django model class
from coffeehouse.stores.models import Store

# CACHE USED SEQUENCE
# Query awaiting evaluation
lazy_stores = Store.objects.all()
# Iteration triggers evaluation and hits database
store_emails = [store.email for store in lazy_stores]
# Uses QuerySet cache from lazy_stores, since lazy_stores is evaluated in previous line  
store_names = [store.name for store in lazy_stores] 

# NO CACHE USED SEQUENCE 
# Iteration triggers evaluation and hits database
heavy_store_emails = [store.email for store in Store.objects.all()]
# Iteration triggers evaluation and hits database again, because it's a different inline QuerySet  
heavy_store_names = [store.name for store in Store.objects.all()] 

# CACHE USED SEQUENCE
# Query wrapped as list() for immediate evaluation
stores = list(Store.objects.all()) 
# Uses QuerySet cache from stores
first_store  = stores[0] 
# Uses QuerySet cache from stores
second_store = stores[1] 
# Uses QuerySet cache from stores, set() is just used to eliminate duplicates
store_states = set([store.state for store in stores])
 # Uses QuerySet cache from stores, set() is just used to eliminate duplicates 
store_cities = set([store.city for store in stores])

# NO CACHE USED SEQUENCE 
# Query awaiting evaluation
all_stores = Store.objects.all()
# list() triggers evaluation and hits database
store_one = list(all_stores[0]) 
# list() triggers evaluation and hits database again, because partially evaluating a QuerySet does not populate the cache
store_one_again = list(all_stores[0]) 

# CACHE USE SEQUENCE 
# Query awaiting evaluation
coffee_stores = Store.objects.all()
# Iteration triggers evaluation and hits database
[store for store in coffee_stores] 
# Uses QuerySet cache from coffee_stores, because it's evaluated fully in previous line
store_1 = coffee_stores[0] 
# Uses QuerySet cache from coffee_stores, because it's already evaluated in full
store_1_again = coffee_stores[0] 

As you can see in the examples in listing 6, those sequences that leverage a QuerySet's cache trigger the evaluation of the QuerySet right away and then use a reference to the evaluated QuerySet to access the cached data. Those sequences that don't use a QuerySet cache, either constantly create identical QuerySet statements or make the evaluation process late and for each data assignment.

The only edge case for caching QuerySet's that doesn't exactly fit the previous behavior is the second to last example in listing 8. If you trigger a partial evaluation of QuerySet by slicing it (e.g. [0] or [1:5]) the cache is not populated. To ensure the use of a QuerySet cache you must fully evaluate a QuerySet and then you can slice the results, as illustrated in the last example in listing 8.

Update multiple records with update() or select_for_update().

In the previous recipe on single record CRUD operations, we explored how to update single records with the update() method, this same method can handle updating multiple records. This process is illustrated in listing 7.

Listing 7 - Django model update multiple records with the update() method

from coffeehouse.stores.models import Store

Store.objects.all().update(email="contact@coffeehouse.com")

from coffeehouse.stores.models import Drink
from django.db.models import F

Drink.objects.all().update(stock=F('stock') +100)

The first example in listing 7 uses the update() method to update all Store records and set their email value to contact@coffeehouse.com. The second example uses a Django F expression and the update() method to update all Drink records and set their stock value to the current stock value plus 100. For the moment, don't worry to much about Django F expressions, just realize they allow you to reference model fields within a query, which is necessary in this case to perform the update in a single operation.

Even though the update() method guarantees everything is done in a single operation to avoid race conditions, on certain occasions the update() method may not be enough to do complex updates. Offering another alternative is Django's select_for_update() method which locks rows on the given query until the update is marked as done. Listing 8 illustrates an example of the select_for_update() method.

Listing 8 - Update multiple records with a Django model using the select_for_update() method

# Import Django model class
from coffeehouse.stores.models import Store
from django.db import transaction

# Trigger atomic transaction so loop is executed in a single transaction
with transaction.atomic():
    store_list = Store.objects.select_for_update().filter(state='CA')
    # Loop over each store to update and invoke save() on each entry
    for store in store_list:
        # Add complex update logic here for each store
    	# save() method called on each member to update
	store.save()

# Call bulk_store_update to update store records
bulk_store_update()

# Method is decorated with @transaction.atomic to ensure all logic is executed in a single transaction
@transaction.atomic
def bulk_store_creator(store_list):
    store_list = Store.objects.select_for_update().exclude(state='CA')
    # Loop over each store and invoke save() on each entry
    for store in store_list:
        # Add complex update logic here for each store
    	# save() method called on each member to update
	store.save()        

Listing 8 shows two variations for select_for_update(), one using an explicit transaction and the other decorating a method to scope it inside a transaction. Both variations use the same logic, they first create a query with select_for_update(), then loop over the results to update each record and use save() to update individual records. In this manner the rows touched by the specified query remain locked from inadvertent changes until the transaction finishes.

Be aware that when using the select_for_update() it's absolutely necessary to use transactions using any of the techniques described in listing 8. If you run the select_for_update() method in a database that supports it and you don't use transactions as illustrated in listing 8 -- maintaining Django's default auto-commit mode -- Django throws a TransactionManagementError error because the rows cannot be locked as a group. Using the select_for_update method in a database that offers no support for it has no effect (i.e. you won't see an error).

Note select_for_update() support is database dependant

Under the hood the Django select_for_update() method is based on SQL's SELECT...FOR UPDATE syntax which is not supported by all databases. Postgres, Oracle and MySQL database support this functionality and SQLite does not support this functionality.

In addition, there's the special argument nowait (e.g. select_for_update(nowait=True) to make a query non-blocking. By default, if another transaction acquires a lock on one of the selected rows, the select_for_update() query blocks until the lock is released. If you use nowait, this allows a query to run right away and in case a conflicting lock is already acquired by another transaction the DatabaseError is raised when the QuerySet is evaluated. Be aware though MySQL does not support the nowait argument and if used with MySQL, Django throws a DatabaseError.

.

Delete multiple records with delete()

To delete multiple records you use the delete() method and append it to a query. Listing 9 illustrates this process.

Listing 9 - Django model delete with the delete() method on query

from coffeehouse.stores.models import Store

Store.objects.filter(city='San Diego').delete()

The example in listing 9 uses the delete() method to delete the Store records with city='San Diego'.