Introduction to search options for Django
Search has become a basic feature for almost every application enabled on the web. Users now expect the ability to introduce a phrase and be taken to accurate results in a timely manner. Since Django projects live on the web, there's no escaping you'll eventually need to deal with search so your Django projects have a minimum level of search functionality.
Django basic search: Model queries & indexes.
Because a Django project's data is stored primarily in a relational database, a basic approach to search consists of using Django model queries with the support of indexes. For example, let's say you have a coffeehouse application with a
Store Django model and want to allow users the ability to search for stores by city.
The first step for this process is to create an HTML form so users can introduce a city, similar to the following snippet
<form>Search for stores in: <input type="text" name="city"/></form>. This form would then be submitted to a view method that extracts the
city field and executes a query like
Store.objects.filter(city=user_provided_city). The results of the query would then be passed to a template to generate a display for the user who initiated the search.
An important aspect that's often forgotten using this Django search approach is to use indexes on the model fields on which queries are performed. Indexes can drastically improve search times because they provide dedicated structures on which to perform queries, something that can be critical for models with either a lot of fields or a large amount of objects (i.e. database table rows). For more details on database indexes in general see https://en.wikipedia.org/wiki/Database_index and for details on creating indexes for Django models see the earlier Django models chapter.
The biggest issue you'll face for searches with standard Django queries & indexes, is they can quickly break down in terms of relevance & performance for more demanding searches. Searches for finite values like city names or true/false values can be done efficiently in this manner because relational database indexes are specially designed to deal with such scenarios. But suppose you now want to allow users to search for
Item objects that contain certain keywords (e.g.
vegan) in a more open-ended
Although the Django model API supports query operators such as
icontains that can solve this problem, these type of Django queries get converted into very inefficient SQL queries (e.g.
Item.objects.filter(description__contains='vegan') gets translated into
'SELECT ... WHERE description LIKE "%vegan%";').
The underlying problem of queries that use SQL keywords such as
LIKE is that most relational databases ignore regular indexes for these types of queries. Since these types of search queries require inspecting the entire text contents of a column, the search is done directly on the column and regular indexes become a moot point. So if you have thousands of
Items that have a description field, which in itself contains hundreds of words and you're trying to search a single word across all descriptions, it can be a very time consuming query.
So just because you can get away using Django model queries to create more open ended searches doesn't mean it's a good choice. For searches spanning more than a couple of hundreds records with open ended text, a better technique is to use full text search.
Django full text search: Postgres contrib & Haystack (Solr, Elasticsearch, Whoosh and Xapian)
Full text search on relational databases requires an entirely different approach than search on fields with a limited set of values (e.g. cities, sizes). The first thing you need to realize is that full text search also use indexes, but not regular indexes which is what relational databases generally use and Django models can create for you out of the box.
Full text search requires full text indexes. In very simple terms, a full text index consists of splitting open ended text values (e.g.The quick brown fox jumps over the lazy dog) into keywords (e.g.quick, brown) and creating an index from the latter values to use when a search query is made on the text. Because a full text search index strips stop words (e.g.the) and is a dedicated structure containing the most relevant keywords, full text searches become more efficient vs. directly making a search on the full text.
In addition, full text search often use stemming, a process that consists of adding equivalent words (e.g.jumps, jumped, jumping) to match a single word (e.g. jump) in order to increase the scope of results. On top of this, full text search also often uses metrics like scores and ranks to classify the most relevant results for a given search term (e.g. assign more relevance to results with words appearing toward the start or more than once in the full text).
In essence, full text search can become very complex compared to standard database queries with regular indexes. Full text search has grown in complexity to the point it varies depending on the relational database brand (e.g. Muscle, Postgres) and there are also completely separate products -- that work alongside relational databases -- known as 'search engine' platforms specifically designed to deal with full text search.
Django in its out of box state only supports full text search for Postgres databases through the
django.contrib.postgres.search module, which means you can use full text search features in Django without having to tweak or configure Postgres directly. Although Django supports other relational databases (e.g. MySQL, Oracle) that can do full text search, Django in itself doesn't support full text search for these brands, which means you need to take additional steps to use full text search with Django and these other relational databases (e.g. create full text indexes manually, create raw SQL statements to run full text searches).
When it comes to Django support for search engine platforms designed to do full text search, the leading choice is a Django package called Haystack. Haystack in itself isn't a search engine platform, but rather standardizes access to search engine platforms in Django. Haystack supports four leading search engine platforms: Solr, Elasticsearch, Whoosh and Xapian.
In essence, Haystack is to full text search in Django, what Django's built-in model API is to relational databases, it shields Django full text search logic to operate across any search engine platform supported by Haystack. This allows you to write full text search logic that isn't search engine platform specific and if you want your Django projects to use a different search engine platform in the future, Haystack allows you to easily make this change, just like the Django built-in model API allows you to easily change a Django project's relational database.
Django public search discovery: Sitemaps & robots file
In addition to supporting search functionality for users of your Django applications, another important aspect related to search in Django is search discovery. Even if your Django applications offer the best search and full text search for current users, new users depend on the ability of your applications being discovered, which is where public search engines (e.g. Google, Bing, DuckDuckGo) come into the picture.
Although it's sometimes only a question of time for public search engines to discover a Django application's content, the process can be made easier -- or restricted -- if you follow certain search engines guidelines. To make search discovery easier, search engines rely on a sitemap, which is a file that contains a web site's various URLs including their characteristics (e.g. how often content changes, relative weight to other URLs). To restrict search discovery, search engines rely on a robots file, which is a file that contains instructions for search engine bots to not crawl certain or all sections of a web site.
Django offers the ability to create sitemaps through the
django.contrib.sitemaps module, which in turn allows the exposure of site URLs based on Django urls & models (e.g. A
Store model's URLs
/stores/2/). Since a robots file isn't as data driven as a sitemap, you can just create a flat file and serve it as a Django static file under a site's main directory.