Optimizing Django Database Queries – Part 2

In the previous post about optimizing Django database queries I showed how to detect and improper use of indexes and what Django creates by default. Today I’d like to show a better usage of indexing on a potentially expensive operation which is text searching.

Task: Search an Author By Name

As you might remember the previous model, we had Author model in our system that consisted of 2 character fields. We might want to implement a search in our system that would try to match authors based on the last name. So when user starts typing “kin” they’ll be presented with authors like “King”, “Kinsley”, “Viking” etc. One way of doing this is to make regular search using

SELECT first_name, last_name FROM authors WHERE UPPER(last_name) LIKE UPPER('%kin%');

For large datasets this will probably be very inefficient. What we can do is to make use of indexing to speed up searches. The proper type of index for this type of queries is GIN index with support for trigram operations.

What is GIN index? It stands for Generalized Inverted index and is used to search across composite values like text, geographical points etc.

And what is a trigram? It’s a part of text consisting of 3 letters. So “king” will be split into “kin” and “ing”. The advantage of this type of search is that is more “resistant” to typos, since when the user searches for “kinh”, there would still be a match with “king”. Moreover, it allows us to sort results by the relevance. If we actually have author named “Kinh” in the database, we can sort results and display “Kinh”, “King”, since the first one will be more relevant. as it matches 2 out of 2 trigrams and the second result only 1.

How to Use of Trigrams And GIN Indexes in Django And PostgreSQL?

First, we need to create GIN and trigram extensions. For this we create a proper migrations:

# Generated by Django 3.1.3 on 2020-11-30 21:05

from django.db import migrations
from django.contrib.postgres.operations import TrigramExtension, BtreeGinExtension

class Migration(migrations.Migration):

    dependencies = [
        ('reviews', '0002_auto_20201129_1839'),

    operations = [

Afterwards we need to add the extensions package for postgres to the INSTALLED_APPS setting:




After applying it we can see the extensions in our database:

btree_gin and pg_trgm extensions enabled in Pgadmin

Before we apply the index, we just need to run a few test queries to ensure that it made the effect we desired. In order to do so, we populate the database with 250 00 records. We’ll be searching for string “ari” with 3 different methods”

  1. Simple substring search, case insensitive,
  2. Trigram similarity search,
  3. Trigram similarity with ordering the results.

In each case I’ll print the result SQL query and query plan generated with EXPLAIN ANALYZE.

First, substring search, no indexing:

In [50]: print(Author.objects.filter(last_name__icontains='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE UPPER("reviews_author"."last_name"::text) LIKE UPPER(%ari%)

In [59]: print(Author.objects.filter(last_name__icontains='bar').explain(analyze=True))
Seq Scan on reviews_author  (cost=0.00..5433.28 rows=10358 width=16) (actual time=0.018..58.630 rows=846 loops=1)
  Filter: (upper((last_name)::text) ~~ '%BAR%'::text)
  Rows Removed by Filter: 258106
Planning Time: 0.046 ms
Execution Time: 58.662 ms

Second, trigram similarity search

In [55]: print(Author.objects.filter(last_name__trigram_similar='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE "reviews_author"."last_name" % ari

In [60]: print(Author.objects.filter(last_name__trigram_similar='bar').explain(analyze=True))
Gather  (cost=1000.00..4478.96 rows=259 width=16) (actual time=0.555..80.710 rows=698 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Parallel Seq Scan on reviews_author  (cost=0.00..3453.06 rows=152 width=16) (actual time=0.503..78.743 rows=349 loops=2)
        Filter: ((last_name)::text % 'bar'::text)
        Rows Removed by Filter: 129127
Planning Time: 0.039 ms
Execution Time: 80.740 ms

And finally trigram with ordering:

In [58]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari
    ...: ')).filter(similar__gt=0).order_by('-similar').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name", SIMILARITY("reviews_author"."last_name", ari) AS "similar" FROM "reviews_author" WHERE SIMILARITY("reviews_author"."last_name", ari) > 0.0 ORDER BY "similar" DESC

In [61]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'bar')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort  (cost=12725.93..12941.72 rows=86317 width=20) (actual time=168.214..168.876 rows=14235 loops=1)
  Sort Key: (similarity((last_name)::text, 'bar'::text)) DESC
  Sort Method: quicksort  Memory: 1485kB
  ->  Seq Scan on reviews_author  (cost=0.00..5649.07 rows=86317 width=20) (actual time=0.022..165.806 rows=14235 loops=1)
        Filter: (similarity((last_name)::text, 'bar'::text) > '0'::double precision)
        Rows Removed by Filter: 244717
Planning Time: 0.052 ms
Execution Time: 169.319 ms

Applying GIN Index for Optimizing the Database Query

In order to create an index, we need to run the following command:

CREATE INDEX "reviews_aut_last_name_gin" ON "reviews_author" USING gin ("last_name" gin_trgm_ops);

What is important here, is that we need to pass the “opclasses”, which in this case is gin_trgm_ops. Otherwise Postgres won’t use the index for searching. I’ve spent hours banging my head against the wall untill I discovered what was wrong. You can read more about opclasses from the additional resources linked below.

Luckily, the newest version of the ORM allows to create this directly on the model.

class Author(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        indexes = [
            GinIndex(name='review_author_ln_gin_idx', fields=['last_name'], opclasses=['gin_trgm_ops'])

Now we can create the migration and apply it using python manage.py makemigrations && python manage.py migrate. Having done this, we can finally measure the performance again:

In [4]: print(Author.objects.filter(last_name__icontains='ari').explain(analyze=True))
Seq Scan on reviews_author  (cost=0.00..5433.28 rows=10358 width=16) (actual time=0.213..69.108 rows=4456 loops=1)
  Filter: (upper((last_name)::text) ~~ '%ARI%'::text)
  Rows Removed by Filter: 254496
Planning Time: 0.972 ms
Execution Time: 69.247 ms

In [5]: print(Author.objects.filter(last_name__trigram_similar='ari').explain(analyze=True))
Bitmap Heap Scan on reviews_author  (cost=38.01..717.84 rows=259 width=16) (actual time=3.792..6.918 rows=431 loops=1)
  Recheck Cond: ((last_name)::text % 'ari'::text)
  Rows Removed by Index Recheck: 2985
  Heap Blocks: exact=1375
  ->  Bitmap Index Scan on review_author_ln_gin_idx  (cost=0.00..37.94 rows=259 width=0) (actual time=3.681..3.682 rows=3416 loops=1)
        Index Cond: ((last_name)::text % 'ari'::text)
Planning Time: 0.989 ms
Execution Time: 6.979 ms

In [6]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort  (cost=12725.93..12941.72 rows=86317 width=20) (actual time=183.890..185.161 rows=27356 loops=1)
  Sort Key: (similarity((last_name)::text, 'ari'::text)) DESC
  Sort Method: quicksort  Memory: 2868kB
  ->  Seq Scan on reviews_author  (cost=0.00..5649.07 rows=86317 width=20) (actual time=0.023..179.401 rows=27356 loops=1)
        Filter: (similarity((last_name)::text, 'ari'::text) > '0'::double precision)
        Rows Removed by Filter: 231596
Planning Time: 0.069 ms
Execution Time: 185.979 ms

As you can see, the performance improved just in one case, but it was considerable. From about 80 ms to about 6.

Final thoughts

To sum up, we have seen how can we improve the common task, most of us have probably come across in our developers’ lives. You need to remember, however, that with great speed come great costs. Indexes take space, make creation longer, and, as we’ve seen both in this and in previous post, can have no impact on your database performance. Therefore, you should always measure whether your index did help you and ensure that it was implemented correctly. Otherwise, you’ll end up like me, looking for answers for hours 😉

I am planning to do 1 or 2 more post on the issue, so if I got you attention, you might want to subscribe to the newsletter to get notified about the newest articles.

Additional Resources:

See the Latest Posts: