|
|
Django ORM performance patchDjango ORM is simple and obvious, but when from tutorial models project moved to real-world models, same ORM often becomes headache. I work on big Django-project that currently have 278 models in 40 applications with a lot of relations between them. For such number of models For example following query it's standard way to receive with 1 SQL query and then show 10 last blog entries titles with author names and avatars.
The problem is in SQL query that Django ORM will generate for this code – it will load from database all fields of A lot of resources wasted here: it's not cheap to select from database big textfield, then transfer utf-8 bytes to web-server, convert from utf-8 to unicode, allocate kilobytes of memory in python process, and finally run garbage collector that will find that those already megabytes of memory (for each object from object list for each client request) were not need at all and it safe now to spend some more system resources for deleting these never used objects. Current Special lightweight models (when you bind to same database table several Django models with different field subsets of a heavy model) results in code duplication and require to implement lightweight models for all related heavy models also, and actually it's not web development done right. Returning back to old good SQL is possible but it's "not right" and in this case you also lose business logic already defined in models. I checked out queryset-ref Django branch and didn't find solution for above problem. This branch makes step ahead and allows to control what related models to join and what don't to join, but it don't give me what I need – full control on the loaded fields without loosing advantages of models. Then I found 2 Django tickets #5420: Allow database API users to specify the fields to exclude in a SELECT statement and #5768: Allow QuerySet.values() to return values spanning joins. This tickets address same problem but unfortunately it were assigned to nobody, so I created the patch and going to attach it to above tickets. You can download the patch here, it is quite simple but allows to write queries that loads only those fields that you requested (and plan to use in view or template). Above example with new
The patch include
Both And a final note about lazy loading of field, that was mentioned in the Django ticket #5420. Proposed patch don't support and from my experience of using Django in big project lazy field loading is a dangerous feature. When different developers write views and templates and same code can be modified by many developers, it's very easy to find eventually in a template single not-optimal line of code that for objects list from 10 items will generate 20 heavy SQL queries. It may look like:
With lazy field loading we can have same problem. For example frontend developer can add to a template I tested the patch with Django trunk in my environment and all 234 available tests were passed. If the patch will broke something in other environments, please let me know in comments. |
Web log, research lab and soft parade of Dima Dogadaylo.
Email: entropyhacker at gmail dot com Table of Contents |
| [about | blog | projects]
© Dima Dogadaylo |
|
Lazy field loading is potentially an annoyance, sure, but everything still works (albeit slower) and the fix is simple (add the extra columns to the query).
Your alternative, though, generates half-loaded model instances. The very idea makes my skin crawl. If you call save() on that instance, you could very well wipe out the other columns on that record.
Django's ORM may very well be smart enough to only update columns that have changed; I don't know off the top of my head. Even if it is, I still don't like it.
I'm using geodjango, which has its own patched version of _get_sql_clause, so the additional patch is needed like this:
--- query.py 2008-05-15 20:59:07.421875000 +0200
+++ query.py.orig 2008-05-15 20:50:34.875000000 +0200
@@ -7,7 +7,6 @@
from django.contrib.gis.db.models.fields import GeometryField, PointField
# parse_lookup depends on the spatial database backend.
from django.contrib.gis.db.backend import gqn, parse_lookup, SpatialBackend
-from django.db.models.query import _get_model_fields
from django.contrib.gis.geos import GEOSGeometry, Point
# Shortcut booleans for determining the backend.
@@ -74,7 +73,7 @@
# QuerySet.extra(select=foo) because extra() adds an an _additional_
# field to be selected. Used in returning transformed geometries, and
# handling the selection of native database geometry formats.
- for f in _get_model_fields(opts, "", self._fields, self._select):
+ for f in opts.fields:
# Getting the selection format string.
if hasattr(f, '_geom'):
sel_fmt = self._geo_fmt
@@ -112,8 +111,7 @@
fill_table_cache(opts, select, tables, where,
old_prefix=opts.db_table,
cache_tables_seen=[opts.db_table],
- max_depth=self._max_related_depth,
- fields=self._fields)
+ max_depth=self._max_related_depth)
# Add any additional SELECTs.
if self._select: