Django Model Descriptors

How can we enhance our Django model fields to act beyond simple database types to encapsulate their associated business logic? Leveraging Python's descriptor protocol, we provide additional processing on retrieval and update to allow more re-usable fields.

Python Descriptors

What are Python Descriptors? They're one of the least understood aspects of the language, but I think of them like class-based properties. Just as you can provide a __call__ method on a class so that it acts like a function, you can supply a __get__ (or __set__ or __delete__) and the class will similarly act more like a property.

This allows us to capture common property definitions in a re-usable construct that we can apply as generic functionality to our models. If you're read the post on Django Model Behaviors, you understand how to package common field definitions into a shared model behavior that you can combine to assemble your models from more atomic functionality. Leveraging descriptors, you can perform a similar deconstruction but this time by creating re-usable fields with enhanced logic without being tied to a given model definition.

Sample Code

All the sample code for this project is available on a GitHub repository. Feel free to use it as the basis for your own experimentation.

Enhance your Models with Properties

A common use of properties is to perform calculations or manipulations on a given field and provide that as additional information.

For example, let's start with this Bookmark model that features a url field:

from django.db import models


class Bookmark(models.Model):
    url = models.URLField()

It's very simple and straightforward. Now let's say we want to extract some additional information about the stored url, say it's domain/hostname so we can show a favicon on our listing. Since the url already contains the hostname, we can add a method or a property to our model which retrieves the url, and then parses out the hostname.

import urlparse
from django.db import models


class Bookmark(models.Model):
    url = models.URLField()

    @property
    def hostname(self):
        return urlparse.urlparse(self.url).hostname

Now, when we're accessing our bookmark model instances, we can simple refer to obj.hostname. This pattern works great for many common derived or computed values. No reason to store redundant information when we can calculate it on the fly from other model attributes.

>>> boomark = Bookmark(url='http://example.com/abcd')

>>> # Let's access our hostname property
>>> bookmark.hostname
'example.com`

But as we add more and more computed values, these properties tend to overload our models, cluttering our codebase. Plus, we are frequently adding the same properties again and again, repeating the same functionality for each field. What we'd like is a way to associate a given property with the model field that derives it.

Descriptors as Reusable Properties

Rather than define a bunch of properties, what we'd rather do is something like: obj.url.hostname, and encapsulate all the derived logic on the url field. Yet when you access a model instance's field, it's simply returned just like the original python type (str/unicode in the case of a URLField). These additional properties would be transparent to the rest of the code.

So, we need to intercept the value returned by accessing the field on the instance and enhance it with our custom properties. This is where the descriptor protocol comes into play. It gives us a chance to intercept and substitute the value returned when an attribute is accessed on a given class instance.

To make this work, we actually need two components. First, we need our descriptor which performs the intercept and substitution. Second, we need a proxy model that acts like the original datatype, but is augmented with the additional custom properties.

class URLFieldProxy(unicode):
    @property
    def hostname(self):
        return urlparse.urlparse(self).hostname


class ProxyFieldDescriptor(object):
    def __init__(self, field_name, proxy_class):
        self.field_name = field_name
        self.proxy_class = proxy_class

    def __get__(self, instance=None, owner=None):
        # grab the original value before we proxy
        value = instance.__dict__[self.field_name]
        if value is None:
            # We can't proxy a None through a unicode sub-class
            return value
        return self.proxy_class(value)

    def __set__(self, instance, value):
        instance.__dict__[self.field_name] = value

Our Proxy class, URLFieldProxy, sub-classes the base type that the field would return (unicode in this case). That ensures we have the same base datatype. And then it adds a property for calculating the derived value(s).

Our Descriptor class, ProxyFieldDescriptor, is a regular python object, with the magic methods __get__ and __set__ that define a descriptor. It features a constructor that takes a two arguments, the name of the field we want to intercept, and the proxy class we want to substitute. The __get__ implementation looks up the original value of the intercepted field using instance.__dict__ and then instantiates our proxy class passing in that original value. We special case None since our proxy can't handle that type directly (None values can't have properties). __set__ simply stores the value directly without modification.

Before we talk through how to attach this descriptor to your django model field, let's put our implementation through some paces to show how a descriptor operates.

class SomeObject(object):
    # Let's add our descriptor on the `url` field substituting `URLFieldProxy`
    wormhole = ProxyFieldDescriptor('url', URLFieldProxy)

    def __init__(self, url):
        self.url = url

Remember that we defined our descriptor to look up it's real value from the passed in argument name. It then returns the Proxy model that adds the computed properties. Let's see it in action:

>>> obj = SomeObject('http://example.com/asdf')

>>> # Normal attribute access still works
>>> obj.url
'http://example.com/asdf'

>>> # Does obj.url have a hostname property?
>>> obj.url.hostname
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

>>> # What about accessing our descriptor field?
>>> obj.wormhole
u'http://example.com/asdf'

>>> # Let's access the descriptor's property
>>> obj.wormhole.hostname
u'example.com'

# As you can see, the descriptor returns our proxy
>>> type(obj.wormhole)
__main__.URLFieldProxy

# But the proxy still *acts* like our original url attribute
>>> obj.wormhole == obj.url
True

Customize Django Model Fields with Descriptors

Now, we need to customize the Django URLField to hook in to the descriptor class. Thankfully, django provides an accessible mechanism under a method named contribute_to_class that allows you to customize how a field class is attached to its model instance. The specifics of how this method works are convoluted and related to the metaclass magic that django model definitions perform.

Since contribute_to_class is used by django when building the model instance, we simply attach our descriptor in the place of the field's name on our constructed model instance using setattr() with self.name representing the name of the field ("url" in our case).

class HostnamedURLField(models.URLField):
    def contribute_to_class(self, cls, name):
        super(HostnamedURLField, self).contribute_to_class(cls, name)
        # Add our descriptor to this field in place of of the normal attribute
        setattr(cls, self.name, ProxyFieldDescriptor(self.name, URLFieldProxy))

Finally, let's update our model to use our customized url field. Notice our model definition knows nothing about hostnames, it's all encapsulated in our customized URLField.

class Bookmark(models.Model):
    url = HostnamedURLField()

With the model all wired up with our descriptor wielding field, let's see if it works.

>>> bookmark = Bookmark(url='http://example.com/asdf')

>>> bookmark.url
u'http://example.com/asdf'

>>> bookmark.url.hostname
u'example.com'

>>> # You can assign to the descriptor field too
>>> bookmark.url = u'http://somewhere.com/else'

>>> # And it still keeps all the same semantics
>>> bookmark.url.hostname
u'somewhere.com'

There you go, we have a successfully migrated a computed property from the model definition to the field definition. This helps keep common functionality encapsulated, allowing greater re-use and cleaner separation of concerns.

Descriptors for Alternative Representations

Besides just augmenting fields with computed properties, descriptors can also be used to provide additional representations for your model data.

One of my favorite database modeling patterns is to use timestamps for boolean flags. If a timestamp is NULL, it's off (or disabled). When a timestamp is set, the flag is enabled. This gives you both a status (on/off) as well as a history of when the flag was enabled.

In most uses, we're only concerned with the status, so a boolean data type is more representative. But certain views would like to know this date information (for say a management overview).

Let's create another simple model to walk through such a use case:

from django.db import models


class BlogPost(models.Model):
    content = models.TextField()
    published_at = models.DateTimeField(null=True, default=None)

With a published_at field, we can show visitors posts that have been marked ready for publication, as well as sort our blog posts based on their publication date. This same pattern works great for any moderation task, including comments, account setup, or content release.

With our model, we can create a few posts and then publish them by setting our timestamp field to a non-NULL value.

>>> from django.utils import timezone
>>> post = BlogPost.objects.create()
>>> post.published_at is not None
False
>>> BlogPost.objects.filter(published_at__isnull=False)
[]
>>> # Now let's set our published flag
>>> post.published_at = timezone.now()
>>> post.save()
>>> post.published_at is not None
True
>>> BlogPost.objects.filter(published_at__isnull=False)
[<BlogPost: BlogPost object>]

Creating a Hybrid Boolean/Timestamp Descriptor

Using a datetime field gives us that extra information, but it makes working with such fields less intuitive. You have to know it's a datetime value rather than a boolean. What if we could use descriptors to modify the returned field to act like a boolean?

Let's create a descriptor class that proxies our datetime value, but acts like a boolean.

class TimestampedBooleanDescriptor(object):
    def __init__(self, name):
        self.name = name

    def __get__(self, instance=None, owner=None):
        return instance.__dict__[self.name] is not None

    def __set__(self, instance, value):
        value = bool(value)
        if value != self.__get__(instance):
            if value:
                instance.__dict__[self.name] = timezone.now()
            else:
                instance.__dict__[self.name] = None

It handles two cases, the first is on __get__ where it checks its stored timestamp against None. The other case is when __set__ is called, it checks the input value (a boolean) and if it's changed, either sets or clears the internal datetime representation.

Let's build an example to use this descriptor.

class SomeObject(object):
    # Let's add our descriptor on the `timestamp` field
    boolean = TimestampedBooleanDescriptor('timestamp')

    def __init__(self, timestamp=None):
        self.timestamp = timestamp

And let's put our sample object through some paces:

>>> obj = SomeObject()
>>> obj.timestamp is None
True
>>> obj.boolean
False
>>> obj.timestamp = timezone.now()
>>> obj.boolean
True
>>> obj.boolean = False
>>> obj.timestamp is None
True
>>> obj.boolean = True
>>> obj.timestamp
datetime.datetime(2014, 12, 6, 21, 34, 12, 872457, tzinfo=<UTC>)

Through the descriptor protocol, we were able to extend the functionality of our fields to both support a datetime and a boolean interface.

Now, let's extend the built-in DateTimeField to use our descriptor class to provide both interfaces. Our field adds two capabilities: first, we want to add a second property on the given model for the boolean access (it defaults to is_<field_name>). Second, we again use contribute_to_class to add our additional boolean property on the model class.

class TimestampedBooleanField(models.DateTimeField):
    """
    A Boolean field that also captures the timestamp when the value was set.

    This field stores a timestamp in the database when set.  It can be accessed
    as a boolean using the property argument (when not provided, it defaults to
    is_{field_name}).
    """
    def __init__(self, *args, **kwargs):
        self.property_name = kwargs.pop('property', None)
        kwargs['null'] = True
        super(TimestampedBooleanField, self).__init__(*args, **kwargs)

    def contribute_to_class(self, cls, name):
        super(TimestampedBooleanField, self).contribute_to_class(cls, name)
        # Use the defined boolean property name or pick a default
        property_name = self.property_name or 'is_{0}'.format(name)
        setattr(cls, property_name, TimestampedBooleanDescriptor(self.name))

Let's now update our BlogPost model to use this timestamp field.

class BlogPost(models.Model):
    content = models.TextField()
    published_at = TimestampedBooleanField(property='is_published')

With the model all wired up with our descriptor wielding field, let's see if it works.

>>> post = BlogPost()

>>> post.published_at is None
True

>>> post.is_published
False

>>> post.is_published = True
>>> post.published_at
datetime.datetime(2014, 12, 6, 21, 47, 17, 191879, tzinfo=<UTC>)

# Still works after reloading from the DB
>>> post.save()
>>> post2 = BlogPost.objects.get(pk=post.pk)
>>> post2.is_published
True
>>> post2.published_at
datetime.datetime(2014, 12, 6, 21, 47, 17, 191879, tzinfo=<UTC>)

There you go, depending on how you want to access your model, you can choose to use either the timestamp attribute or the boolean. All of this is encapsulated in a descriptor class, keeping your model definition clean and intuitive.

Summary

There's numerous opportunities to leverage descriptors to enhance your model fields. Django itself uses them to handle File Upload fields since they're stored in the database as pathnames to the file, but provided as a file-like proxies to your django code.

There's plenty of additional uses for descriptors. Please leave a comment (or tweet, etc) if you have addditional ideas or examples where descriptors can improve your model code.