How can we enhance our Django model fields to act beyond simple database types to encapsulate their associated business logic? Leveraging Python's descriptor protocol, we provide additional processing on retrieval and update to allow more re-usable fields.
Python Descriptors
What are Python Descriptors? They're one of the least
understood aspects of the language, but I think of them like class-based
properties. Just as you can provide a __call__
method on a
class so that it acts like a function, you can supply a __get__
(or
__set__
or __delete__
) and the class will similarly act more like a
property.
This allows us to capture common property definitions in a re-usable construct that we can apply as generic functionality to our models. If you're read the post on Django Model Behaviors, you understand how to package common field definitions into a shared model behavior that you can combine to assemble your models from more atomic functionality. Leveraging descriptors, you can perform a similar deconstruction but this time by creating re-usable fields with enhanced logic without being tied to a given model definition.
Sample Code
All the sample code for this project is available on a GitHub repository. Feel free to use it as the basis for your own experimentation.
Enhance your Models with Properties
A common use of properties is to perform calculations or manipulations on a given field and provide that as additional information.
For example, let's start with this Bookmark model that features a url
field:
from django.db import models
class Bookmark(models.Model):
url = models.URLField()
It's very simple and straightforward. Now let's say we want to extract some additional information about the stored url, say it's domain/hostname so we can show a favicon on our listing. Since the url already contains the hostname, we can add a method or a property to our model which retrieves the url, and then parses out the hostname.
import urlparse
from django.db import models
class Bookmark(models.Model):
url = models.URLField()
@property
def hostname(self):
return urlparse.urlparse(self.url).hostname
Now, when we're accessing our bookmark model instances, we can simple refer to
obj.hostname
. This pattern works great for many common derived or computed
values. No reason to store redundant information when we can calculate it on
the fly from other model attributes.
>>> boomark = Bookmark(url='http://example.com/abcd')
>>> # Let's access our hostname property
>>> bookmark.hostname
'example.com`
But as we add more and more computed values, these properties tend to overload our models, cluttering our codebase. Plus, we are frequently adding the same properties again and again, repeating the same functionality for each field. What we'd like is a way to associate a given property with the model field that derives it.
Descriptors as Reusable Properties
Rather than define a bunch of properties, what we'd rather do is something
like: obj.url.hostname
, and encapsulate all the derived logic on the url
field. Yet when you access a model instance's field, it's simply returned
just like the original python type (str/unicode in the case of a URLField).
These additional properties would be transparent to the rest of the code.
So, we need to intercept the value returned by accessing the field on the instance and enhance it with our custom properties. This is where the descriptor protocol comes into play. It gives us a chance to intercept and substitute the value returned when an attribute is accessed on a given class instance.
To make this work, we actually need two components. First, we need our descriptor which performs the intercept and substitution. Second, we need a proxy model that acts like the original datatype, but is augmented with the additional custom properties.
class URLFieldProxy(unicode):
@property
def hostname(self):
return urlparse.urlparse(self).hostname
class ProxyFieldDescriptor(object):
def __init__(self, field_name, proxy_class):
self.field_name = field_name
self.proxy_class = proxy_class
def __get__(self, instance=None, owner=None):
# grab the original value before we proxy
value = instance.__dict__[self.field_name]
if value is None:
# We can't proxy a None through a unicode sub-class
return value
return self.proxy_class(value)
def __set__(self, instance, value):
instance.__dict__[self.field_name] = value
Our Proxy class, URLFieldProxy
, sub-classes the base type that the field
would return (unicode
in this case). That ensures we have the same base
datatype. And then it adds a property for calculating the derived value(s).
Our Descriptor class, ProxyFieldDescriptor
, is a regular python object,
with the magic methods __get__
and __set__
that define a descriptor. It
features a constructor that takes a two arguments, the name of the field we
want to intercept, and the proxy class we want to substitute. The __get__
implementation looks up the original value of the intercepted field using
instance.__dict__
and then instantiates our proxy class passing in that
original value. We special case None
since our proxy can't handle that type
directly (None values can't have properties). __set__
simply stores the
value directly without modification.
Before we talk through how to attach this descriptor to your django model field, let's put our implementation through some paces to show how a descriptor operates.
class SomeObject(object):
# Let's add our descriptor on the `url` field substituting `URLFieldProxy`
wormhole = ProxyFieldDescriptor('url', URLFieldProxy)
def __init__(self, url):
self.url = url
Remember that we defined our descriptor to look up it's real value from the passed in argument name. It then returns the Proxy model that adds the computed properties. Let's see it in action:
>>> obj = SomeObject('http://example.com/asdf')
>>> # Normal attribute access still works
>>> obj.url
'http://example.com/asdf'
>>> # Does obj.url have a hostname property?
>>> obj.url.hostname
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
>>> # What about accessing our descriptor field?
>>> obj.wormhole
u'http://example.com/asdf'
>>> # Let's access the descriptor's property
>>> obj.wormhole.hostname
u'example.com'
# As you can see, the descriptor returns our proxy
>>> type(obj.wormhole)
__main__.URLFieldProxy
# But the proxy still *acts* like our original url attribute
>>> obj.wormhole == obj.url
True
Customize Django Model Fields with Descriptors
Now, we need to customize the Django URLField
to hook in to the descriptor
class. Thankfully, django provides an accessible mechanism under a method
named contribute_to_class
that allows you to customize how a field class is
attached to its model instance. The specifics of how this method works are
convoluted and related to the metaclass magic that django model definitions
perform.
Since contribute_to_class
is used by django when building the model
instance, we simply attach our descriptor in the place of the field's name on
our constructed model instance using setattr()
with self.name
representing
the name of the field ("url" in our case).
class HostnamedURLField(models.URLField):
def contribute_to_class(self, cls, name):
super(HostnamedURLField, self).contribute_to_class(cls, name)
# Add our descriptor to this field in place of of the normal attribute
setattr(cls, self.name, ProxyFieldDescriptor(self.name, URLFieldProxy))
Finally, let's update our model to use our customized url field. Notice our model definition knows nothing about hostnames, it's all encapsulated in our customized URLField.
class Bookmark(models.Model):
url = HostnamedURLField()
With the model all wired up with our descriptor wielding field, let's see if it works.
>>> bookmark = Bookmark(url='http://example.com/asdf')
>>> bookmark.url
u'http://example.com/asdf'
>>> bookmark.url.hostname
u'example.com'
>>> # You can assign to the descriptor field too
>>> bookmark.url = u'http://somewhere.com/else'
>>> # And it still keeps all the same semantics
>>> bookmark.url.hostname
u'somewhere.com'
There you go, we have a successfully migrated a computed property from the model definition to the field definition. This helps keep common functionality encapsulated, allowing greater re-use and cleaner separation of concerns.
Descriptors for Alternative Representations
Besides just augmenting fields with computed properties, descriptors can also be used to provide additional representations for your model data.
One of my favorite database modeling patterns is to use timestamps for boolean flags. If a timestamp is NULL, it's off (or disabled). When a timestamp is set, the flag is enabled. This gives you both a status (on/off) as well as a history of when the flag was enabled.
In most uses, we're only concerned with the status, so a boolean data type is more representative. But certain views would like to know this date information (for say a management overview).
Let's create another simple model to walk through such a use case:
from django.db import models
class BlogPost(models.Model):
content = models.TextField()
published_at = models.DateTimeField(null=True, default=None)
With a published_at
field, we can show visitors posts that have been marked
ready for publication, as well as sort our blog posts based on their
publication date. This same pattern works great for any moderation task,
including comments, account setup, or content release.
With our model, we can create a few posts and then publish them by setting our timestamp field to a non-NULL value.
>>> from django.utils import timezone
>>> post = BlogPost.objects.create()
>>> post.published_at is not None
False
>>> BlogPost.objects.filter(published_at__isnull=False)
[]
>>> # Now let's set our published flag
>>> post.published_at = timezone.now()
>>> post.save()
>>> post.published_at is not None
True
>>> BlogPost.objects.filter(published_at__isnull=False)
[<BlogPost: BlogPost object>]
Creating a Hybrid Boolean/Timestamp Descriptor
Using a datetime field gives us that extra information, but it makes working with such fields less intuitive. You have to know it's a datetime value rather than a boolean. What if we could use descriptors to modify the returned field to act like a boolean?
Let's create a descriptor class that proxies our datetime value, but acts like a boolean.
class TimestampedBooleanDescriptor(object):
def __init__(self, name):
self.name = name
def __get__(self, instance=None, owner=None):
return instance.__dict__[self.name] is not None
def __set__(self, instance, value):
value = bool(value)
if value != self.__get__(instance):
if value:
instance.__dict__[self.name] = timezone.now()
else:
instance.__dict__[self.name] = None
It handles two cases, the first is on __get__
where it checks its stored
timestamp against None. The other case is when __set__
is called, it checks
the input value (a boolean) and if it's changed, either sets or clears the
internal datetime representation.
Let's build an example to use this descriptor.
class SomeObject(object):
# Let's add our descriptor on the `timestamp` field
boolean = TimestampedBooleanDescriptor('timestamp')
def __init__(self, timestamp=None):
self.timestamp = timestamp
And let's put our sample object through some paces:
>>> obj = SomeObject()
>>> obj.timestamp is None
True
>>> obj.boolean
False
>>> obj.timestamp = timezone.now()
>>> obj.boolean
True
>>> obj.boolean = False
>>> obj.timestamp is None
True
>>> obj.boolean = True
>>> obj.timestamp
datetime.datetime(2014, 12, 6, 21, 34, 12, 872457, tzinfo=<UTC>)
Through the descriptor protocol, we were able to extend the functionality of our fields to both support a datetime and a boolean interface.
Now, let's extend the built-in DateTimeField
to use our descriptor class to
provide both interfaces. Our field adds two capabilities: first, we want to
add a second property on the given model for the boolean access (it defaults
to is_<field_name>
). Second, we again use contribute_to_class
to add our
additional boolean property on the model class.
class TimestampedBooleanField(models.DateTimeField):
"""
A Boolean field that also captures the timestamp when the value was set.
This field stores a timestamp in the database when set. It can be accessed
as a boolean using the property argument (when not provided, it defaults to
is_{field_name}).
"""
def __init__(self, *args, **kwargs):
self.property_name = kwargs.pop('property', None)
kwargs['null'] = True
super(TimestampedBooleanField, self).__init__(*args, **kwargs)
def contribute_to_class(self, cls, name):
super(TimestampedBooleanField, self).contribute_to_class(cls, name)
# Use the defined boolean property name or pick a default
property_name = self.property_name or 'is_{0}'.format(name)
setattr(cls, property_name, TimestampedBooleanDescriptor(self.name))
Let's now update our BlogPost
model to use this timestamp field.
class BlogPost(models.Model):
content = models.TextField()
published_at = TimestampedBooleanField(property='is_published')
With the model all wired up with our descriptor wielding field, let's see if it works.
>>> post = BlogPost()
>>> post.published_at is None
True
>>> post.is_published
False
>>> post.is_published = True
>>> post.published_at
datetime.datetime(2014, 12, 6, 21, 47, 17, 191879, tzinfo=<UTC>)
# Still works after reloading from the DB
>>> post.save()
>>> post2 = BlogPost.objects.get(pk=post.pk)
>>> post2.is_published
True
>>> post2.published_at
datetime.datetime(2014, 12, 6, 21, 47, 17, 191879, tzinfo=<UTC>)
There you go, depending on how you want to access your model, you can choose to use either the timestamp attribute or the boolean. All of this is encapsulated in a descriptor class, keeping your model definition clean and intuitive.
Summary
There's numerous opportunities to leverage descriptors to enhance your model fields. Django itself uses them to handle File Upload fields since they're stored in the database as pathnames to the file, but provided as a file-like proxies to your django code.
There's plenty of additional uses for descriptors. Please leave a comment (or tweet, etc) if you have addditional ideas or examples where descriptors can improve your model code.