We've launched user accounts at EveryBlock, and we faced the interesting problem of needing to cache entire pages except for the "You're logged in as [username]" bit at the top of the page. For example, the Chicago homepage takes a nontrivial amount of time to generate and doesn't change often -- which means we want to cache it -- but at the same time, we need to display the dynamic bit in the upper right:
One solution would be to pull in the username info dynamically via Ajax. This way, you could cache the entire page and rely on the client to pull in the username bits. The downsides are that it relies on JavaScript and it requires two hits to the application for each page view.
Another solution would be to use Django's low-level cache API to cache the results of the queries directly in our view function. The downsides are that it's kind of messy to manage all of that caching, plus each page view still incurs the overhead of template rendering (which isn't horrible, but it's unnecessary overhead).
The solution we ended up using is two-phased template rendering. Credit for this concept goes to my friend Honza Kral, who suggested the idea to me during PyCon earlier this year.
The way it works is to split the page rendering into two steps:
- At cache reset time, render everything except the "You're logged in as" bit, which should remain unrendered Django template code. Cache the result as a Django template. (This is the clever part!)
- At page view time, render that cached template by passing it the current user. This is super fast because, at this point, the template only has two or three template tags. (The rest of the page is already rendered.)
It's a clever solution because you end up defining what doesn't get cached instead of what does get cached. It's a sideways way of looking at the problem -- sort of like how Django's template inheritance system defines which parts of the page change instead of defining server-side includes of the common bits.
In order to make this work, we had to write two parts of infrastructure: a template tag and a middleware class that does the cache-checking and rendering. The template tag looks like this:
# Copyright 2009, EveryBlock
# This code is released under the GPL.
from django import template
register = template.Library()
def raw(parser, token):
# Whatever is between {% raw %} and {% endraw %} will be preserved as
# raw, unrendered template code.
text = []
parse_until = 'endraw'
tag_mapping = {
template.TOKEN_TEXT: ('', ''),
template.TOKEN_VAR: ('{{', '}}'),
template.TOKEN_BLOCK: ('{%', '%}'),
template.TOKEN_COMMENT: ('{#', '#}'),
}
# By the time this template tag is called, the template system has already
# lexed the template into tokens. Here, we loop over the tokens until
# {% endraw %} and parse them to TextNodes. We have to add the start and
# end bits (e.g. "{{" for variables) because those have already been
# stripped off in a previous part of the template-parsing process.
while parser.tokens:
token = parser.next_token()
if token.token_type == template.TOKEN_BLOCK and token.contents == parse_until:
return template.TextNode(u''.join(text))
start, end = tag_mapping[token.token_type]
text.append(u'%s%s%s' % (start, token.contents, end))
parser.unclosed_block_tag(parse_until)
raw = register.tag(raw)
This template tag merely treats everything between {% raw %}
and {% endraw %}
as unrendered template code.
Then, in our base EveryBlock template, we wrap the appropriate bit of code in the {% raw %}
tag, like this:
{% raw %}
{% if USER %}
<p>Logged in as {{ USER.email }}</p>
{% else %}
<p>Sign in / register.</p>
{% endif %}
{% endraw %}
The final part is to write some middleware that renders every text/html
response through the template system:
# Copyright 2009, EveryBlock
# This code is released under the GPL.
from django.core.cache import cache
from django.template import Template
from django.template.context import RequestContext
import urllib
class CachedTemplateMiddleware(object):
def process_view(self, request, view_func, view_args, view_kwargs):
response = None
if request.method == 'GET' and 'magicflag' not in request.GET:
cache_key = urllib.quote(request.path)
response = cache.get(cache_key, None)
if response is None:
response = view_func(request, *view_args, **view_kwargs)
if 'magicflag' not in request.GET and response['content-type'].startswith('text/html'):
t = Template(response.content)
response.content = t.render(RequestContext(request))
return response
One thing to note here is that there's a backdoor for an external process (say, our script that resets the cache) to retrieve the halfway-rendered template code for any page -- magicflag
in the query string. (We actually use something different on EveryBlock; I've changed this example.) So that means the only thing the cache-resetting script has to do is make a request to the appropriate page, with that query string, and save the result in the cache. Pretty slick.
There's also a potential gotcha/limitation here: anything within {% raw %}
and {% endraw %}
will only have access to a template context with the default RequestContext
stuff -- which, in our case, will be user-specific stuff.
Thanks again to Honza for telling me about this concept. It's a great idea, and it's serving us well.
Comments
Posted by Andrew Ingram on May 18, 2009, at 7:28 p.m.:
Very, very clever. I look forward to using it myself.
Posted by David on May 18, 2009, at 7:35 p.m.:
Maybe I'm misreading something -- where is the cache getting set?
Posted by Adrian Holovaty on May 18, 2009, at 7:37 p.m.:
David: I didn't include the code that sets the cache, but it's simple -- just request the URLs with the magicflag and save the responses in the cache.
Posted by Honza Král on May 18, 2009, at 9:15 p.m.:
I am glad that you found that idea helpful, your solution is much more generic than ours, we let the individual templatetags handle the double render and have the middleware always present.
sample tag:
http://github.com/ella/ella/blob/e2853bc264f4cbf3e3f61536ba423fae55c2a5d7/ella/core/templatetags/hits.py#L76
middleware:
http://github.com/ella/ella/blob/e2853bc264f4cbf3e3f61536ba423fae55c2a5d7/ella/core/middleware.py#L16
Posted by Mark on May 18, 2009, at 10:27 p.m.:
Any reason for this code being released as GPL?
Posted by Patrick on May 19, 2009, at 12:55 a.m.:
I'm sure Adrian has his reasons, but if you can't use Adrian's GPLed code, you might want to take a look at the code in Honza's project (links in his comment on @4:15pm). The license for the CMS which uses it (Ella) appears to be standard 3 clause BSD.
Posted by Jeff Waugh on May 19, 2009, at 7:51 a.m.:
This would be a good opportunity to use ESI. :-)
Posted by Kr0n on May 19, 2009, at 8:13 a.m.:
Nice!
Wouldn't be the same use SSI with something like nginx and at the same time avoid a Django hit? It'd be going out the stack somehow, but...
Posted by Simon Willison on May 19, 2009, at 8:46 a.m.:
This kind of optimisation is one of the reasons I'm so keen on signed cookies as an alternative to sessions. If all you need to customise is the "logged in as..." box on a page, having the username stored in a signed cookie means you don't have to hit the database (or an external session store) /at all/ for the duration of the request - just pull out the cached copy, check the signature on the cookie, extract the username and render it out on to the page. And since the computation is done entirely by the app server it scales horizontally.
Posted by Sergey Shepelev on May 19, 2009, at 1:40 p.m.:
If "hello {username}" is the only dynamic part, then you don't use user accounts in first place. Registration and login just to see my name at website is awful.
P.S.: this comment form is bad too. It thinks that anything inside angle brackets is HTML. BUT, when i use proper HTML < it doesn't show angle bracket! What was on your mind - don't accept angle bracket AND escape ampersand?
Posted by Davide Della Casa on May 19, 2009, at 1:57 p.m.:
Why just not using a cookie with the username and let the browser to fetch it and render it in the page?
Posted by Adrian Holovaty on May 19, 2009, at 3:42 p.m.:
Mark: This is licensed as GPL because we're required to release EveryBlock's source code as GPL. The project is funded by a grant, and that's the license that we were asked to use.
Sergey: If I'm logged into Google and I view the Google homepage, I see my e-mail address at the top right, but it doesn't customize the page. I would argue that if a user is logged in, the developer has an obligation to let the user know that -- regardless of whether the particular page actually changes based on the user. (And in EveryBlock's case, *of course* we're customizing pages for users -- just not the homepage, at this time.)
Simon and Davide: With a cookie, you'd either have to parse it in JavaScript (which is non-ideal because it requires JavaScript) or do it in the application, in which case this two-phased template rendering would still help you, because you've still got to figure out a way to cache the heavy stuff and let the application do the username bit dynamically. The question of whether to store the username in a cookie vs. a session is tangential to this caching approach, isn't it?
Posted by Tom W. Most on May 19, 2009, at 5:45 p.m.:
Perhaps I'm missing something, but doesn't this leave you vulnerable to injection of Django template code? I don't see any method being used to escape the content outside of the {% raw %} tag from being interpreted as template code, or do you somehow guarantee that your data never contains "{{", "{%" or "{#"?
Posted by Adrian Holovaty on May 19, 2009, at 7:46 p.m.:
Tom: That's a great point, and I hadn't thought of it. Thanks for bringing it up!
A better version of this would take care of escaping everything *outside* the {% raw %} block between the first and second renders, to avoid a template-injection vulnerability.
Posted by Tim o'reilly on May 20, 2009, at 6 a.m.:
Is the code coming out by june end ?
Posted by Andrew Ingram on May 20, 2009, at 11:18 a.m.:
I'm trying it in conjunction with the @cache_page decorator for a customer shopping cart that appears in the header of every page.
It seems to work, but I'm worried I might be missing a security implication.
Posted by Simon Willison on May 20, 2009, at 10:29 p.m.:
Adrian - yes, that's what I was getting at - storing the username in a signed cookie is a great complement to this kind of two-phased rendering as it allows you to avoid having to even hit the database or lookup their session - you pull from cache, extract the username from the cookie, render the two together and you're done.
Posted by James Abley on May 21, 2009, at 10:57 p.m.:
I'm having a dense moment. Surely that's just basic multi-pass compiler [1] design? Does that not get widely used when implementing template languages? What am I missing?
[1] http://en.wikipedia.org/wiki/Multi-pass_compiler
Posted by coulix on May 22, 2009, at 2:27 p.m.:
How would you do to make it accept {% trans ''foo" %} and how could i add some context like mail_count that i currently get by calling a custom tag.
Posted by Dan on May 23, 2009, at 8:45 a.m.:
I did something similar, but using the {% templatetag %} for rendering the template code. A {% raw %} tag would be much more useful, and would be a nice addition to the Django trunk.
Posted by Simon Law on May 29, 2009, at 3:11 p.m.:
Tom is right. Adrian, you’ll want to hack the {% autoescape "on" %} tag so that all {% templatetag %} characters are properly escaped in your variables after the first pass.
Posted by Adrian Holovaty on May 29, 2009, at 4:10 p.m.:
Simon: Thanks for the suggestion. I've already solved it another way.
Posted by Andy Baker on May 30, 2009, at 9:33 a.m.:
Adrian - How far does template fragment caching take you before running out of steam? I was going to go down that road before I read this post.
Posted by ian on June 14, 2009, at 11:28 a.m.:
another way of doing this is to have a alternate template tag indicator for each phase.
{% .. %} for the initial run, and {$ .. $} for the 2nd run. we did this kind of thing in 2000 with SSI @cnet ;-0
Posted by ESI on June 16, 2009, at 8:33 p.m.:
We're planning in doing something similar using ESI (edge side includes): some smart reverse HTTP proxy support ESI which lets you replace some parts of a cached page with the response of another HTTP GET.
So, we cache the output HTML at varnish level (faster than doing it in RoR or even a Metal), get the session data via a 2nd HTTP request and calling a JS function that applies user customizations (we do a little bit more than displaying the username).
We are doing it with a JS request instead of ESI because Varnish won't gunzip the HTML returned by apache to check for the esi:include tags but it's going to be supported soon.
The only downside is that the app is not 100% usable without javascript...
Also Mnot created some Javascript functions to replace some parts of a HTML document with data from other HTTP reqs: http://www.mnot.net/javascript/hinclude/
Posted by Johan Bergström on June 27, 2009, at 6:56 a.m.:
My stab would be to split your page apart with SSI/ESI (although Varnish only) and cache parts of your page by routing them differently into Django.
ESI: Why toss javascript into the equation? Most people put something in front of varnish (until varnish handles it on its own) that takes care of deflate.
Posted by ZK@Web Marketing Blog on June 29, 2009, at 11:17 a.m.:
I did use them as an inpiration for a presentation of the ORM part of Django at the French Perl Workshop in Paris this week-end.
My slides (in french) are here : http://o.mengue.free.fr/blog/2006/11/...
Comments have been turned off for this page.