Improved Mathematical Captcha

December 20, 2007 [last comment: March 28, 2008]

A week ago I published MathCaptchaForm that I use in my blog for preventing spam. Malcolm Tredinnick and then other people noticed that solution don't protect from replay attacks and once solved question can be used by spam bots on different web-sites. So I added protection against replay attacks and implemented it without database usage. It was original requirement to keep MathCaptchaForm as lightweight as possible.

I removed from the form captha_question and added new field captcha_token that holds a hash from question, answer, settings.SECRET_KEY, settings.SITE_URL and expires time (1 hour by default).

def _make_token(self, q, a, expires):
    data = base64.urlsafe_b64encode(\
        pickle.dumps({'q': q, 'expires': expires}))
    return self._sign(q, a, expires) + data

def _sign(self, q, a, expires):
    plain = [getattr(settings, 'SITE_URL', ''), settings.SECRET_KEY,\
             q, a, expires]
    plain = "".join([str(p) for p in plain])
    return sha.new(plain).hexdigest()

As you see captcha_token contains hash and question with expires time in plain form, but don't contain the answer. When form is submitted, from above fields and user answer is built new hash that we compare with old hash. If hashes aren't equal, form is rejected.

def clean(self):
    """Check captcha answer."""
    cd = self.cleaned_data
    # don't check captcha if no answer
    if 'captcha_answer' not in cd:
        return cd
    t = cd.get('captcha_token')
    if t:
        form_sign = self._sign(t['q'], cd['captcha_answer'],
                               t['expires'])
        if form_sign != t['sign']:
            self._errors['captcha_answer'] = ["Are you human?"]
    else:
        self.reset_captcha()
    return super(MathCaptchaForm, self).clean()

If captcha is expired, we reset it and generate new captcha. Here is a bit tricky moment – we need to change field values of already bound form, but django.newforms seems don't provide a solution for this (initial values only affect unbound forms). For storing fields values in forms is used django.http.QueryDict that is immutable, so I was need to make it temporary mutable to reset only captcha fields and don't touch other fields of the form (that will extend MathCaptchaForm).

def reset_captcha(self):
    """Generate new question and valid token
    for it, reset previous answer if any."""
    q, a = self._generate_captcha()
    expires = time.time() +\
    getattr(settings, 'CAPTCHA_EXPIRES_SECONDS', 60*60)
    token = self._make_token(q, a, expires)
    self.initial['captcha_token'] = token
    self._plain_question = q
    # reset captcha fields for bound form
    if self.data:
        def _reset():
            self.data['captcha_token'] = token
            self.data['captcha_answer'] = ''
        if hasattr(self.data, '_mutable') and not self.data._mutable:
            self.data._mutable = True
            _reset()
            self.data._mutable = False
        else:
            _reset()

The usage of the form was not changed, you just need to extend MathCaptchaForm like in the example bellow.

class CommentForm(MathCaptchaForm):
    """Form for editing a comment."""
    author = forms.CharField(label='Name', required=True,
        max_length=Comment._meta.get_field('author').maxlength,
        widget=TextInput(attrs={'size':20}))
    url = forms.URLField(label='URL', required=False,
        max_length=Comment._meta.get_field('url').maxlength,
        widget=TextInput(attrs={'size':24}))
    content = forms.CharField(label="Your comment",
        max_length=Comment._meta.get_field('content').maxlength,
        widget=Textarea(attrs={'cols':80, 'rows': 10}))

And then add captcha_token and captcha_answer fields in a template for your form.

{{ comment_form.captcha_token }}
<label for="id_captcha_answer"
{% if comment_form.captcha_answer.errors %}
title="{{ comment_form.captcha_answer.errors|join:", " }}">
<em class="error">{{ comment_form.knotty_question }}=</em>
{% else %}
title="Human? Enter answer!">{{ comment_form.knotty_question }}=
{% endif %}
</label>
{{ comment_form.captcha_answer }}

The full source code of MathCaptchaForm is bellow.

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Copyright (c) 2007, Dima Dogadaylo (www.mysoftparade.com)

import re
import sha
import pickle
import base64
import time
from random import randint
from django import newforms as forms
from django.conf import settings

class MathCaptchaForm(forms.Form):
    """Lightweight mathematical captcha where human is asked to solve
    a simple mathematical calculation like 3+5=?. It don't use database
    and don't require external libraries.
        
    From concatenation of time, question, answer, settings.SITE_URL and
    settings.SECRET_KEY is built hash that is validated on each form
    submission. It makes impossible to "record" valid captcha form
    submission and "replay" it later - form will not be validated
    because captcha will be expired.
    
    For more info see:
    http://www.mysoftparade.com/blog/improved-mathematical-captcha/
    """
    A_RE = re.compile("^(\d+)$")
    
    captcha_answer = forms.CharField(max_length = 2, required=True,
        widget = forms.TextInput(attrs={'size':'2'}))
    captcha_token = forms.CharField(max_length=200, required=True,
        widget=forms.HiddenInput())
    
    def __init__(self, *args, **kwargs):
        """Initalise captcha_question and captcha_token for the form."""
        super(MathCaptchaForm, self).__init__(*args, **kwargs)
        # reset captcha for unbound forms
        if not self.data:
            self.reset_captcha()

    def reset_captcha(self):
        """Generate new question and valid token
        for it, reset previous answer if any."""
        q, a = self._generate_captcha()
        expires = time.time() +\
        getattr(settings, 'CAPTCHA_EXPIRES_SECONDS', 60*60)
        token = self._make_token(q, a, expires)
        self.initial['captcha_token'] = token
        self._plain_question = q
        # reset captcha fields for bound form
        if self.data:                   
            def _reset():
                self.data['captcha_token'] = token
                self.data['captcha_answer'] = ''
            if hasattr(self.data, '_mutable') and not self.data._mutable:
                self.data._mutable = True
                _reset()
                self.data._mutable = False
            else:
                _reset()

    def _generate_captcha(self):
        """Generate question and return it along with correct answer."""
        a, b = randint(1,9), randint(1,9)
        return ("%s+%s" % (a,b), a+b)

    def _make_token(self, q, a, expires):
        data = base64.urlsafe_b64encode(\
            pickle.dumps({'q': q, 'expires': expires}))
        return self._sign(q, a, expires) + data
    
    def _sign(self, q, a, expires):
        plain = [getattr(settings, 'SITE_URL', ''), settings.SECRET_KEY,\
                 q, a, expires]
        plain = "".join([str(p) for p in plain])
        return sha.new(plain).hexdigest()
    
    @property
    def plain_question(self):
        return self._plain_question
    
    @property
    def knotty_question(self):
        """Wrap plain_question in some invisibe for humans markup with random
        nonexisted classes, that makes life of spambots a bit harder because
        form of question is vary from request to request."""
        digits = self._plain_question.split('+')
        return "+".join(['<span class="captcha-random-%s">%s</span>' %\
                         (randint(1,9), d) for d in digits])

    def clean_captcha_token(self):
        t = self._parse_token(self.cleaned_data['captcha_token'])
        if time.time() > t['expires']:
            raise forms.ValidationError("Captcha is expired.")
        self._plain_question = t['q']
        return t
        
    def _parse_token(self, t):
        try:
            sign, data = t[:40], t[40:]
            data = pickle.loads(base64.urlsafe_b64decode(str(data)))
            return {'q': data['q'],
                    'expires': float(data['expires']),
                    'sign': sign} 
        except Exception, e:
            import sys
            sys.stderr.write("Captcha error: %r\n" % e)
            raise forms.ValidationError("Invalid captcha!")
        
    def clean_captcha_answer(self):
        a = self.A_RE.match(self.cleaned_data.get('captcha_answer'))
        if not a:
            raise forms.ValidationError("Number is expected!")
        return int(a.group(0))
        
    def clean(self):
        """Check captcha answer."""
        cd = self.cleaned_data
        # don't check captcha if no answer
        if 'captcha_answer' not in cd:
            return cd

        t = cd.get('captcha_token')
        if t:
            form_sign = self._sign(t['q'], cd['captcha_answer'],
                                   t['expires'])
            if form_sign != t['sign']:
                self._errors['captcha_answer'] = ["Are you human?"]
        else:
            self.reset_captcha()
        return super(MathCaptchaForm, self).clean()

And a final note about replay attacks. It are still possible for same site during expires time – a bot can't generate captcha_token but if it solved a CAPTHA question it can reuse it during a hour for other forms on same site. The key phrase here is «if it solved a CAPTHA question». If a bot can answer CAPTHA questions, protection against replay attacks isn't need at all – the bot will act like a human. But implemented protection makes it impossible to record solved CAPTCHA form and reuse it on other sites in any time.

It's also possible to add to hash URL of master page for the form and recorded CAPTCHA fields will become invalid for other pages of same site even during expires time, but I think for spam bot owners it's simple to teach spam bot mathematics rather than deal with expired every hour CAPTCHA hashes.

Tags: captcha  concept  django 
Submitted 2 comments: accepted - 2, in moderation queue - 0. Add your comment.
  • Amir Hossain on March 25, 2008
    Hi!!! Hope you are doing well. We the leading Data processing company in Bangladesh. Presently we are processing 300000+ captcha per day by our 55 operators. We have a well set up and We can give the law rate for the captcha solving.

    Our rate $2 per 1000 captcha.

    We just wanna make the relationship for long terms. can we go forward? Thank you, (For inquiry amir4@yours.com or
    khoknaa@yahoo.com)

    Best Regards
    Amir Hossain Dewan
    Data Home Ltd.
    amir4@yours.com
    khoknaa@yahoo.com
  • Dima Dogadaylo on March 28, 2008
    Thanks Amir, for your offer. Here I use mathematical captcha, can you tell me 3 integers (x,y,z) that solve following captcha: x^3+y^3=z^3. If so, yeah, your company is definitely interesting.


   
Web log, research lab and soft parade of Dima Dogadaylo.
Email: entropyhacker at gmail dot com