Mathematical Captcha

December 11, 2007 [last comment: December 20, 2007]

When spam bots started to post comments in my blog every day I realized that I need some kind of simple anti-spam protection. I didn't saw yet real anti-spam AI and don't like image captchas. I looked for a mathematical captcha where human is asked to solve a simple mathematical calculation like 3+5=?. Unfortunately I didn't found open-source implementation and write own MathCaptchaForm. A week already I don't see spam posts.

import re
from random import randint
from django import newforms as forms

class MathCaptchaForm(forms.Form):
    Q_RE = re.compile("^(\d)\+(\d)$")
    A_RE = re.compile("^(\d+)$")
    captcha_question = forms.CharField(max_length=10, required=True,
        widget=forms.HiddenInput())
    captcha_answer = forms.CharField(max_length = 2, required=True,
        widget = forms.TextInput(attrs={'size':'2'}))

    def __init__(self, *args, **kwargs):
        super(MathCaptchaForm, self).__init__(*args, **kwargs)
        q = self.data.get('captcha_question') or self._generate_question()
        self.initial['captcha_question'] = q

    def _generate_question(self):
        return "%s+%s" % (randint(1,9), randint(1,9))

    def clean_captcha_answer(self):
        q = self.Q_RE.match(self.cleaned_data['captcha_question'])
        if not q:
            raise forms.ValidationError("Are you hacker?")
        q = q.groups()
        a = self.A_RE.match(self.cleaned_data['captcha_answer'])
        if not a:
            raise forms.ValidationError("Number is expected!")
        a = a.groups()
        if int(q[0]) + int(q[1]) != int(a[0]):
            raise forms.ValidationError("Are you human?")

Django supports forms subclassing, so to add mathematical captcha to any existing form it is enough to extend MathCaptchaForm. For example comments form used on this site is defined as bellow.

class CommentForm(MathCaptchaForm):
    """Form for editing a comment."""
    author = forms.CharField(label='Name', required=False,
        max_length=Comment._meta.get_field('author').maxlength,
        widget=TextInput(attrs={'size':20}))
    url = forms.URLField(label='URL', required=False,
        max_length=Comment._meta.get_field('url').maxlength,
        widget=TextInput(attrs={'size':24}))
    content = forms.CharField(label="Your comment",
        max_length=Comment._meta.get_field('content').maxlength,
        widget=Textarea(attrs={'cols':80, 'rows': 10})

Then it's only need to add 2 captcha's fields to your template and spam bots along with humans who was bad at school will not be able to post comments. The template may look like:

{{ comment_form.captcha_question }}
<label for="id_captcha_answer">
{% if comment_form.captcha_answer.errors %}
<em class="error">{{ comment_form.initial.captcha_question }}=</em>
{% else %}
{{ comment_form.initial.captcha_question }}=
{% endif %}
</label>
{{ comment_form.captcha_answer }}

Well, it's quite simple to write spam bot that will solve such simple mathematical captchas, but in this case we can improve current simple MathCaptchaForm with integral and differential equations.

Tags: captcha  concept  django 
Submitted 12 comments: accepted - 12, in moderation queue - 0.
  • Malcolm Tredinnick on December 11, 2007
    Your form is vulnerable to replay attacks, because you don't check that the question submitted in the form is the same one you passed to the user. So I can just change the question to something I computed the answer for earlier and then submit hundreds of posts without any extra work.
  • Martin on December 11, 2007
    That should be good enought even though (as you mentioned) its easy to make a script to find the label and parse its content but i dont think any spammer goes in to your blog and takes the time to add just your captcha solution to his spamming scripts.

    I think theres a project named djapthca (django captcha) aswell but if you go the image way you need it to be audioable for blind persons aswell.
    I mean.. your solution should be good enough, if spam starts to popin again you can just make some small changes and you are good to go for a while again...

    Thanks for sharing the code! maybe you can post it to djangosnippets aswell?
  • Dima Dogadaylo on December 11, 2007
    Malcolm, you are right from technical point of view. And from technical point I can resolve spotted by your issue by adding one more hidden field that will keep an unique for each page request random id and compare it with ids stored in database. But I don't want to have database involved.

    I wanted to have as lightweight captcha as possible, i.e. captcha without PIL and database access (well, I believe that server should not waste resources on such tasks). It quite well protects from bots that fill&submit all found forms, smart bots with memory or eval() on the board will be able to avoid the protection.

    Probably, if my small blog will become so popular that spammers will use special scripts for it, I will need to rethink protection mechanism, but now I'm quite happy with it. So it's not a bug, it's a feature :)
  • Daniel Andrlik on December 11, 2007
    You could solve the problem by making adding hash of the values concatenated with your settings.SECRET_KEY, maybe even additional data as well, which is similar to how the comments app works to begin with. Just toss the hash in a hidden field and compare the hash to the forms to confirm that the initial question remained the same between display and post.

    Should be relatively easy to implement.
  • Dima Dogadaylo on December 11, 2007
    Daniel, I didn't got you. Hashing is used to prevent replacement of generated by server value with other value. For example, server generated question '8+3', concatenated it with settings.SECRET_KEY and build hash. Then user can't send OTHER question (for example, 7+5) with hash for old question (8+3), however user can send OLD question with OLD hash as many times as it wants because hash and question are matched.

    So using hashes without storing state in the database don't protect from replay attacks, that Malcom addressed.
  • Daniel Andrlik on December 11, 2007
    I should have been more clear. I was thinking about it in the sense of reusable code for multiple sites. As you said, there really isn't a way I can think of for protecting against a replay attack without storing state in some way. However, I was just thinking about preventing a script for reusing the same attack against multiple sites as it would need a unique hash + question + answer combination for each site it attempted to post at, which is time consuming unless it was a highly focused attack.

    For what you need it probably works, although I've found that integrating Akismet support into the comment approval process goes a long way and doesn't seem to take up much in the way of resources. I'm mostly thinking out loud here, because I like the simplicity of a numerical captcha like this, and I'm considering incorporating into my existing comments system. :-)
  • Strange Pants on December 12, 2007
    Have you thought about implementing the Honeypot Captcha? (http://haacked.com/archive/2007/09/11...)

    It works, it's invisible to real humans, it's quite accurate. I think you could also implement it in less lines of code ;D
  • Dima Dogadaylo on December 12, 2007
    Well, MathCaptcha already provides all features that Honeypot Captcha provides. If bot just fills all fields, form will be rejected because answer will be incorrect.
  • Strange Pants on December 13, 2007
    No disrespect to MathCapture, but it misses the biggest feature provided by Honeypot Capture: it's invisible to real users; it doesn't require them to do anything at all.
  • Empty on December 13, 2007
    I just wanted you to know that I really enjoy your blog posts. The stuff you're thinking about is different and refreshing. Keep it up.
  • Dima Dogadaylo on December 13, 2007
    I read that under some circumstances browser may fill invisible to user input fields with default values (so called form auto-fill mode). So this invisible field may receive invisible to real user value and form will be rejected.
  • Dima Dogadaylo on December 20, 2007
    I improved MathCaptchaForm with protection against replay attacks. Please see:
    http://www.mysoftparade.com/blog/improved-mathematical-captcha/
Web log, research lab and soft parade of Dima Dogadaylo.
Email: entropyhacker at gmail dot com