[KLUG Members] Looking for PHP code snippet.

Jamie McCarthy jamie at mccarthy.vg
Sat Apr 9 11:42:39 EDT 2005


> I want to verify that the data entered in a form was done by a
> real person, and not a bot.

That's been dubbed a CAPTCHA;  I don't know but googling on that
term for PHP code may help.

A few thoughts on this...

CAPTCHAs can never stop an attacker, only slow him/her down.  They
can always screen-scrape your application and dump it onto a webpage
where they can just have a human solve it and hit return over and
over again very quickly.  CAPTCHAs can turn a fully-automated attack
into an attack that is fully-automated except for one parallelizable
component that requires a human for a few brief seconds.  That may
or may not be worth it for your application.

CAPTCHAs are very hard to do if your attacker is determined.  We
rolled our own for Slashdot, and it has been broken by trolls for
months, possibly years, as in there is software out there that
automates the process. (They wrote a wrapper around some image
processing app written by some guys at MIT or something... the
details are irrelevant to me, broken is broken.)  For a sample of
what we're using for images, reload this a few times:
<http://slashdot.org/login.pl?op=newuserform>

CAPTCHAs are very easy to do if your attacker is not especially
determined.  Like the sticker in a car window that says "protected
by Leet Security Systems," sometimes all that's necessary is to
encourage your attacker to hit someone else's site besides yours.

CAPTCHAs are only effective if you thwart brute force attacks.
You may want to let the user reload the page to see a different
question, in case the first one you generated was hosed in some way
(an image that your algorithm munged into unrecognizability, for
example).  But multiple reloads, or multiple failed attempts to
solve, should result in a blocked IP number, blocked subnet, or in
the case of a massive distributed attack, blocked participation on
your entire site until an admin can be notified.

CAPTCHAs don't have to be images.  For most programmers who want to
roll their own, and if the CAPTCHA won't be used on a huge website
or a network of large websites, I would advise writing something
completely textual in nature.  It doesn't take much to emit
questions whose answers are obvious to humans, but which would be
nontrivial to write a program to solve.  Also, textual challenges
are more friendly for visually-impaired readers.  For example, you
could ask your users:

    Which of these things is not like the others?
    cardinal robin bluejay slug chickadee

    Which of these words comes first alphabetically?
    discus street invasive keyboard mirror

    How much is nine take away two?  [____]

    How many E's are in the word "fifteen"?  [____]

If you code ten different types of questions -- some of which may
draw from a large database, some of which may be more algorithmic in
nature, it's up to you -- then your attackers will have to spend
many hours and perhaps hit your site a very large number of times to
acquire data, so that they can write code that answers your
questions correctly.  That may be enough of a deterrent effect.
Also, I think this type of coding is more fun than boring old image
processing (your mileage may vary).

A CAPTCHA is one tool in your arsenal against attackers.  Like
everything else you do to keep attackers from damaging your site,
you need to decide on what balance you think will be right:  how
much time and resources you will invest, both now, and in the event
that you need to increase your protection in the event of an attack.
It's one step in an arms race, not a perfect shield, so don't think
you have to invest tremendous effort to make something whose first
draft is impervious, unless you're writing the next CAPTCHA that
will be used by Yahoo or Google.  :)
-- 
  Jamie McCarthy
 http://mccarthy.vg/
  jamie at mccarthy.vg



More information about the Members mailing list