MR 264746
JM 308537
AH 223683
Criteria for The Evaluation of CAPTCHA Techniques
CAPTCHA (“Completely Automated Public Turing Test to Tell Computers and Humans Apart”) distinguishes between computers and human users by presenting a problem that can be solved by a human, but not by means of computers and artificial intelligence (AI).
This report compares the effectiveness of various CAPTCHA techniques. But how can the effectiveness of a practical system be defined or measured? Is it effective when its problem is unsolvable for computers, regardless of how many humans fail it, too? Or is it effective when all humans have a chance to pass it, no matter how many AI-systems might pass it as well? Clearly, an ideal CAPTCHA would keep out any computer but let pass every human. With this as a reference in mind, this assignment introduces an evaluation system to quantify effectiveness, consisting of two parts: an assessment for the computer perspective, and an assessment for the human perspective.
Various criteria have been identified and weighted, describing effectiveness from both points of view, hence evaluating three different CAPTCHA techniques. Each CAPTCHA is assessed on each criterion, being marked with points between 1 and 5. After taking the respective weight into consideration and summing up the scores from AI and human perspective, the maximum score is 10 points.
It must be noted, that a general assertion about difficulties or eases is hard to make, since these terms are very user-dependent (or depend on the respective AI-strategy) and subjective; we tried to generalize and had a representative cross section of society in mind. Though bearing in mind that pure visual CAPTCHA techniques exclude potential users like blind people (see http://www.w3.org/TR/turingtest ), the limited scope of this assignment did not allow evaluating other techniques.
AI Perspective
As so far all existing computer attacks are made for letter-based CAPTCHAs, the evaluation criteria focus on that AI-problem only. All criteria are listed with an example and their respective weighting in the table below.
Obviously, recognizing letters under perfect conditions is not too hard for an AI, as is proven by the optical character recognition (OCR-) software available in the market today. So there should be some more obstacles in recognising these patterns.
One of the easiest, yet most effective techniques (if done right) is the perturbation, which can make it very hard for an AI to recognise a letter. Perturbation ranges from some lines in the background, which can be removed by an AI, up to a background with gradient colours, where even humans can hardly distinguish between letters and perturbation. Because of its mightiness, perturbation is weighted highest.
Many CAPTCHAs use a row of characters which are aligned to some imaginary line (this line does not have to be horizontal, though), while more sophisticated ones have the letters randomly distributed over the image, or randomly offset from the line. This is also a good criterion to determine if its hard for an AI to solve this CAPTCHA. With this technique the AI can't look for familiar patterns along a line, but has to search the whole image. This is much more difficult for a software, while, again, it is very easy for humans to do.
Similar to this technique is the tilt of some letters. When letters are tilted, an artificial intelligence not only has to be trained on recognising them in the normal way, it also has to learn all varieties of tilted letters. Though not impossible, this requires very sophisticated programming. Therefore this criterion is weighted a little less than the previous ones. As for tilted characters, skewed symbols don't necessarily make it harder for an AI to learn them; it only takes more time to be solved. An AI needs to learn all possible variations of a symbol.
Some CAPTCHAs not only use letters, but also numbers and special symbols. The AI has to learn all these characters. This is one of the weakest techniques unless combined with others, therefore its weight is relatively low.
A powerful technique to make recognition hard for an AI is to merge characters, which is very difficult to solve for an AI; therefore we assigned a higher weight for this characteristic. The opposite approach, yet evenly difficult to solve by computers, is non-continuous characters, i.e. characters interrupted by the background or a similar colour. For this criterion we assigned a similar weight as for merged characters.
Human Perspective
An ideal CAPTCHA would be easy to pass for any human: of any age, with any knowledge, from any culture, with any language, with or without any disabilities. Therefore, from a human point of view, effectiveness describes the ease of solving the problem and is as well a measure of how many people do have a fair chance to pass it.
It must be noted that the assessment of a CAPTCHA can only be very subjective and depends on a number of abilities and faculties, which would require psychological and sociological investigation and cannot be discussed in detail within this assignment. We grouped various criteria into two categories, as shown in the table below: those concerned with senses (in this case: sense of sight only) and those concerned with abilities and knowledge.
The criteria related to the sense of sight assess the ease of capturing information from the picture; in case of letter-based techniques that is, acquiring both scripture and perturbation from the background. A very important criteria is the contrast, as a lowered contrast would significantly decrease chances to acquire the content for virtually every user. As the factor colour difference affects only a relatively small part of the population (according to http://en.wikipedia.org/wiki/Color_blindness , less than 8% of humans suffer from anomalous trichromacy), it is weighted less. Clearness is not too important, either. Scripture can be acquired even when slightly blurred, although recognition is derogated.
Having captured the information, it has to be ‘processed’. This requires certain abilities and knowledge. All letter-based techniques -- which are most of the commonly used -- require certain reading skills; some are more difficult than others, requiring more experience. Reading skill is therefore weighted highest of all abilities. All other criteria described in the following have been weighted equally low, as they affect only a small number of users.
Since the user is required to read and type in characters, a minimum ability to write is required, though web users without writing skills are not expected to be too likely.
If the CAPTCHA uses meaningful words rather than a sequence of random characters, users could understand a word even if letters were totally distorted and unre_ogn_sable, supposed it was a common word from a familiar language. This would mean a slight advantage over users from different languages. A similar issue concerning text-based solutions is the use of certain alphabets or sets of characters; to Chinese or Russian users, Latin letters might not be familiar.
Some solutions might require logical thinking, i.e. by expecting the user to solve puzzles, riddles or trivia.
For a picture-based CAPTCHA ("What do you see?"), users have to have specific knowledge about the objects and their respective names, which could cause problems with very specific objects. The question “Which picture shows a Weißwurst?” is easy to solve for any Bavarian, but tremendously difficult for people from other parts of the world, not even knowing what a Weißwurst is or looks like.
Comparison And Evaluation of Different CAPTCHAs
According to the specification of the coursework three websites using CAPTCHAs were chosen. MVN-Forum was selected because its CAPTCHA (see a in figure below) is quite hard for the visually impaired and it also looked like it would be hard for an AI to read the characters.
The second one was chosen on the basis that its CAPTCHA (see b in figure below)looked like it is very easy to solve for humans as well as for an artificial intelligence. It can be found on studivz.net.
A CAPTCHA (see c in figure below) we considered to be very difficult to read for AIs was chosen as a third example (see MSN-Passport). It turned out to be difficult for humans too.
The CAPTCHAs of all three websites have been assessed against the previously discussed criteria; the results can be seen in detail in the table below. CAPTCHA a) has a strong perturbation, but characters are sitting virtually on a virtual straight line without any offset. The characters are neither skewed nor merged, and only slightly tilted. From a human perspective, the bad contrast and low colour difference are remarkable. The second CAPTCHA shows a very regular and therefore easy to crack perturbation, but has a very good character offset, making it difficult for an AI to make out a line of text. The characters are fairly tilted, but neither merged nor skewed. Its high contrast and colour difference make it quite easily readable for humans, although the characters are slightly blurred. CAPTCHA c) has a very strong perturbation. All characters are skewed and slightly tilted, which is a challenge for the AI. From a human view, the good contrast and the colour difference are striking. Due to the perturbation it requires a fair level of reading experience.
None of the CAPTCHAs require any specific language or cultural knowledge. All examples require the user to type characters from the screen, assuming a least amount of writing skills, but all use the Latin alphabet, even the Japanese version of the website c). Merged characters did not occur in any of the assessed websites, nor did special characters.
Discussion of the Results
The experimental results only partly confirm the expectations, others are disproved. It has to be mentioned that the results are not representative, as the characteristics and their respective weight have been chosen according to what we thought would be most appropriate.
At a first glance CAPTCHA c) has scored as expected: it is very hard for an AI to recognise the characters because it uses all relevant techniques, like a strong perturbation, skewed and tilted characters and even non-continuous ones. The human perspective score on the other hand is surprisingly high, given that the first appearance suggests some humans will have difficulties to solve it. But the 'messy impression' does not pull down the whole score. Overall, c) is the best CAPTCHA of this test, with 7 out of 10 points.
The second CAPTCHA - expected to be easily solvable for both humans and computers - surprised: its score is very good and close to c). Although this is mainly due to its excellent readability, it also causes more difficulties to an AI than a).
Evenly unexpected came a), which ended up with the lowest score, a rank where initially b) was expected. It turned out that apparently a) is easier to solve by AI than b), which is mostly due to missing offset and little tilt. Obviously, the alleged quality of this CAPTCHA was suggested by the outstanding background perturbation, which in turn makes it hard to acquire information by humans.
A general observation is that some of the criteria are identical for all CAPTCHAs; for the sake of a better dynamic and resolution of the values, these should be cancelled out in future investigations. Additionally, the average value for human and AI criteria seem to be biased: the best AI score is still lower than the worst human score. This suggests further research about the respective criteria.
To conclude, none of the investigated subjects is an ideal CAPTCHA; each of them has its own trade-off's and compromises between human-solvability and AI-solvability. It remains to be seen whether or not future techniques will improve user-accessibility and concurrently keep up with AI developments.
References
- http://en.wikipedia.org/wiki/Captcha General Information
- http://sam.zoy.org/pwntcha/ Selection of various CAPTCHAs
- http://www.captcha.net/ Main-Page (with links to good CAPTCHAs)
- http://frikk.tk/comments-273-04.28.06.htm simple CAPTCHA implementation with PHP
- http://www.w3.org/TR/turingtest/ Inaccessibility of CAPTCHA
- https://accountservices.passport.net/reg.srf?id=2&sl=1&lc=1031 An audio CAPTCHA as alternative
- https://accountservices.passport.net/reg.srf?roid=2&sl=1&lc=1041
Comments (0)
You don't have permission to comment on this page.