Wednesday, February 22, 2012

Contrived acronym in computer science and web development: Captcha

The term “captcha” was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University to denote a webpage element that is designed to tell humans and robots automatically apart: Captcha stands for Completely Automated Public Turing test to tell Computers and Humans Apart [1]. A captcha image, consisting of randomly generated characters squiggling inside a rectangle, is often encountered as a required-to-do field within a submit form. Misrecognizing certain letters or symbols, a user may get the interrogation vibe when asked if she would mind to do it again (and again …). In a recent TechnoFile contribution, David Pogue highlights the functioning of captchas as efficiency barriers and suggests the following meaning: Computers Annoying People with Time-wasting Challenges that Howl for Alternatives [2].

In case you haven't found an alternative yet or are going to argue that captchas are nevertheless pretty good in capturing—or should I write captcharing—machines employed by misconducting humans, then some interesting websites and JavaScript code will be helpful. Client side as well as server side scripting has been explained and demonstrated  [3,4]. Further, dynamic generation of webpages that include forms with captcha images can be achieved with various programming languages such as PHP, ASP, JSP, Perl, Python and Ruby [5].

What about identifying and appreciating the honest human being, who is interacting with your site; instead of targeting potential spambots all the time? Ben Hunt discusses a promising approach [6]. Invisibility is not only the strategy of spammers and spies, but can be derived by user-friendly technology, implemented as a backstage wizard that lets humans submit, sign in and hack as long as they employ finger work. Open creativity instead of captchability!

References and resources to explore
[1] www.captcha.net and www.google.com/recaptcha/captcha.
[2] TechnoFiles by David Pogue: Time to Kill Off Captchas. How the bot-proofing of the Internet in bringing humans down. Sci. Am. March 2012, 306 (3), page 23 [www.scientificamerican.com/article.cfm?id=time-to-kill-off-captchas].
[3] Simple JavaScript CAPTCHA Generator: typicalwhiner.com/190/simple-javascript-captcha-generator/.
[4] Implementation of Captcha in JavaScript: www.codeproject.com/Articles/42842/Implementation-of-Captcha-in-Javascript.
[5] Free CAPTCHA-Service: captchas.net.
[6] Ben Hunt:  CAPTCHA Alternative? Try this Invisible Human Check for Web Form Validation [www.webdesignfromscratch.com/javascript/human-form-validation-check-trick].

Tuesday, February 14, 2012

Case sensitive URL distinction? Don't rely on it!

A Uniform Resource Locator (URL) should not be case sensitive. At least, the domain-name part of the URL string is not interpreted with respect to case sensitivity [1].  Of course, you can type a URL into the provided field of your browser anyway you want. The same applies to the href attribute in an anchor tag of your HTML page. But the server that is hosting the targeted website may interpret file paths differently, depending on the occurrence of upper- and lower-case letters in an otherwise identical character sequence [2]. Unless you know exactly the set-up and configuration (Apache/Linux or other hosting software) of the server you are trying to access, you do not want to rely on either a case or non-case sensitive interpretation of your query.

Obviously, the common concern is to locate a website by not caring for upper- or lower-case letter typing and by avoiding  to end up with a “404 Error File Not Found” page [3,4]. Here, I like to emphasize the “mirror problem:” let us assume the server presence of multiple files, whose names vary only by selective capitalization. This problem is not restricted to website location, but is a general issue of targeted search and annotation. For example, in fields such as chemistry, case-sensitive presentation can be critical to distinguish between different materials: the symbols/formulae Co and CO represent the chemical element cobalt and carbon monoxide; CsI and CSi represent  cesium iodide and silicon carbide. Within each pair, notations differ by case only. Two files, named Co.htm and CO.htm, may not correctly be addressed or resolved as separate files, when located in the same directory. Such ambiguities are avoided—although generating overhead—by employing a more distinctive naming scheme. In our ThermoML file repository for molecular-composition-based open access of thermodynamic data and chemical publication hyperlinks, we choose a host-independent system of file names. For cobalt and carbon monoxide the files happen to be Co_aaa.htm and CO_aax.htm, respectively. The x is making the difference.

Keywords: name disambiguation, formula disambiguation, file names, identifiers, web hosting, Windows, Linux, UNIX, case standardization 

References and more on URL case sensitivity
[1] Bin-Blog: www.bin-co.com/blog/2007/10/case-sensitivity-in-urls/.
[2] wiseGEEK: www.wisegeek.com/are-urls-case-sensitive.htm.
[3] Ted Kuik: Case Sensitive URLs. Does capitalization matter? [www.coolnotions.com/Articles/Article_02.htm].
[4] Case-Sensitive URL's: www.infocellar.com/networks/internet/URL-case-sensitive.htm.