The traditional approach to transcribing large passages of text from paper documents was good old-fashion copy typing—although the ability to scan documents has been with us for some years, the result of the scan is actually an image rather than editable text.
This is no longer an issue because Optical Character Recognition is now a mature software technology which can analyse the image to extract editable text with a very high level of accuracy.
With the Microsoft Office Document Scanner, for example, text recognition is automatic so you end up with the basic image together with a text file which can be saved into Word or some other application.
This feature can be found in the Microsoft Office Tools section, under Microsoft Office on the 'All Programs' menu—this feature is installed as standard with Office 2003 but not with Office 2007 where it needs to be installed separately from the Control Panel.
Omnipage is a rather more sophisticated product whose main benefit is to preserve document layout and produce editable documents which look exactly like the original—ideal for such applications as reproducing forms to be filled in on the computer.
The software can also improve the appearance of poor-quality originals; it recognises a number of languages and can turn the text into an audio file which may be read aloud by the computer.
The recognition of handwriting is still somewhat experimental and the most successful implementations have been on PDA’s (hand-held computers) using special fonts which are written as individual letters.
In fact, handwriting recognition is also built into MS-Office and is best used with a graphics tablet rather than attempting to write with a mouse.
I think that it will be a while before any software can recognise my ‘joined-up’ handwriting.
Comments