Improving Access to Text
The IMPACT project (Improving Access to Text), started on 1 January 2008, aims to significantly improve the accessibility of historical printed text. The project will push innovation in OCR technology and language technology for historical document processing and retrieval. IMPACT will also take away the barriers that stand in the way of the mass digitization of the European cultural heritage by sharing expertise and building capacity in digitisation across Europe.
Objectives
- Significantly improve access to historical text by innovating OCR and language technology
- Share expertise and build capacity across Europe
- Ensure that tools and services will be sustained after the end of the project
Why IMPACT?
Digitised material is becoming available too slowly, in too small quantities and from too few sources. Issues that users and institutions currently face include: Material dated before 1900 is difficult to access in a digital form, because the state-of-the-art OCR software does not provide satisfactory results for old books, magazines and newspapers; Libraries, archives and other content holding institutions across Europe lack experience and know-how in the process of digitisation and the historical language barrier also forms a stumbling block. Together this causes inefficiency and slows down the process of making European cultural heritage available on the Internet.
IMPACT tools
To overcome these barriers to digitisation, IMPACT plans to innovate in the technology for text recognition and text enrichment. A cutting edge text recognition system will be set up, based on an adaptive model which automatically tunes itself to each new book being digitised. With the online Collaborative Correction web application linked to this system, volunteers across Europe will be able to contribute their efforts to the correction process of OCR results for further improvement. In addition, IMPACT explores new approaches in image enhancement and segmentation and in the use of language technology and historical lexica in OCR processing and Information Retrieval. The IMPACT tools will become available as interoperable web services integrated with a user friendly platform.
Sharing expertise and best practice
IMPACT will also improve the process of large-scale digitisation by sharing expertise and best practice. For this a number of strategic tools such as a website (www.impact-project.eu), help desk, decision support tools and a training programme are developed. In addition, a sustainable Centre of Competence will be set up in order to provide a central service entry point for all libraries, archives and museums involved in the digitisation of textual material.
Extension
In the current second phase of IMPACT (2010-2011), an additional 11 partners (four national libraries, four research centres and three universities) from France, Spain, Poland, Bulgaria, Slovenia and the Czech Republic have been added to the original consortium of 15 partners. The new partners will work on building historical lexica for their languages, providing datasets and disseminating project results in Southern and Eastern Europe.
Organisation
IMPACT is coordinated by the National Library of the Netherlands (KB) and brings together twenty-six national and regional libraries, research institutions and commercial suppliers from Europe, Israel and Russia. IMPACT is funded under the Seventh Framework Programme of the European Commission (FP7) and has a duration of four years (2008-2011).
More information is available on the IMPACT project website.