TY - CONF T1 - OCR Alternatives for Electronic Publishing of Digitised Documents T2 - From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing Y1 - 2005 A1 - Stefan Pletschacher AB - This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representations by means of vectorisation and based on these steps encoding the original document, it is possible to gather benefits of encoded text without the effort and the possible mistakes that arise from recognition methods. The use of the Extensible Markup Language (XML) for structural descriptions and Scalable Vector Graphics (SVG) for graphical representations enables a seamless integration into style sheet based output workflows for producing system specific layouts. JA - From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing T3 - ELPUB PB - Peeters Publishing Leuven CY - Leuven-Heverlee, Belgium N1 - Conference held at Katholieke Universiteit Leuven J1 - ELPUB2005 ID - oai:elpub.id:215elpub2005 ER -