Chapter 13. Converting into TEI format

Table of Contents

Technical notes on the Ergane databases

Unfortunately, many translation database/wordlist projects use their own format and so require a specialized converter. Some adhere to standards, but mostly their own. A problem with standards is that you have too many to choose from. It is an aim of FreeDict to lead to some kind of standardisation of data with the TEI format. Also we hope that upstream projects (ie. the projects where actual dictionary data comes from) make their data available in TEI XML format or even directly use TEI XML as their primary data format.

Technical notes on the Ergane databases

Ergane 5.0, which was the source for the first FreeDict databases, had an export function. Back then it was easy to use its output and convert it into TEI. But by now Ergane 8.0 is out - without any export function. Then and now Ergane used Microsoft Access databases. Using knowledge about the structure of the Ergabe databases it is possible to implement a new export function.

At the heart of Ergane's approach to dictionary encoding is an index of word meanings. The meanings of every word are encoded as numbers. The word meanings of different languages are mapped to the same index numbers. It is possible to find the translations of the words of language la1 in language la2 by this SQL query:

SELECT * FROM la1, la2 WHERE la1.EspKey = la2.EspKey

Ergane explains it uses Esperanto as an intermediate language. But that is not really necessary. The Ergane database just contains some special tables, but they are not special to Esperanto.

Table 13.1. The 'Woordenboek' table structure

data typecolumn namedescription
longintkeynoprimary key
longintEspKeymeaning number
Text(510)XEntryorthography (the headword), encoding?
byteTypePart of Speech. For codes see conversion script.
char(2)GTypeGenus type. code?
byteFTypeFlexion type. codes?
MemoOmschrtransliteration?
ByteFreq 
longintVolgordesorting key
longintOpm 
longintOpm2 
Text(240)Sortkey 
Text(170)Uitspraakpronunciation?

Some Woordenboek tables of language databases have extra columns.