Table of Contents
Unfortunately, many translation database/wordlist projects use their own format and so require a specialized converter. Some adhere to standards, but mostly their own. A problem with standards is that you have too many to choose from. It is an aim of FreeDict to lead to some kind of standardisation of data with the TEI format. Also we hope that upstream projects (ie. the projects where actual dictionary data comes from) make their data available in TEI XML format or even directly use TEI XML as their primary data format.
ding2tei.pl - conversion of the ding database (English/German) into TEI format
hd2tei.pl - conversion of the "hd" format (which dictfmt also understands) into TEI format
tab2tei.pl - conversion of tab delimited plain text file into TEI format
dict2tei.py - conversion of an already formatted dictd database into TEI format
Ergane 5.0, which was the source for the first FreeDict databases, had an export function. Back then it was easy to use its output and convert it into TEI. But by now Ergane 8.0 is out - without any export function. Then and now Ergane used Microsoft Access databases. Using knowledge about the structure of the Ergabe databases it is possible to implement a new export function.
At the heart of Ergane's approach to dictionary encoding is an
index of word meanings. The meanings of every word are encoded as
numbers. The word meanings of different languages are mapped to the
same index numbers. It is possible to find the translations of the
words of language la1 in language
la2 by this SQL query:
SELECT * FROM la1, la2 WHERE la1.EspKey = la2.EspKey
Ergane explains it uses Esperanto as an intermediate language. But that is not really necessary. The Ergane database just contains some special tables, but they are not special to Esperanto.
Table 13.1. The 'Woordenboek' table structure
| data type | column name | description |
|---|---|---|
| longint | keyno | primary key |
| longint | EspKey | meaning number |
| Text(510) | XEntry | orthography (the headword), encoding? |
| byte | Type | Part of Speech. For codes see conversion script. |
| char(2) | GType | Genus type. code? |
| byte | FType | Flexion type. codes? |
| Memo | Omschr | transliteration? |
| Byte | Freq | |
| longint | Volgorde | sorting key |
| longint | Opm | |
| longint | Opm2 | |
| Text(240) | Sortkey | |
| Text(170) | Uitspraak | pronunciation? |
Some Woordenboek tables of language databases have extra columns.