Table of Contents
Abstract
This section explains how to build most of the TEI format dictionary from a file formatted the same way as a dictd compliant .dict file. Beware that the .index file is ignored!
It relies on you laying out your dictionary with a very simple format. You will still have to construct the TEI header, but most of the work will be done for the actual content of the dictionary.
If you have any dictionaries
in dictd database format installed, you may open one of the
dictionary-name.dict.dz files to have a look at the
format and contents. You will need the tool dictunzip
that comes with dictd or gunzip to uncompress a .dz
file. The dictzip compression extends the gzip compression with special
data, so the uncompression can be done by gzip, where the header data
is discarded.
Beyond the dictd header section you will notice that the file is a text file with a simple and predictable format.
When a dictd dictionary is built using dictfmt,
two files are created. The dictionary-name.dict
file, the one we are interested in here, contains the data that is
presented to the user when she asks for the translation or definition of
a word. The second file, dictionary-name.index, is a
listing of the position and length of the definitions in the .dict file.
Together they form an indexed database of headwords and definitions.
Here is a <comment/> commented snippet
from the freedict-eng-lat.dict file.
Example 7.1. A freedict-eng-lat.dict snippet
00-database-info <comment/> A formatted string dictd knows about 3. Apr. 2000 Database was converted to TEI format and checked into CVS 9.Jan.2000Phonetics added (H.Ey) - machine generated from MBRODICT( http://tcts.fpms.ac.be/synthesis/mbrdico )1.Jan 2000This Database was generated from ergane (http://www.travlang.com).- Thanks!Copyright (C) 1999 Horst Eyermann (Horst@freedict.de)This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published bathe Free Software Foundation; either version 2 of the License, or(at your option) any later version. This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. <comment/> A space (not TAB) indented info block 00-database-short English-Latin Freedict Dictionary <comment/> The dictionary name, usually different than the file name 00-database-url http://www.freedict.org <comment/> The website of the origin of this dictionary ABC /eibiːsiː/ abecedarium Abyssinian /əbisiniən/ Ãthiops Academy /əkædəmiː/ Academia Achaea /ətʃiə/ Achaia Achaia /ətʃiə/ Achaia Acheron /ətʃerən/ Acheron Actium /ækʃaim/ Actium Adam /ædəm/ Adam Adriatic /ədriætik/ Hadria Adriatic Sea /ədriætiksiə/ Hadria Aeneas /əniːz/ Ãneas Aeolus /iːələs/ Ãolus <comment/> Snipped zither /ziðər/ cithara zone /zoun/ zona <comment/> Notice the empty lines between entries
So, an entry has this format: Blank line above. Headword starts on the beginning of the line (column 0), the translation starts on the next line that is indented more than column 0.
Like so:
Headword
Translation
Headword2
Translation2
Example 7.2. The dictd .index format
The corresponding .index file is built by the dictfmt tool and looks like this:
00-database-info Q QM 00-database-short Qd 3 00-database-url RV q abacus BZv 3 abbess Ban e abbey BbG b abbot Bbi a abbreviate Bb9 BB ABC SA i abdicate Bc/ q abdication Bdq q abdomen BeV BW abductor Bfs k aberration BgR x abet BhD 8 abhor BiA s
When running the dictd (or Serpento) dictionary server, these files are used for matching queries with headwords.
Somehow you have your headwords and related translations written in the simple format described above. You might need to convert a spread sheet or some other document into this format. As there are many possibilities we can not give you a description to do that.
Otherwise you may have an existing dictd dictionary file or finally you may be starting from scratch. In that case we recommend you to use a template as demonstrated later. If you have much lexicographic, etymological or other information to add to your dictionary, we strongly suggest you to use a template or a fully fledged XML editor.
Download the dict2tei.py python script
from the tools package at the FreeDict servers at Sourceforge.
Follow the instructions included in the package to install and run with your file.
All you need to do is something like: dict2tei.py -f your-dict-format.dict -o same-working-name and the rest should happen automatically.
Now hopefully all you have to do is markup any extra entries and add the TEI header information. Please see the Writing TEI and Installing TEI sections.
Sometimes a match lists headwords that yield no entry when they are looked up. In such case, it is likely that the index is sorted incorrectly. For a word to be looked up, the way the index is sorted and the way the dict server looks for entries have to be exactly the same.
In such case it can be sorted again, using a command such as:
LC_ALL=C sort -t $'\t' -k1,1 -bdf broken.index >working.index
Note the LC_ALL=C: Leaving it out can produce a
broken index.