<?xml version='1.0' encoding='utf-8' ?>
<!--
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
               "/usr/share/sgml/docbook/dtd/xml/4.2/docbookx.dtd" [
<!ENTITY fdictUrl "http://www.freedict.org">
<!ENTITY fdictMail "freedict-beta@lists.sourceforge.net">
<!ENTITY MichEmail "bunk@imn.htwk-leipzig.de">
<!ENTITY PeteEmail "petergozz@users.sf.net">
]>
-->
<chapter id="dictd">
  <title>The Dictd Approach</title>
  <abstract>
    <para>This section explains how to build most of the TEI format dictionary
      from a file formatted the same way as a dictd compliant .dict file.
      Beware that the .index file is ignored!</para>
    <para>It relies on you laying out your dictionary with a very simple
      format. You will still have to construct the TEI header, but most of the
      work will be done for the actual content of the dictionary.</para>
  </abstract>

  <section id="dictd-format">
    <title>The Dictd Database Format</title> <para>If you have any dictionaries
      in dictd database format installed, you may open one of the
      <filename>dictionary-name.dict.dz</filename> files to have a look at the
      format and contents. You will need the tool <command>dictunzip</command>
      that comes with dictd or <command>gunzip</command> to uncompress a .dz
      file. The dictzip compression extends the gzip compression with special
      data, so the uncompression can be done by gzip, where the header data 
      is discarded.</para>
    <para>Beyond the dictd header section you will notice that the file is a
      text file with a simple and predictable format.</para>
    <para>When a dictd dictionary is built using <command>dictfmt</command>,
      two files are created. The <filename>dictionary-name.dict</filename>
      file, the one we are interested in here, contains the data that is
      presented to the user when she asks for the translation or definition of
      a word. The second file, <filename>dictionary-name.index</filename>, is a
      listing of the position and length of the definitions in the .dict file.
      Together they form an indexed database of headwords and definitions.
      <indexterm><primary>headword</primary></indexterm>
      <indexterm><primary>definition</primary></indexterm>
      <indexterm><primary>translation</primary></indexterm></para>

    <para>Here is a <emphasis>&lt;comment/&gt;</emphasis> commented snippet
	from the <filename>freedict-eng-lat.dict</filename> file.</para>
    <para><example id="eng-lat">
      <title>A freedict-eng-lat.dict snippet</title>
      <screen>
	  00-database-info     <lineannotation>&lt;comment/&gt; A formatted string dictd knows about</lineannotation>
   3. Apr. 2000 Database was converted to TEI format and checked
   into CVS 9.Jan.2000Phonetics added (H.Ey) - machine generated
   from MBRODICT( http://tcts.fpms.ac.be/synthesis/mbrdico
   )1.Jan 2000This Database was generated from ergane
   (http://www.travlang.com).- Thanks!Copyright (C) 1999 Horst
   Eyermann (Horst@freedict.de)This program is free software;
   you can redistribute it and/or modify it under the terms of
   the GNU General Public License as published bathe Free
   Software Foundation; either version 2 of the License, or(at
   your option) any later version. This program is distributed
   in the hope that it will be useful,but WITHOUT ANY WARRANTY;
   without even the implied warranty of MERCHANTABILITY or
   FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
   License for more details.You should have received a copy of
   the GNU General Public License along with this program; if
   not, write to the Free Software Foundation, Inc., 59 Temple
   Place - Suite 330, Boston, MA 02111-1307, USA.
<lineannotation>&lt;comment/&gt; A space (not TAB) indented info block</lineannotation>

00-database-short
   English-Latin Freedict Dictionary
<lineannotation>&lt;comment/&gt; The dictionary name, usually different than the file name</lineannotation>
00-database-url
   http://www.freedict.org
   <lineannotation>&lt;comment/&gt; The website of the origin of this dictionary</lineannotation>

ABC /eibiːsiː/
     abecedarium

Abyssinian /əbisiniən/
     &#195;thiops

Academy /əkædəmiː/
     Academia

Achaea /ətʃiə/
     Achaia

Achaia  /ətʃiə/
     Achaia

Acheron /ətʃerən/
     Acheron
     
Actium  /ækʃaim/
     Actium

Adam  /ædəm/
     Adam

Adriatic  /ədriætik/
     Hadria

Adriatic Sea  /ədriætiksiə/
     Hadria

Aeneas  /əniːz/
     &#195;neas

Aeolus  /iːələs/
     &#195;olus

	  <lineannotation>&lt;comment/&gt; Snipped</lineannotation>

zither /ziðər/
     cithara

zone /zoun/
     zona

	  <lineannotation>&lt;comment/&gt; Notice the empty lines between entries</lineannotation>
	</screen>
	<para>So, an entry has this format: Blank line above. Headword
          starts on the beginning of the line (column 0), the translation
          starts on the next line that is indented more than column 0.</para>

	<para>Like so:
	  <screen>
Headword
    Translation

Headword2
    Translation2
	  </screen>
	</para>
      </example>
    </para>

    <para>
      <example id="eng-lat-idx-eg">
	<title>The dictd .index format</title>
	<para>The corresponding .index file is built by the
	  <application>dictfmt</application> tool and looks like this:</para>
	<para id="eng-lat-index">
	  <screen>
00-database-info        Q       QM
00-database-short       Qd      3
00-database-url RV      q
abacus  BZv     3
abbess  Ban     e
abbey   BbG     b
abbot   Bbi     a
abbreviate      Bb9     BB
ABC     SA      i
abdicate        Bc/     q
abdication      Bdq     q
abdomen BeV     BW
abductor        Bfs     k
aberration      BgR     x
abet    BhD     8
abhor   BiA     s</screen>
	</para>
      </example>
    </para>
    <para>When running the dictd (or Serpento) dictionary server, these files
      are used for matching queries with headwords.
      <indexterm>
	<primary>database</primary>
	<secondary>dictd</secondary>
      </indexterm>
      <indexterm>
	<primary>00-database-info</primary>
      </indexterm>
      <indexterm>
	<primary>00-database-short </primary>
      </indexterm>
      <indexterm>
	<primary>00-database-url</primary>
      </indexterm>
    </para>
  </section>

  <section id="howto-process">
    <title>Converting dictd database format files into TEI</title>
    <para>Somehow you have your headwords and related translations written in
      the <link linkend="eng-lat">simple format</link> described above. You
      might need to convert a spread sheet or some other document into this
      format. As there are many possibilities we can not give you a description
      to do that.</para>
    <para>Otherwise you may have an existing dictd dictionary file
      or finally you may be starting from scratch. In that
      case we recommend you to use a template as demonstrated later. If you
      have much lexicographic, etymological or other information to add to
      your dictionary, we strongly suggest you to use a template or a fully
      fledged XML editor.</para>
    <para>Download the <filename>dict2tei.py</filename> python script
      from the tools package at the FreeDict servers at Sourceforge.</para>
    <para>Follow the instructions included in the package to install and
      run with your file.</para>
    <para>All you need to do is something like: <command>dict2tei.py
	-f your-dict-format.dict -o same-working-name</command> and
      the rest should happen automatically.</para>
    <para>Now hopefully all you have to do is markup any extra
      entries and add the TEI header information. Please see the
      <link linkend="write-tei">Writing TEI</link> and
      <link linkend="installTeiDTD">Installing TEI</link> sections.</para>
  </section>

  <section>
    <title>Sorting a .index file</title>
    <para>Sometimes a match lists headwords that yield no entry when they are
      looked up. In such case, it is likely that the index is sorted
      incorrectly. For a word to be looked up, the way the index is sorted and
      the way the dict server looks for entries have to be exactly the
      same.</para>
    <para>In such case it can be sorted again, using a command such as:</para>
    <para><command>LC_ALL=C sort -t $'\t' -k1,1 -bdf broken.index
	>working.index</command></para>
    <para>Note the <literal>LC_ALL=C</literal>: Leaving it out can produce a
      broken index.</para>
  </section>

</chapter>


