hns-sigml-convert-2011-04-12
----------------------------

For general information see the notes below for the previous
releases, dated 2010-01-19 and 2010-12-07.

This update has an improved sigmlinlib.jar, with better
HamNoSys 4 coverage.


hns-sigml-convert-2011-01-19
----------------------------

For general information see the notes below for the previous
release, dated 2010-12-07.

This update fixes the "missing initial symbol" bug in the
gsigml2hnsu.command, and has a new version of sigmlinlib.jar with
various improvements in the conversions, in both directions,
between HNS and Gestural SiGML.


hns-sigml-convert-2010-12-07
----------------------------

These HNS/SiGML conversion routines are limited in various ways,
including:

- Quite inefficient -- assuming smallish input datasets.
- Very limited error checking/handling.
- Input/output formats could usefully be generalised.
- In particular, everything is expected to be UTF-8.
  (Although the code has some support for other encodings,
  and could be adapted to handle them if necessary.)
- Except in conversion to and from gestural SiGML, there is
  no proper XML processing, just naive text processing.
- Wrapper scripts (C-shell) are provided for Unix/Mac OS X
  only -- no Windows batch scripts so far.
- Very little documentation (although there's Javadoc for
  the source code).
- So far, the HNS-processing part doesn't deal with non-manual
  HNS.

How to use -- short form
------------------------
Either put the hsc-bin/ folder on the system's execution path,
or copy its contents into a directory that is already on that
path.  The Python scripts in the hsc-py/ folder should be copied
to the directory containing the HamNoSys files being processed.
See the descriptions of these folders below for more details.

Examples of use
---------------

Assume the current directory contains a file, LSF_IdGlossHNS-tsv.txt,
containing HamNoSys data in TSV (tab separated values) format,
similar to that recently provided by Annelies for the LSF lexicon.
And assume that this directory also contains copies of the Python
scripts from hsc-py/.


mergeCols.py
  Preprocesses the TSV file, attaching the ID of each sign to its
  gloss, and putting the HNS string in the first column, with the
  result in the file LSF_HNSIdGloss-hnsu.txt .

(Alternatively, the permCols.py script could be used to rearrange
the columns of the original file, without merging the sign IDs
into their glosses.)

% hnsu2sigml.command < LSF_IdGlossHns-hnsu.txt > LSF_IdGlossHns-h.sigml
  # Converts the HNS (Unicode) file to HNS SiGML.


% sigml2gsigml.command < LSF_IdGlossHns-h.sigml > LSF_IdGlossHns.sigml
  # Converts the HNS SiGML file to gestural SiGML.

% sigmldoc2signdocs.command LSF_IdGlossHns.sigml
  # Splits the SiGML file into a set of SiGML files, one per sign, in
  # a local directory called LSF_IdGlossHns-signs/.  The name of each
  # file is an ASCII-fied form of its gloss, prefixed with its
  # index number with respect to the original sequence.

stripIndices.py
  Can be used to rename all the files generated in the previous
  example by removing their index number prefixes.

% gsigml2hnsu.command < LSF_IdGlossHns.sigml > LSF_IdGlossHns-TEST-hnsu.txt
  # Converts the gestural SiGML file to a tab-separated HNS (Unicode)
  # file.  The result should be similar, modulo the ordering of
  # some HNS symbol pairs and one or two other variations, to the
  # original HNS file.

The commands

    symhns2hnsu.command
    symhns2hnsu.command

can be used to convert between symbolic HNS and HamNoSys Unicode strings.
The command

    sigml2gsigml.command -d SIGML_DIR

will convert a directory of SiGML files to gestural SiGML
with the results in a new directory called SIGML_DIR-g .


Manifest
--------

hsc-py/
-------
This directory contains a few auxiliary Python scripts for
processing TSV (tab separated value) files:

mergeCols.py    Merges two or more columns from a TSV file into
                a single column, and strips enclosing quotes
                from individual entries.
permCols.py     Permutes the columns of a TSV file, and strips
                enclosing quotes from individual entries.
stripIndices.py Renames the SiGML files in a given directory by
                removing initial index numbers from their names.

These scripts are intended to be copied into the directory
containing the files to which they are to be applied.  They can
be invoked from the command line, or by double-clicking from the
desktop.  They obtain any necessary parameters interactively from
the console, and are short and simple enough to be easily adapted
to meet local needs.

hsc-bin/
--------
This directory contains some Java jars, including sigmlinlib.jar,
plus some (C) shell scripts, most of them acting as filters.
To run these scripts, either put this directory on the system's
execution path, or copy its contents to a directory already on
the execution path.

Each script typically deals with with a sequence of signs in one
of these formats:

hnsu    A string of HamNoSys Unicode characters.
symhns  A comma-separated list of "ham..." symbol names.
hsigml  HNS SiGML, effectively symhns wrapped up in XML tags.
gsigml  Gestural SiGML, the form used as input to Animgen.

A less important format, except maybe for testing, and for
importing old datasets, is also supported:

hns8    A string of classic HNS 8-bit manual HNS codes,
        masquerading as Unicode characters with the equivalent
        codepoints in the range [0..256).
        (This classic encoding is similar to, but not quite the
        same as that used in the old 8-bit Windows HamNoSys font).

Any non-SiGML file is expected to be TSV, with no enclosing quotes
on individual entries, with each line representing one sign, and
with the manual HNS in column 0 and the gloss in column 1.
The Python scripts in hsc-py/ can be useful in preprocessing
a TSV file so that it meets these constraints.

The Javadoc provides more details on the Java main classes directly
supporting these scripts.

hsc-source-and-doc/
------------------
This directory contains the source code and Javadoc for the Java
modules in hsc-bin/.


Ralph Elliott
re@uea.ac.uk

--
