[smc-discuss] Fwd: [aspell-devel] Aspell's Future
Santhosh Thottingal
santhosh.thottingal at gmail.com
Mon Sep 12 01:03:58 PDT 2011
---------------------------- Original Message ----------------------------
Subject: [aspell-devel] Aspell's Future
From: "Kevin Atkinson" <kevina at gnu.org>
Date: Mon, September 12, 2011 1:18 pm
To: aspell-announce at gnu.org
aspell-devel at gnu.org
--------------------------------------------------------------------------
Aspell not dead, in this post I will outline how I see Aspell moving
forward. Please circulate this post as you see fit, but keep in mind
that I still consider this document a draft so expect the occasional
spelling (yes even a spell checker doesn't catch everything) and
grammar mistakes as perhaps a few (hopefully minor) factual errors.
Please direct all discussion to aspell-devel at gnu.org, if you are not
subscribed, don't worry I will approve your posts in a timely fashion.
Please direct grammar and factual corrections directly to me at
kevina at gnu.org.
INTRO
=====
In recent years the development of Aspell has stagnated, but I have
never really lost interest in Aspell. The problem was that I just did
not know how to move forward in light of Hunspell slower taking the
role that I meant for Aspell to take. However, after giving it a lot
of thought, I have finally figured out how Aspell, and spell checking
in general on GNU/Linux and other Free Unix like operating system, can
move forward.
For a long time I had two goals for Aspell:
1. Do a superior job of suggesting possible replacements for a
misspelled word than just about any other spell checker out there
for the English language (see http://aspell.net/test/cur/, also see
http://suggest.aspell.net)
2. Become the standard system spell checker for GNU/Linux and
other Free Unix like operation systems.
I have succeeded in the first goal but, due to Hunspell slowly talking
over the role of a system spell checker [1], I'm failing on the second.
Unfortunately the fact that Hunspell is taking over as a system spell
checker make it increasing difficult for users to take advantage of
Aspell high suggestion quality. Right now Aspell is still in most
distributions, and users can still use it with many applications
(Open/LibreOffice, Firefox, Thunderbird, and Google Chrome being some
notable exceptions), but unless I do something this may no longer be
the case.
([1] See http://fedoraproject.org/wiki/Releases/FeatureDictionary and
http://wiki.ubuntu.com/ConsolidateSpellingLibs)
For a long time I thought about ways to regain Aspell status as the
standard system spell checker, but after giving it a lot of through I
have decided that this goal that is no longer worth pursuing. One of
Hunspell advantages over Aspell is that it has better support for many
languages thanks to its support for compounding and complex
morphology, so I thought that if I could add add support for these
features I could support the same set of languages that Hunspell does,
and convince Linux distributions to consider Aspell over Hunspell as
the one true spell checker; however after many years, I finally
decided that it wasn't worth it. Since Aspell was originally designed
to be able to support multiple backends I briefly considering making
Hunspell a backend for Aspell; however, as Aspell multiple backend
support has never really been tested it would probably be more trouble
than its worth, especially in light of Enchant
(http://www.abisource.com/projects/enchant/), which is a meta-spell
checker that already has working support for multiple backends.
Hence, the way forward for Aspell, and spell checking in general, is
to make Enchant the system spell checker. Using Enchant will not only
allow users to take advantage of Aspell superior suggestion quality
for the English language, it will also add proper support for the
Finnish language by being able to use Voikk
(http://voikko.sourceforge.net/) instead of Hunspell, which at the
time of this writing has poor support for the Finnish language.
In addition using a meta-spell checker as the system spell checker
will pave the way for more advance forms of spell checking than either
Aspell or Hunspell support, such as taking into account frequency or
context information.
The rest of this post will outline how I see Aspell moving forward by
making Enchant the system spell checker for GNU/Linux and other Free
Unix like operating systems.
THE WAY FORWARD FOR ASPELL
==========================
As already mentioned the way forward centers around making Enchant the
system spell checker. Here is how I see that happening:
1. Get any applications that use Hunspell directly to use Enchant.
The primary applications of concern are Firefox, Thunderbird,
LibreOffice (and maybe OpenOffice), and Google Chrome.
2. Convince the Enchant projects (and all distributions using it) to
prefer Aspell over Hunspell for the English language.
3. Convert all applications that use Aspell directly to also use
Enchant.
4. Enhance enchant so that it can better support both Aspell and
Hunspell advance features. At minimal Enchant will need to be
able to work with encoding other than UTF-8 and provide some sort
of way for applications to talk to the backend spell checker
engine directly.
5. Once Enchant is sufficiently enhanced to support Aspell features
abolish the current C ABI and instead have applications use
Aspell through Enchant. Also convert the Aspell utility to be a
more generic Enchant front-end.
6. Eventually distill Aspell so that it is nothing but a plugin for
Enchant. Encourage Hunspell and other spell checkers to go the
same route.
Step (1) is probably the most difficult as there right now is a lot of
inertia towards making Hunspell the one and only spell checker [1], I
believe this is a mistake. Hunspell is a good spell checker which
supports a lot of languages but making any one spell checker engine
_the_ only spell checker is a mistake. Different spell checkers have
different strengths and weaknesses and it does not make sense to have
one spell checker used for every language. Furthermore, right now the
Finnish language is not well supported by Hunspell, so for every
program that uses Hunspell directly, a plugin for Voikko, the Finish
spell checker needs to be written. If Enchant was used instead this
would not be an issue. In addition neither Aspell nor Hunspell is
well equipped to support languages such as Thai which don't have any
sort of separation between words [2].
([2] There is a Thai dictionary for Hunspell but it only is useful
once the words are already separated somehow, perhaps using the
zero-width space (ZWSP) marker.
http://www.thaivisa.com/forum/topic/444360-thai-in-openoffice-on-ubuntu-lucid-lynx/,
http://openoffice.org/bugzilla/show_bug.cgi?id=43583)
Step (2) is technically very easy, it is simply adding a line to the
enchant ordering file. However, right now their seams to be the
conception that Aspell is this legacy spell checker that needs to be
eventually eliminated [1] and why would anyone want to use Aspell over
Hunspell unless they have to. So, even if the changes makes it into
Enchant, I'm not sure that the change will stick as the various Linux
distributions might remove the line in their packaged version.
Furthermore, changing the default spell checker will change the
personal dictionary used, which will likely lead to confusion. Thus,
(2) should likely only be done after (1) and, furthermore, I do not
want to be the one who has to push the change; I hope others, can
eventually see Aspell advantage for the English language and want to
use it.
Step (3) will eventually happen on its own, again due to the
conception that Aspell is this legacy spell checker that needs to be
eventually eliminated. However, once (1) is done I will help push
(3).
Step (4) does not really depend on the previous steps and in fact
should happen in parallel to them. Furthermore this step is something
I am willing to do most of the work for.
For (4), by supporting other encoding other than UTF-8 I mean adding
support for conversion between UTF-8 and other encoding so that users
of the Enchant library can use what ever encoding they are storing the
document in and the backend can use what ever encoding is most
efficient. This will avoid unnecessary conversions to and from UTF-8
when neither the users of the library or the backend is using UTF-8
internally.
Step (5) and (6) are more long term goals for Aspell are are not
fundamental to the plan of making Enchant the system spell checker.
Rather they will greatly simply the Aspell library and make it easier
to maintain in the future.
THE WAY FORWARD IN GENERAL
==========================
Having all applications use Enchant will make it easy for newer and
better spell checkers to replace both Aspell and Hunspell.
Unfortunately, the current interface provided by Enchant is inadequate
for many advantaged forms of spell checking. For example, context
sensitive spell checking is impossible since words are fed in one at a
time, and sometimes in random order. Furthermore, Enchant only gives
a boolean response to the question is the word correctly spelled; when
taking into account frequency information the answer might be maybe,
as in yes it is correctly spelled, but it is not that common of a word
and most likely you meant some other similarly spelled word.
Therefore another long term goal is:
7. Enhance enchant to support more advance forms of spell checking
such as to be able to support:
* Context sensitive spell checking.
* Flagging uncommon words, but not outright marking than as
misspelled.
* Taking in to account local frequency information in the
document.
* Words with spaces in them such as "de facto".
* Languages such Thai, which do not have spaces between words [2].
I have some ideas on more advance interfaces, but they are beyond the
scope of this post.
MAKING IT HAPPEN
================
I still have limited time to work on Aspell. I am still motivated to
move forward with Aspell, but if no one else seams to care I am
unlikely to spend much effort on it. That is, I would be a lot more
motivated if I get get a sense that others would like to see Aspell
continue to be development for its technical merits. As far as what
those technical merits are that is something I will happy to discuss
in a follow-up post.
In addition, I am unlikely to move forward unless I see some movement
on the first step, that is move Hunspell only applications towards
Enchant. Unfortunately, I do not have the time to push this goal
myself. Thus others will need to convince major projects such as
Libra/OpenOffice, Firefox, and Chrome to move away from using Hunspell
directly.
So, basically, none of this is going to happen without some effort on
others.
Feedback welcome.
_______________________________________________
Aspell-devel mailing list
Aspell-devel at gnu.org
https://lists.gnu.org/mailman/listinfo/aspell-devel
More information about the discuss
mailing list