Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Warning

Updating the dictionaries packaged with the spellchecker is, in effect, forking the dictionary. Any future enhancements to our dictionaries (e.g. additional words added or errors fixed) could cause you to want to re-apply your updates to these future release dictionaries.

Warning

Updating the spellchecking feature (i.e. by installing updated versions of ephox-spelling.war) may overwrite your modified dictionaries. Please ensure you backup your modified dictionaries before upgrading ephox-spelling.war.

Custom dictionaries can be added to Spell Checker Pro.

Anchor
creating
creating

...

1. You'll need to identify which language/s you want to apply updates to.

You can locate each individual dictionary .jar file for each supported language in your_web_server/ephox-spelling/WEB-INF/classes/dictionaries.

For the purposes of this document and the examples contained herein, it will be assumed that you're looking to update the English (US) dictionary en_us_4_0.jar.

Once you have found this .jar file, use a file unzipping utility to extract the contents of the .jar.

2. If the file has been extracted correctly the contents should contain 2 directories: com and META-INF.

Info

If a custom dictionary has already been created for this spell checker, there will also be a file named userdic.tlx present. If this is the case and you wish to extend the existing custom dictionary then do not delete this file. Please see the Modifying an Updated Dictionary section of this document for more information.

If, however, you wish to remove the current custom dictionary, then delete this file. For more information on this then please see Removing the Modifications from a Dictionary section of this document.

3. With a plain text editor, create a file called userdic.tlx in the same directory where the contents of the .jar file were extracted. This file will be where the listings for the custom dictionary are placed.

4. At the top of the userdic.tlx file, place the following line:

 

Code Block
#LID 24941
5. For each word that you wish to place into the custom dictionary, list the word in the userdic.tlx file on its own line. Then, one tab spacing from the word, place an i character. Thus, lines in the file should appear in the following format:

 

Code Block
customword        i
6. Repeat step 5 for all the words you wish to add to the custom dictionary. When finished, save the userdic.tlx file.

7. After saving the userdic.tlx file the .jar file must be recompiled. This requires using the Java .jar command at the command line in the directory where the contents of the original .jar file were extracted.

Example
If the contents of the dictionary .jar file were extracted to c:\customdictionary and the location of the jar command is c:\java\bin\jar, then the following command would create a .jar file called en_us_4_0.jar.

 

Code Block
c:\customdictionary>c:\java\bin\jar cvf en_us_4_0.jar .
8. Move the newly compiled .jar file back to it's original location, replacing the original file. For the context of this example, you would be replacing your_web_server/ephox-spelling/WEB-INF/classes/dictionaries/en_us_4_0.jar with your updated copy of en_us_4_0.jar.

...

1. Perform steps 1 and 2 as above in the Creating New Updates to a Dictionary section of this document.

2. Open the userdic.tlx file in a plain text editor.

3. For each word that you wish to add to the custom dictionary, list the word in the userdic.tlx file on its own line. Then, one tab spacing from the word, place an i. Thus, lines in the file should appear in the following format:

 

Code Block
customword                i
4. Repeat step 3 for all the words you wish to add to the custom dictionary. When finished save the userdic.tlx file.

5. After saving the userdic.tlx file the .jar file must be recompiled. This requires using the Java jar command at the command line in the directory where the contents of the original .jar file were extracted. The name of the new spell checker .jar file is specified in this step.

Example
If the contents of the dictionary .jar file were extracted to c:\customdictionary and the location of the jar command is c:\java\bin\jar, then the following command would create a jar file called en_us_4_0.jar.

 

Code Block
c:\customdictionary>c:\java\bin\jar cvf en_us_4_0.jar .
6. Move the newly compiled .jar file back to it's original location, replacing the original file. For the context of this example, you would be replacing your_web_server/ephox-spelling/WEB-INF/classes/dictionaries/en_us_4_0.jar with your updated copy of en_us_4_0.jar.

...

1. Perform steps 1 and 2 as above in the Creating New Updates to a Dictionary section of this document.

2. Delete the userdic.tlx file from the directory where the contents of the original .jar file were extracted.

3. After deleting the userdic.tlx file the .jar file must be recompiled. This requires using the Java jar command at the command line in the directory where the contents of the original .jar file were extracted. The name of the new spell checker .jar file is specified in this step.

Example
If the contents of the dictionary .jar file were extracted to c:\customdictionary, and the location of the jar command is c:\java\bin\jar, then the following command would create a .jar file called en_us_4_0.jar.

 

Code Block
c:\customdictionary>c:\java\bin\jar cvf en_us_4_0.jar .

...

Configuring the Custom Dictionary Feature

Additional configuration to your application.conf file is required. (Don't forget to restart the Java application server after updating the configuration.

Adding the ephox.spelling.custom-dictionaries-path element activates the custom dictionary feature. It points to a directory on the server's file system that will contain custom dictionary files and should not contain anything else. It is a good idea to store these files where the application.conf file lives, i.e. if application.conf is in a directory called /opt/ephox, the dictionary files could live in a sub-directory /opt/ephox/dictionaries.

Example

 

Code Block
ephox {
  spelling {
    custom-dictionaries-path = "/opt/ephox/dictionaries"
  }
}

Anchor
creating
creating
Creating Custom Dictionary Files

One custom dictionary can be created for each language supported by the spell checker (see supported languages), as well as an additional "global" dictionary that contains words that are valid across all languages, such as trademarks.

A dictionary file for a particular language must be named with the language code of the language (see supported languages for language codes), plus the suffix .txt: E.g. en.txt, en_gb.txt, fr.txt, de.txt etc.

The "global" dictionary file for language-independent words must be called "global.txt".

The server will scan the dictionary directory as per configuration above and pick up "txt"-files for each language and the global file as present.

Anchor
removing
removing
Custom Dictionary File Format

A dictionary file must be a simple text file with:

  • one word on each line,
  • either Windows-style or Linux-style line endings (CR or CR+LF)
  • no comments or blank lines, and
  • saved in UTF-8 encoding, with or without BOM (byte-order mark).

The last point is important for files created or edited on non-Linux (Windows or Mac) systems, as these will usually encode text files differently. However, Windows or Mac editors such as Windows Notepad can optionally save files in UTF-8 if asked to do so. Please check your editor of choice for this functionality. Failure to chose the correct encoding will result in problems with non-English letters such as umlauts and accents.


NOTE for German and Finnish languages: Spell checking in German and Finnish will employ compound word spell checking. Compound words such as "Fußballtennis" will be assumed correct as long as the root words "Fußball" and "Tennis" are individually present in the dictionary. It is not necessary to add "Fußballtennis" separately.

Anchor
removing
removing
Verifying Custom Dictionary Functionality

If successfully configured, the custom dictionary feature will report dictionaries found in the application server's log at service startup.

Example

Code Block
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - Starting task (booting Ironbark)
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: [global] = 1 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "en" = 3 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "fr" = 2 words
2017-06-12 17:46:01 [main] INFO  com.ephox.ironbark.IronbarkBoot - Finished task (booting Ironbark)

 The above log shows that 3 custom dictionaries were found, one "global", language-independent one and one each for English and French. They were found to contain 1, 3 and 2 words, respectively. Please check that this report matches your expectations.

Anchor
removing
removing
Ongoing Dictionary Maintenance

Future additions/changes to dictionaries after the initial deployment will require a restart of the spell check service each time.