понедельник, 27 декабря 2010 г.

воскресенье, 5 декабря 2010 г.

SQL and Non-SQL dictionary components

The following components are stored in relational DBs:

1. Morphology - definition of parts of speech, grammatical categories and so on.
2. Word and phrase entries (lexicon)
3. Thesaurus
4. Lemmatization engine
5. N-grams

There are dictionary parts which are not stored in SQL DB:

1. Alphabet
2. Rules for morphological and syntactic analysis, text segmentation and translation
3. Stemmer

Read more about SQL Dictionary and Persistent Dictionary ORM

Getting started with grammatical dictionary SDK

Grammatical dictionary c-style API is composed of 180+ functions, counting the wide and utf8 versions as distinct, see sol_GetEntryName as an example.

It can be difficult for developer to choose the right function. Sample programs and their source code can be helpful - read more about them.

суббота, 4 декабря 2010 г.

How to add new word entries to the dictionary

Grammatical Dictionary SDK contains all necessary means to extend the dictionary:

2. Basic Russian dictionary (more than 120,000 word entries)
3. Sample text files with word entry definitions
4. Shell script which loads the basic dictionary, parses the word entry definition file, merges it all and stores new dictionary datafiles in .../bin-linux.