-Up to-Home/Languages
-Site Map|-Text version

Notes on a Language Database

  Fred Curtis - November 2004
These notes are some thoughts on a database to capture information on words from many languages, with an aim to provide resources for learning languages.

[Introduction] [Limitations of dictionaries] [What is a word?] [Written vs. Spoken Language] [Modelling Variations] [References]


These notes grew out of a desire to create myself some computerised tools to help me learn some new languages. Some of the areas I thought computerised tools would help me were:

I'm not a linguist (someone with a deep knowledge of the structure of languages and how they change) nor someone with a knowledge of how languages should be taught. I'm a student trying to learn several languages and I'd like reference materials that let me merely dip into the meanings of a word or wade in amongst detail as I need.

Limitations of dictionaries

What is a word?

Written vs. Spoken Language

[...] Most importantly, spoken language is primary, not written language. Indeed, only spoken language can be truly considered "language." Writing is a collection of symbols meant to represent spoken language. It is not language in and of itself. Many written languages (Spanish, Dutch, etc.), will regularly undergo orthographic reforms to reflect changes in the spoken language. This has never been done for English (the spelling of which has never been regularized in the first place), so what we use for written language is actually largely based on the spoken language of several centuries ago.

[Merriam-Webster web page on pronunciation, sampled 2005-12-11]

Modelling Variations

Different readings of 水気 すいき みずけ


In this view (ignoring, e.g. conjugated forms), a word is a grouping (fluid over time and geography) of meaning, orthographic and phonetic forms.

By tying a written (spoken) example to a spelling/word (sound/word) pair, we can record particular instances of a word (through the time/location details of the example text/speech), and effectively record shifts in meanings, spelling and pronunciation.

Refinements: sounds and spellings may be more finely divided by adding context, e.g. a particular archaic saying may provide the only extant example of a particular meaning, spelling or pronunciation; a word may take on a particular meaning only in certain contexts; c.f. slang vs. formal usages.

This model doesn't capture the strong associations between related verb and noun forms (as occur, e.g. in English and Japanese); there is no explicit representation of etymological relationships, word transfer between languages, forms unrelated by sound/orthography (e.g. humble/honorific verb forms in Japanese) - these would seem to require additional relationships and possibly anchors to particular contexts.

... to chase up: Defining Polysemous Words, by Peter Norvig


-This page
last changed:
3 Jan 2006
[Validate HTML]
-Donate free
food & land
|Feedback by email
or Web form