How FreeHAL Works

General

The computer application FreeHAL is the most advanced self-learning artificial intelligence available as free software (open source). The FreeHAL project was founded in 2006 by Tobias Schulz, and is focused on the further development of this artificial intelligence.

The application uses semantic networks and employs pattern recognition, stemming, part-of-speech databases and Hidden Markov Models in order to best mimic human behavior in conversations. In contrast to most free, as well as commercial chatbots, FreeHAL is able to add to its own knowledge. The program expands its knowledge base through typed communication with the user. It supports the German and English languages, but at the moment an extensive database - in fact, a semantic network - exists only for German.

The Response Process

The user interacts with FreeHAL mostly through the graphical user interface. Behind the scenes FreeHAL starts two processes, one of which is the GUI, and the other is responsible for processing the user's input. All user input and any replies from FreeHAL are sent over an internal TCP/IP connection.

When the user enters a statement or a question, this text is sent to the processing process, named runner.exe. This process then goes through several stages, further described below.

Simple Pattern Matching

First, we attempt to replace certain colloquial phrases with equivalent standard language expressions, to simplify FreeHAL's later processing. The database includes common typing mistakes, which are also processed at this stage.

The Part of Speech Tagger

Next, each word is assigned a part-of-speech and grammatical gender. Note that that the use of gender applies primarily to German speech. This is done for some words by consulting the part of speech files, and for unrecognized words we use the Part of Speech Tagger. This module assigns unknown words to a part of speech, which is derived by applying statistics to existing text. In case of doubt, the user is asked to explicitly enter the part of speech.

The Parser

The Parser is responsible for arranging the natural language text, along with the Tagger's results, according to the text's component parts: subject, predicate, object, adverbial qualification etc.

After parsing, personal and possessive pronouns are assigned to their corresponding nouns, provided this is possible. As a final step during the parsing step, FreeHAL tries to derive, based on experience values and past properties, exactly what kind of response is expected. This can be a confirmation, a factual answer, an enumeration of facts, a greeting, an insult or something similar.

The Database

Using the results of the Parser, the database is now searched for facts from which we can derive answers. The database is in the form of a semantic network. This is described in more detail below, under "Semantic Network".

Evaluation

If one or more facts are found that can be used to formulate a response, the final response is selected using a rating from the Evaluation module. If no answer is found, the program responds with a standard answer.

Formulation of the Answer

The Formulation module transforms the fact into a German (or English) sentence. This is where it is important to note uppercase and lowercase, add grammatical articles and arrange verbs in the right order.

The Semantic Network

All data items that together add up to FreeHAL's knowledge are stored as a so-called semantic web. This makes it possible to arrange nearly all of the correlations between terms and to efficiently draw conclusions from them. Theoretically, FreeHAL may traverse connections from one term to another through any number of edges in the network.


[Image credit: Wikipedia]

Each term is associated with all the facts and formulated sentences that contain the term or one of its synonyms. Subordinate clauses are treated as separate sentences in the semantic network and are formally but loosely labeled as such. Sentences are treated as facts when they involve only a basic statement; they are then stored without specific wording (e.g. "A = B" instead of "an A is a B").

Each fact or sentence represents one or more links between subjects, objects, verbs, adverbial rules and question words. In addition, the association contains various classifications such as degree of truth (true, false, uncertain in various shades) and source (usually a .pro file), and in the case of fully formulated sentences, possible subordinate clauses. Formulated sentences represent a bridge technology: at the moment FreeHAL cannot represent all its saved knowledge as non-fully formulated facts. The number of these still needed sentences is being continuously reduced as the development process progresses.

Back to Documentation
    

Comments

About   Supporters   Social Networks
Copyright © 2006 - 2011
Tobias Schulz and Contributors

The FreeHAL Software is distributed under the GNU GPL v3 license.

The FreeHAL Website (freehal.org and freehal.net) is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
  Members from SETI.Germany and Planet 3DNow! are sponsoring one of our servers.

 
Elton Chung from ReactOS donates a BOINC download mirror.


I don't want to see social networks at freehal.org! Buchmessen
More information: our partners | funds | expenses and donors