How FreeHAL Works
General
The computer application FreeHAL is the most advanced self-learning artificial intelligence available as free software (open source). The FreeHAL project was founded in 2006 by Tobias Schulz, and is focused on the further development of this artificial intelligence.The application uses semantic networks and employs pattern recognition, stemming, part-of-speech databases and Hidden Markov Models in order to best mimic human behavior in conversations. In contrast to most free, as well as commercial chatbots, FreeHAL is able to add to its own knowledge. The program expands its knowledge base through typed communication with the user. It supports the German and English languages, but at the moment an extensive database - in fact, a semantic network - exists only for German.
The Response Process
The user interacts with FreeHAL mostly through the graphical user interface. Behind the scenes FreeHAL starts two processes, one of which is the GUI, and the other is responsible for processing the user's input. All user input and any replies from FreeHAL are sent over an internal TCP/IP connection.When the user enters a statement or a question, this text is sent to the processing process, named
runner.exe. This process then goes through several stages, further described below.Simple Pattern Matching
First, we attempt to replace certain colloquial phrases with equivalent standard language expressions, to simplify FreeHAL's later processing. The database includes common typing mistakes, which are also processed at this stage.The Part of Speech Tagger
Next, each word is assigned a part-of-speech and grammatical gender. Note that that the use of gender applies primarily to German speech. This is done for some words by consulting the part of speech files, and for unrecognized words we use the Part of Speech Tagger. This module assigns unknown words to a part of speech, which is derived by applying statistics to existing text. In case of doubt, the user is asked to explicitly enter the part of speech.The Parser
The Parser is responsible for arranging the natural language text, along with the Tagger's results, according to the text's component parts: subject, predicate, object, adverbial qualification etc.After parsing, personal and possessive pronouns are assigned to their corresponding nouns, provided this is possible. As a final step during the parsing step, FreeHAL tries to derive, based on experience values and past properties, exactly what kind of response is expected. This can be a confirmation, a factual answer, an enumeration of facts, a greeting, an insult or something similar.
The Database
Using the results of the Parser, the database is now searched for facts from which we can derive answers. The database is in the form of a semantic network. This is described in more detail below, under "Semantic Network".Evaluation
If one or more facts are found that can be used to formulate a response, the final response is selected using a rating from the Evaluation module. If no answer is found, the program responds with a standard answer.Formulation of the Answer
The Formulation module transforms the fact into a German (or English) sentence. This is where it is important to note uppercase and lowercase, add grammatical articles and arrange verbs in the right order.The Semantic Network
All data items that together add up to FreeHAL's knowledge are stored as a so-called semantic web. This makes it possible to arrange nearly all of the correlations between terms and to efficiently draw conclusions from them. Theoretically, FreeHAL may traverse connections from one term to another through any number of edges in the network.[Image credit: Wikipedia]
Each term is associated with all the facts and formulated sentences that contain the term or one of its synonyms. Subordinate clauses are treated as separate sentences in the semantic network and are formally but loosely labeled as such. Sentences are treated as facts when they involve only a basic statement; they are then stored without specific wording (e.g. "A = B" instead of "an A is a B").
Each fact or sentence represents one or more links between subjects, objects, verbs, adverbial rules and question words. In addition, the association contains various classifications such as degree of truth (true, false, uncertain in various shades) and source (usually a
.pro file), and in the case of fully formulated sentences, possible subordinate clauses. Formulated sentences represent a bridge technology: at the moment FreeHAL cannot represent all its saved knowledge as non-fully formulated facts. The number of these still needed sentences is being continuously reduced as the development process progresses.Back to Documentation



