Intelligent Databases:
Easing Access to Information

 

Deborah Alpert Sleight
Educational Psychology
Michigan State University
sleightd@msu.edu

Spring 1993


Introduction

Information and its use have changed over the years. According to Richard Saul Wurman, in his book Information Anxiety, the term "information" used to mean communication that informed. Now its meaning has changed to mean any communication, whether it is understood or not.

" Data are facts; information is the meaning that human beings assign to these facts. Individual elements of data, by themselves, have little meaning; it's only when these facts are in some way put together or processed that the meaning begins to become clear." (Davis, W.S. and McCormack, A., The Information Age, quoted in Wurman, p. 38)

Information is produced more rapidly now than ever before. According to Wurman, a weekday edition of the New York Times contains more information than the average person was likely to come across in a lifetime in 17th century England.

Jobs today require more information, and more technical information, than ever before. But the people doing those jobs have little or no control over access to the information they need to do their jobs. "We are dependent on those who design information, on the news editors and producers who decide what news we will receive, and by decision makers ... who can restrict the flow of information." (Wurman, p. 34)

Information anxiety is that feeling "produced by the ever-widening gap between what we understand and what we think we should understand." (Wurman, p. 34) There are several situations that are likely to make us anxious: "not understanding information; feeling overwhelmed by the amount of information to be understood; not knowing if certain information exists; not knowing where to find information; and knowing where to find the information, but not having the key to access it. " (Wurman, p. 44)

There are two ways of reducing information anxiety: finding information, and understanding the information you have found. Intelligent databases can help us find information, and, perhaps in the future, be able to help us make sense of it.

Case Study #1

David needs to find articles and books on "information anxiety," so he goes to the library and uses the online catalog to search. He starts by entering the subject term "information anxiety." The search reveals no such subject. He then tries just "information" as the subject, and sees a listing of categories containing the word "information."

David chooses one that says "information--management" as the most likely. This category gives him a huge list of books and articles on information management. Some of these may deal with information anxiety, but he'll have to look at each abstract to find out. And since some of the items don't have abstracts, he may have to guess from the title alone.

David concludes that he needs to narrow his search. He decides to enter the keywords--"information anxiety." This search reveals 5,153 items that contain the word "information" and 3,996 items that contain the word "anxiety." After looking at some of these items, David asks the system for help; it gives him a list of commands he may use.

He decides to enter the keywords "information and anxiety." This search produces items that contain both words, but not necessarily both as one concept. So for his sixth search David enters the keywords "information adj anxiety," causing the system to display only items that contain the word "information" adjacent to the word "anxiety."

David is finally successful in narrowing his search, but now only three items are listed. He realizes he must search again, using synonyms for "information anxiety," if he can find any. His searching has taken him 45 minutes so far, and all he has are three possibly appropriate items. He also has a neck ache and a back ache from sitting hunched over the computer for that long. David decides to check out the items he found, then go home, and perhaps search again tomorrow.

Case Study #2

David's friend Leah goes to a different university, one whose library has a different online cataloging system. She is also researching "information anxiety." Leah decides to search not only her library, but also several online databases around the country.

Leah clicks on a button to indicate the database she wants to search, then types in the relevant words, in this case "information anxiety." The database shows some items that contain both "information" and "anxiety," and some that contain the words together. Leah narrows her search by telling the database that the item that contains the phrase "information anxiety" is the example to follow.

The database not only displays the appropriate items, but also shows her a list of synonyms for "information anxiety" that she may select from for further searching. There turn out to be no useful synonyms. Leah is not sure what to do now, so she asks the database for help. The expert system component of the database emulates the expertise of an information searcher, and suggests that she refine her search by asking the database to justify its selections to her.

Because the searches have turned up only a few items on the topic she wants, and some items on different topics, Leah asks the database to show her what rules, words and weightings it used to find the appropriate items. The database tells her that it used the hundred most common important words (e.g., not articles or conjunctions) in her example document and ranked them in order of occurrence. Then it compared those words with all the documents in the database, and displayed those that had a high number of matches.

Since Leah can now see where the search went wrong, she clicks on the words, weightings and rules that are incorrect, and tells the database that these are negative examples, and not to display such item. She also increases the weightings of certain phrases which define information anxiety, and tells the database to search again.

This time the search displays more items, and more appropriate items. Leah is satisfied with the items for now, and has the database print out a history of her search, so she won't have to start from zero when she searches again later. Her search has taken her 15 minutes, and has provided more possibly appropriate items than David's search.

What was the difference between David's and Leah's searches? David was using a traditional full text database, and Leah was using an intelligent database.

Current Types of Databases

There are several types of databases in use today: full text, indexed keywords and hypertext links.

Full Text Databases

Full text databases use character string searches to match search terms entered by the user. If the string does not match any string in the database exactly, the database will return a "no match" message. The user will have to try variations on the string, such as plural, singular, past tense, present tense, and so on, to see if that variation of the string is actually in the database.

Character string:

Any combination of letters and numbers.

Indexed Keywords

Some databases used indexed keywords to catalog their records. When each item is entered into the database, it is identified by keywords that the author or database manager has designated. ERIC has a large book documenting the keywords it allows. If the user knows the appropriate keywords, then it is fairly easy to search this kind of database. But if the user does not know the keywords, he or she has to start guessing, and refining the search by more guessing. Many full text databases also use indexed keywords, allowing the user to search by both keywords and character string.

Hypertext Links

Hypertext databases provide links between non-hierarchical but related types of information, very much like a corss-referenced index, but available within the text. For example, if a user was researching the Civil War, and read an article about a particular battle, he or she could select the name of that battle, and cause items about that battle to be displayed. This allows the user to perform horizontal non-linear searches.

Hypertext databases are better than full text and indexed keyword databases at allowing nebulous searches, but the user has to follow the links the database authors created, which may or may not match the links the user has in mind. For example, the user may click on the name of a person mentioned in an article on economics, expecting to learn more about that person's contribution to economics, only to be shown the person's biography.

Traditional databases are adequate for precision searches if the user has a well-defined topic, because the searcher will already know the appropriate search terms and keywords. But if the topic is ill-defined, traditional databases become inadequate, since the user will have to guess at the appropriate search terms.

Hypertext links allow people to search in a more realistic way, in a model that Bates calls "berry picking."

...at each stage, with each different conception of the query, the user may identify useful information and references. In other words, the query is satisfied not by a single, final retrieved set, but by a series of selections of individual references and bits of information at each stage of the ever-modifying search. A bit-at-a-time retrieval of this sort is here called berry picking. (Bates, pp. 56-57)

As the user finds pertinent information, he or she may find related information and start looking in that direction. The other traditional types of databases can also allow berry picking, but the user will have a more difficult time finding related information. Frequently the only clues are a few related subject categories listed at the bottom of an only distantly related topic.

What is an Intelligent Database?

Intelligent databases have artificial intelligence components that provide help with the intellectual operation of the search, have ways of representing knowledge, and are based on connectionist neural network models.

Connectionist neural network:

A model that associates new information with similar information that is already known.

How Intelligent Databases Help the User Search

Many traditional databases provide the user little help with the mechanical operation of accessing the database, i.e., knowing which keys to press or which commands to enter. For example, the MSU library's MAGIC system doesn't even describe in its help function how to log off. Apparently it was designed only for users physically in the library, where logoff is not necessary. However, remote access is provided, so the help screen should mention important commands for the remote user to know.

Some traditional databases now provide interfaces that help the user with the mechanical operation of the database. The interfaces automate some procedures so that only one key press or button click is necessary to start it. The interfaces provide menus for further ease of use. This kind of interface is called an information gateway, in that it helps the user with the mechanical operation of the database.

Mechanical operations:

The mechanical, routine procedures necessary to start and quit the database.

Information gateway:

An interface that helps the user with the mechanical operation of the database.

But the real difficulty in using a database is the actual search, called intellectual operations. Intelligent databases provide knowledge gateways, which help the user find intelligible information, that is, knowledge.

Intellectual operations:

The actual searching of the database.

Knowledge gateway:

An interface that helps the user with the actual searching of a database.

Knowledge Representation

A computer can store, manipulate and retrieve information and knowledge. What is the difference between information and knowledge? Feigenbaum and McCorduck define knowledge as "information that is 'pared, shaped, interpreted, selected, and transformed' into a domain of expertise." (quoted in McFarland and Parker, p. 58)

For a computer to store, manipulate and retrieve information and knowledge, it must store them in precise ways, such as rules, frames, semantic nets and heuristic decision trees.

Rules

Rules are used to represent strategies, recommendations, directives, and other models for problem-solving. They consist of two parts: an IF section that specifies a condition, and a THEN part that specifies action to take if the condition is met. Both parts of rules can contain objects, values and attributes.

For example, a rule for turning right at a red light might be: "If there is no traffic approaching from the driver's left, then turn right." Rules may be weighted with a confidence value, which indicates how close to being true the rule is.

Frames

Frames are used to represent related knowledge about a narrow subject, particularly subjects that cluster around objects, concepts or events. For example, nursing tends to cluster around diseases and patient care.

A frame is "...a data structure which contains data hooks or slots for all information associated with an object or event...Slots may include default values, pointers to other frames, sets of rules, or sets of procedures for attaining values." (McFarland and Parker, p. 66) A slot may contain categories such as Name, Definition, Examples, Specialization and Analogies.

Semantic Nets

Semantic nets were created as a model of associative memory in humans. Semantic nets (for networks) show relationships among things; these things may be objects, concepts or events. Things are represented in the net by a node, and the relationship between them is represented by a connecting line.

For example, one node in the net is MOON. Another node is MARS. They are connected by an arrow labeled "has." Other arrows are labeled "contains," "is a," and "is contained in." The relationship between the two nodes is "MARS has a MOON."

Semantic nets are usually hierarchical, for without some organization, the net would become tangled and difficult to decipher. For this reason, an arrow is used to indicate the hierarchy of the relationship (that is, who is the parent and who is the child). Saying "The MOON has MARS" would not make sense.

Semantic nets are easily converted into decision trees, with the nodes representing goals and the links representing decisions that result from attaining one goal, and that lead to another goal.

Heuristic Decision Trees

Decision trees are used in artificial intelligence "to show all possible consequences which can result from an initial situation." (McFarland and Parker, p. 67) If a problem is too complex, a decision tree would not be the appropriate representation, because the number of possible branches would become too numerous.

This example shows a decision tree for approving or not approving a loan. There are several possible decisions (approve, cosign, deny) to make if a loan applicant's credit rating is greater than 5.00, and only one decision (deny) if the credit rating is less than 5.00.


INFERENCE

One of the capabilities of artificial intelligence is inference, in which the computer draws conclusions from the facts and rules represented in forms the computer can use. During the inference process, the computer may derive new facts or rules.

Inference:

The process of deriving new facts and rules from known information.

" Computer programs that perform inference are called inference engines. The inference engine uses the knowledge presented and information provided to draw conclusions and make recommendations about a problem presented to the computer system." (McFarland and Parker, p. 56) The inference engine must decide which rules (contained in the knowledge base) are germane and how to use them, and which order to apply the rules. It must also decide which information (contained in the information base) to use.

Inference engines can be used with any of the knowledge presentation methods described above.

With decision trees, the inference proceeds along the various linear pathways established by the system developer: information from the database will be matched against the branch points in the decision tree, and the system will progress accordingly. In an object-oriented knowledge base (i.e., a semantic net), the pre-established patterns of inheritance within classes of objects trigger certain actions or events to occur ... Once invoked, these objects might then call specific functions or rules into action to reach a specific result or conclusion. In a production-rule system (i.e., a rule-based system), the database information will be sued to evaluate the rules themselves to see if they are either true or false. This process of testing, or "firing," rules in an optimal sequence is aided by two major inferencing techniques ... forward chaining and backward chaining. (Bielawski and Lewand, p. 33)

Inheritance:

A node in a semantic net linked to another node with an "is a" link inherits all the properties of its parent. "MARS is a PLANET" means that Mars inherits the property of being a planet.

Forward Chaining

Forward chaining is an example of deductive reasoning, that is, building a conclusion from data. When the inferencing engine is using the forward chaining technique, it compares information in the database with the IF part of the first active rule in the knowledge base. If the information matches the rule, the THEN part of the rule "fires," that is, it is added to the database about that particular problem, and the search narrows. Once a rule has been added to the database, it becomes inactive for that session. This procedure is followed until the inference engine has come to all possible conclusions.

An example of forward chaining is the computer game of ANIMAL, where the computer asks a player some questions, trying to discover which animal the player is thinking of. The inference engine asks the player a question: "Does the animal have fins?" If the player answers "no," then the engine searches for the rule that applies to fins: "IF a creature has fins, THEN it is a fish." Since the answer was "no," the THEN part of the rule does not fire.

The inference engine asks another question: "Does the animal have wings?" To this question the player answers "yes." When the engine finds the rule that applies to wings ("IF an animal has wings, THEN it is a bird."), it discovers that the THEN part of the rule does apply, so it stores that rule in its database, and searches for questions pertaining to birds. It does not ask the questions about fins or wings again, since it already knows the answers; the rules applying to fins and wings become inactive so the engine won't have to waste time searching through those rules again.

Eventually the inference engine either guesses the animal, or gives up and asks the player for the name of the animal. This data is then added to the database. In this way the knowledge base is built, and the next time the engine will be able to guess that particular animal.

Backward Chaining

If there is a lot of data involved (i.e., a lot of questions to be asked), forward chaining should be used in order to narrow the field of questions. If there is not a lot of data, then one could start with the goal and use backward chaining. In backward chaining the THEN part of the rule--the goal--is the starting point, and the IF section--the data--the ending point.

Backward chaining is an example of inductive reasoning, starting with a conclusion and trying to figure out what its components are. Referring to the Animal game described above, backward chaining would take a known animal and try to figure out the rules to describe it.


EXAMPLES OF INTELLIGENT DATABASES

DowQuest

The intelligentdatabase described in Case Study #2 above is DowQuest, a search service maintained by Dow Jones. DowQuest is an example of an intelligent database that uses words, phrases, and sample documents from the user as examples to compare to the documents in its database. First the computer asks the user for important words or phrases to create a profile of the desired information.

Then it tries to match the profile with profiles of the documents in its database. Every word in each document in its database is analyzed. After eliminating common or noise words (e.g., "the" or "if"), the hundred most frequently occurring words in the document are used to assemble a profile of the document. This profile is then compared to the profile entered by the user.

Matching is performed by tallying word occurrences, combining scores for different words (basically using an implicit Boolean "or" between query words) and then normalizing the score to take into account the size of documents and length of query. DowQuest then sorts the documents by their scores and displays headlines in this order. Documents containing more of the query words and more of each particular word will be generally higher in significance. Also, documents with phrases (e.g., George Bush or nuclear fusion), where query words occur closer together, are ranked more highly than documents in which the query words are scattered. (Weyer, p. 44)

The user looks at the documents displayed by the computer, and decides if any of them contain the information being sought. If so, the user indicates the most appropriate article, and the computer uses it as an example to improve the profile of the information the user wants. If none of the articles is appropriate, the user can add or change words in the profile, and have the computer search again.

Sometimes at least one of the documents displayed is similar enough to what the user wants that it can be used as feedback to the computer to improve the profile and narrow the search. But, as with any database, finding that first example article can take a long time.

DowQuest is a neural network model because "giving feedback via examples and then doing associative, adaptive classification is central to connectionist or neural network models..." (Weyer, p. 44)

Associative classification:

Comparing words in example documents with words in similar documents.

Topic

Another type of intelligent database is a program called Topic, from Verity, Inc. This program differs from DowQuest by using a more structured approach called building concept hierarchies. Stephen Weyer describes concept hierarchies as:

building a "topic" from other topics and patterns of words ... defining a concept or a "grammar" for a topic (e.g., "terrorist events") by figuring out components (i.e., who, what, whom, when, where, how - "attackers," "victims," "weapons"), and then the details in terms of specific words, Boolean connectives and weights or relevance factors (e.g., "Colonel Mustard" (0.8) "Molotov Cocktail" (0.8) "Ballroom" (0.2)). (Weyer, p. 45)

This approach allows the user to narrow a search manually by focusing on specific attributes.

Concept hierarchy:

Defining a concept or topic by its components and details.


FEATURES OF AN IDEAL INTELLIGENT DATABASE

An intelligent database may have some of the following features, which were gathered from various articles and books describing different systems; an ideal one would have all of them.

Feedback

An intelligent database adapts to user feedback, which may be examples of what the user is searching for, and may be examples of what the user is not searching for. The system allows the user to copy and paste text from another document as an example of what to search for. These examples are stored as profiles for searches.

Interface

The interface is composed of windows, arranged in a hierarchical order so that the user may see what previous searches were made, and which words, documents and phrases were used as search terms. A mouse or pen may be used to allow a user to select search terms by pointing to them instead of typing them.

Help

Help is provided in several ways and on several levels. If the system does not understand a query, it questions the user until both are satisfied of mutual understanding. An expert system containing the expertise of an experienced information specialist is available, to guide the user through procedures and to answer questions. The system uses an intelligent tutor to monitors the user's interactions in order to detect when help is needed. If the user starts repeating commands or search terms, the tutor will intervene with suggestions.

Selecting Search Terms

The user can reformulate or expand a query using a thesaurus or relevant words from previous searches. An online thesaurus is available to both the system and the user, although the intelligent database can allow the user to select alternate search terms instead of the system using algorithms to select alternates.

Displaying Results of a Search

After a search, the system not only displays the hits, but also the weight or relevance of each hit, so the user will know which documents the system thinks are closer to the search profile. The most relevant paragraphs in the new documents are indicated, with the query words highlighted. If the user desires it, the system can justify its results, that is, can show how it arrived at those documents as most closely matching the search profile.

PROBLEMS WITH INTELLIGENT DATABASES

There are several problems with intelligent databases. As with any database, finding that first article can be time-consuming. Computers have problems with ambiguous uses of words, and with words that have multiple meanings. Currently it is impossible to build into a knowledge base a broad sense of the world, so context is a problem.

FUTURE FEATURES OF INTELLIGENT DATABASES

Currently there are very few intelligent databases in use; most are in the research stage. Most databases provide some help with the mechanical operations of working the database, and some help with the intellectual operations, that is, the search. But it would be useful for the computer to help even before the search, in asking the right questions.

Ideally, what a database should do, and what it might be able to do in the future, is to search by content, not just by string or keyword matching. It should be able to match overnight what the user is working on with other information in the user's company or in other databases in the world. And finally, it should evaluate and synthesize the information found, in order to help the user ask the right questions, so that the search for answers might begin.

 

BIBLIOGRAPHY for INTELLIGENT DATABASES

Bates, M. (1991). The Berry-Picking Search: User Interface Design. In Interfaces for Information Retrieval and Online Systems: the State of the Art, ed. M. Dillon. Greenwood Press, New York.

Bielawski, L. & Lewand, R. (1991). Intelligent Systems Design: Integrating Expert Systems, Hypermedia, and Database Technologies. John Wiley & Sons, New York.

Dillon, M., ed. (1991). Interfaces for Information Retrieval and Online Systems: the State of the Art. Greenwood Press, New York.

Fallows, J. (1992). "Hidden Powers" in The Atlantic, May 1992, pp. 114-117.

Gibbons, H. (1990) "The Instructional Potential of AI." In CBT Directions, February 1990, Weingarten Publications, Boston.

Glossbrenner, A. (1987). How to Look It Up Online: Get the Information Edge with Your Personal Computer. St. Martin's Press, New York.

Harter, S. (1986). Online Information Retrieval: Concepts, Principles, and Techniques. Academic Press, Inc., New York.

Hawkins, D. (1988). Applications of Artificial Intelligence (AI) and Expert Systems for Online Searching. Online, Vol. 12:1, pp. 31-44.

McFarland, T. and Parker, R. (1990). Expert Systems in Education and Training. Educational Technology Publications, Englewood Cliffs, NJ.

Parsaye, K., M. Chignell, S. Khoshafian and H. Wong (1989). Intelligent Databases: Object-Oriented, Deductive Hypermedia Technologies. John Wiley and Sons, New York.

Tufte, E. (1990). Envisioning Information. Graphics Press, Cheshire, Connecticut.

Tufte, E. (1983). The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut.

Turkle, S. (1984). The Second Self. Simon and Schuster, New York.

Weyer, S. (1989). Questing for the "DAO": DowQuest and Intelligent Text Retrieval. Online, Vol. 13:5, pp. 39-49.

Wurman, R. (1989). Information Anxiety. Bantam Books, New York, New York.

Zuboff, S. (1988). In the Age of the Smart Machine. Basic Books, Inc., Publishers, New York.


(c) Deborah Alpert Sleight, 1993
Permission is given to reprint for non-profit use providing credit is given.

Deborah Alpert Sleight
Educational Psychology
Michigan State University
East Lansing, MI 48824
sleightd@msu.edu

Return