[This article was published in the Practical Applications of Java 1999 Conference Proceedings.
This work was supported by the U.S. Army Medical Research and Materiel Command under Contract No. DAMD17-98-D-0022 and DAMD17-93-C-3141. The views, opinions and/or findings contained in this report are those of the authors and should not be construed as an official Department of the Army position, policy or decision unless so designated by other documentation.]
Further, it would be very efficient to support computational approaches to these activities using a single, reusableknowledge representation scheme. A perfect example of this reuse of knowledge is the Army-sponsored Breast Cancer Decision Guide presented at the 1998 Practical Applications of Prolog conference. In that application, various research results were distilled into a set of rules that can be applied to customize information about breast cancer for patients and their families.
The Generic Encapsulated Knowledge Object (GEKO) is a formal structure used to capture the results of research. It contains the rules and relationships between scientific variables that are the essence of a particular piece of research. Coupled with those rules in the GEKO are the human-readable descriptions, citations and other supporting information essential for verifying and understanding knowledge used in medical applications.
Having designed the GEKO, a means was needed for allowing researches to create GEKOs that describe a piece of research. Further, it was necessary to store GEKOs in a central archive, and enable both the query and inferencing over the archive that meets the Army's two main objectives, understanding a body of research, and being able to reuse and apply the knowledge obtained by that research.
The Army Medical Knowledge Engineering System (AMKES) is the application that meets these needs. It is a classic three-tiered application, with a back-end database for storing GEKOs, client software for accessing the database, and server/network software for joining the two together.
The database is actually implemented in two layers. The GEKO format is a frame-based structure. That is, it is composed of a number of slots represented by name:value pairs. These slots can have other frames as their values and so on. The deepest layer of AMKES is a generic frame-based database implemented using Object Store's PSE/Pro.
On top of the generic frame database is a layer of database code that supports the actual structure of GEKO frames and the subframes contained withing GEKOs. It is this layer that is exposed through an interface to its users.
The interface is implemented using the Java Foundation Classes (JFC). The client talks to another interface that defines the services an AMKES client can access. The client interface is similar to the interface that defines the database services.
The client interface has two implementations. One connects to a central server over the Internet, and the other is designed for local use, in which the client interface is implemented directly with calls to the database interface.
The server handles multiple clients and manages the threads of execution for the database. Unfortunately, there has to be some knowledge of the particular database in the server implementation because RMI and Object Store's PSE/Pro each have their own ideas of which threads to run when. So a significant portion of the server is code devoted to making sure RMI's threads connect correctly with the database.
The server also manages the logging in and logging off of users, and various system functions such as backup and recovery of the database.
The primary advantage of the frame structure is the flexibility it gives for defining GEKO formats. It is not necessary to hard-wire into the code any of the slots particular to GEKOs, so the inevitable changes to GEKO format that are made as AMKES evolves are made without directly impacting any of the Java code implementing AMKES.
The frame class provides all of the methods you would expect for retrieving slots based on name, adding slots, and for slots that have multiple values (lists of other slots). Because the value in a slot is an object, slots can store any type of Java object, including the frame class defined for AMKES.
The source slot's value is a frame, with its own slots, such as 'Title' and 'Author'. The value of the 'Title' slot would be a text string, but the value of the 'Author' slot is another frame structure used to describe persons.
A person frame has slots for name, address, e-mail address, etc.
One key requirement of AMKES is the ability to share information between GEKOs. A perfect example is the 'Author' slot with its 'person' frame value. There might be many GEKOs in the archive with the same author, but within AMKES only one copy of each person frame is stored. The 'Author' slot really has just a key, indicating the actual person frame which has the information about that author.
This way, when a person changes e-mail address, that change is automatically 'known' to all the GEKOs that use that person frame.
Other major sub-frames used in AMKES are one describing citations and one describing the variables that are studied in a particular piece of research. The variable sub-frames are the heart of the system, because it is only when different research projects work with the same variables does the body of research begin to have value larger than the sum of its parts.
The schema definitions are also downloaded by the client software, and are used to drive the user interface for creating, editing and browsing GEKOs.
The frame browser has two main panels. The one on the left uses a JFC tree control that is mapped to the slots of the frame being displayed. Expanding and contracting the nodes lets the user see various levels of nesting depth in a GEKO. The right panel is used to view the contents of a slot and to edit it.
When the user wants to create a new GEKO, a blank browser is presented, with the tree control filled out according to the schema definition of a GEKO stored in the archive.
Other user interface internal frames provided include a main browser for an archive, a query browser for posing QBE queries of the archive, and an inference browser that allows the user to try out different inferences using various variables as goals in both a forward- and backward chaining mode.
In many ways JFC provides a very satisfying approach to user interface construction, but the lack of maturity in package makes us wonder, at times, about the appropriateness of using it for this project. There are user interface glitches and annoyances that have consumed more time than one would hope, and the performance of the light-weight controls is not what one would hope.
From the programmer's perspective, the JFC is satisfying to work with, and it appears in the long run, as it matures, it will definitely be the way to go for user interfaces for Java applications.
All in all we're happy with JFC, but wish it wasn't so buggy and slow.
PSE/Pro works well, but has the disadvantage that it requires a post-processor. This means the objects that are stored in PSE/Pro, and the objects that use the objects in PSE/Pro must all go through the post-processing phase.
For example, the main class used by the frame database is called KnowledgeFrame. When the interface function is called to store a KnowledgeFrame in the archive, the interface first creates an instance of the class PSEKnowledgeFrame using a constructor that takes a KnowledgeFrame as an argument. When the user wants to retrieve a KnowledgeFrame, a PSEKnowledgeFrame is retrieved instead, and its make_KnowledgeFrame() method is called to get the corresponding KnowledgeFrame.
This approach allows us to keep the code and preprocessing associated with PSE/Pro isolated within the implementation of the AMKES GEKO database interface, and shielded from the rest of the application.
But since we were using shadow objects, and sending them back and forth through RMI, the level of complexity and understanding required to know which objects were which became overwhelming. So we keep with each frame its hash key and use that for accessing the frame. This greatly simplifies the code and procedures for passing frames into and out of the archive and across the network to the client and back.
It also allowed for easy implementation of the shared objects of AMKES. When a GEKO refers to a person frame in its 'Author' slot, it just references the key of the particular frame. When the GEKO is retrieved, that key is used to retrieve the appropriate person frame as well.
This is perfect for a server-based system with multiple clients, as AMKES is. But it does require that the server go through the appropriate initialization steps when a new user logs onto the server.
Given that RMI was reported to have automatic support for HTTP tunneling through firewalls, we switched to RMI. RMI proves to be a powerful package and the networking worked just great using RMI, but our clients behind a firewall still could not access the server.
After many weeks of research and frustrating dealings with Sun technical support, we finally got to one of the RMI developers within Sun who proved to be extremely helpful.
RMI is built to automatically determine whether or not to use HTTP tunneling by attempting to contact the primary server address directly. But how those direct requests are handled varies from firewall to firewall and leads to heated debates on the appropriate TCP/IP codes to send back.
In any case, RMI only correctly identifies the existence of a firewall if the firewall behaves the way Sun's firewall product behaves. Otherwise, it simply thinks the server address is genuinely not there and doesn't bother to try HTTP tunneling.
This was all very frustrating since our clients already knew they were behind a firewall, knew they had to use a proxy host and all the addresses and ports they needed. But there was at the time no feature in RMI to allow one to force HTTP tunneling.
Eventually the Sun developer told us how to force HTTP tunneling, so that is what we did, and now our clients behind firewalls are happily tunneling to and from AMKES. But a lot of time was lost in the project during this phase.
RMI created other problems for the application with its seemingly random way of starting new threads. Given that the multi-user database is based on threads, and RMI was creating new threads willy-nilly, it meant some fairly nasty code had to be written to keep the users happily connected to the database.
The XML tags identify the frames and their slots, and within each slot the name and the value. This means the actual slot names of a GEKO are not part of the tags, but are part of the contents. The results is GEKOs and frames of various design can all be read in or written out using the same XML tag set.
The import/export facility makes it easy to backup and restore archives, and allows GEKOs to be exported for use in other applications.
The inference browser presents the user with a list of variables in an archive, and the user can provide values for variables and can select a variable as a goal. The inference browser first see what rules can be fired for the given values, and then proceeds to look for rules that can be used to find a value for the goal variable. This leads to requirements for other variables in classic expert system backward chaining, and queries to the user when no rules can be found for a variable.
The inference browser is written in Java which is a clean language for implementing this type of code.
JavaCC was used for both. JavaCC is a very slick tool in the vein of YACC and LEX that makes implementing the parsing parts of an application easy and enjoyable. All of the time we lost trying to figure out firewall bugs in RMI was made up for and more by the ease with which parsers were generated where needed for AMKES.
One of the biggest benefits of the Java is the built-in support for interfaces, which has lead to a very clean organization of the separate components of the application. The interface architecture has already enabled two major changes to the software with very little side-effect impact. These are the change from sockets to RMI for network communication, and the addition of support for a local version of AMKES that puts the database directly on the client's machine.
In both cases, the change simply required building another implementation of the involved interfaces.
All of the Java support products used, including PSE/Pro from Object Store, and RMI, JavaCC and JFC from Sun have all worked extremely well given the youth of Java, as has Java itself. Programming is more fun today than it has been in years.