www.amzi.com |
2010-12-12 |
Contents
WelcomeTo read this newsletter on the Web: Read on the Web Welcome to the next Amzi! newsletter. In the last issue we introduced our new pricing schedules, introduced the topic of Domain Specific Languages (DSLs), and had a selection from the forum. First, let me thank all those who purchased the new individual licenses, and those institutions which purchased site licenses. For those wishing to purchase licenses, either as individuals or for your educational or commercial institution, please visit the Amzi! store. You can also download the free version of Amzi! Prolog without the Logic Server there as well. On DSLs, we covered the three steps towards implenting one. 1) Knowledge representation, 2) Reasoning engine, and 3) API for embedding the DSL. In this newsletter we'll introduce the sample decision tree domain and show one way to implement the knowledge representation. The 'From the Forum' section by Chip Eastham also discusses the importance of knowledge representation for building applications in Prolog. Domain-Specific LanguagesProlog is exceptionally well suited for developing domain-specific languages (DSL). In this newsletter we will illustrate this by implementing a system that automates the decision trees used by a bank working with Amzi!. Their use of decision trees makes sense for this application, as it has often been said that decision trees are the easiest form of knowledge representation for people to understand. The decision trees for this application are used to classify financial products based on the sorts of investments they contain. They are constructed so that each node has only a yes/no exit from it. Here are two examples. The first is a simple tree that determines whether a fund can be called a fixed income fund or not. While the first tree is simple, the criteria at node #2 is relatively complex, and involves determining the values of a number of facts pertinent to the decision. As with any software development, one of the key decisions is names. For this application we need names for the decision trees and names for the facts used by and derived from the trees. We've call the first tree fixed_income_relevance. The fact that it produces a value for we call fixed_income. It's value is either true or false depending on the path taken. The second tree is more complex and representative of the rest of the trees. It represents the second step in a multistep process used to fully define and classify a fund. The trees do refer to each other, so a node in one tree might be using a fact that is determined by another. This tree is given the name bond_universe and the fact it determines is bond_universe_classification with values like money_market or emerging_markets. There are three issues with representing these trees in code. How to represent:
The NodesFrames are a common way to represent complex data structures. A frame is composed of a collection of slots, each of which has values. For the nodes of the tree, we need a frame with slots for:
(Note that for this particular bank, the decision tree nodes are all yes/no decisions, so our Domain Specific Language for them reflects that. In the future, we might want to expand the decision tree model so that it can react to a wider variety of possibilities at each node.) Frame-like structures are easy to represent in Prolog using a combination of Prolog structures and lists. The structure holds the basic frame, and a list is used for the slots. Thus, the first node of the first tree can be represented like this: (If you are new to Prolog, read Adventure in Prolog or any of the other excellent online tutorials.) dtree_node(fixed_income_relevance:1, [ criteria = ( fixed_income_percentage >= 90 ), yes = fixed_income_relevance:2, no = fixed_income_relevance:4 ]). It represents the knowledge of the first node, which is if the fixed_income_percentage is greater or equal to 90, then go to node #2, otherwise go to node #4. The predicate, dtree_node/2, is used to define the node frames. The first argument is an identifier for the node. In this case it is the name of the decision tree followed by the number of the node. The second argument of dtree_node/2 is a list of attribute = value pairs for the slots, where the attributes are criteria, yes (where to go if the criteria is true), and no (where to go if the criteria is false). The CriteriaNode 2 from the first decision tree has a much more complicated criteria. We express it's node like this: dtree_node(fixed_income_relevance:2, [ criteria = ( convertibles_percentage =< 70 and asset_backed_percentage + mortgage_backed_percentage =< 90 and equity_percentage == 0 and cash_percentage + other_percentage =< 10 ), yes = fixed_income_relevance:3, no = fixed_income_relevance:4 ]). Here we can see the second design decision, which is how to express the criteria. We've decided on a notation using fact names, comparison operators and logical ands and ors. We do have to define 'and' and 'or' as Prolog infix operators, using the same precedence and associativity as the standard ',' and ';' Prolog operators. :- op(1000, xfy, and). :- op(1100, xfy, or). This notation makes the criteria easy to read and verify because it is very close to the original specification. It is also not difficult to write Prolog code in the reasoning engine that can evaluate the criteria expressed in this manner. The final two nodes of the first decision tree are terminal, or ending nodes. There are no criteria, just simply the setting of the value of a fact. The fact name we've used for this decision tree is fixed_income. Here's the two terminal nodes setting the two choices: dtree_node(fixed_income_relevance:3, [ terminal, fact = fixed_income, value = true ]). dtree_node(fixed_income_relevance:4, [ terminal, fact = fixed_income, value = false ]). We can similarly encode the nodes for the second decision tree as well. Here, for example, is the representation of node 3 of the second tree: dtree_node(bond_universe:3, [ criteria = ( junk_emerging_market_bond_exposure_percent >= 10 or convertibles_percentage >= 20 or asset_backed_percentage + mortgage_backed_percentage =< 20 ), yes = bond_universe:8, no = bond_universe:5 ]). It illustrates the use of 'or' in the criteria. The FactsThe next issue is representing the facts. They need to by tied to questions implied in the original document. Look again at node #1 of the fixed_income_relevance tree and notice it asks for the percentage which is fixed income. We represent that fact as fixed_income_percentage, and then create a frame structure for representing a fact and it's related prompts. Here is the frame for that first fact: fact(fixed_income_percentage, [ prompt = english : `What percentage is fixed income?` ]). Note that this frame structure can also accomodate additional slots for translations of the questions into other languages. This could also be made more complex by adding slots that are used for validation of the input, but for this simple prototype we leave that part out. Similarly, we can add the rest of the required facts for the first decision tree. fact(asset_backed_percentage, [ prompt = english : `What percentage is asset backed?` ]). fact(cash_percentage, [ prompt = english : `What percentage is cash?` ]). fact(convertibles_percentage, [ prompt = english : `What percentage is convertibles?` ]). fact(equity_percentage, [ prompt = english : `What percentage is equities?` ]). fact(mortgage_backed_percentage, [ prompt = english : `What percentage is mortgage backed?` ]). fact(other_percentage, [ prompt = english : `What percentage is others?` ]). What's NextThe next issue of this newsletter will go into implementing a reasoning engine that makes use of this knowledge representation. When asked to find the value of fixed_income, it will go to the first tree, start at node #1 and work it's way towards finding a value, asking questions as needed along the way. But for now, we've got knowledge being encoded in a way that is very close to the original specification. This makes it easier to encode the knowledge, and makes the programming task of mapping the original specifications into coded structures much less error prone and more readily adapted to changing business requirements. The knowledge representation has also been constrained to fit within standard Prolog syntax with the addition of definitions of the operators 'and' and 'or'. Another option would have been to design a completely unique syntax and then use Prolog's Definite Clause Grammar capability to parse that syntax into structures such as the ones we used. One could also make a graphical front end on the knowledge representation, so that the knowledge engineer could manipulate symbols directly, and then use the Logic Server API to call Prolog for the creation of the internal representation above. For example, it could be incorporated into the Eclipse tools for application development. A plug-in could be written that works with the Amzi! Eclipse plug-in to allow graphics and this representation to work together. Dennis Merritt From the Forum
|
The Amzi! Newsletter is sent periodically to Amzi! inc. customers and subscribers.
Click here if you do not want to receive future Amzi! Newsletters:
If you wish to contact us click here: Contact Us