[PDF] Introduction to Machine Learning & Case-Based Reasoning

Several algorithms are available that can be used to construct a tree based on some data set A typical example is the ID3 algorithm proposed in [13] This is a

[PDF] An introduction to case-based reasoning - MIT Media Lab

A lawyer, for example, uses interpretive case-based reasoning when he uses a series of old cases to justify an argument in a new case But interpre- tive CBR can

[PDF] Case-Based Learning Algorithms - School of Computer Science

knowledge-lean algorithm that computes rather than encodes similarity assessments To appear in Proceedings of the 1991 DARPA Case-Based Reasoning

Effective retrieval and new indexing method for case based reasoning

Case based reasoning Adaptation guided retrieval Fuzzy sets Sphere indexing algorithm Design In this paper we try to improve the retrieval step for case

[PDF] Case-Based Reasoning - ARTIFICIAL INTELLIGENCE RESEARCH

Case-based reasoning is a recent approach to problem solving and learning algorithm that identifies and controls the execution of subtasks, and accesses

A cognitive engine based on case-based reasoning - IEEE Xplore

To increase the speed of the cognitive engine, a case-based reasoning quantum genetic algorithm (CBR-QGA) is proposed And a cognitive engine using

[PDF] Problem Solving by Case-Based Reasoning - Machine Learning Lab

11 mai 2010 · Dr Thomas Gabel --- Problem Solving by Case-Based Reasoning --- 11 05 2010 Agenda 1 Retrieval Algorithm Using kd-Trees • Algorithm

[PDF] Stratified Case-Based Reasoning: Reusing Hierarchical - IJCAI

Case-based reasoning (CBR) is a problem-solving paradigm that focuses on the retrieval, revision, and reuse of stored solutions to solve newly presented prob-

[PDF] Enhancing Case-Based Reasoning Retrieval Using Classification

that it develops a new retrieval strategy using a frequent classed tree algorithm A novel strategy, Case- Based Reasoning Using Association Rules (CBRAR) is

[PDF] molecule de l'air

[PDF] molécule d'air formule

[PDF] l'air un mélange de molécules 4ème

[PDF] pourquoi les molécules principales de l'air sont-elles appelées diazote et dioxygène

[PDF] molécule d'air définition

[PDF] diazote et dioxygene dans l'air

[PDF] raisonnement philosophique

[PDF] exemple de raisonnement

[PDF] le raisonnement inductif

[PDF] raisonnement hypothético-déductif exemple

[PDF] raisonnement par contre exemple exercices

[PDF] exercice raisonnement direct

[PDF] contre exemple math

[PDF] les types de raisonnement pdf

[PDF] modèle moléculaire du méthane

Machine Learning

(course 395)

Introduction to Machine Learning

Case-Based Reasoning

Maja Pantic

Prologue

These notes refer to the course of Machine Learning (course 395), Computing Department, Imperial College London. The goal of this syllabus is to summarize the basics of machine learning and to provide a detailed explanation of case-based reasoning.

Part 1: Introduction to Machine Learning

This chapter introduces the term "machine learning" and defines what do we mean while using this term. This is a very short summary of the work of Mitchell [8].

Part 2: Case-based Reasoning

This chapter discusses Case-Based Reasoning. This is an adaptation of the work of Pantic [10]. 2 3

1. Introduction to Machine Learning

This chapter is concerned with the term "machine-learning" and defines what do we mean while using this term. It also provides a short overview of a number of well-known learning paradigms. For a more profound discussion about different learning algorithms, theoretical results, and applications, the reader is referred to [8].

1.1 What does the term "machine learning" denote?

Machine learning is inherently a multidisciplinary field. It draws on results from research fields as diverse as: Artificial Intelligence: AI forms a theoretical and methodological basis for learning symbolic representations of concepts, learning in terms of classification and pattern recognition problems, and learning by using prior knowledge together with training data as a guideline. Bayesian methods: the Bayes' theorem forms the basis for calculating probabilities of hypotheses, the basis of the naïve Bayes classifier, and the basis of algorithms for estimating values of unobserved variables. Computational complexity theory: This theory imposes the theoretical bounds on the inherent complexity of different learning tasks measured in terms of computational effort, number of training examples, number of mistakes, etc. Control theory: This theory forms the theoretical foundation of procedures that learn to control processes in order to optimize predefined objectives and to predict the next state of the process they are controlling. Information theory: Measures of entropy and optimal codes are germane and central to the issue of delimiting optimal training sequences for encoding a hypothesis. Philosophy: Philosophical argumentations like "the simplest hypothesis is the best" underlie the reasoning process of machine learning algorithms. Psychology: The view on human reasoning and problem-solving initiated many machine learning models (e.g., see the discussion on Case-Based Reasoning in chapter 2). Neurobiology: Information processing found in biological organisms motivated Artificial Neural Network models of learning (see chapter 4 in [8]). As delimited by the definition given by Mitchell [8], a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. For example, an automated facial expression classifier which classifies facial expressions in terms of user- defined interpretation labels (e.g., [11]), improves its performance as measured by its ability to accomplish user-defined interpretations at the class of tasks involving classification of facial expressions, through experience obtained by interacting with the user on the meanings that he/she associates with different facial expressions. In general, in a well-defined learning problem, these three features must be identified (i.e. the class of tasks T, the measure of performance to be improved P, and the source of experience E). Once the learning problem is defined, the next step in designing a learning system is to delimit exactly:

the type of knowledge to be learned,

4 the representation of this target knowledge (i.e. the definition of target function to be

learned, which when utilized will produce for any instance of a new problem as input a trace of its solution as output), and

the learning mechanism to apply.

Different target knowledge (hypotheses space) representations are appropriate for learning different kinds of target functions. For each of these hypothesis representations, the corresponding learning algorithm takes advantage of a different underlying structure to organize the search through the hypotheses space. Therefore, deciding about the issues listed above involves searching a very large space of alternative approaches to determine the one that best fits the defined learning problem. In order to decide a machine learning algorithm which will perform best for the given problem and the given target function, it is useful to analyze the relationships between the size of the hypotheses space, the completeness of it, the number of training examples available, the prior knowledge held by the learner, and the confidence we can have that a hypothesis that is consistent with the training data will correctly generalize to unseen examples. Though, generally, learning is considered as one of the basic facets of intelligence, not all AI techniques are capable of learning. Expert systems are an obvious example, at least in their most common form (see chapter 2 in [9]).

1.2 The primary machine learning approaches and algorithms

Decision Trees

Decision tree learning is one of the most widely used and practical methods for inductive inference. It is a method for approximation of discrete-valued functions, in which a tree represents the learned function. A decision tree is in a nutshell a discrete value functional mapping, a classifier. Each node in the decision tree specifies a test of some attribute of the query instance, and each branch descending from that node corresponds to one of the possible values for this attribute. An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute. This process is repeated for the sub-tree rooted at the new node as long as it takes to reach the appropriate leaf node, then returning the classification associated with this leaf. Several algorithms are available that can be used to construct a tree based on some data set. A typical example is the ID3 algorithm proposed in [13]. This is a greedy search algorithm that constructs the tree recursively and chooses at each step the attribute to be tested so that the separation of the data examples is optimal. This decision-tree learning method searches a complete hypothesis space (i.e. the space of all possible decision trees) and, thus, avoids difficulties of restricted hypothesis spaces (i.e. that the target function might not be present in the hypothesis space). Its inductive bias is a preference for small trees over large trees. Experiments that compare decision-tree learning and other learning methods can be found in numerous papers, for example, in [5] and [17].

Artificial Neural Networks

Artificial Neural Networks (ANNs) provide a general, practical method for learning real- valued, discrete-valued, and vector-valued target functions from examples. Algorithms such as backpropagation use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been

5successfully applied to problems such as interpreting visual scenes, speech recognition, etc.

(see chapter 4 in [8]).

Learning set of rules

One of the most expressive and human readable representations of a learned target function is

a set of if-then rules that jointly define the function. One way to learn sets of rules is to learn a

decision tree first, then translate the tree into an equivalent set of rules; one rule for each leaf node in the tree. A quite successful method for converting the learned tree into a set of rules is a technique called rule post pruning used by the C4.5 algorithm [13], which represents an extension of the original ID3 algorithm. Another way to convert a tree into a set of rules is to apply a sequential covering algorithm for learning sets of rules based upon the strategy of learning one rule, removing the data it covers and then iterating this process. To elaborate, given a LSR (learn-single-rule) subroutine, invoke it on all the available training examples, remove any positive examples covered by the rule it learns, then invoke it again to learn a second rule based on the remaining training examples. Thus, a sequential covering algorithm sequentially learns a set of (disjunctive) rules that together cover the full set of positive examples. Because this algorithm carries out a greedy search, so it formulizes a sequence of rules without backtracking, the smallest or best set of rules that cover the training examples is not necessarily found. A prototypical sequential covering algorithm is the general-to-specific beam search which searches through the space of possible rules maintaining k best candidates, then generates descendents for each of these k best candidates, and again reduces the resulting set to k most promising members. This algorithm has been used by the CN2 program [4]. Many variations on this approach have been explored [8].

Inductive Logic Programming

The previous subsection discussed algorithms for learning sets of propositional (i.e. variable- free) rules. This subsection is considered with learning rules that contain variables, in particular, learning first-order Horn theories. Inductive learning of first-order rules is also referred to as Inductive Logic Programming (ILP), because this process can be viewed as automatically inferring PROLOG 1 programs from examples. A variety of algorithms has been proposed for learning first-order rules. A typical example is FOIL, which is an extension of the sequential covering algorithms to first-order representations. Another approach to inductive logic programming is inverse deduction, which is based upon the simple observation that induction is just the inverse of deduction. In other words, the problem of induction is to find a hypothesis h that satisfies the constraint (?x i , f(x i )?D) (B ? h ? x i ) Ń f(x i ), where B is general background information, x ...x n are descriptions of the instances in the training data D, f(x )...f(x n ) are the target values of the training instances, and expression Z Ń C is read "C follows deductively from Z". A prototypical algorithm based upon inverse deduction principle is CIGOL, introduced by Muggleton and Buntine in 1988, which uses the inverse resolution, an operator that is the inverse of the deductive resolution operator, introduced by Robinson in 1965, and commonly used for mechanical theorem proving.

Instance-Based Learning

In contrast to learning methods that construct a general, explicit description of the target function when training examples are provided, instance-based learning methods simply store 1

PROLOG is a general purpose, Turing-equivalent programming language in which programs are expressed as

collections of Horn clauses.

6the training examples. Generalizing beyond these examples is postponed until a new instance

must be classified: given a new instance, its relations to the already stored examples are examined in order to assign a target function value (the classification) for the new instance. Due to this property, instance-based learning methods are also called lazy learning methods, as opposed to the eager learning methods represented by all other learning algorithms discussed in this section. Examples of instance-based learning include nearest-neighbor learning and locally weighted regression methods. Instance-based learning also includes case- based reasoning methods that use more complex, symbolic representations for instances. An overview of the topic can be found in [8]. A survey of methods for locally weighted regression is given in [3]. Chapter 2 of this syllabus provides a detailed discussion on case- based reasoning. A key advantage of instance-based learning as a delayed, or lazy, learning method is that instead of estimating the target function once for the entire instance space, these methods can estimate it locally and differently for each new instance to be classified. Yet, these methods are at a disadvantage because of their computation and memory/storage requirements.

Genetic Algorithms

Genetic Algorithms (GA) are optimization techniques providing an approach to learning that is based loosely on simulated evolution. One thing that distinguishes GA from other optimization algorithms is that GA simultaneously work on large sets (populations) of possible solutions. The search for an appropriate hypothesis begins with a population of initial hypotheses. Members of the current population give rise to the next generation population by means of operations such as random mutation and crossover, which are patterned after biological evolution processes. At each iteration, the hypotheses in the current population are evaluated relative to a given measure of fitness and the most fit members of the population are selected to produce new offspring that replace the least fit members of the population. To elaborate, the learning task of GA is to find the optimal hypothesis according to the predefined fitness function. Evolution-based computational approaches have been explored since the early days of computer science. Evolutionary programming as a method for finite-state machine evolution has been developed by Folgel and colleagues in 1966. Genetic algorithms have been first introduced by Holland in 1962. An overview of the subject can be found in [8] and [9]. GA are especially suited to tasks in which hypotheses are complex (e.g. sets of rules for robot control, sets of optimal routes, etc.) and in which the objective to be optimized may be an indirect function of the hypotheses. A variant of GA is genetic programming, in which the hypotheses being manipulated are computer programs rather than bit strings 2 . Genetic programming has been demonstrated to learn programs for tasks such as simulated robot control and recognizing objects in visual scenes (including human facial expressions).

Reinforcement Learning

Reinforcement learning addresses the question of how an autonomous agent (see syllabus on Intelligent Agents and Multi Agent Systems), which senses and acts in its environment, can learn to choose optimal actions to accomplish its goals. This generic problem covers learning tasks such as to control CAM tools and robots, to optimize operations in factories, to search Internet, to play board games, etc. In a nutshell, reinforcement learning is reward hunting. Namely, each time a given agent performs an action in its environment, a trainer may provide a reward or penalty to indicate the desirability of the resulting state; the goal of the agent is to learn an action policy that maximizes the total reward it will receive from any starting state. 2

Though hypotheses may be represented by symbolic expressions or even computer programs, they are usually

described by bit strings. The interpretation of these bit strings depends on the actual application.

7The reinforcement learning methodology fits a problem setting known as a Markov decision

process, in which the outcome of applying any action to any state depends only on this action and this state as opposed to being dependent on preceding actions or states. A prototypical reinforcement learning algorithm is Q-learning, in which the agent learns the evaluation function Q(s, a) representing the maximum expected, cumulative reward the agent can achieve by applying action a to state s. Watkins introduced Q learning in 1989 with the goal of acquiring optimal policies when the reward and action transition functions are unknown. Recently a survey of the related methods has been is given by Sutton and Barto [18].

1.3 Vantages and disadvantages of machine learning

The major vantage of a learning system is its ability to adapt to a changing environment. Of course, the existing machine-learning techniques are still far from enabling computers to learn nearly as well as people. Yet algorithms have been invented that are effective for certain types of learning tasks. In the late 90s, a formalized theoretical foundation of learning was established [8], and many practical computer programs have been developed to enable different types of learning. Machine learning algorithms have proven to be of great practical value, especially in: Data mining problems concerning large databases that may contain valuable implicit regularities that can be discovered automatically (for an overview of this topic, the reader is referred to the special issue on Intelligent Information Retrieval of the IEEE Intelligent Systems and Their Applications, vol. 14, no. 4, pp. 30-70). Poorly understood domains where humans might not have well-established, generic knowledge needed to develop effective algorithms (e.g. in learning to play games or in learning to interpret human facial expressions in terms of emotions). Domains where the program must dynamically adapt to changing conditions (e.g. see the special issue on Self-adaptive Software of the IEEE Intelligent Systems and Theirquotesdbs_dbs8.pdfusesText_14

[PDF] [PDF] Introduction to Machine Learning & Case-Based Reasoning - iBUG