[PDF] Feature Selection with the CLOP Package - ClopiNet PDF TM-fextract-class.pdf

ploited in a course on feature extraction: clopinet com/isabelle/Projects/ETH examples A lot of attention was given in class to feature selection because many

The class profile provides an overview of the MAS ETH MTEC students that started in 2020 Master of Advanced Studies ETH in Management, Technology, and

[PDF] MAS ETH MTEC Class Profile 2019

Master of Advanced Studies ETH in Management, Technology, and Economics MAS MTEC Class Statistics 2019 mas-mtec ethz ch → 34 average age

[PDF] Registration at ETH Zurich, Registering for Courses and - UZH

Students must register for courses to be attended at ETH Zurich via myStudies ( www mystudies ethz ch) Course registration does not mean that you have been

[PDF] ETH Zurich - Information for Students on Exchange

time studies prior to entrance at ETH Zurich Note also the conditions set by the departments: https://www ethz ch/en/studies/non-degree-courses/exchange-and-

[PDF] Feature Selection with the CLOP Package - ClopiNet

ploited in a course on feature extraction: clopinet com/isabelle/Projects/ETH examples A lot of attention was given in class to feature selection because many

[PDF] diris a40/a41 ethernet - Socomec

[CLASS C] [CLASS D] DHCP activation (Eth DHCP) : Yes/ No The factory setting is: • IP address: 192 168 1 1 • Subnet mask: 255 255 255 0 • Gateway: 0 0 0 0

[PDF] Katja Köhler, Markus Aebi1 and Ernst Hafen2 Center for Active

Switzerland, 2Institute of Molecular Systems Biology, ETH Zürich, Switzerland Evaluation of a flipped classroom redesign of a large introductory biology class

[PDF] 643M81 Data ssheet STYLE 7 - Bodet Time

- Independent, impulse, AFNOR, DHF and NTP/Wi-Fi models: 100-240VAC - Wireless DCF model: 230VAC - NTP/ETH model: PoE (Power over Ethernet) – class

[PDF] eth email address

[PDF] eth ftp server

[PDF] eth it

[PDF] eth login

[PDF] eth login global indian school

[PDF] eth login mail

[PDF] eth login ssis

[PDF] eth logo

[PDF] eth mail login

[PDF] eth microsoft

[PDF] eth microsoft office

[PDF] eth network login

[PDF] eth sign in

[PDF] eth software drive

[PDF] eth username

Feature Selection with the CLOP Package

Isabelle Guyon

ETH ZÄurich, Switzerland

isabelle@clopinet.comJiwen Li

University of ZÄurich, Switzerland

li@i¯.unizh.ch

Theodor Mader, Patrick A. Pletscher,

Georg Schneider, Markus Uhr

ETH ZÄurich, Switzerland

March, 2006

Abstract

We used the datasets of the NIPS 2003 challenge on feature se- lection as part of the practical work of an undergraduate course on feature extraction. The students were provided with a toolkit implemented in Matlab. Part of the course requirements was that they should outperform given baseline methods. The results were beyond expectations: the student matched or exceeded the per- formance of the best challenge entries and achieved very e®ective feature selection with simple methods. We make available to the community the results of this experiment and the corresponding teaching material:http://clopinet.com/isabelle/Projects/

ETH/Feature_Selection_w_CLOP.html.

1 Introduction

In the recent years, it has been recognized by the machine learning and neural network communities that competitions are key to stimulate research and bring improvement. Large conferences are now regularly organizing competitions (e.g. KDD, CAMDA, ICDAR, TREC, ICPR, CASP, and IJCNN). In 2003, we organized a competition on the theme of feature selection, the results of which were presented at a workshop on feature extraction [12]. The outcomes of that e®ort were com- piled in a book including tutorial chapters and papers from the proceedings of that workshop [11]. The website of the challenge remains open for post-challenge sub- missions:www.nipsfsc.ecs.soton.ac.uk. Meanwhile, we have been organizing a second challenge on the theme of model selection:www.modelselect.inf.ethz.ch. As part of that e®ort, we have developed a Matlab toolkit based on the Spider package [19]. All this material constitute a great teaching resource that we have ex- ploited in a course on feature extraction:clopinet.com/isabelle/Projects/ETH. We are reporting on our teaching experience with two intentions: encouraging other teachers to use challenge platforms in their curricula, and providing to graduate students simple competitive baseline methods to at- tack problems in machine learning. The particular theme of the class is feature extraction, which we de¯ne as the combination of feature construction and feature selection. These past few years, feature extraction and space dimensionality reduction problems have drawn a lot of interest. More than a passing fancy, this trend in the research community is driven by applications: bioinformatics, chemistry (drug design, cheminformatics), text pro- cessing, pattern recognition, speech processing, and machine vision provide machine learning problems in very high dimensional spaces, but often with comparably few examples. A lot of attention was given in class to feature selection because many successful applications of machine learning have been built upon a large number of very low level features (e.g. the \bag-of-word" representation in text processing, gene expression coe±cients is cancer diagnosis, and QSAR features in cheminfor- matics). Our teaching strategy is to make students gain hands on experience by working on large real world datasets (those of the NIPS 2003 challenge [8]), rather than providing them with toy problems. We reviewed in class basic machine learning techniques (linear predictors, neural networks, kernel methods, and decision trees), all of which are included in the pro- vided software package. The course devoted to feature construction included various transforms such as the Fourier transform, and convolutional methods. We also re- viewed a number of preprocessing methods involving noise modeling. The students could experiment with such methods using theGisettedataset, a handwriting recognition task based on the MNIST data [16]. The rest of the curriculum covered feature selection methods (¯lters, wrappers, and embedded methods), information theoretic and ensemble methods. The datasets were introduced progressively in the homework to illustrate algorithms learned in class. The class requisites included making one complete challenge submission with results on the ¯ve datasets of the challenge. The students had then to present their methods and results in a poster presentation. Another requisite was to make a slide presentation of one of the chapters of the book reporting results of the challenge. In the remainder of the paper, we provide some details on the data and software used. We review and analyze the results obtained by the students and draw lessons of this experience to encourage others to use challenges as teaching resources.

2 Datasets and synopsis of the challenge

The NIPS 2003 challenge included ¯ve datasets (Table 1) from various application domains. All datasets are two-class classi¯cation problems. The data were split into three subsets: a training set, a validation set, and a test set. All three subsets were made available at the beginning of the challenge. The class labels for the validation set and the test set were withheld. The identity of the datasets and of the features (some of which were random features arti¯cially generated) were kept secret during the challenge but have been revealed since then. The datasets were chosen to span a variety of domains and di±culties (the input variables are continuous or binary, sparse or dense; one dataset has unbal- anced classes.) One dataset (Madelon) was arti¯cially constructed to illustrate a particular di±culty: selecting a feature set when no feature is informative by itself. We chose datasets that had su±ciently many examples to create a large enough test set to obtain statistically signi¯cant results [8]. To facilitate the assessment of feature selection methods, we introduced a number of arti¯cial features called probesdrawn at random from a distribution resembling that of the real features, but carrying no information about the class labels. A good feature selection algo- rithm should eliminate most of the probes. The details of data preparation can be found in a technical memorandum [8]. Table 1:NIPS 2003 challenge datasets.For each dataset we show the domain it was taken from, its type T (d=dense, s=sparse, or sb=sparse binary), the number of features#F, the percentage of probes%P, the number of examples in the training, validation, and test sets, and the fraction of examples in the positive class%[+], and the ratio number of training examples to number of featuresTr/F. All problems are two-class classi¯cation problems.

Dataset Domain T #F%P #Tr #Va #Te%[+]Tr/F

ArceneMass Spectrometry d 10430 100 100 700 44 0.01 DexterText classi¯cation s 2:10450 300 300 2000 50 0.015 DorotheaDrug discovery sb 10550 800 350 800 10 0.008 GisetteDigit recognition d 5000 50 6000 1000 6500 50 1.2

MadelonArti¯cial d 500 96 2000 600 1800 50 4

The challenge participants could submit prediction results on the validation set and get their performance results and ranking on-line during a development period. The validation set labels were then revealed and the participants could make submissions of test set predictions, after having trained on both the training and the validation set. For details on the benchmark design, see [12]. For the class, the students had access to the training and validation set labels, but not the test labels. Thus far, the test set labels have not been released to the public and we intend to leave it this way to keep an on-going benchmark. The students made post-challenge submissions to the web site of the challenge (www.nipsfsc.ecs.soton.ac.uk) to obtain their performance on the test set. The distributions of the original results of the challenge participants for the var- ious datsets are represented in Figure 1.

1The distribution is rather peaked for

Gisette, indicating that this is perhaps the easiest task. Conversely, the results are very spread out forDorothea, probably the hardest task of all.Madelon has a bimodal distribution, symptomatic of a particular di±culty, which was not overcome by all the participants. We will provide explanations in Section 6. These distributions guided us to set standards for the students accomplishments: We introduced in class the datasets in order of presumed complexity. We provided the students with baseline methods approximately in the tenth percentile of the best methods. We asked students to try to outperform the baseline method and gave them extra credit for outperforming the best challenge entry. In the remainder of the paper, we give details on the software toolbox provided and the students results.

3 Learning object package

The machine learning package we used for the class called CLOP (Challenge Learn- ing Object Package) is available from the website of the \performance predic- tion challenge"http://clopinet.com/isabelle/Projects/modelselect/Clop. zip. We have written an QuickStart guide2and FAQs are available.3We present 1 We show the performance on the validation set because we have many more validation set entries than ¯nal test set entries. The results correlate well with those on the test set.

051015202530354045500

20 40

ARCENE

051015202530354045500

20 40

DEXTER

051015202530354045500

20 40

DOROTHEA

051015202530354045500

20 40

GISETTE

051015202530354045500

20 40

MADELON

Test error (%)

Figure 1:Distribution of the challenge participant results.We show his- tograms of the balanced error rate (BER) for the ¯ve tasks. below only a high level overview of the package.

Data and algorithm objects

The methods were implemented using the interface of the Spider package developed at the Max Planck Institute for Biological Cybernetics [19]. The Spider package on top of which our package is built, uses Matlab (R) objects (The MathWorks, http://www.mathworks.com/). Two simple abstractions are used: data: Data objects include two members X and Y, X being the input matrix (patterns in lines and features in columns), Y being the target matrix (i.e. one column of +-1 for binary classi¯cation problems). algorithms: Algorithm objects representing learning machines (e.g. neural networks, kernel methods, decision trees) or preprocessors (for feature con- struction, data normalization or feature selection). They are constructed from a set of hyper-parameters and have at least two methods: train and test. The train method adjusts the parameters of the model. The test method processes data using a trained model.

For example, you can construct a data objectD:

> D = data(X, Y); The resulting object has 2 members:D.XandD.Y. Models are derived from the class algorithm. They are constructed using a set of hyperparameters provided as a cell array of strings, for instance: > hyperparam = {'h1=val1', 'h2=val2'}; > model0 = algorithm(hyperparam); In this way, hyperparameters can be provided in any order or omitted. Omitted hyperparameters take default values. To ¯nd out about the default values and allowed hyperparameter range, one can use the \default" method: > default(algorithm) The constructed modelmodel0can then be trained on dataDtrainand tested on dataDtest: > [Dout, model1] = train(model0, Dtrain); > Dout = test(model1, Dtest); The trained modelmodel1is an object identical tomodel0, except that its pa- rameters (some data members) have been updated by training. Matlab uses the convention that the object of a method is passed as ¯rst argument as a means to identify which overloaded method to call. Hence, the \correct"trainmethod for the class ofmodel0will be called. Since Matlab passes all arguments by value, model0remains unchanged. By calling the trained and untrained model with the same name, the new model can overwrite the old one. Repeatedly calling the method \train" on the same model may have di®erent e®ects depending on the model. The output data objectDoutstores the result of calling the test method on the input data inDout.X. If the algorithm is a preprocessing,Dout.Xwill be the preprocessed data matrix. If the algorithm is a classi¯er,Dout.Xwill be a vector of discriminant values. The other memberDout.Yremains unchanged. To save the model is very simple since Matlab objects know how to save themselves: > save(`filename', `modelname'); This feature is very convenient to make results reproducible, particularly in the context of a challenge.

Compound models: chains and ensembles

The Spider (with some CLOP extensions) provides ways of building more complex \compound" models from the basic algorithms with two abstractions: chain: A chain is a learning object (having a train and test method) con- structed from an array of learning objects. Each array member takes the output of the previous member and feeds its outputs to the next member. ensemble: An ensemble is a also learning object constructed from an array of learning objects. The trained learning machine performs a weighted sum of the predictions of the array members. The individual learning machines are all trained from the same input data. The voting weights are set to one by default. An interface is provided for user-de¯ned methods of learning the voting weights. Compound models behave like any other learning object: they can be trained and tested. In the following example, a chain objectcmconsists of a feature standard- ization for preprocessing followed by a neural network: > cm=chain({standardize, neural}); While a chain is a \serial" structure of models, an ensemble is a \parallel" structure.

The following command create an ensemble modelem:

> em=ensemble({neural, kridge, naive}); To create more complex compound models, models of the same class with di®erent hyperparameters or di®erent models can be combined in this way; chains can be part of ensembles or ensembles can be part of chains.

4 CLOP objects provided for the class

It is easy to be overwhelmed when starting to use a machine learning package. CLOP is an extremely simpli¯ed package, limited to a few key methods, each having just a few hyper-parameter values with good default values. This makes it suitable for teaching a machine learning class. More advanced students can venture to use other methods provided in the Spider package, on top of which CLOP is built. The CLOP modules correspond to methods having performed well in the featurequotesdbs_dbs17.pdfusesText_23

[PDF] [PDF] Feature Selection with the CLOP Package - ClopiNet

Feature Selection with the CLOP Package

Isabelle Guyon

ETH ZÄurich, Switzerland

University of ZÄurich, Switzerland

Theodor Mader, Patrick A. Pletscher,

Georg Schneider, Markus Uhr

ETH ZÄurich, Switzerland

March, 2006

Abstract

ETH/Feature_Selection_w_CLOP.html.

1 Introduction

2 Datasets and synopsis of the challenge

Dataset Domain T #F%P #Tr #Va #Te%[+]Tr/F

MadelonArti¯cial d 500 96 2000 600 1800 50 4

1The distribution is rather peaked for

3 Learning object package

051015202530354045500

ARCENE

051015202530354045500

DEXTER

051015202530354045500

DOROTHEA

051015202530354045500

GISETTE

051015202530354045500

MADELON

Test error (%)

Data and algorithm objects

For example, you can construct a data objectD:

Compound models: chains and ensembles

The following command create an ensemble modelem:

4 CLOP objects provided for the class