The challenge started at May 2014 and lasted for over 4 months The algorithm is implemented as a new software package called XGBoost, which offers fast
Previous PDF | Next PDF |
[PDF] XGBoost: A Scalable Tree Boosting System - CINS
to-end tree boosting system called XGBoost, which is used widely by data IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014
[PDF] 32 Algorithme XGBoost - Institut des actuaires
La première méthode est l'algorithme de gradient boosting XGBoost La reserve uncertainty, 2014, British Actuarial Journal, 19(1), 168-218 [2] Robert Kroon
[PDF] Agrégation de modèles - Institut de Mathématiques de Toulouse
(Fernandez-Delgado et al 2014)[7]) Elles participent régulièrement aux solu- algorithme dont le dernier avatar, proposé dans une librairie XGBoost (extrem
[PDF] Higgs Boson Discovery with Boosted Trees - Proceedings of
The challenge started at May 2014 and lasted for over 4 months The algorithm is implemented as a new software package called XGBoost, which offers fast
[PDF] Apprentissage supervisé - Insee
Première édition réalisée en 2014 (sur l'année de référence 2013) ○ La seconde Gradient Boosting (XGBoost) et machines de vecteurs à support ( SVM) ○
[PDF] Accelerating the XGBoost algorithm using GPU computing - PeerJ
24 juil 2017 · algorithm within the gradient boosting library XGBoost The tree decision tree algorithms from the C4 5 family (Quinlan, 2014), designed for
XGBoost-Based Algorithm Interpretation and - IEEE Xplore
6 fév 2019 · (3) XGBoost is the tree structure model, which doesn't need to normalize From 2013 to 2014, he was a Visiting Scholar with the Department
[PDF] xgboost friedman
[PDF] xgboost google scholar
[PDF] xgboost paper
[PDF] xgboost ppt
[PDF] xgboost stanford
[PDF] xhtml tags list with examples pdf
[PDF] yfc mbz jv
[PDF] youtube beauty statistics
[PDF] youtube c'est nous le grand paris
[PDF] youtube les enfoirés faire des ricochets
[PDF] yum download command in linux
[PDF] yurtdışında adres beyanı nasıl yapılır
[PDF] yy country code
[PDF] zac de paris nord villepinte 93420 villepinte
JMLR: Workshop and Conference Proceedings 42:69-80, 2015 HEPML 2014
Higgs Boson Discovery with Boosted Trees
Tianqi Chentqchen@cs.washington.edu
University of Washington
Tong Hehetongh@sfu.ca
Simon Fraser University
Editor:Glen Cowan, Cecile Germain, Isabelle Guyon, Balazs Kegl, David RousseauAbstract
The discovery of the Higgs boson is remarkable for its importance in modern Physics research. The next step for physicists is to discover more about the Higgs boson from the data of the Large Hadron Collider (LHC). A fundamental and challenging task is to extract the signal of Higgs boson from background noises. The machine learning technique is one important component in solving this problem. In this paper, we propose to solve the Higgs boson classication problem with a gradient boosting approach. Our model learns ensemble of boosted trees that makes careful tradeo between classication error and model complexity. Physical meaningful features are further extracted to improve the classication accuracy. Our nal solution obtained anAMSof3.71885 on the private leaderboard, making us the top 2% in the Higgs boson challenge.
Keywords:Higgs Boson, Machine Learning, Gradient Boosting1. Introduction
The Higgs boson is the last piece of the Standard Model of particle physics. In 2012, theATLAS experiment (
Aad et al.
2012) and the CMS experiment (
Chatrchyan et al.
2012claimed the discovery of the Higgs boson. Soon after the discovery, Peter Higgs and Francois Englert was acknowledged by the 2013 Nobel Prize in physics. The next step for physicists is to discover more about the physical process with the data from Large Hadron Collider (LHC). One aspect of the task is to distinguish the signal of Higgs boson from similar background processes. However, several challenges have been raised: The volume of data generated by LHC is huge. Every year it generates about one billion events and three petabytes of raw data. The signal of Higgs boson is extremely rare among background noises. Dierent physical processes are too complex to be deterministically analysed. Machine learning is treated as one of the promising ways to tackle these challenges. To aggregate more machine learning techniques on this task, the Higgs Boson Machine
Learning Challenge
1was held. The challenge started at May 2014 and lasted for over 4
months. There were in total 1,785 teams participated in the competition, one of the largest1.http://www.kaggle.com/c/higgs-boson
c2015 T. Chen & T. He.
Chen He
and most active ones on the platform websitewww.kaggle.com. The competition requires competitors to build a binary classier for detection of signals from Higgs boson-related events. We take a gradient boosting approach to solve these problems. Our model learns an ensemble of boosted trees which makes careful tradeo between classication error and model complexity. The algorithm is implemented as a new software package calledXGBoost, which oers fast training speed and good accuracy. Due to the eectiveness of the toolkit, it is adopted by many participants, including the top ones. We further improve the accuracy by introducing features that are physically meaningful. Our nal solution obtained anAMS of 3.71885 on the private leaderboard, making us the top 2% in the Higgs boson challenge. The rest part of the paper is structured as follows. We discuss related works in Section 2In Section
3 , we discuss the regularized boosted tree model. Section 4 fo cuseson details of feature engineering. Section 5 con tainsexp erimentalresults. Finall y,w econclude the paper in Section 62. Related Work
During the competition, various models are introduced. The baseline method is a naive Bayes classier. People working with particle physics are also using tools named Multi- boost (Benbouzid et al.
2012) and TMVA (
Hoecker et al.
2007). These two boosting trees implementations are used as the ocial benchmark scores. The model won the challenge is an ensemble of neural networks
2. And the runner-up model3is based on regularized
greedy forest (Johnson and Zhang
2014Our approach follows the path of gradient boosting (
Friedman
2001), which performs additive optimization in functional space. Gradient boosting has been successfully used in classication (
Friedman et al.
2000) and learning to rank (
Burges
2010) as well as other elds. Due to its eectiveness, there has been many software packages that implements the algorithm, including the gbm package in R (
Ridgeway
2013) and scikit-learn (
Pedregosa
et al. 2011). Ours diers from the traditional gradient boosting method by introducing a regularization term to penalize the complexity of the function, making the result more robust to overtting. The advantage of regularizing boosted trees is also discussed in (
Johnson
and Zhang 20143. Regularized Boosted Trees
3.1. Model Formalization
In the common supervised learning scenario, the data set can be represented by a set containingnpaired feature vectors and labels:D=f(xi;yi)g(jDj=n). In the context of Higgs boson classicationxi2Rdis the vector of physics properties of thei-th event, while y i2 f0;1gindicates whether it is a signal event. Because the relation between the physics features and outputyican be very complicated, a simple linear model may not capture the complicated relation between them. One possibleapproach is to enhance the input by dividing the input space into separate regions and model2.https://github.com/melisgl/higgsml
3.https://github.com/TimSalimans/HiggsML
2Higgs Boson Discovery with Boosted Trees
the probability distribution ofyiin each region. However, the problem remains to nd such regions of interest. Ideally, we would learn them from the data. Motivated by this idea, we propose to model prediction score ^yigiven the inputxiby the following functional form: ^yi=(xi) =KX k=1f k(xi); fk2 F(1) whereKis the number of functions andFis the space of functions containing the partition of the region and the score in each of them. Formally, we assumeFto be a set of regression trees. Because our model introduces functions as parameters, it allows us to nd functions f k2 Fthat t the data well when we train the model, thus nds corresponding regions automatically.3.2. Training Objective
To learn the set of functions used in the model, we dene the followingregularizedobjective function.L() =X
il(^yi;yi) +X k (fk) (2) wherelis a dierentiable convex loss function that measures the dierence between the prediction ^yiand the targetyi. The second term measures the complexity of the model (i.e., the feature functions) to avoid overtting. One important fact about Eq. ( 2 ) is that the objective is regularized, which means we are penalizing complicated models. With the objective function, a model employing simple and predictive functions will be selected as the best model. Because the model includes functions as parameters, we cannot directly use traditional optimization methods in Euclidean space to nd the solution. Instead, we train the model additively: at each iterationt, our proposed algorithm rst searches over the functional spaceFto nd a new functionftthat optimizes the objective function, and then adds it to the ensemble. Formally, let ^y(t) ibe the prediction ofi-th instance att-th iteration, we ndftto optimize the following objective. L (t)=nX i=1l(yi;^y(t) i) +tX i=1 (fi) nX i=1l(yi;^yi(t1)+ft(xi)) +tX i=1 (fi) This objective means that we want to add the best function to improve the model. In the general setting, the above objective is still hard to optimize. So we approximate the objective using the second order Taylor expansion. L (t)'nX i=1[l(yi;^y(t1)) +gift(xi) +12 hif2t(xi)] +tX i=1 (fi) 3Chen He
wheregi=@^y(t1)l(yi;^y(t1)) andhi=@2 ^y(t1)l(yi;^y(t1)). We can remove the constant terms to obtain the following approximate objective at stept.L(t)=nX
i=1[gift(xi) +12 hif2t(xi)] + (ft) (3) A general gradient boosting algorithm will iteratively add functions that optimizes Eq ( 3 ) for a number of user-specied iterations. An important dierence between our objective and some traditional gradient boosting method is the explicit regularization term which stops the model from overtting. Based on this framework, we will discuss our model in details in the following subsections.3.3. Learning the Functions
It remains to learn the functionftin each step. In this model, we consider a specic class of functions dened by the following procedure. Firstly data is segmented intoTregions based on the inputxi, and then an independent score is assigned to each region. Formally, we dene a mappingq:Rd! f1;2;;Tgthat maps the input to the index of the region, and a vectorwof scores in each region. The function is dened by f t(x) =wq(x) This function class includes the regression tree whenqrepresents the decision tree structure onxi. We can also choose other forms ofqto apply prior knowledge to regions of interest if needed.Furthermore, we dene the function complexity
to be the following form (ft) = T+12 TX j=1w 2j(4) This regularization term penalizes on the number of regions and the sum of squared scores of each region. Learning too many regions and assigning too large scores to leaves may cause overtting thus harm the accuracy of our model. We also introduce andas two parameters to make a balance. DeneIj=fijq(xi) =jgas the instance set of regionj. We can rewrite Eq (3) in the following wayL(t)=nX
i=1[gift(xi) +12 hif2t(xi)] + T+12 TX j=1w 2j TX j=1[(X i2Ijg i)wj+12 (X i2Ijh i+)w2j] + T(5) Now the objective is the sum ofTindependent quadratic functions of elements inw. Assume the partition of regions,q(x), is xed, the optimal weightwjof regionjcan be obtained by w j=P i2IjgiP i2Ijhi+(6) 4Higgs Boson Discovery with Boosted Trees
The corresponding optimal objective function value isL(t)(q) =12
T X j=1( P i2Ijgi)2P i2Ijhi++ T(7) We write the objective as a function ofqsince it depends on the structure of the mapping. Eq ( 7 ) can be used as a scoring function to score the region partition specied byq. Figure1demonstrates a tree with xed structure. We learn the scorewon each leaf by Eq (6).Figure 1: Demonstration of a Decision Tree
Further, we apply a generic algorithm enumerates over possible structures ofqwith some heuristic and output the bestqwe found. In the next subsection, we will discuss how to do it eciently whenqare decision trees.