[PDF] [PDF] Higgs Boson Discovery with Boosted Trees - Proceedings of

The challenge started at May 2014 and lasted for over 4 months The algorithm is implemented as a new software package called XGBoost, which offers fast 



Previous PDF Next PDF





[PDF] XGBoost: A Scalable Tree Boosting System - CINS

to-end tree boosting system called XGBoost, which is used widely by data IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014



[PDF] 32 Algorithme XGBoost - Institut des actuaires

La première méthode est l'algorithme de gradient boosting XGBoost La reserve uncertainty, 2014, British Actuarial Journal, 19(1), 168-218 [2] Robert Kroon 



[PDF] Agrégation de modèles - Institut de Mathématiques de Toulouse

(Fernandez-Delgado et al 2014)[7]) Elles participent régulièrement aux solu- algorithme dont le dernier avatar, proposé dans une librairie XGBoost (extrem



[PDF] Higgs Boson Discovery with Boosted Trees - Proceedings of

The challenge started at May 2014 and lasted for over 4 months The algorithm is implemented as a new software package called XGBoost, which offers fast 



[PDF] Apprentissage supervisé - Insee

Première édition réalisée en 2014 (sur l'année de référence 2013) ○ La seconde Gradient Boosting (XGBoost) et machines de vecteurs à support ( SVM) ○



[PDF] Accelerating the XGBoost algorithm using GPU computing - PeerJ

24 juil 2017 · algorithm within the gradient boosting library XGBoost The tree decision tree algorithms from the C4 5 family (Quinlan, 2014), designed for 



XGBoost-Based Algorithm Interpretation and - IEEE Xplore

6 fév 2019 · (3) XGBoost is the tree structure model, which doesn't need to normalize From 2013 to 2014, he was a Visiting Scholar with the Department 

[PDF] xgboost derivation

[PDF] xgboost friedman

[PDF] xgboost google scholar

[PDF] xgboost paper

[PDF] xgboost ppt

[PDF] xgboost stanford

[PDF] xhtml tags list with examples pdf

[PDF] yfc mbz jv

[PDF] youtube beauty statistics

[PDF] youtube c'est nous le grand paris

[PDF] youtube les enfoirés faire des ricochets

[PDF] yum download command in linux

[PDF] yurtdışında adres beyanı nasıl yapılır

[PDF] yy country code

[PDF] zac de paris nord villepinte 93420 villepinte

JMLR: Workshop and Conference Proceedings 42:69-80, 2015 HEPML 2014

Higgs Boson Discovery with Boosted Trees

Tianqi Chentqchen@cs.washington.edu

University of Washington

Tong Hehetongh@sfu.ca

Simon Fraser University

Editor:Glen Cowan, Cecile Germain, Isabelle Guyon, Balazs Kegl, David Rousseau

Abstract

The discovery of the Higgs boson is remarkable for its importance in modern Physics research. The next step for physicists is to discover more about the Higgs boson from the data of the Large Hadron Collider (LHC). A fundamental and challenging task is to extract the signal of Higgs boson from background noises. The machine learning technique is one important component in solving this problem. In this paper, we propose to solve the Higgs boson classication problem with a gradient boosting approach. Our model learns ensemble of boosted trees that makes careful tradeo between classication error and model complexity. Physical meaningful features are further extracted to improve the classication accuracy. Our nal solution obtained anAMSof

3.71885 on the private leaderboard, making us the top 2% in the Higgs boson challenge.

Keywords:Higgs Boson, Machine Learning, Gradient Boosting

1. Introduction

The Higgs boson is the last piece of the Standard Model of particle physics. In 2012, the

ATLAS experiment (

Aad et al.

2012
) and the CMS experiment (

Chatrchyan et al.

2012
claimed the discovery of the Higgs boson. Soon after the discovery, Peter Higgs and Francois Englert was acknowledged by the 2013 Nobel Prize in physics. The next step for physicists is to discover more about the physical process with the data from Large Hadron Collider (LHC). One aspect of the task is to distinguish the signal of Higgs boson from similar background processes. However, several challenges have been raised: The volume of data generated by LHC is huge. Every year it generates about one billion events and three petabytes of raw data. The signal of Higgs boson is extremely rare among background noises. Dierent physical processes are too complex to be deterministically analysed. Machine learning is treated as one of the promising ways to tackle these challenges. To aggregate more machine learning techniques on this task, the Higgs Boson Machine

Learning Challenge

1was held. The challenge started at May 2014 and lasted for over 4

months. There were in total 1,785 teams participated in the competition, one of the largest1.http://www.kaggle.com/c/higgs-boson

c

2015 T. Chen & T. He.

Chen He

and most active ones on the platform websitewww.kaggle.com. The competition requires competitors to build a binary classier for detection of signals from Higgs boson-related events. We take a gradient boosting approach to solve these problems. Our model learns an ensemble of boosted trees which makes careful tradeo between classication error and model complexity. The algorithm is implemented as a new software package calledXGBoost, which oers fast training speed and good accuracy. Due to the eectiveness of the toolkit, it is adopted by many participants, including the top ones. We further improve the accuracy by introducing features that are physically meaningful. Our nal solution obtained anAMS of 3.71885 on the private leaderboard, making us the top 2% in the Higgs boson challenge. The rest part of the paper is structured as follows. We discuss related works in Section 2

In Section

3 , we discuss the regularized boosted tree model. Section 4 fo cuseson details of feature engineering. Section 5 con tainsexp erimentalresults. Finall y,w econclude the paper in Section 6

2. Related Work

During the competition, various models are introduced. The baseline method is a naive Bayes classier. People working with particle physics are also using tools named Multi- boost (

Benbouzid et al.

2012
) and TMVA (

Hoecker et al.

2007
). These two boosting trees implementations are used as the ocial benchmark scores. The model won the challenge is an ensemble of neural networks

2. And the runner-up model3is based on regularized

greedy forest (

Johnson and Zhang

2014
Our approach follows the path of gradient boosting (

Friedman

2001
), which performs additive optimization in functional space. Gradient boosting has been successfully used in classication (

Friedman et al.

2000
) and learning to rank (

Burges

2010
) as well as other elds. Due to its eectiveness, there has been many software packages that implements the algorithm, including the gbm package in R (

Ridgeway

2013
) and scikit-learn (

Pedregosa

et al. 2011
). Ours diers from the traditional gradient boosting method by introducing a regularization term to penalize the complexity of the function, making the result more robust to overtting. The advantage of regularizing boosted trees is also discussed in (

Johnson

and Zhang 2014

3. Regularized Boosted Trees

3.1. Model Formalization

In the common supervised learning scenario, the data set can be represented by a set containingnpaired feature vectors and labels:D=f(xi;yi)g(jDj=n). In the context of Higgs boson classicationxi2Rdis the vector of physics properties of thei-th event, while y i2 f0;1gindicates whether it is a signal event. Because the relation between the physics features and outputyican be very complicated, a simple linear model may not capture the complicated relation between them. One possible

approach is to enhance the input by dividing the input space into separate regions and model2.https://github.com/melisgl/higgsml

3.https://github.com/TimSalimans/HiggsML

2

Higgs Boson Discovery with Boosted Trees

the probability distribution ofyiin each region. However, the problem remains to nd such regions of interest. Ideally, we would learn them from the data. Motivated by this idea, we propose to model prediction score ^yigiven the inputxiby the following functional form: ^yi=(xi) =KX k=1f k(xi); fk2 F(1) whereKis the number of functions andFis the space of functions containing the partition of the region and the score in each of them. Formally, we assumeFto be a set of regression trees. Because our model introduces functions as parameters, it allows us to nd functions f k2 Fthat t the data well when we train the model, thus nds corresponding regions automatically.

3.2. Training Objective

To learn the set of functions used in the model, we dene the followingregularizedobjective function.

L() =X

il(^yi;yi) +X k (fk) (2) wherelis a dierentiable convex loss function that measures the dierence between the prediction ^yiand the targetyi. The second term measures the complexity of the model (i.e., the feature functions) to avoid overtting. One important fact about Eq. ( 2 ) is that the objective is regularized, which means we are penalizing complicated models. With the objective function, a model employing simple and predictive functions will be selected as the best model. Because the model includes functions as parameters, we cannot directly use traditional optimization methods in Euclidean space to nd the solution. Instead, we train the model additively: at each iterationt, our proposed algorithm rst searches over the functional spaceFto nd a new functionftthat optimizes the objective function, and then adds it to the ensemble. Formally, let ^y(t) ibe the prediction ofi-th instance att-th iteration, we ndftto optimize the following objective. L (t)=nX i=1l(yi;^y(t) i) +tX i=1 (fi) nX i=1l(yi;^yi(t1)+ft(xi)) +tX i=1 (fi) This objective means that we want to add the best function to improve the model. In the general setting, the above objective is still hard to optimize. So we approximate the objective using the second order Taylor expansion. L (t)'nX i=1[l(yi;^y(t1)) +gift(xi) +12 hif2t(xi)] +tX i=1 (fi) 3

Chen He

wheregi=@^y(t1)l(yi;^y(t1)) andhi=@2 ^y(t1)l(yi;^y(t1)). We can remove the constant terms to obtain the following approximate objective at stept.

L(t)=nX

i=1[gift(xi) +12 hif2t(xi)] + (ft) (3) A general gradient boosting algorithm will iteratively add functions that optimizes Eq ( 3 ) for a number of user-specied iterations. An important dierence between our objective and some traditional gradient boosting method is the explicit regularization term which stops the model from overtting. Based on this framework, we will discuss our model in details in the following subsections.

3.3. Learning the Functions

It remains to learn the functionftin each step. In this model, we consider a specic class of functions dened by the following procedure. Firstly data is segmented intoTregions based on the inputxi, and then an independent score is assigned to each region. Formally, we dene a mappingq:Rd! f1;2;;Tgthat maps the input to the index of the region, and a vectorwof scores in each region. The function is dened by f t(x) =wq(x) This function class includes the regression tree whenqrepresents the decision tree structure onxi. We can also choose other forms ofqto apply prior knowledge to regions of interest if needed.

Furthermore, we dene the function complexity

to be the following form (ft) = T+12 TX j=1w 2j(4) This regularization term penalizes on the number of regions and the sum of squared scores of each region. Learning too many regions and assigning too large scores to leaves may cause overtting thus harm the accuracy of our model. We also introduce andas two parameters to make a balance. DeneIj=fijq(xi) =jgas the instance set of regionj. We can rewrite Eq (3) in the following way

L(t)=nX

i=1[gift(xi) +12 hif2t(xi)] + T+12 TX j=1w 2j TX j=1[(X i2Ijg i)wj+12 (X i2Ijh i+)w2j] + T(5) Now the objective is the sum ofTindependent quadratic functions of elements inw. Assume the partition of regions,q(x), is xed, the optimal weightwjof regionjcan be obtained by w j=P i2IjgiP i2Ijhi+(6) 4

Higgs Boson Discovery with Boosted Trees

The corresponding optimal objective function value is

L(t)(q) =12

T X j=1( P i2Ijgi)2P i2Ijhi++ T(7) We write the objective as a function ofqsince it depends on the structure of the mapping. Eq ( 7 ) can be used as a scoring function to score the region partition specied byq. Figure1

demonstrates a tree with xed structure. We learn the scorewon each leaf by Eq (6).Figure 1: Demonstration of a Decision Tree

Further, we apply a generic algorithm enumerates over possible structures ofqwith some heuristic and output the bestqwe found. In the next subsection, we will discuss how to do it eciently whenqare decision trees.

3.4. Tree Growing AlgorithmFigure 2: Tree Splitting

The objective function in Eq (

7 ) can be used to nd a good structure. Since there can be innitely many possible candidates of the tree structure, we apply a greedy algorithm in practice. Figure 2 is an illustration of one step of the algorithm: splitting a leaf in tot wo leaves. In each round, we greedily enumerate the features and choose to make a split on the feature that gives the maximum reduction calculated by Eq ( 7 ). More specically, let I LandIRbe the instance sets of left and right nodes andI=IL[IR, then the gain in loss 5

Chen He

Algorithm 1:Tree Split Finding with Missing ValueInput:I, instance set of current node

Input:Ik=fi2Ijxik6= missinggp

Input:d, feature dimension

gain 0 G P i2I;gi,H P i2Ihi fork= 1tomdoenumerate missing value goto right G

L 0; HL 0

forjin sorted(Ik, ascent order byxjk)doG

L GL+gj; HL HL+hj

G

R GGL; HR HHL

gain max(gain;G2LH

L++G2RH

R+G2H+)

end enumerate missing value goto left G

R 0; HR 0

forjin sorted(Ik, descent order byxjk)doG

R GR+gj; HR HR+hj

G

L GGR; HL HHR

gain max(gain;G2LH

L++G2RH

R+G2H+)

end end Output:Split and default direction with max gain reduction by introducing the split is calculated by gain=12 (P i2ILgi)2P i2ILhi++(P i2IRgi)2P i2IRhi+(P i2Igi)2P i2Ihi+# (8) There are two important factor that we need to consider when designing the tree growing algorithm: 1) handling missing values 2) controlling complexity. Data with missing values is a common problem in machine learning. One tradition way to handle missing values is to impute them with either mean or median of that feature dimension. However, this could require prior knowledge from the user and the specied imputed values may not be the best choice. We simply enhance the function class to learn the best way to handle missing values. The idea is to learn a default direction for each node and guide the instance with missing value along the default direction. The learning algorithm will enumerate the default directions to nd the better one for each node. This algorithm is shown in Algorithm 1 It can also be viewed as learning the best imputation values, since dierent imputations implicitly give default directions. Algorithm 1 is recursiv elyapplied to tree no desun til some stopping condition (e.g maximum depth) is met. Model complexity is controlled by the regularization term in Eq ( 4 ). The introduction of the complexity regularization results in the fact that the result of Eq ( 8 ) can be negative when the training loss reduction is smaller than . Optimizing the regularized objective 6

Higgs Boson Discovery with Boosted Trees

naturally corresponds to the pruning heuristic of tree structures. We use a post-pruning strategy to rstly grow the tree to maximum depth, then recursively prune all the leaf splits with negative gain. In practice, we nd this results in learning more conservative trees in the latter phase of iterations and makes the result more generalizable.

3.5. Time Complexity

quotesdbs_dbs20.pdfusesText_26