[PDF] 10-601 Machine Learning Midterm Exam





Previous PDF Next PDF



Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong

mml-book.com. Page 8. 2. Foreword. Why Another Book on Machine Learning ... solutions understanding and debugging existing approaches



Mathematics for Machine Learning

27 Oct 2019 The general solution which captures the set of all possible solutions



Mathematics for Machine Learning

mml-book.com. Page 8. 2. Foreword. Why Another Book on Machine Learning ... solutions are rare and computational methods such as sampling (Gilks et al.



PATTERN BOOK HOMES FOR 21ST CENTURY MICHIGAN

form of these multi-family housing solutions to age- old housing needs MML PATTERN BOOK I Schematics. Page 23. This Used to be Normal: Pattern Book ...



B. Tech. (MME)

MML 474. MML 443. MML 479. MML 469. MML 477. MML 480. MML 445. MML490. (Elective VI) Preparation of standard solutions and standardization of standard ...



MML Is Not Consistent for Neyman-Scott

Using novel techniques that allow for the first time direct non-approximated analysis of SMML solutions



Toshiba Air Conditioning

What truly makes the SHRM-i one of the most flexible solutions available is its ability to provide simultaneous MML-AP0244BH-E. MMF-AP0244H-E. MMD-VN(K) ...



Jasco Solutions Book - Environmental - Rev.3.pub

The column used is a Zorbax Eclipse PAH (2.1 mm ID x50 mmL 1.8 μm). PAHs from the residue in a diesel engine was extracted for 8 hours using a soxhlet.



Handbook for Municipal Officials

Copies are available online at mml.org and hard copies are mailed upon request. electronic book (e-book)



MML is not Consistent for Neyman-Scott

Using novel techniques that allow for the first time direct non-approximated analysis of SMML solutions



Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong

of “Mathematics for Machine Learning”. Feedback: https://mml-book.com. ... is a solution of the system of linear equations i.e.



MyMathLab User Guide

book from the book store new and bundled with MML. -The “View an Example” button walks you through the solution without asking you.



Mathematics for Machine Learning

of “Mathematics for Machine Learning”. Feedback: https://mml-book.com. ... is a solution of the system of linear equations i.e.



SOLUTIONS

The finished post-test and evaluation form should be mailed back to MEDICAL MUTUAL in the postage-paid envelope included with the book. The cut-off date to 



Handbook for Municipal Officials

their own communities is the focus of the. League's placemaking efforts. Visit place- making.mml.org for comprehensive re- sources and solutions.



NEW WAVE MENTAL MATHS (BOOK E) – ANSWERS

R.I.C. Publications® www.ricpublications.com.au. 978-1-921750-06-9. NEW WAVE MENTAL MATHS TEACHERS GUIDE. NEW WAVE MENTAL MATHS (BOOK E) – ANSWERS.



MML Is Not Consistent for Neyman-Scott

Using novel techniques that allow for the first time direct non-approximated analysis of SMML solutions



10-601 Machine Learning Midterm Exam

18 ?.?. 2555 This exam is open book open notes



MML Literature Awards 2017.indd

Readers will be aspired not to put the book down.” G Noge judge of the Maskew Miller Longman Literature Awards. Page 9. PUBLISHED TITLES.



Air Conditioning for small and medium-size building

Better Air Solutions. Introduction to find the most forward-thinking solutions possible for your world. MiNi-SMMS 7 ... MML-AP0074NH1-E. MML-AP0074H1-E.



Mathematics For Machine Learning (MML) Official Solutions

the official solution manual for https://mml-book com from Mathematics For Machine Learning (MML) Official Solutions (Instructor's Solution Manual) 



Code / solutions for Mathematics for Machine Learning (MML Book)

Chapter exercises solutions · Chapter 2 Solutions: Notebook PDF · Chapter 3 Solutions: Notebook PDF · Chapter 4 Solutions: Notebook PDF · Chapter 5 Solutions: 





[PDF] Marc Peter Deisenroth A Aldo Faisal Cheng Soon Ong

This book brings the mathematical foundations of basic machine learn- ing concepts to the fore and collects the information in a single place so that this 



[P] Mathematics for Machine Learning - Sharing my solutions - Reddit

25 sept 2020 · 576 votes 65 comments Just finished studying Mathematics for Machine Learning (MML) Amazing resource for anyone teaching themselves ML



Mathematics for Machine Learning Companion webpage to the

Companion webpage to the book “Mathematics for Machine Learning” Please link to this site using https://mml-book com PDF of the book



Solution to Mathematics for Machine Learning

Chapter 2 Linear Algebra · Chapter 3 Analytic Geometry · Chapter 4 Matrix Decompositions · Chapter 5 Vector Calculus · Chapter 6 Probability and Distributions



[PDF] Mathematics for Machine Learning

Feedback: https://mml-book com We do not aim to write a classical machine learning book is a solution of the system of linear equations i e  



[PDF] DATA11002 Introduction to Machine Learning Autumn 2020

Notice that the solutions to the problems should all be quite short: don't be Learning” (MML) which is available as a pdf file from https://mml-book



ML Learning resources — Free and Open Machine Learning

Mathematics for Machine Learning https://mml-book github io/ Examples and tutorials for this book are See https://gwthomas github io/docs/math4ml pdf

:

10-601 Machine Learning, Midterm Exam

Instructors: Tom Mitchell, Ziv Bar-Joseph

Monday 22

ndOctober, 2012

There are 5 questions, for a total of 100 points.

This exam has 16 pages, make sure you have all pages before you begin. This exam is open book, open notes, butno computers or other electronic devices.

Good luck!

Name:Andrew ID:

QuestionPointsScore

Short Answers20

Comparison of ML algorithms20

Regression20

Bayes Net20

Overfitting and PAC Learning20

Total:100

1

10-601 Machine Learning Midterm Exam October 18, 2012

Question 1.Short Answers

True False Questions.

(a) [1 point] W ecan get multiple local optimum solutions if we solve a linear r egressionpr oblemby minimizing the sum of squared errors using gradient descent.

True False

Solution:

False(b)[1 point] When a decision tr eeis gr ownto full depth, it is mor elikely to fit the noise in the data.

True False

Solution:

True(c)[1 point] When the hypothesis space is richer ,over fitting is mor elikely .

True False

Solution:

True(d)[1 point] When the featur espace is lar ger,over fitting is mor elikely .

True False

Solution:

True(e)[1 point] W ecan use gradient descent to learn a Gaussian Mixtur eModel.

True False

Solution:

TrueShort Questions.

(f) [3 points] Can you r epresentthe following boolean function with a single logistic thr esholdunit

(i.e., a single unit from a neural network)? If yes, show the weights. If not, explain why not in 1-2

sentences.A B f(A,B) 1 1 0 0 0 0 1 0 1 0 1 0

Page 1 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

Solution:

Yes, you can represent this function with a single logistic threshold unit, since it is linearly separable. Here is one example.

F(A;B) = 1fAB0:5>0g(1)

Page 2 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

(g) [3 points] Suppose we cluster eda set of N data points using two dif ferentclustering algorithms: k-means and Gaussian mixtures. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in the k- means solution be assigned to the same cluster in the Gaussian mixture solution? If no, explain. If so, sketch an example or explain in 1-2 sentences.

Solution:

Yes, k-means assigns each data point to a unique cluster based on its distance to the cluster center. Gaussian mixture clustering gives soft (probabilistic) assignment to each data point. Therefore, even if cluster centers are identical in both methods, if Gaussian mixture compo- nents have large variances (components are spread around their center), points on the edges

between clusters may be given different assignments in the Gaussian mixture solution.Circle the correct answer(s).

(h) [3 points] As the number of training examples goes to infinity ,your model trained on that data will have: A. Lower variance B. Higher variance C. Same variance

Solution:

Lower variance(i)[3 points] As the number of training examples goes to infinity ,your model trained on that data

will have:

A. Lower bias B. Higher bias C. Same bias

Solution:

Same bias(j)[3 points] Suppose you ar egiven an EM algorithm that finds maximum likelihood estimates for a

model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates instead. Which step or steps do you need to modify: A. Expectation B. Maximization C. No modification necessary D. Both

Solution:

MaximizationPage 3 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

Question 2.Comparison of ML algorithms

Assume we have a set of data from patients who have visited UPMC hospital during the year 2011. A

set of features (e.g., temperature, height) have been also extracted for each patient. Our goal is to decide

whether a new visiting patient has any of diabetes, heart disease, or Alzheimer (a patient can have one

or more of these diseases). (a) [3 points] W ehave decided to use a neural network to solve this pr oblem.W ehave two choices: either to train aseparateneural network for each of the diseases or to train a single neural network with one output neuron for each disease, but with a shared hidden layer. Which method do you prefer? Justify your answer.

Solution:

1- Neural network with a shared hidden layer can capture dependencies between diseases.

It can be shown that in some cases, when there is a dependency between the output nodes, having a shared node in the hidden layer can improve the accuracy.

2- If there is no dependency between diseases (output neurons), then we would prefer to have

a separate neural network for each disease.(b)[3 points] Some patient featur esar eexpensive to collect (e.g., brain scans) wher easothers ar enot

(e.g., temperature). Therefore, we have decided to first ask our classification algorithm to predict

whether a patient has a disease, and if the classifier is 80% confident that the patient has a disease,

then we will do additional examinations to collect additional patient features In this case, which classification methods do you recommend: neural networks, decision tree, or naive Bayes? Justify your answer in one or two sentences.

Solution:

We expect students to explain how each of these learning techniques can be used to output a confidence value (any of these techniques can be modified to provide a confidence value). In addition, Naive Bayes is preferable to other cases since we can still use it for classification when the value of some of the features are unknown. We gave partial credits to those who mentioned neural network because of its non-linear de-

cision boundary, or decision tree since it gives us an interpretable answer.(c)Assume that we use a logistic r egressionlearning algorithm to train a classifier for each disease.

The classifier is trained to obtain MAP estimates for the logistic regression weightsW. Our MAP estimator optimizes the objective

W argmaxWln[P(W)Y

lP(YljXl;W)] wherelrefers to thelth training example. We adopt a Gaussian prior with zero mean for the weightsW=hw1:::wni, making the above objective equivalent to:

W argmaxWCX

iw i+X llnP(YljXl;W) NoteChere is a constant, and we re-run our learning algorithm with different values ofC. Please answer each of these true/false questions, and explain/justify your answer in no more than 2 sentences. i. [2 points] The aver agelog-pr obabilityof the training datacan never increase as we increaseC.

True FalsePage 4 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

Solution:

True. As we increaseC, we give more weight to constraining the predictor. Thus it makes our predictor less flexible to fit to training data (over constraining the predictor, makes it

unable to fit to training data).ii.[2 points] If we start with C= 0, the average log-probability oftest datawill likely decrease as

we increaseC.

True False

Solution:

False. As we increase the value ofC(starting fromC= 0), we avoid our predictor to over fit to training data and thus we expect the accuracy of our predictor to be increased on the

test data.iii.[2 points] If we start with a very lar gevalue of C, the average log-probability oftest datacan

never decrease as we increaseC.

True False

Solution:

False. Similar to the previous parts, if we over constraint the predictor (by choosing very large value ofC), then it wouldn"t be able to fit to training data and thus makes it to perform worst on the test data.Page 5 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

(d)

Decision boundary

(a) (b)

Figure 1: Labeled training set.

i. [2 points] Figur e1(a) illustrates a subset of our training data when we have only two featur es: X

1andX2. Draw the decision boundary for the logistic regression that we explained in part

(c).

Solution:

The decision boundary for logistic regression is linear. One candidate solution which clas- sifies all the data correctly is shown in Figure 1. We will accept other possible solutions since decision boundary depends on the value ofC(it is possible for the trained classifier

to miss-classify a few of the training data if we choose a large value ofC).ii.[3 points] Now assume that we add a new data point as it is shown in Figur e1(b). How does

it change the decision boundary that you drew in Figure 1(a)? Answer this by drawing both the old and the new boundary.

Solution:

We expect the decision boundary to move a little toward the new data point.(e)[3 points] Assume that we r ecordinformation of all the patients who visit UPMC every day .How-

ever, formanyofthesepatientswedon"tknowiftheyhaveanyofthediseases, canwestillimprove

the accuracy of our classifier using these data? If yes, explain how, and if no, justify your answer.

Solution:

Yes, by using EM. In the class, we showed how EM can improve the accuracy of our classifier using both labeled and unlabeled data. For more details, please look athttp://www.cs. cmu.edu/ ˜tom/10601_fall2012/slides/GrMod3_10_9_2012.pdf, page 6.Page 6 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

Question 3.Regression

Consider real-valued variablesXandY. TheYvariable is generated, conditional onX, from the fol- lowing process:

N(0;2)

Y=aX+ where everyis an independent variable, called anoiseterm, which is drawn from a Gaussian distri- bution with mean 0, and standard deviation. This is a one-feature linear regression model, wherea is the only weight parameter. The conditional probability ofYhas distributionp(YjX;a)N(aX;2), so it can be written as p(YjX;a) =1p2exp

122(YaX)2

The following questions are all about this model.

MLE estimation

(a) [3 points] Assume we have a training dataset of npairs(Xi;Yi)fori= 1::n, andis known. Which ones of the following equations correctly represent the maximum likelihood problem for estimatinga? Say yes or no to each one. More than one of them should have the answer "yes." [Solution:no]argmaxaX i1p2exp(122(YiaXi)2) [Solution:yes]argmaxaY i1p2exp(122(YiaXi)2) [Solution:no]argmaxaX iexp(122(YiaXi)2) [Solution:yes]argmaxaY iexp(122(YiaXi)2) [Solution:no]argmaxa12 X i(YiaXi)2 [Solution:yes]argmina12 X i(YiaXi)2 (b) [7 points] Derive the maximum likelihood estimate of the p arameterain terms of the training exampleXi"s andYi"s. We recommend you start with the simplest form of the problem you found above.

Solution:Page 7 of 16

10-601 Machine Learning Midterm Exam October 18, 2012

UseF(a) =12

P i(YiaXi)2and minimizeF. Then 0 = @@a 12 X i(YiaXi)2# (2) X i(YiaXi)(Xi)(3) X iaX

2iXiYi(4)

a=P iXiYiP iX2i(5) Partial credit: 1 point for writing a correct objective, 1 point for taking the derivative, 1 point for getting the chain rule correct, 1 point for a reasonable attempt at solving fora. 6 points for correct up to a sign error. Many people gotPyi=Pxias the answer, by erroneously cancellingxion top and bottom.

4 points for this answer when it is clear this cancelling caused the problem. If they explicitly

derivedPxiyi=Px2ialong the way, 6 points. If it is completely unclear wherePyi=Pxi came from, sometimes worth only 3 points (based on the partial credit rules above). Some people wrote a gradient descent rule. We intended to ask for a closed-form maximum likelihood estimate, not an algorithm to get it. (Yes, it is true that lectures never said there exists a closed-form solution for linear regression MLE. But there is. In fact, there is a closed- form solution even for multiple features, via linear algebra.) But we gave 4 points for getting the rule correct; 3 points for correct with a sign error. For gradient descent/ascent signs are tricky. If you are using the log-likelihood, thus maxi- mization, you want gradient ascent, and thus add the gradient. If instead you"re doing the minimization problem, and using gradient descent, need to subtract the gradient. Either way, it comes out toa a+P i(yiaxi)xi. Interpretation:P i(yiaxi)xiis the correlation of

data against the residual. In the case of positivex,y, if the data still correlates with the residual,

that means predictions are too low, so you want to increasea. Here is a lovely book chapter by Tufte (1974) on one-feature linear regression: http://www.edwardtufte.com/tufte/dapp/chapter3.htmlMAP estimation

Let"s put a prior ona. AssumeaN(0;2), so

p(aj) =1p2exp(122a2)

The posterior probability ofais

p(ajY1;:::Yn;X1;:::Xn;) =p(Y1;:::YnjX1;:::Xn;a)p(aj)Rquotesdbs_dbs20.pdfusesText_26
[PDF] mmm stock forecast

[PDF] mmm stock forecast 2019

[PDF] mmm stock forecast 2020

[PDF] mmm stock forecast 2025

[PDF] mmm stock forecast cnn

[PDF] mmm stock price history

[PDF] mmm stock quote

[PDF] mmm stock quote today

[PDF] mmm stock quote yahoo

[PDF] mmr vaccine

[PDF] mn 2016 election map

[PDF] mn birth certificate application

[PDF] mn immunization exemption form

[PDF] mndot

[PDF] mnemonic opcode table