Sebastian Raschka STAT 479: Machine Learning FS 2018 7 https://pandas pydata McKinney, Wes "Data structures for statistical computing in python "
Previous PDF | Next PDF |
[PDF] Python Machine Learning - Second Edition - HDip Data Analytics
Sebastian Raschka, the author of the bestselling book, Python Machine Did you know that Packt offers eBook versions of every book published, with PDF
[PDF] Python Machine Learning - Sebastian Raschka Vahid Mirjalili Bok
Python Machine Learning - Sebastian Raschka Vahid Mirjalili boken PDF Unlock modern machine learning and deep learning techniques with Python by using
[PDF] Python: Deeper Insights into Machine Learning - [Home] [Articles
Sebastian Raschka David Julian Module 1, Python Machine Learning, discusses the essential machine algorithms for classification and pdf After successfully installing Anaconda, we can install new Python packages using the following
[PDF] Deep Learning
Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Python Machine Learning - Sebastian Raschka General programming, preferably Python 3
[PDF] Data Preprocessing and Machine Learning with Scikit-Learn
Sebastian Raschka STAT 479: Machine Learning FS 2018 7 https://pandas pydata McKinney, Wes "Data structures for statistical computing in python "
[PDF] What are Machine Learning and Deep Learning? - Sebastian Raschka
History of neural networks and what makes deep learning different from “classic machine learning” Machine Learning https://homes cs washington edu/~ pedrod/papers/cacm12 pdf ) (Machine Learning) Main Scientific Python Libraries
[PDF] python machine learning sebastian raschka pdf github
[PDF] python mcq online test
[PDF] python midterm exam pdf
[PDF] python mini projects with database
[PDF] python mit pdf
[PDF] python mysql connector
[PDF] python numpy partial differential equation
[PDF] python oop
[PDF] python oop exercises with solutions
[PDF] python oracle database programming examples pdf
[PDF] python oracle database programming pdf
[PDF] python pdfminer python3
[PDF] python physics examples
[PDF] python pour les nuls
Sebastian Raschka STAT 479: Machine Learning FS 2018Data Preprocessing and
Machine Learning with Scikit-Learn
(Computational Foundations Part 3/3)Lecture 051STAT 479: Machine Learning, Fall 2018Sebastian Raschka
Sebastian Raschka STAT 479: Machine Learning FS 20182 Sebastian Raschka STAT 479: Machine Learning FS 2018!3Labels Raw DataTraining DatasetTest DatasetLabelsNew DataLabels
Learning
Algorithm
PreprocessingLearningEvaluationPredictionFinal ModelFeature Extraction and Scaling
Feature Selection
Dimensionality Reduction
Sampling
Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization
Sebastian Raschka STAT 479: Machine Learning FS 20184Reading a Dataset from a Tabular Text File
Sebastian Raschka STAT 479: Machine Learning FS 20185Iris-VersicolorIris-VirginicaIris-SetosaFisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).
Sebastian Raschka STAT 479: Machine Learning FS 20186Sebastian Raschka STAT 479: Machine Learning FS 20187https://pandas.pydata.orgMcKinney, Wes. "Data structures for statistical computing in python." Proceedings of the 9th Python in Science Conference. Vol. 445. 2010.
Sebastian Raschka STAT 479: Machine Learning FS 20188https://pandas.pydata.org Sebastian Raschka STAT 479: Machine Learning FS 20189Basic Data Handling Sebastian Raschka STAT 479: Machine Learning FS 201810 Sebastian Raschka STAT 479: Machine Learning FS 201811 Sebastian Raschka STAT 479: Machine Learning FS 201812 Sebastian Raschka STAT 479: Machine Learning FS 201813 Sebastian Raschka STAT 479: Machine Learning FS 201814 Sebastian Raschka STAT 479: Machine Learning FS 201815 Sebastian Raschka STAT 479: Machine Learning FS 201816Sebastian Raschka STAT 479: Machine Learning FS 201817Raschka, Sebastian. "MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack."
The Journal of Open Source Software 3.24 (2018).http://rasbt.github.io/mlxtend/MLXTEND Sebastian Raschka STAT 479: Machine Learning FS 201818 Sebastian Raschka STAT 479: Machine Learning FS 201819 Sebastian Raschka STAT 479: Machine Learning FS 201820 Sebastian Raschka STAT 479: Machine Learning FS 201821 Sebastian Raschka STAT 479: Machine Learning FS 201822 Sebastian Raschka STAT 479: Machine Learning FS 201823Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201824Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201825Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201826Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201827 Sebastian Raschka STAT 479: Machine Learning FS 201828Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201829Python Classes Sebastian Raschka STAT 479: Machine Learning FS 201830 Sebastian Raschka STAT 479: Machine Learning FS 201831Sebastian Raschka STAT 479: Machine Learning FS 201832http://scikit-learn.orgPedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python."
Journal of machine learning research 12.Oct (2011): 2825-2830. Sebastian Raschka STAT 479: Machine Learning FS 201833Sebastian Raschka STAT 479: Machine Learning FS 201834Training DataModelTraining LabelsPredicted labelsTest Dataest.fit(X_train, y_train)est.predict(X_test)①②
Scikit-learn Estimator API
Sebastian Raschka STAT 479: Machine Learning FS 201835Sebastian Raschka STAT 479: Machine Learning FS 2018!361.3ResubstitutionV alidationand theHoldoutMethod
Theholdoutmethod isinarguably thesimplestmodel evaluation technique;itcanbe summarizedas follows.First,wetake alabeleddataset andsplitit intotwoparts: Atrainingand atestset. Then,we fitamodel tothetraining dataand predictthelabels ofthetest set.Thefraction ofcorrectpredictions, whichcanbe computedbycomparing thepredictedlabels totheground truthlabelsof thetest set, constitutesourestimate ofthemodel' sprediction accuracy. Here,it isimportanttonotethat we donotw anttotrain andevaluatea modelonthe sametrainingdataset (thisiscalled resubstitution validationorresubstitutionevaluation),sinceit would typicallyintroducea veryoptimisticbiasdue toov erfitting.Inotherwords,wecannottell whetherthemodel simplymemorized thetrainingdata, orwhetherit generalizeswellto new ,unseendata. (Onasi denote,we canestimatethisso-called optimismbiasasthedif ferencebetweenthe trainingandtestaccuracy.) Typically,thesplittingofadatasetinto trainingandtest setsisa simpleprocess ofrandomsubsam- pling.We assumethatalldatapoints have beendrawn fromthesame probabilitydistribution (with respecttoeach class).Andwe randomlychoose 2/3ofthese samplesforthe trainingsetand 1/3 ofthesamples forthetest set.Note thatthereare twoproblems withthisapproach, whichwewill discussinthe nextsections.1.4Stratification
Wehaveto keepinmindthatadatasetrepresentsa randomsampledra wnfromaprobability distribution,andwetypicallyassume thatthis sampleisrepresentati veof thetruepopulation -more orless.No w,further subsamplingwithoutreplacementaltersthestatistic(mean,proportion, and variance)ofthesample.The degreeto whichsubsamplingwithout replacementaff ectsthe statisticof asamplei sinv erselyproportionaltothesizeof thesample.Letushavea lookatan exampleusing theIrisdataset 1 ,whichwe randomlydivide into2/3 trainingdataand 1/3testdataasillustrated in Figure1.(The sourcecodefor generatingthis graphicisa vailableon GitHub 2All samples (n = 150)Training samples (n = 100)Test samples (n = 50)Figure1:Distrib utionof Irisflowerclassesuponrandomsubsampling intotraining andtestsets.
1 2 6Issues with Subsampling
Sebastian Raschka STAT 479: Machine Learning FS 2018!37Stratified SplitSebastian Raschka STAT 479: Machine Learning FS 2018Normalization: Min-Max Scaling!38x
[i] norm x [i] !x min x max !x minSebastian Raschka STAT 479: Machine Learning FS 2018Normalization: Min-Max Scaling!39x
[i] norm x [i] !x min x max !x minSebastian Raschka STAT 479: Machine Learning FS 2018Normalization: Standardization!40x
[i] std x [i] x xSebastian Raschka STAT 479: Machine Learning FS 2018Normalization: Standardization!41x
[i] std x [i] x xSebastian Raschka STAT 479: Machine Learning FS 2018Normalization: Standardization!42
Sebastian Raschka STAT 479: Machine Learning FS 2018Sample vs Population Standard Deviation!43s
x 1 n!1 i=1 n (x [i] !¯x) 2 x 1 n i=1 n (x [i] x 2Sebastian Raschka STAT 479: Machine Learning FS 2018Sample vs Population Standard Deviation!44s
x 1 n!1 i=1 n (x [i] !¯x) 2 x 1 n i=1 n (x [i] x 2Sebastian Raschka STAT 479: Machine Learning FS 2018!45Scaling Validation and Test Sets
Sebastian Raschka STAT 479: Machine Learning FS 2018!46Scaling Validation and Test SetsGiven 3 training examples: - example1: 10 cm -> class 2
- example2: 20 cm -> class 2 - example3: 30 cm -> class 1Estimate: mean: 20 cm
standard deviation: 8.2 cmSebastian Raschka STAT 479: Machine Learning FS 2018!47Scaling Validation and Test SetsGiven 3 training examples: - example1: 10 cm -> class 2
- example2: 20 cm -> class 2 - example3: 30 cm -> class 1Estimate: mean: 20 cm
standard deviation: 8.2 cmStandardize: - example1: -1.21 -> class 2
- example2: 0.00 -> class 2 - example3: 1.21 -> class 1Sebastian Raschka STAT 479: Machine Learning FS 2018!48Scaling Validation and Test SetsGiven 3 training examples: - example1: 10 cm -> class 2
- example2: 20 cm -> class 2 - example3: 30 cm -> class 1Estimate: mean: 20 cm
standard deviation: 8.2 cm Standardize (z scores): - example1: -1.21 -> class 2 - example2: 0.00 -> class 2 - example3: 1.21 -> class 1h(z)=2z"0.6
1otherwise
Sebastian Raschka STAT 479: Machine Learning FS 2018!49Scaling Validation and Test SetsGiven 3 training examples: - example1: 10 cm -> class 2
- example2: 20 cm -> class 2 - example3: 30 cm -> class 1Estimate: mean: 20 cm
standard deviation: 8.2 cm Standardize (z scores): - example1: -1.21 -> class 2 - example2: 0.00 -> class 2 - example3: 1.21 -> class 1h(z)=2z"0.6
1otherwise
Given 3 NEW examples: - example4: 5 cm -> class ?
- example5: 6 cm -> class ? - example6: 7 cm -> class ?Estimate "new" mean and std.: - example5: -1.21 -> class 2 - example6: 0.00 -> class 2 - example7: 1.21 -> class 1
Sebastian Raschka STAT 479: Machine Learning FS 2018!50Scaling Validation and Test SetsGiven 3 training examples: - example1: 10 cm -> class 2
- example2: 20 cm -> class 2 - example3: 30 cm -> class 1Estimate: mean: 20 cm
standard deviation: 8.2 cm Standardize (z scores): - example1: -1.21 -> class 2 - example2: 0.00 -> class 2 - example3: 1.21 -> class 1h(z)=2z"0.6
1otherwise
- example4: 5 cm -> class ? - example5: 6 cm -> class ? - example6: 7 cm -> class ?Estimate "new" mean and std.: - example5: -1.21 -> class 2 - example6: 0.00 -> class 2 - example7: 1.21 -> class 1 - example5: -18.37
- example6: -17.15 - example7: -15.92Sebastian Raschka STAT 479: Machine Learning FS 2018!51Training DataModelTransformed Test DataTest DataTransformed Training Dataest.fit(X_train)est.transform(X_train)est.transform(X_test)①②③
Scikit-Learn Transformer API
Sebastian Raschka STAT 479: Machine Learning FS 2018!52Scikit-Learn Transformer API
Sebastian Raschka STAT 479: Machine Learning FS 2018!53 Sebastian Raschka STAT 479: Machine Learning FS 2018Categorical: Ordinal!54 Sebastian Raschka STAT 479: Machine Learning FS 2018Categorical: Ordinal!55 Sebastian Raschka STAT 479: Machine Learning FS 2018Categorical: Nominal!56 Sebastian Raschka STAT 479: Machine Learning FS 2018One-hot Encoding!57 Sebastian Raschka STAT 479: Machine Learning FS 2018One-hot Encoding!58 Sebastian Raschka STAT 479: Machine Learning FS 2018!59 Sebastian Raschka STAT 479: Machine Learning FS 2018!60 Sebastian Raschka STAT 479: Machine Learning FS 2018!61Sebastian Raschka STAT 479: Machine Learning FS 2018Scikit-Learn Pipelines!62Training setTest setScalingDimensionality ReductionLearning Algorithm.fit(...) & .transform(...).fit(...) & .transform(...).fit(...)Predictive Model.transform(...).transform(...).predict(...)pipeline.fit(...)Class labelspipeline.predict(...)Class labels(Step 1)(Step 2)Pipeline
Sebastian Raschka STAT 479: Machine Learning FS 2018Scikit-Learn Pipelines!63 Sebastian Raschka STAT 479: Machine Learning FS 2018Scikit-Learn Pipelines!64Sebastian Raschka STAT 479: Machine Learning FS 2018Scikit-Learn Pipelines!65Training setTest setScalingDimensionality ReductionLearning Algorithm.fit(...) & .transform(...).fit(...) & .transform(...).fit(...)Predictive Model.transform(...).transform(...).predict(...)pipeline.fit(...)Class labelspipeline.predict(...)Class labels(Step 1)(Step 2)Pipeline
Sebastian Raschka STAT 479: Machine Learning FS 2018Model Selection: Simple Holdout Method!66Original datasetTraining setValidation setTest setTraining setTest setMachine learning algorithmPredictive modelChange hyperparametersand repeatFinal performance estimateFitEvaluate
Sebastian Raschka STAT 479: Machine Learning FS 2018Model Selection: Simple Holdout Method!67
Sebastian Raschka STAT 479: Machine Learning FS 2018Model Selection: Simple Holdout Method!68
Sebastian Raschka STAT 479: Machine Learning FS 2018Model Selection: Simple Holdout Method!69
Sebastian Raschka STAT 479: Machine Learning FS 2018!70Reading Assignments•Python Machine Learning, 2nd ed.: