Big-Data Tutorial
'Big-data' is similar to 'Small-data' but bigger. ? Recently getting popular expression “Midsize data”. ? …but having data bigger it requires somewhat.
Big-Data Tutorial
? Good news about big-data: ? Often because of vast amount of data
Data Science Tutorial
10 de ago. de 2017 2017 SEI Data Science in Cybersecurity Symposium. Approved for Public Release; Distribution is Unlimited. Data Science Tutorial.
Preview Big Data Analytics Tutorial (PDF Version)
sources to a data product useful for organizations forms the core of Big Data Analytics. In this tutorial we will discuss the most fundamental concepts and
Data Science do zero: Primeiras regras com o Python
Translated from original Data Science from Scratch by Joel Grus. Mas não é um tutorial compreensível sobre Python é direcionado a ... O pdf para y.
introduction to big data and hadoop
volumes of data Facebook was generating. Makes it possible for analysts with strong SQL skills to run queries. Used by many organizations.
Informatica Big Data Management - 10.2.2 - User Guide - (English)
10 de jul. de 2020 Informatica the Informatica logo
Big Data Conceitos básicos
Volume de dados de difícil tratamento. Page 5. SEFAZ/ES – do BI ao Big Data Analytics. • Início de
UNIVERSIDADE ESTADUAL DE CAMPINAS FACULDADE DE
O curso “Data Mining e Big Data: inteligência analítica na pesquisa FEQ-0267 – “A História da Big Data: Tutorial sobre Big Data /.
Big Data Analytics: Optimization and Randomization
10 de ago. de 2015 http://www.cs.uiowa.edu/˜tyng/kdd15-tutorial.pdf. Yang Lin
Big Data Analytics: Optimization
and RandomizationTianbao Yang
†, Qihang Lin?, Rong Jin?‡Tutorial@SIGKDD 2015
Sydney, Australia
Department of Computer Science, The University of Iowa, IA, USA ?Department of Management Sciences, The University of Iowa, IA, USA ?Department of Computer Science and Engineering, Michigan State University, MI, USA ‡Institute of Data Science and Technologies at Alibaba Group, Seattle, USAAugust 10, 2015
Yang, Lin, JinTutorial for KDD"15August 10, 2015 1 / 234 URL http://www.cs.uiowa.edu/ ˜tyng/kdd15-tutorial.pdfYang, Lin, JinTutorial for KDD"15August 10, 2015 2 / 234Some Claims
NoThis tutorial is not an exhaustive literature survey It is not a survey on different machine learning/data mining algorithms YesIt is about how toefficien tlysolve machine lea rning/datamining (formulated as optimization) problems for big data Yang, Lin, JinTutorial for KDD"15August 10, 2015 3 / 234Outline
Part I: Basics
Part II: Optimization
Part III: Randomization
Yang, Lin, JinTutorial for KDD"15August 10, 2015 4 / 234 Big Data Analytics: Optimization and RandomizationPart I: Basics
Yang, Lin, JinTutorial for KDD"15August 10, 2015 5 / 234BasicsIntroduction
Outline
1Basics
Introduction
Notations and Definitions
Yang, Lin, JinTutorial for KDD"15August 10, 2015 6 / 234BasicsIntroduction
Three Steps for Machine Learning
Model Optimization
20406080100
0 0.05 0.1 0.15 0.2 0.25 0.3 iterations distance to optimal objective 0.5 T 1/T 2 1/T Data Yang, Lin, JinTutorial for KDD"15August 10, 2015 7 / 234BasicsIntroduction
Big Data Challenge
Big Data
Yang, Lin, JinTutorial for KDD"15August 10, 2015 8 / 234BasicsIntroduction
Big Data Challenge
Big Model60 million parameters
Yang, Lin, JinTutorial for KDD"15August 10, 2015 9 / 234BasicsIntroduction
Learning as Optimization
Ridge Regression Problem:min
w?Rd1n n i=1(yi-w?xi)2+λ2 ?w?22x i?Rd:d-dimensional feature vectoryi?R: target variablew?Rd: model parametersn: number of data pointsYang, Lin, JinTutorial for KDD"15August 10, 2015 10 / 234
BasicsIntroduction
Learning as Optimization
Ridge Regression Problem:min
w?Rd1n n i=1(yi-w?xi)2Empirical Loss+
λ2 ?w?22x i?Rd:d-dimensional feature vectoryi?R: target variablew?Rd: model parametersn: number of data pointsYang, Lin, JinTutorial for KDD"15August 10, 2015 11 / 234
BasicsIntroduction
Learning as Optimization
Ridge Regression Problem:min
w?Rd1n n i=1(yi-w?xi)2+λ2 ?w?22????Regularizationx
i?Rd:d-dimensional feature vectoryi?R: target variablew?Rd: model parametersn: number of data pointsYang, Lin, JinTutorial for KDD"15August 10, 2015 12 / 234
BasicsIntroduction
Learning as Optimization
Classification Problems:min
w?Rd1n n i=1?(yiw ?xi) +λ2 ?w?22y i? {+1,-1}: labelLoss function?(z):z=yw?x 1. S VMs :(squa red)hinge loss ?(z) =max(0,1-z)p, wherep=1,2 2.L ogisticRegression
:?(z) =log(1+exp(-z))Yang, Lin, JinTutorial for KDD"15August 10, 2015 13 / 234BasicsIntroduction
Learning as Optimization
Feature Selection:min
w?Rd1n n i=1?(w?xi,yi) +λ?w?1?1regularization?w?1=?di=1|wi|λcontrols sparsity levelYang, Lin, JinTutorial for KDD"15August 10, 2015 14 / 234
BasicsIntroduction
Learning as Optimization
Feature Selection using
Elastic Net
:min w?Rd1n n i=1?(w?xi,yi)+λ??w?1+γ?w?22?Elastic net regularizer, more robust than?1regularizerYang, Lin, JinTutorial for KDD"15August 10, 2015 15 / 234
BasicsIntroduction
Learning as Optimization
Multi-class/Multi-task Learning:
min W1n n i=1?(Wxi,yi) +λr(W)W?RK×dr(W) =?W?2F=?Kk=1?dj=1W2kj: Frobenius Normr(W) =?W??=?iσi: Nuclear Norm (sum of singular values)r(W) =?W?1,∞=?dj=1?W:j?∞:?1,∞mixed normYang, Lin, JinTutorial for KDD"15August 10, 2015 16 / 234
BasicsIntroduction
Learning as Optimization
Regularized Empirical Loss Minimization
min w?Rd1n ni=1?(w?xi,yi) +R(w)Both?andRare convex functionsExtensions to Matrix Cases are possible (sometimes straightforward)
Extensions to Kernel methods can be combined with randomized approachesExtensions to Non-convex (e.g., deep learning) are in progress Yang, Lin, JinTutorial for KDD"15August 10, 2015 17 / 234BasicsIntroduction
Data Matrices and Machine Learning
The Instance-feature Matrix:X?Rn×dX=(
(((((((x ?1x?2· x ?n) )))))))Yang, Lin, JinTutorial for KDD"15August 10, 2015 18 / 234BasicsIntroduction
Data Matrices and Machine Learning
The output vector:y=(
(((((((y 1 y 2 y n))))))))?Rn×1continuousyi?R: regression (e.g., house price)discrete, e.g.,yi? {1,2,3}: classification (e.g., species of iris)Yang, Lin, JinTutorial for KDD"15August 10, 2015 19 / 234
BasicsIntroduction
Data Matrices and Machine Learning
The Instance-Instance Matrix:K?Rn×nSimilarity MatrixKernel Matrix
Yang, Lin, JinTutorial for KDD"15August 10, 2015 20 / 234BasicsIntroduction
Data Matrices and Machine Learning
Some machine learning tasks are formulated on the kernel matrixClusteringKernel Methods
Yang, Lin, JinTutorial for KDD"15August 10, 2015 21 / 234BasicsIntroduction
Data Matrices and Machine Learning
The Feature-Feature Matrix:C?Rd×dCovariance MatrixDistance Metric Matrix
Yang, Lin, JinTutorial for KDD"15August 10, 2015 22 / 234BasicsIntroduction
Data Matrices and Machine Learning
Some machine learning tasks requires
the cova riancematrix Principal Component Analysis Top-k Singular Value (Eigen-Value) Decomposition of the CovarianceMatrix
Yang, Lin, JinTutorial for KDD"15August 10, 2015 23 / 234BasicsIntroduction
Why Learning from Big Data is Challenging?
High per-iteration cost
High memory cost
High communication cost
Large iteration complexity
Yang, Lin, JinTutorial for KDD"15August 10, 2015 24 / 234BasicsNotations and Definitions
Outline
1Basics
Introduction
Notations and Definitions
Yang, Lin, JinTutorial for KDD"15August 10, 2015 25 / 234BasicsNotations and Definitions
NormsVectorx?RdEuclidean vector norm:?x?2=⎷x
?x=?? di=1x2i? p-norm of a vector:?x?p=??di=1|xi|p?1/pwherep≥11?2norm?x?2=??
d i=1x2i2?1norm?x?1=?d
i=1|xi|3? ∞norm?x?∞=maxi|xi|Yang, Lin, JinTutorial for KDD"15August 10, 2015 26 / 234BasicsNotations and Definitions
NormsVectorx?RdEuclidean vector norm:?x?2=⎷x
?x=?? di=1x2i? p-norm of a vector:?x?p=??di=1|xi|p?1/pwherep≥11?2norm?x?2=??
d i=1x2i2?1norm?x?1=?d
i=1|xi|3? ∞norm?x?∞=maxi|xi|Yang, Lin, JinTutorial for KDD"15August 10, 2015 26 / 234BasicsNotations and Definitions
NormsVectorx?RdEuclidean vector norm:?x?2=⎷x
?x=?? di=1x2i? p-norm of a vector:?x?p=??di=1|xi|p?1/pwherep≥11?2norm?x?2=??
d i=1x2i2?1norm?x?1=?d
i=1|xi|3? ∞norm?x?∞=maxi|xi|Yang, Lin, JinTutorial for KDD"15August 10, 2015 26 / 234BasicsNotations and Definitions
Matrix Factorization
kkV?k: top-kapproximationPseudo inverse:X†=V-1U?QR factorization:X=QR(n≥d)Q?Rn×d: orthonormal columnsR?Rd×d: upper triangular matrixYang, Lin, JinTutorial for KDD"15August 10, 2015 27 / 234
BasicsNotations and Definitions
Matrix Factorization
kkV?k: top-kapproximationPseudo inverse:X†=V-1U?QR factorization:X=QR(n≥d)Q?Rn×d: orthonormal columnsR?Rd×d: upper triangular matrixYang, Lin, JinTutorial for KDD"15August 10, 2015 27 / 234
BasicsNotations and Definitions
Matrix Factorization
kkV?k: top-kapproximationPseudo inverse:X†=V-1U?QR factorization:X=QR(n≥d)Q?Rn×d: orthonormal columnsR?Rd×d: upper triangular matrixYang, Lin, JinTutorial for KDD"15August 10, 2015 27 / 234
BasicsNotations and Definitions
NormsMatrixX?Rn×dFrobenius norm:?X?F=?tr(X?X) =??
ni=1?dj=1X2ijSpectral (induced norm) of a matrix:?X?2=max?u?2=1?Xu?2?A?2=σ1(maximum singular value)Yang, Lin, JinTutorial for KDD"15August 10, 2015 28 / 234
BasicsNotations and Definitions
NormsMatrixX?Rn×dFrobenius norm:?X?F=?tr(X?X) =??
ni=1?dj=1X2ijSpectral (induced norm) of a matrix:?X?2=max?u?2=1?Xu?2?A?2=σ1(maximum singular value)Yang, Lin, JinTutorial for KDD"15August 10, 2015 28 / 234
BasicsNotations and Definitions
Convex Optimization
min x?Xf(x)Xis a convex domainfor anyx,y? X, their convex combination αx+ (1-α)y? Xf(x)is a convex functionYang, Lin, JinTutorial for KDD"15August 10, 2015 29 / 234BasicsNotations and Definitions
Convex Function
?x,y? X,α?[0,1]f(x)≥f(y) +?f(y)?(x-y)?x,y? Xlocal optimum is global optimum Yang, Lin, JinTutorial for KDD"15August 10, 2015 30 / 234BasicsNotations and Definitions
Convex Function
?x,y? X,α?[0,1]f(x)≥f(y) +?f(y)?(x-y)?x,y? Xlocal optimum is global optimum Yang, Lin, JinTutorial for KDD"15August 10, 2015 30 / 234BasicsNotations and Definitions
Convex vs Strongly Convex
Convex function:
f(x)≥f(y) +?f(y)?(x-y)?x,y? XStrongly Convex function:
f(x)≥f(y) +?f(y)?(x-y) +λ 2 ?x-y?22?x,y? XGlobal optimum is uniquestrong convexity
constant Yang, Lin, JinTutorial for KDD"15August 10, 2015 31 / 234BasicsNotations and Definitions
Convex vs Strongly Convex
Convex function:
f(x)≥f(y) +?f(y)?(x-y)?x,y? XStrongly Convex function:
f(x)≥f(y) +?f(y)?(x-y) +λ 2 ?x-y?22?x,y? XGlobal optimum is uniquestrong convexity
constant Yang, Lin, JinTutorial for KDD"15August 10, 2015 31 / 234BasicsNotations and Definitions
Non-smooth function vs Smooth function
Non-smooth functionLipschitz continuous: e.g.absolute loss constantSubgradient:f(x)≥f(y) +∂f(y)?(x-y)-1-0.500.51-0.2 0 0.2 0.4 0.6 0.8 |x| non-smooth sub-gradientSmooth function e.g. constant -5-4-3-2-1012345-1 0 1 2 3 4 5 6 log(1+exp(-x)) f(y)+f'(y)(x-y) y f(x) Quadratic FunctionYang, Lin, JinTutorial for KDD"15August 10, 2015 32 / 234BasicsNotations and Definitions
Non-smooth function vs Smooth function
quotesdbs_dbs50.pdfusesText_50[PDF] bilan apb 2016
[PDF] bilan arjel 2016
[PDF] bilan biochimique sang
[PDF] bilan biochimique sang pdf
[PDF] bilan cm2 systeme solaire
[PDF] bilan comptable marocain excel
[PDF] bilan comptable marocain exemple
[PDF] bilan comptable marocain exercice corrigé
[PDF] bilan dune macrocytose
[PDF] bilan de cycle eps
[PDF] bilan des omd en afrique
[PDF] bilan dysgraphie orthophonie
[PDF] bilan energetique formule pdf
[PDF] bilan energetique physique 3eme