How to study statistics for data science?
Study inferential statistics: Once you've learnt descriptive statistics and probability, move on to inferential statistics.
Start with hypothesis testing, including t-tests and ANOVA, and then progress to regression analysis, including simple linear regression and multiple regression..
Statistical tools in quantitative research
Descriptive and Inferential Statistics
The two major areas of statistics are known as descriptive statistics, which describes the properties of sample and population data, and inferential statistics, which uses those properties to test hypotheses and draw conclusions..
Statistical tools in research
Perform a statistical test that suits our data.
Check the resulting p-Value.
If the p-Value is smaller than our significance level, then we reject the null hypothesis in favor of our alternative hypothesis.
If the p-Value is higher than our significance level, then we go with our null hypothesis..
What are the commonly used statistical tests in data science?
There are various statistical tests that can be used, depending on the type of data being analyzed.
However, some of the most common statistical tests are t-tests, chi-squared tests, and ANOVA tests..
What are the statistical methods used in data science?
Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean and median and another is inferential statistics, which draw conclusions from data using statistical tests such as student's t-test..
What type of statistics is needed for data science?
Therefore, it shouldn't be a surprise that data scientists need to know statistics.
For example, data analysis requires descriptive statistics and probability theory, at a minimum.
These concepts will help you make better business decisions from data..
- Central tendency (measures of the center): mean (average of all values), median (central value of a data set), and mode (the most recurrent value in a data set).
Measures of the spread: Range: the distance between each value in a data set.
Variance: the distance between a variable and its expected value.