Bayesian Analyis with Python Chapter 4, Understanding and Predicting Data with Linear Regression Models
Simple Linear Regression
Continuous variable - a variable using real numbers or floats (dependent, predicted, outcome)Independent variable - can be continous or categorical (predictor, input)
We can can model this relationship with linear regression. With multiple independent variables, we will use multiple regression models.
The machine learning connection
Machine learning (ml) is the umbrella term for a collection of methods to automatically learn patterns in data. Regression is a supervised learning problem because we know the x and y values. The question is how to generalize these observations to any future observation.The core of linear regression models
Beta is the slope of the line, changer per unit change in x.
Alpha is the value of y when x = 0.
When we try to solve this problem we use the least squares model. We can also use a Bayesian framework.
This has several advantages:
- we can obtain the best values of alpha and beta
- capture uncertainty estimation of these parameters.
data vector y is assumed to be distributed as a Gaussian with mean alpha + beta x
Linear models and high auto correlation
By definition, both of our parameters are going to be correlated by the definition of the model. The shape a very diagonal space. See the Curse of dimensionality. The fact that the line is constrained by the mean of the data is only true for least squares method. Using Bayesian methods, this constraint is relaxed. We will look at two approaches to fix this.Modifying the data before running
The simple solution is to center the x data by subtracting the mean of the x variables.x prime will be centered at 0.
Centering the data can help with interpeting the data. In some insteresting cases
the value
To report parameters back in the original scale
You can also standardize the data. To do this, you divide the by the standard deviation
Standardizing the data allows us to talk in z-scores. 1.3 in z-score units means 1.3 standard deviations above or below.
Changing the sampling method
By changing the sampling method we can alleviate the auto correlation problem. NUTS can be slower per step, but usually needs less steps than Metropolis-HastingsInterpreting and visualizing the posterior
This section is a lot of code, and pictures. The author gives good ideas on how to analyze of the code and the data that it produces.Pearson correlation coefficient
Is the measure of the degree of linear dependence between two variables, often denoted as:r = +1 == perfect positive linear correlation. When one variable goes up, the other goes down.
r = -1 == perfect negative correlation. When one variable goes up, the other goes down.
r = 0 == no linear correlation
Pearson correlation coefficient is equal to the slope of the line when standard deviation of x = standard deviation of y.
determinant coefficient = 1 = pearson coefficient squared
Pearson coeffecient from a multivariate Gaussian
Multivariate Gaussian is the generalization of the Gaussian distribution to more than one dimension.We need a 2x2 co-variance matrix.
The main diagonal is the square of the variances p = Pearson coefficient correlation
Since we don't know the values of the covariance matrix we have to put priors on it. We could use other methodologies
- Wishart Distribution
- LKJ Prior
- use manual priors
Robust linear regression
Assuming our data is Gaussian is a reasonable assumption in most cases. Because of outliers, our Gaussian assumption can fail. The students t-distribution can be a reasonably robust inference. These concepts apply to linear regression also. You may need to use a shifted exponential, as the unshifted puts too much emphasis on extreme outliers or data with few bulk points.Hierarchical linear regression
This section contains a bunch of code that shows how to do hierarchical linear regression.Correlation, causation, and the messiness of life
Correlation does not imply causation
When we establish a linear relationship between two variables, the variables can be interchanged. This does not mean that x implies y or that y implies x. To establish that correlation can be interpreted as causation, we need to add a physical mechanism to the problems.
Polynomial regression
This section discusses the code to a line for polynomial regression.Interpreting the parameters of a polynomial regression
The beta coefficients are no longer slopes. While these models can be good at predictions, they aren't good at understanding the underlying processes. Polynomials of order two or higher are better with other models.Polynomial regression - the ultimate model
Is is possible to fit a polynomial perfectly. A model that fits your data perfectly will in general be a poor description of unobserved data. This is known as overfitting, and is a problem for statistics and machine learning. Lines are easier to interpret even if the cubic models fit the data better. It is possible to fit you data perfectly with polynomial regression. This does not mean it is a good modelMultiple linear regression
We have been working with one dependent and independent variable. We can also have multiple independent variables.Beta is a vector of coefficients
Under linear regression, we hope to find a straight line that fits our data. In multiple linear regression, we find a hyperplane of dimension mu.
Confounding variables and redundant variables
We can sometimes predict y from x. But we might really be interested in z. When we omit the variable that is actually driving our analysis, this is called the confounding variable. It can be left out for many reasons, it wasn't measured, or was left out of our dataset.Multi collinearity or when the correlation is too high
To prove the point, we set two variables almost exactly equal. This model works and allows us to predict data very well. But it may be simpler to leave one variable out. Correlated variables and highly correlated variables are always possible in any dataset.How to deal with them is the question:
- remove one variable, it doesn't really matter which one
- create a new variable averaging the redundant variables
- use stronger priors
Comments
Post a Comment