This is a short course in machine learning, aimed at those who are already proficient with the basics of statistical methodology, and in particular with linear regressions.

Some models fit previously seen data vey well, but are bad at forecasting unseen data

A model used to predict unseen outputs given observed inputs

Choice of a regularization parameter

Use train-val-test splits to choose the amount of regularization

How to split the data in order to test and validate predictive models

A generalization of the algorithm used in boosted linear regressions

An algorithm to train high-dimensional linear regression models without overfitting

A gradient-boosted model where the base learners are decision trees

A predictive model built performing sample splits based on the input values

A classification model in which the scores are obtained by boosting

A method to validate and test predictive models that uses data in a smart way

It is advantageous to average the predictions from many different models

What to do when production data does not come from the same distribution as the learning data

The books

Most of the learning materials found on this website are now available in a traditional textbook format.

Featured pages

- Combinations
- Characteristic function
- Independent events
- Wald test
- Uniform distribution
- Exponential distribution

Explore

Main sections

- Mathematical tools
- Fundamentals of probability
- Probability distributions
- Asymptotic theory
- Fundamentals of statistics
- Glossary

About

Glossary entries

- Loss function
- Integrable variable
- Binomial coefficient
- Probability density function
- Convolutions
- Type I error

Share