Machine learning basics

Albert
4 min readJan 8, 2020

Machine learning and AI are often used interchangeably but they are not exactly the same thing. Machine learning is more like a subset of Artificial intelligence. In my previous article on artificial intelligence, i mentioned that machine learning is the backbone of AI just as our learning capacity serves as the backbone of human intelligence, among other cognitive abilities. Check out that article here

Machine learning refers to algorithms that computers use to learn from data, allowing it to make predictions on future, unseen data. Development began somewhere in the 50s, but it didn’t really thrive until the late 2000s. This was because computers back then were not powerful enough and didn’t have access to huge amount of data that machine learning algorithms required. Things got better in the late 2000s due to an explosion of extremely large data sets (known as big data) available for training models, along with fast processors that can run the algorithms.

Machine learning algorithms can be classified into 3 main categories — Supervised learning, Unsupervised learning and Reinforcement learning

Supervised learning

This group of algorithms use labeled data to train models. Learning is performed on input-output training samples or data sets. The labeled data sets is used as a basis for prediction. A learning algorithm then trains a model to generate a prediction for the response to new data or the test data set.

Two popular techniques used by supervised learning algorithms to develop predictive models include classification and regression techniques. These techniques also form a basis for identifying supervised learning problems (classification problems or regressions problems).

Classification problem

A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. Its useful for predicting a qualitative response by analyzing data and recognizing patterns. It is when the output data falls under a finite set of possible outcomes. For example, when filtering emails “spam” or “not spam” or when looking at transaction data, “fraudulent”, or “authorized”. In short Classification either predicts categorical class labels or classifies data (construct a model) based on the training set and the values (class labels) in classifying attributes and uses it in classifying new data. The classification process deals with problems where the data can be divided into binary or multiple discrete labels. Where there are only 2 possible outcomes we refer to it as binary classification. Where there are multiple possible outcomes, we call it multi-class classification.

Tip: with classification technique the output variable is qualitative or takes class labels

Regression problem

A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”. Regression is useful for predicting, forecasting, and finding relationships between quantitative data. That means the output is represented by a quantity that can be flexibly determined based on the inputs of the model rather than being confined to a set of possible labels. For example, the price of a house depending on the ‘size’ and ‘location’, can be some ‘numerical value’. Because a regression predictive model predicts a quantity, the skill of the model must be reported as an error in those predictions.

Tip: with regression techniques the output is quantitative or variables takes continuous values.

Popular supervised learning algorithms include:

  • Linear regression
  • Logistic regression
  • Support vector machine
  • Random forest
  • Naive Bayes
  • K-nearest neighbor
  • Neural networks

Unsupervised learning

Unsupervised learning algorithms finds structures in the data through common elements, similar attributes, naturally occurring trends, patterns or relationships in the data. Labels for the data instances or other forms of guidance for training are not necessary. There is no training data set with this machine learning approach. None of the data can be presorted or classified beforehand, so the machine learning algorithm is more complex and the processing is time intensive. This makes unsupervised learning good for applications where data is cheap to obtain, but labels are either expensive or not available.

Two popular unsupervised learning techniques are Clustering and Principal Components Analysis.

Clustering

Clustering or cluster analysis is used to find commonalities between data elements that are otherwise unlabeled. The goal of clustering is to find distinct groups or “clusters” within a data set. Using a machine learning algorithm, the model creates groups where items in a similar group will, in general, have similar characteristics to each other

Principal components analysis

Principal component analysis summarizes a large set of variables and reduces it into a smaller representative variables, called “principal components” The objective of this type of analysis is to identify patterns in data and express their similarities and differences through their correlations.

Popular unsupervised learning algorithms include:

  • Hierarchical and density based clustering
  • K-means clustering
  • Complete link
  • Average link
  • Ward clustering
  • Mixture models
  • Neural networks

Reinforcement learning

Reinforcement learning algorithms find ways of improving performance through trial-and-error (reinforcement). The machine is trained to make specific decisions by being exposed to an environment where it trains itself continually using trial and error. It consists of cycles in which a learning agent is presented with an input describing the current environmental state, responds with an action and receives some reward as an indication of the value of its action.

The machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Reinforcement learning is an important model of the behavior of humans and robots, and it has been applied to various areas such as autonomous robot control, computer games, and marketing strategy optimization. Behind reinforcement learning, supervised and unsupervised learning methods such as regression, classification, and clustering are often utilized.

--

--