Machine Learning For Modern
Developers
C. Aaron Cois, PhD
Wanna chat?
@aaroncois
www.codehenge.net
github.com/cacois
Let’s talk about Machine Learning
The Expectation
The Sales Pitch
The Reaction
My Customers
The Definition
“Field of study that gives computers the ability
to learn without being explicitly programmed”
~ Arthur Samuel, 1959
That sounds like Artificial Intelligence
That sounds like Artificial Intelligence
True
That sounds like Artificial Intelligence
Machine Learning is a branch of
Artificial Intelligence
That sounds like Artificial Intelligence
ML focuses on systems that learn from
data
Many AI systems are simply programmed
to do one task really well, such as playing
Checkers. This is a solved problem, no
learning required.
Isn’t that how Skynet starts?
Isn’t that how Skynet starts?
Ya, probably
Isn’t that how Skynet starts?
But it’s also how we do this…
…and this…
…and this
Isn’t this just statistics?
Machine Learning can take statistical analyses
and make them automated and adaptive
Statistical and numerical methods are Machine
Learning’s hammer
Supervised vs. Unsupervised
Supervised = System trained on human
labeled data (desired output
known)
Unsupervised = System operates on unlabeled
data (desired output
unknown)
Supervised learning is all about
generalizing a function or mapping
between inputs and outputs
Supervised Learning Example:
Complementary Colors
…
Training Data
…
Test Data
Supervised Learning Example:
Complementary Colors
…
Training Data
f( ) =
…
Test Data
Supervised Learning Example:
Complementary Colors
…
Training Data
f( ) =
f( ) =
…
Test Data
Let’s Talk Data
Supervised Learning Example:
Complementary Colors
input,output
red,green
violet,yellow
blue,orange
orange,blue
…
training_data.csv
red
green
yellow
orange
blue
…
test_data.csv
First line
indicates
data
fields
Feature Vectors
A data point is represented by a feature vector
Ninja Turtle = [name, weapon, mask_color]
data point 1 = [michelangelo,nunchaku,orange]
data point 2 = [leonardo,katana,blue]
…
Feature Space
Feature vectors define a point in an n-
dimensional feature space
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1 1.2
If my feature vectors
contain only 2 values,
this defines a point in
2-D space:
(x,y) = (1.0,0.5)
High-Dimensional Feature Spaces
Most feature vectors are much higher
dimensionality, such as:
FVlaptop = [name,screen size,weight,battery life,
proc,proc speed,ram,price,hard drive,OS]
This means we can’t easily display it visually, but
statistics and matrix math work just fine
Feature Space Manipulation
Feature spaces are important!
Many machine learning tasks are solved by
selecting the appropriate features to define a
useful feature space
Task: Classification
Classification is the act of placing a new data point
within a defined category
Supervised learning task
Ex. 1: Predicting customer gender through shopping
data
Ex. 2: From features, classifying an image as a car or
truck
Linear Classification
Linear classification uses a linear combination
of features to classify objects
Linear Classification
Linear classification uses a linear combination
of features to classify objects
result Weight vector
Feature vector
Dot product
Linear Classification
Another way to think
of this is that we
want to draw a line
(or hyperplane) that
separates datapoints
from different
classes
Sometimes this is easy
Classes are well
separated in this
feature space
Both H1 and H2
accurately separate
the classes.
Other times, less so
This decision boundary works for most data points,
but we can see some incorrect classifications
Example: Iris Data
There’s a famous dataset published by R.A.
Fisher in 1936 containing measurements of
three types of Iris plants
You can download it yourself here:
http://archive.ics.uci.edu/ml/datasets/Iris
Example: Iris Data
Features:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class
Data:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
…
7.0,3.2,4.7,1.4,Iris-versicolor
…
6.8,3.0,5.5,2.1,Iris-virginica
…
Data Analysis
We have 4 features in our vector (the 5th is the
classification answer)
Which of the 4 features are useful for predicting
class?
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 1 2 3 4 5 6 7 8 9
sepiawidth
sepia length
sepia length vs width
Different feature spaces give different
insight
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9
petallength
sepia length
sepia length vs petal length
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6 7 8
petalwidth
petal length
petal length vs petal width
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
petalwidth
sepia width
sepia width vs petal width
Half the battle is choosing the features
that best represent the discrimination
you want
Feature Space Transforms
The goal is to map data into an effective feature space
Demo
Logistic Regression
Classification technique based on fitting a
logistic curve to your data
Logistic Regression
P(Y | b, x) =
1
1+e-(b0+b1x)
Logistic Regression
Class 2
Class 1 Probability of data point being in a class
Model weights
P(Y | b, x) =
1
1+e-(b0+b1x)
More Dimensions!
Extending the logistic function into N-
dimensions:
More Dimensions!
Extending the logistic function into N-
dimensions:
Vectors!
More weights!
Tools
Torch7
Demo: Logistic Regression (Scikit-
Learn)
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
iris = load_iris()
# set data
X, y = iris.data, iris.target
# train classifier
clf = LogisticRegression().fit(X, y)
# 'setosa' data point
observed_data_point = [[ 5.0, 3.6, 1.3, 0.25]]
# classify
clf.predict(observed_data_point)
# determine classification probabilities
clf.predict_proba(observed_data_point)
Learning
In all cases so far, “learning” is just a matter of
finding the best values for your weights
Simply, find the function that fits the training
data the best
More dimensions more features we can
consider
What are we doing?
Logistic regression is actually maximizing the
likelihood of the training data
This is an indirect method, but often has good
results
What we really want is to maximize the accuracy
of our model
Support Vector Machines (SVMs)
Remember how a large number of lines could
separate my classes?
Support Vector Machines (SVMs)
SVMs try to find the optimal classification
boundary by maximizing the margin between
classes
Bigger margins mean better
classification of new data points
Points on the edge of a class are called Support
Vectors
Support
vectors
Demo: Support Vector Machines
(Scikit-Learn)
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
iris = load_iris()
# set data
X, y = iris.data, iris.target
# run regression
clf = LinearSVC().fit(X, y)
# 'setosa' data point
observed_data_point = [[ 5.0, 3.6, 1.3, 0.25]]
# classify
clf.predict(observed_data_point)
Want to try it yourself?
Working code from this talk:
https://github.com/cacois/ml-
classification-examples
Some great online courses
Coursera (Free!)
https://www.coursera.org/course/ml
Caltech (Free!)
http://work.caltech.edu/telecourse
Udacity (free trial)
https://www.udacity.com/course/ud675
AMA
@aaroncois
www.codehenge.net
github.com/cacois

Machine Learning for Modern Developers

Editor's Notes

  • #6 What some customers think
  • #7 What some people think
  • #20 And like any toolbox, the contents are tools – not processes, procedures, or algorithms. Machine Learning provides these components.
  • #21 Supervised learning algorithms are trained on labelled examples, i.e., input where the desired output is known. The supervised learning algorithm attempts to generalise a function or mapping from inputs to outputs which can then be used speculatively to generate an output for previously unseen inputs. Unsupervised learning algorithms operate on unlabelled examples, i.e., input where the desired output is unknown. Here the objective is to discover structure in the data (e.g. through a cluster analysis), not to generalise a mapping from inputs to outputs.
  • #37 Note: many possible boundaries between black and white dots
  • #49 plot_iris.py
  • #52 DEMO
  • #58 i.e. many logistic models can work the same on training data, some are better than others. We can’t tell.