Machine Learning for Modern Developers

Machine Learning For Modern
Developers
C. Aaron Cois, PhD

Wanna chat?
@aaroncois
www.codehenge.net
github.com/cacois

Let’s talk about Machine Learning

The Definition
“Field of study that gives computers the ability
to learn without being explicitly programmed”
~ Arthur Samuel, 1959

That sounds like Artificial Intelligence

True

Machine Learning is a branch of
Artificial Intelligence

ML focuses on systems that learn from
data
Many AI systems are simply programmed
to do one task really well, such as playing
Checkers. This is a solved problem, no
learning required.

Isn’t that how Skynet starts?

Isn’t that how Skynet starts?
Ya, probably

But it’s also how we do this…

Isn’t this just statistics?
Machine Learning can take statistical analyses
and make them automated and adaptive
Statistical and numerical methods are Machine
Learning’s hammer

Supervised vs. Unsupervised
Supervised = System trained on human
labeled data (desired output
known)
Unsupervised = System operates on unlabeled
data (desired output
unknown)

Supervised learning is all about
generalizing a function or mapping
between inputs and outputs

Supervised Learning Example:
Complementary Colors
…
Training Data
…
Test Data

…
Training Data
f( ) =
…
Test Data

…
Training Data
f( ) =
f( ) =
…
Test Data

input,output
red,green
violet,yellow
blue,orange
orange,blue
…
training_data.csv
red
green
yellow
orange
blue
…
test_data.csv
First line
indicates
data
fields

Feature Vectors
A data point is represented by a feature vector
Ninja Turtle = [name, weapon, mask_color]
data point 1 = [michelangelo,nunchaku,orange]
data point 2 = [leonardo,katana,blue]
…

Feature Space
Feature vectors define a point in an n-
dimensional feature space
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1 1.2
If my feature vectors
contain only 2 values,
this defines a point in
2-D space:
(x,y) = (1.0,0.5)

High-Dimensional Feature Spaces
Most feature vectors are much higher
dimensionality, such as:
FVlaptop = [name,screen size,weight,battery life,
proc,proc speed,ram,price,hard drive,OS]
This means we can’t easily display it visually, but
statistics and matrix math work just fine

Feature Space Manipulation
Feature spaces are important!
Many machine learning tasks are solved by
selecting the appropriate features to define a
useful feature space

Task: Classification
Classification is the act of placing a new data point
within a defined category
Supervised learning task
Ex. 1: Predicting customer gender through shopping
data
Ex. 2: From features, classifying an image as a car or
truck

Linear Classification
Linear classification uses a linear combination
of features to classify objects

Linear classification uses a linear combination
of features to classify objects
result Weight vector
Feature vector
Dot product

Another way to think
of this is that we
want to draw a line
(or hyperplane) that
separates datapoints
from different
classes

Sometimes this is easy
Classes are well
separated in this
feature space
Both H1 and H2
accurately separate
the classes.

Other times, less so
This decision boundary works for most data points,
but we can see some incorrect classifications

Example: Iris Data
There’s a famous dataset published by R.A.
Fisher in 1936 containing measurements of
three types of Iris plants
You can download it yourself here:
http://archive.ics.uci.edu/ml/datasets/Iris

Example: Iris Data
Features:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class
Data:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
…
7.0,3.2,4.7,1.4,Iris-versicolor
…
6.8,3.0,5.5,2.1,Iris-virginica
…

Data Analysis
We have 4 features in our vector (the 5th is the
classification answer)
Which of the 4 features are useful for predicting
class?

0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 1 2 3 4 5 6 7 8 9
sepiawidth
sepia length
sepia length vs width

Different feature spaces give different
insight

0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9
petallength
sepia length
sepia length vs petal length

0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6 7 8
petalwidth
petal length
petal length vs petal width

0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
petalwidth
sepia width
sepia width vs petal width

Half the battle is choosing the features
that best represent the discrimination
you want

Feature Space Transforms
The goal is to map data into an effective feature space

Logistic Regression
Classification technique based on fitting a
logistic curve to your data

Logistic Regression
P(Y | b, x) =
1
1+e-(b0+b1x)

Logistic Regression
Class 2
Class 1 Probability of data point being in a class
Model weights
P(Y | b, x) =
1
1+e-(b0+b1x)

More Dimensions!
Extending the logistic function into N-
dimensions:

More Dimensions!
Extending the logistic function into N-
dimensions:
Vectors!
More weights!

Demo: Logistic Regression (Scikit-
Learn)
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
iris = load_iris()
# set data
X, y = iris.data, iris.target
# train classifier
clf = LogisticRegression().fit(X, y)
# 'setosa' data point
observed_data_point = [[ 5.0, 3.6, 1.3, 0.25]]
# classify
clf.predict(observed_data_point)
# determine classification probabilities
clf.predict_proba(observed_data_point)

Learning
In all cases so far, “learning” is just a matter of
finding the best values for your weights
Simply, find the function that fits the training
data the best
More dimensions more features we can
consider

What are we doing?
Logistic regression is actually maximizing the
likelihood of the training data
This is an indirect method, but often has good
results
What we really want is to maximize the accuracy
of our model

Support Vector Machines (SVMs)
Remember how a large number of lines could
separate my classes?

Support Vector Machines (SVMs)
SVMs try to find the optimal classification
boundary by maximizing the margin between
classes

Bigger margins mean better
classification of new data points

Points on the edge of a class are called Support
Vectors
Support
vectors

Demo: Support Vector Machines
(Scikit-Learn)
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
iris = load_iris()
# set data
X, y = iris.data, iris.target
# run regression
clf = LinearSVC().fit(X, y)
# 'setosa' data point
observed_data_point = [[ 5.0, 3.6, 1.3, 0.25]]
# classify
clf.predict(observed_data_point)

Want to try it yourself?
Working code from this talk:
https://github.com/cacois/ml-
classification-examples

Some great online courses
Coursera (Free!)
https://www.coursera.org/course/ml
Caltech (Free!)
http://work.caltech.edu/telecourse
Udacity (free trial)
https://www.udacity.com/course/ud675

AMA
@aaroncois
www.codehenge.net
github.com/cacois

Machine Learning for Modern Developers

More Related Content

What's hot

Viewers also liked

Similar to Machine Learning for Modern Developers

More from cacois

Recently uploaded

Machine Learning for Modern Developers

Editor's Notes