Python Data Analysis - Second Edition

Learn how to apply powerful data analysis techniques with popular open source Python modules

Preview in Mapt

Code Files

Python Data Analysis - Second Edition

Armando Fandango
March 2017

4 customer reviews

Learn how to apply powerful data analysis techniques with popular open source Python modules

Quick links: > Table of contents > What will you learn? > Product reviews

Mapt Subscription

FREE

€29.73/m after trial

eBook

€31.65

RRP €45.20

Save 29%

Print + eBook

€47.99

RRP €47.99

What do I get with a Mapt Pro subscription?

Unlimited access to all Packt’s 5,000+ eBooks and Videos
Early Access content, Progress Tracking, and Assessments
1 Free eBook or Video to download and keep every month after trial

What do I get with an eBook?

Download this book in EPUB, PDF, MOBI formats
DRM FREE - read and interact with your content when you want, where you want, and how you want
Access this title in the Mapt reader

What do I get with Print & eBook?

Get a paperback copy of the book delivered to you
Download this book in EPUB, PDF, MOBI formats
DRM FREE - read and interact with your content when you want, where you want, and how you want
Access this title in the Mapt reader

What do I get with a Video?

Download this Video course in MP4 format
DRM FREE - read and interact with your content when you want, where you want, and how you want
Access this title in the Mapt reader

€0.00

€31.65

€47.99

€29.74 p/m after trial

RRP €45.20

RRP €47.99

Subscription

eBook

Print + eBook

Start 14 Day Trial

Frequently bought together

Python Data Analysis - Second Edition

€ 45.20

€ 31.65

Python Data Analysis - Second Edition

Mar 2017

330 pages

€ 31.65

Python: End-to-end Data Analysis

€ 82.09

€ 57.47

Python: End-to-end Data Analysis

May 2017

931 pages

€ 57.47

Buy 2 for €35.42
Save €77.20

Add to Cart

Book Details

ISBN 139781787127487

Paperback330 pages

Book Description

Data analysis techniques generate useful insights from small and large volumes of data. Python, with its strong set of libraries, has become a popular platform to conduct various data analysis and predictive modeling tasks.

With this book, you will learn how to process and manipulate data with Python for complex analysis and modeling. We learn data manipulations such as aggregating, concatenating, appending, cleaning, and handling missing values, with NumPy and Pandas. The book covers how to store and retrieve data from various data sources such as SQL and NoSQL, CSV fies, and HDF5. We learn how to visualize data using visualization libraries, along with advanced topics such as signal processing, time series, textual data analysis, machine learning, and social media analysis.

The book covers a plethora of Python modules, such as matplotlib, statsmodels, scikit-learn, and NLTK. It also covers using Python with external environments such as R, Fortran, C/C++, and Boost libraries.

Chapter 1: Getting Started with Python Libraries

Installing Python 3

Using IPython as a shell

Reading manual pages

Jupyter Notebook

NumPy arrays

A simple application

Where to find help and references

Listing modules inside the Python libraries

Visualizing data using Matplotlib

Summary

Chapter 2: NumPy Arrays

The NumPy array object

Creating a multidimensional array

Selecting NumPy array elements

NumPy numerical types

One-dimensional slicing and indexing

Manipulating array shapes

Creating array views and copies

Fancy indexing

Indexing with a list of locations

Indexing NumPy arrays with Booleans

Broadcasting NumPy arrays

Summary

References

Chapter 3: The Pandas Primer

Installing and exploring Pandas

The Pandas DataFrames

The Pandas Series

Querying data in Pandas

Statistics with Pandas DataFrames

Data aggregation with Pandas DataFrames

Concatenating and appending DataFrames

Joining DataFrames

Handling missing values

Dealing with dates

Pivot tables

Summary

References

Chapter 4: Statistics and Linear Algebra

Basic descriptive statistics with NumPy

Linear algebra with NumPy

Finding eigenvalues and eigenvectors with NumPy

NumPy random numbers

Creating a NumPy masked array

Summary

Chapter 5: Retrieving, Processing, and Storing Data

Writing CSV files with NumPy and Pandas

The binary .npy and pickle formats

Storing data with PyTables

Reading and writing Pandas DataFrames to HDF5 stores

Reading and writing to Excel with Pandas

Using REST web services and JSON

Reading and writing JSON with Pandas

Parsing RSS and Atom feeds

Parsing HTML with Beautiful Soup

Summary

Reference

Chapter 6: Data Visualization

The matplotlib subpackages

Basic matplotlib plots

Logarithmic plots

Scatter plots

Legends and annotations

Three-dimensional plots

Plotting in Pandas

Lag plots

Autocorrelation plots

Plot.ly

Summary

Chapter 7: Signal Processing and Time Series

The statsmodels modules

Moving averages

Window functions

Defining cointegration

Autocorrelation

Autoregressive models

ARMA models

Generating periodic signals

Fourier analysis

Spectral analysis

Filtering

Summary

Chapter 8: Working with Databases

Lightweight access with sqlite3

Accessing databases from Pandas

SQLAlchemy

Pony ORM

Dataset - databases for lazy people

PyMongo and MongoDB

Storing data in Redis

Storing data in memcache

Apache Cassandra

Summary

Chapter 9: Analyzing Textual Data and Social Media

Installing NLTK

About NLTK

Filtering out stopwords, names, and numbers

The bag-of-words model

Analyzing word frequencies

Naive Bayes classification

Sentiment analysis

Creating word clouds

Social network analysis

Summary

Chapter 10: Predictive Analytics and Machine Learning

Preprocessing

Classification with logistic regression

Classification with support vector machines

Regression with ElasticNetCV

Support vector regression

Clustering with affinity propagation

Mean shift

Genetic algorithms

Neural networks

Decision trees

Summary

Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing

Exchanging information with Matlab/Octave

Installing rpy2 package

Interfacing with R

Sending NumPy arrays to Java

Integrating SWIG and NumPy

Integrating Boost and Python

Using Fortran code through f2py

PythonAnywhere Cloud

Summary

Chapter 12: Performance Tuning, Profiling, and Concurrency

Profiling the code

Installing Cython

Calling C code

Creating a process pool with multiprocessing

Speeding up embarrassingly parallel for loops with Joblib

Comparing Bottleneck to NumPy functions

Performing MapReduce with Jug

Installing MPI for Python

IPython Parallel

Summary

What You Will Learn

Install open source Python modules such NumPy, SciPy, Pandas, stasmodels, scikit-learn,theano, keras, and tensorflow on various platforms
Prepare and clean your data, and use it for exploratory analysis
Manipulate your data with Pandas
Retrieve and store your data from RDBMS, NoSQL, and distributed filesystems such as HDFS and HDF5
Visualize your data with open source libraries such as matplotlib, bokeh, and plotly
Learn about various machine learning methods such as supervised, unsupervised, probabilistic, and Bayesian
Understand signal processing and time series data analysis
Get to grips with graph processing and social network analysis

Authors

Armando Fandango

Armando Fandango is an accomplished technologist with hands-on capabilities and senior executive level experience with startups and large companies globally. Armando is spearheading Epic Engineering and Consulting Group as Chief Data Scientist. His work spans across diverse industries including FinTech, Banking, BioInformatics, Genomics, AdTech, Utilities and Infrastructure, Traffic and Transportation, Energy, Human Resource, and Entertainment.

Armando has worked for more than ten years in projects involving Predictive Analytics, Data Science, Machine Learning, Big Data, Product Engineering and High-Performance Computing. His research interests span across machine learning, deep learning, algorithmic game theory and scientific computing. Armando has authored book titled “Python Data Analysis - Second Edition” and published research in international journals and conferences.

Chapter 1: Getting Started with Python Libraries

Installing Python 3

Using IPython as a shell

Reading manual pages

Jupyter Notebook

NumPy arrays

A simple application

Where to find help and references

Listing modules inside the Python libraries

Visualizing data using Matplotlib

Summary

Chapter 2: NumPy Arrays

The NumPy array object

Creating a multidimensional array

Selecting NumPy array elements

NumPy numerical types

One-dimensional slicing and indexing

Manipulating array shapes

Creating array views and copies

Fancy indexing

Indexing with a list of locations

Indexing NumPy arrays with Booleans

Broadcasting NumPy arrays

Summary

References

Chapter 3: The Pandas Primer

Installing and exploring Pandas

The Pandas DataFrames

The Pandas Series

Querying data in Pandas

Statistics with Pandas DataFrames

Data aggregation with Pandas DataFrames

Concatenating and appending DataFrames

Joining DataFrames

Handling missing values

Dealing with dates

Pivot tables

Summary

References

Chapter 4: Statistics and Linear Algebra

Basic descriptive statistics with NumPy

Linear algebra with NumPy

Finding eigenvalues and eigenvectors with NumPy

NumPy random numbers

Creating a NumPy masked array

Summary

Chapter 5: Retrieving, Processing, and Storing Data

Writing CSV files with NumPy and Pandas

The binary .npy and pickle formats

Storing data with PyTables

Reading and writing Pandas DataFrames to HDF5 stores

Reading and writing to Excel with Pandas

Using REST web services and JSON

Reading and writing JSON with Pandas

Parsing RSS and Atom feeds

Parsing HTML with Beautiful Soup

Summary

Reference

Chapter 6: Data Visualization

The matplotlib subpackages

Basic matplotlib plots

Logarithmic plots

Scatter plots

Legends and annotations

Three-dimensional plots

Plotting in Pandas

Lag plots

Autocorrelation plots

Plot.ly

Summary

Chapter 7: Signal Processing and Time Series

The statsmodels modules

Moving averages

Window functions

Defining cointegration

Autocorrelation

Autoregressive models

ARMA models

Generating periodic signals

Fourier analysis

Spectral analysis

Filtering

Summary

Chapter 8: Working with Databases

Lightweight access with sqlite3

Accessing databases from Pandas

SQLAlchemy

Pony ORM

Dataset - databases for lazy people

PyMongo and MongoDB

Storing data in Redis

Storing data in memcache

Apache Cassandra

Summary

Chapter 9: Analyzing Textual Data and Social Media

Installing NLTK

About NLTK

Filtering out stopwords, names, and numbers

The bag-of-words model

Analyzing word frequencies

Naive Bayes classification

Sentiment analysis

Creating word clouds

Social network analysis

Summary

Chapter 10: Predictive Analytics and Machine Learning

Preprocessing

Classification with logistic regression

Classification with support vector machines

Regression with ElasticNetCV

Support vector regression

Clustering with affinity propagation

Mean shift

Genetic algorithms

Neural networks

Decision trees

Summary

Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing

Exchanging information with Matlab/Octave

Installing rpy2 package

Interfacing with R

Sending NumPy arrays to Java

Integrating SWIG and NumPy

Integrating Boost and Python

Using Fortran code through f2py

PythonAnywhere Cloud

Summary

Chapter 12: Performance Tuning, Profiling, and Concurrency

Profiling the code

Installing Cython

Calling C code

Creating a process pool with multiprocessing

Speeding up embarrassingly parallel for loops with Joblib

Comparing Bottleneck to NumPy functions

Performing MapReduce with Jug

Installing MPI for Python

IPython Parallel

Summary

Book Details

ISBN 139781787127487

Paperback330 pages

From 4 reviews

Recommended for You

Python: End-to-end Data Analysis

€ 82.09

€ 57.47

Python: End-to-end Data Analysis

May 2017

931 pages

€ 57.47

Python: Data Analytics and Visualization

€ 90.42

€ 63.30

Python: Data Analytics and Visualization

Mar 2017

866 pages

€ 63.30

Python GUI Programming Cookbook - Second Edition

€ 45.20

€ 31.65

Python GUI Programming Cookbook - Second Edition

May 2017

444 pages

€ 31.65

Statistics for Machine Learning

€ 45.20

€ 31.65

Statistics for Machine Learning

Jul 2017

442 pages

€ 31.65

Python High Performance - Second Edition

€ 36.87

€ 25.82

Python High Performance - Second Edition

May 2017

270 pages

€ 25.82

Python: Deeper Insights into Machine Learning

€ 82.09

€ 57.47

Python: Deeper Insights into Machine Learning

Aug 2016

901 pages

€ 57.47

Feb	MAR	Apr
	04
2017	2018	2019

Python Data Analysis - Second Edition

Python Data Analysis - Second Edition

Frequently bought together

Book Details

Book Description

Table of Contents

What You Will Learn

Authors

Armando Fandango

Table of Contents

Book Details

Recommended for You

Contact Us

Help & Support

Alerts & Offers

Log in to your account

Not yet a member?

Python Data Analysis - Second Edition

Python Data Analysis - Second Edition

Frequently bought together

Book Details

Book Description

Table of Contents

What You Will Learn

Authors

Armando Fandango

Table of Contents

Book Details

Recommended for You

Contact Us

Help & Support

Alerts & Offers

Series & Level

Learning

Beginner's Guide

Essentials

Cookbook

Blueprints

Mastering

Starting

Progressing