PyData Seattle 2015 - Simulation, PySpark and Deep Learning Tutorials

One month ago, I got an email from a colleague:

The subject of his email was a single word: PyData

The email had a single EventBrite.com URL.

Intrigued, I clicked... and that's when my adventure into PyData began.

In this post, I’m going to take you through 3 of the tutorials at PyData Seattle 2015: Simulation, PySpark and Deep Learning.

Keynote on Friday (Jul-24)

One-sentence summary? An interesting 1-hour plug for code.org, but oddly no mention of Python or much about data analysis.

Who was the speaker? Hadi Partovi gave the keynote on Friday. Slick presentation and a great message.

What was the key message? The key message for me was the Hour of Code. Students (or in some cases entire schools) commit to a one-hour coding event to expose themselves to the awesome world of computing.

Where can I find out more? The keynote summary, code.org and the Hour of Code website are all great resources to learn more about this initiative.

Simplified statistics through simulation

One-sentence summary? Use Python's NumPy and SciPy to do simulation in order to get a better understanding of statistics.

Who was the speaker? Justin Bozonier gave this tutorial. Great speaker and actively engaged with the audience without forcing the audience to participate (personally, I don't like forced participation).

What was the key message? If you'd like to explore some statistics or probability theory, simulate it using Python so you can see the concepts come to life.

Where can I find out more? All the tutorial material is uploaded to Justin's github PyData2015 folder.

His PyData2015 IPython Notebook is a great way to experiment with the tutorial.

It covers:

Monte Carlo Simulation
Bootstrapping
Simulating a function
Simulating split tests
Simulating a probability puzzle
...and a few bonuses.

A brief introduction to Distributed Computing with PySpark

One-sentence summary? A great hands-on intro to Spark and PySpark.

Who was the speaker? Holden Karau gave this hands-on tutorial. I found her to be an engaging and knowledgeable speaker! She explained concepts in understandable language (especially for those new to Spark, like me!).

What was the key message? The key message for me was PySpark makes manipulating Spark for large-scale data processing super intuitive and straight-forward.

Where can I find out more? The tutorial slides are on SlideShare.

I especially liked the coverage of:

Comparison to Hadoop
Different parts of Spark
RDDs + lazy evaluation
Data frames and working with tweets

Learn to Build an App to Find Similar Images using Deep Learning

One-sentence summary? An awesome tutorial providing hands-on manipulation of deep learning tasks through Dato's GraphLab platform.

Who was the speaker? Piotr Teterwak (from Dato) gave this fantastic tutorial. He seems super knowledgeable and provided the history, motivation and difficulties of deep learning.

What was the key message? My key takeaway is Dato's GraphLab platform makes it super easy to play with and inspect the complexities of deep learning. However, I also enjoyed the history, motivation and difficulties of deep learning that Piotr went over.

Where can I find out more? The tutorial materials are available from Dato.

The IPython Notebooks (check out step 2 from the tutorial materials) from the tutorial are fantastic.

I really enjoyed the 3 exercises:

Hand-written digit recognition of the MNIST dataset
Implementing a dress recommender
Deploying the dress recommender as a service

Now it's your turn...

Ready to rock with these PyData tutorials?

Then download the tutorials to get rolling right away.

The simulation tutorial is probably the easiest to get setup. The deep learning one is slightly more challenging, but super rewarding!

Enjoy!

About the Author

Ray Li

Ray is a software engineer and data enthusiast who has been blogging for over a decade. He loves to learn, teach and grow. You’ll usually find him wrangling data, programming and lifehacking.

PyData Seattle 2015 – Simulation, PySpark and Deep Learning Tutorials