libchaos / Learning-Data-Mining-with-Python-Second-Edition

Learning Data Mining with Python Second Edition by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning Data Mining with Python - Second Edition

This is the code repository for Learning Data Mining with Python - Second Edition, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

import numpy as np
dataset_filename = "affinity_dataset.txt"
X = np.loadtxt(dataset_filename)

It should come as no surprise that you’ll need a computer, or access to one, to complete the book. The computer should be reasonably modern, but it doesn’t need to be overpowered. Any modern processor (from about 2010 onwards) and 4 gigabytes of RAM will suffice, and you can probably run almost all of the code on a slower system too. The exception here is with the final two chapters. In these chapters, I step through using Amazon’s web services (AWS) for running the code. This will probably cost you some money, but the advantage is less system setup than running the code locally. If you don’t want to pay for those services, the tools used can all be set-up on a local computer, but you will definitely need a modern system to run it. A processor built in at least 2012, and more than 4 GB of RAM are necessary. I recommend the Ubuntu operating system, but the code should work well on Windows, Macs, or any other Linux variant. You may need to consult the documentation for your system to get some things installed though. In this book, I use pip for installing code, which is a command line tool for installing Python libraries.

Another option is to use Anaconda, which can be found online here:

http://continuum.io/downloads

I also have tested all code using Python 3. Most of the code examples work on Python 2 with no changes. If you run into any problems, and can’t get around it, send an email and we can offer a solution.

Related Products

Suggestions and Feedback

Click here if you have any feedback or suggestions.

About

Learning Data Mining with Python Second Edition by Packt

License:MIT License


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%