duttashi / learnlp

natural language processing with python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tutorial: Natural Language Processing in Python

This repo contains material for a 1-semester course on Natural Language Processing with Python.

Audience


The target audience are students, researchers, developers, hobbyists and anyone interested in knowing more about Natural Language Processing and Text Analytics.

Some very basic knowledge of Python is assumed (e.g. if you have seen some Python script before, you're good to go), but no previous NLP knowledge is required.

Environment Set up


The code has been tested with Python 2.7 only on Windows 7 64-bit OS.

Step 1 - Navigate to desktop and clone this repo

  • Open PowerShell by pressing and releasing the keys Windows and R together on the keyboard and release these two keys together. If you have done it right, then a Run dialog box will open up. Type “powershell” in Run dialog box and click the OK button.

  • In PowerShell type the command cd c:\users\yourusername\desktop (ensure to subsitute yourusername) and press the enter key on the keyboard.

  • Now type the command git clone https://github.com/duttashi/learnlp

Step 2- Install Anaconda and iPython Notebook

  • Downloads and install Anaconda from here. Choose Python 2.7 version. Select the default options when prompted during the installation of Anaconda.

  • Launch IPython notebook by typing jupyter notebook in PowerShell

Step 3- Check installed libraries versions

  • Click the new button on the notebook.

    scipy

    import scipy

    print('scipy: %s' % scipy.version)

    numpy

    import numpy

    print('numpy: %s' % numpy.version)

    matplotlib

    import matplotlib

    print('matplotlib: %s' % matplotlib.version)

    pandas

    import pandas

    print('pandas: %s' % pandas.version)

    statsmodels

    import statsmodels

    print('statsmodels: %s' % statsmodels.version)

    scikit-learn

    import sklearn

    print('sklearn: %s' % sklearn.version)

You should see output like the following:

scipy: 0.19.0

numpy: 1.12.1

matplotlib: 2.0.2

pandas: 0.20.1

statsmodels: 0.8.0

sklearn: 0.18.1

Step 4- Install Deep Learning Libraries

In this step, we will install Python libraries used for deep learning, specifically: Theano, TensorFlow, and Keras.

NOTE: While installing the deep learning libraries, if you encounter any error, check out the Issues tab or else search for possible answers on www.stackoverflow.com website.

  • Install the Theano deep learning library by typing: conda install theano

Confirm your deep learning environment is installed and working correctly by executing the following commands in the ipython notebook

# theano

import theano

print('theano: %s' % theano.__version__)

You should see an output like;

theano: 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
  • Install Keras by typing: pip install keras

    import keras

    print('keras: %s' % keras.version)

    Using TensorFlow backend. keras: 2.0.8

  • Install Tensorflow by typing: activate tensorflow, your prompt should change. You should see something like, (tensorflow)C:>.

  • To install the CPU-only version of TensorFlow, enter the following command: (tensorflow)C:> pip install --ignore-installed --upgrade tensorflow

  • To install the GPU version of TensorFlow, enter the following command (on a single line): (tensorflow)C:> pip install --ignore-installed --upgrade tensorflow-gpu

  • Validate the installation by launching the IPython Notebook. In the notebook type the command,

    import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))

If the system outputs the following, 'Hello, TensorFlow!' then you are ready to begin writing TensorFlow programs:

Summary

Congratulations, you now have a working Python development environment for machine learning.

You can now learn and practice machine learning and deep learning on your workstation.

Where to go from here?

Please see the folder scripts where you will find ipython notebooks for further learning.

Here are some interesting questions and answers on StackOverflow. I recommend these should be read in the order, On Statistical knowledge- 1, On data mining- 2, 3.

Enjoy and Keep Calm!

About

natural language processing with python

License:Other


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%