yhx189 / portfolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This directory contains a number of tools that may be helpful, or at
least interesting, in implementing the data analysis/mining part of
the project, and the trading strategy part of the project.

To use any of the tools, you need to add this directory to your PATH:

export PATH=$PATH:/path/to/this/directory


Where is the stock data?
------------------------

There are copies of the stock data in Oracle and in MySQL.  
You can use either database simply by setting the correct
environment variables:

PORTF_DBMS   = mysql or oracle  
PORTF_DB     = <appropriate db, "cs339" for oracle, "cs339data" for mysql>
PORTF_DBUSER = <a user account that can access that db,
                your oracle account for oracle, "cs339" for mysql>
PORTF_DBPASS = <password for that account, 
                your oracle password for oracle, "aPaSSwoRd!" for mysql>

This setup is shown in the files setup_ora.sh and setup_my.sh.  To use
the oracle copy of the stock data, for example, run

source ./setup_ora.sh

Once these variables are set up, you can use the wrapper library in
stock_data_access.pm to do your SQL queries on the data.  The wrapper
will hide many of the details of whether you're using Oracle or MySQL.

You only have SELECT access to the Oracle and MySQL stock databases.
You cannot create/drop tables or insert/update/delete data in the
existing tables.  Additional data you have should go into your own
database (your regular oracle account).

Regardless of whether you use MySQL or Oracle, you will find a table
called StocksDaily, which contains all of the recorded daily
information.  There are almost 3.5 million records in this table, so
be careful with your queries!

You will also find a table called StocksSymbols, which contains a
summary of information for each stock symbol in the database.

These two tables are also replicated in the Cassandra keyspace
"Stocks".  



Tools for accessing the historical stock data
---------------------------------------------

The following tools will let you manipulate data from the commmand
line.  If you first source the appropriate setup file (see
setup_ora.sh, above), these tools will run seamlessly on top of either
MySQL or Oracle. You can generally run these tools without arguments
to get help.

get_data.pl

   This one lets you extract data for a symbol from the database.  
   For example:

      get_data.pl --close --from="1/1/99" --to="12/31/00" AAPL

   will print two columns of data: timestamps and the close prices for
   the stock AAPL (Apple Computer) for the given two years.  If you
   add a --plot, you'll get a graphical view.

get_info.pl

   This one will give you summary statistics for a group of stocks.
   For example:

      get_info.pl --field=close --from="1/1/99" --to="12/31/00" AAPL IBM

   will print statistics about the close prices for Apple and IBM for
   those years.

get_random_symbol.pl

   Picks a random symbol for you.  Useful when you're doing random
   samples to test some trading strategy or predictor.


get_symbols.pl

   Gets all the available symbols.




Tools for simple analysis
-------------------------

get_info.pl 
   
   Described above.

get_covar.pl

   This will compute the covariance or correlation of two or more
   stocks, giving you a covariance (or correlation) matrix.  You can
   use it to find out which stocks tend to be correlated (or
   anti-correlated).  Two stocks that are correlated move together,
   which means if you invest in both of them, you get higher
   volatility.  Stocks that are anti-correlated move in opposite
   directions, which means if you invest in both of them, you'll get
   lower volatility.

   Example:

     get_covar.pl --field1=close --field2=close --from="1/1/99" --to="12/31/00" AAPL IBM G

   This will compute the covariance among the close prices of Apple,
   IBM, and GM over the period in question.  Add a --corrcoeff to get
   correlation coeefficients.

   Note that this is an expensive tool since it needs to join the
   StocksDaily table with itself.



Tools for predicton
-------------------

markov_symbol.pl
  
   This tool will attempt to predict future values of a stock based on
   past values using a Markov model that's continually trained.

   Example:

     markov_symbol.pl AAPL 16 1

   This will try to predict Apple's close prices from history using a
   1st order Markov model.  Because Markov models are discrete (state
   machines) and this is continuous data, we must choose how to
   discretize it.  The "16" indicates that we will use 16 levels.

stepify.pl
   
   Used by markov_symbol to discretize data

eval_pred.pl

   Used by markov_symbol to evaluate predictions

markov_online.pl
markov.pm
  
   This is the core of the Markov predictor



genetic_symbol.pl

   This tool will use genetic programming to try to evolve a predictor
   for the close prices of a given stock symbol.  This is an offline
   tool, in that what is returned is the predictor, not the
   prediction.

   Example:

     genetic_symbol.pl AAPL 10 

   This tries to find the best program for looking at the last 10
   Apple stock prices and predicting the next stock price (close) from
   them.  What is printed are the best programs of each generation.
   The programming language is Lisp/Scheme-like.  Each program is also
   measured by its fitness (prediction error - lower is better) and
   its structural complexity (lower is better).

genetic_predictor_online
pred.ini
   
   The genetic programming-based predictor and config files.

time_series_symbol.pl

   Evaluate a wide range of linear and nonlinear time series models
   for predicting close values of a stock symbol.  Run
   time_series_predictor_online to see the list of models.  All models
   must be prefaced with "AWAIT" or "MANAGED" in order to work.

   Example:
     
      time_series_symbol.pl AAPL 4 AWAIT 200 AR 16

   Looks at the first 200 closing values of Apple, fits an AR(16)
   model to them, and then uses that model to predict 4 days ahead for
   each following value.  The results are fed through an evaluation
   process, to result in a table showing error levels for predictors
   for each of the 4.  Note: the goal is to minimize the
   MeanSquareError with the model.

time_series_symbol_project.pl

   Similar to time_series_symbol.pl, but this one will simply present the
   historic data followed by the predictions.


time_series_evaluator_online
time_series_predictor_offline
time_series_predictor_online
time_series_project

    Used internally by time_series_symbol.pl, but you can run them too. 
    In particular, time_series_predictor_online/offline can print the actual
    predictions.
 

Example Trading Strategy
------------------------

shannon_ratchet.pl
    
    This implements the "Shannon Ratchet" trading strategy invented
    by Claude Shannon.  The Shannon Ratchet provides a positive rate of
    return for any stock that is a geometric random walk.   In fact, the 
    more volatile the stock, the greater the rate of return!!!  This kind of 
    random walk is what would be expected were the Efficient Market Hypothesis 
    true.  Shannon surprised people by finding a way to milk volatility.

    The idea is really simple.  Every day, you rebalance your
    portfolio to be 50% in the stock, and 50% in cash.  That's it.
    You'll see it is quite powerful, even with real stock data.  It
    works less well if you have to pay to do trades :)


    Example:

        shannon_ratchet.pl AAPL 1000 20

    This plays the Shannon Ratchet on Apple, starting with $1000 to
    invest, and assuming a $20 trading cost.  The result will show
    both the return with and without paying the trading cost.


Data Access Library
-------------------

stock_data_access.pm
  
    This implements a simple SQL query interface that hides the 
    details of interfacing with MySQL or Oracle.   You are, however
    responsible for providing SQL that is compatible with the 
    database you use.   This is used in all the "get_*" tools. 
  
    use stock_data_access.pm;

    print ExecStockSQL("TEXT", "select * from ...");

    Note that it is necessary in some cases to preface the 
    name of the tables being used in the stock database, and these
    can be different in MySQL or Oracle.   A helper function is 
    provided:

    $sql = "select ... from ".GetStockPrefix()."StocksDaily ..."



Tools for getting current and historical stock quotes
-----------------------------------------------------

quote.pl

   Given a list of symbols, this will retrieve current quotes
   for each of them

quotehist.pl

   Given a symbol, get historical market data from Internet sources,
   if possible.


Examples for accessing stock data and plotting from CGI
-------------------------------------------------------

plot_stock.pl
  
   A CGI script that shows how to get stock data, including plots
   from the logic tier of a web application.   This basically shows
   how to use the stock market data acccess library from within a 
   Perl CGI script.   

About


Languages

Language:Perl 95.4%Language:JavaScript 2.4%Language:CSS 2.0%Language:Shell 0.2%