Phase Change Index

Question

Phase Change Index

infiniteneurotech opened this issue a year ago · comments

Hello,

I'm working on writing the Phase Change Index in a form that will work with Freqtrade. Of course I wouldn't be doing this if it already existed somewhere but my searching hasn't given any results. To my mind the indicator appears to be very simple, but in practice it's quite a bit harder than I expected and my Python skills are really not up to the task. I'm hoping I can get some help and I've tried asking on Discord but I think that format isn't really suitable for my questions.

I'll try my best to explain what the indicator does. The indicator works by drawing an imaginary straight line between the previous close and close n periods ago. It then checks to see if the actual close values were above or below the theoretical gradient line and uses that information to determine how the market has been moving in the past. Tradingview has a version of the indicator, and I'm well aware of the issues with translating indicators from Pine. The publication by the original author is available online too if it would help for me to post a link.

The difficult part I've found is finding a way to access the previous n close values in a way that doesn't cause the indicator to look ahead. The indicator needs each of the n previous close values for each new bar to be able to calculate the gradient line. The obvious solutions are to use shift() or rolling(). However, shift only provides access to the single value but the indicator needs every value between current close and close.shift(n). Rolling() provides the lookback period, but I still need the actual previous n close values to write a function that could be applied using rolling().

Here's what I have so far (keep in mind I'm very much still learning!):

def phase_change_index(self, dataframe: DataFrame, period):
        df = dataframe.copy()

        # Calculate momentum
        df['momentum']=ta.MOM(df, period)
        
        # Define values in window
        values = list(range(0,36))

        #Divide momentum value by number of items in window
        value_list = [i/35 for i in values]

        #Multiply momentum by each value in list to create momentum factor, and create dataframe
        mom = [i * df['momentum'] for i in value_list]
        dfmom = pd.DataFrame(mom, index=values)
        data = dfmom.transpose()

        
        # Calculate values along a gradient line that connects start and end values of window period
        shift_close = df['close'].shift(36)
        gradient = data.apply(lambda x: x + shift_close)

I can create a dataframe of the deviations from the gradient using:

        # deviation = pd.DataFrame()
       
        # deviation[1] = df['close'].shift(4) - gradient[1]
        # deviation[2] = df['close'].shift(3) - gradient[2]
        # deviation[3] = df['close'].shift(2) - gradient[3]
        # deviation[4] = df['close'].shift(1) - gradient[4]

But surely there must be a better way, especially for longer lookback periods? The author suggests a period of 35 so coding it manually is a hassle.

The next step after calculating the deviation is to determine if the actual close values at each point on the line are above or below the gradient. For example, if close.shift(n) > gradient, the deviation value from that point on the line should be added to variable [up] else add to [down].

The PCI is then calculated as ([up].rolling(n).sum() / ([up].rolling(n).sum() + [down].rolling(n).sum())) * 100

How does this look? Does anyone have any suggestions for how this could be completed or improved? Am I on the right track at least or should I be using a different approach?

Thanks, any help is appreciated!

Matthias · Answer 1 · Tue Apr 04 2023 00:32:00 GMT+0800 (China Standard Time)

honestly, your description of the indicator leaves more questions than answers/explanations ... (ignoring the code for now - as that could lead in a wrong way right away).

Assuming a lookback of 35 for simplicity...

for point T - would you then have exactly one gradiend (df[T] vs. df[T - 35]) - or would you have 35 gradients to compare to?
If it's one - it should be pretty simple (you get exactly one value per timepoint - so using .shift() should work just fine.

If you're looking for 35 gradients (assuming mid-dataframe) - then you'll need to keep in mind that as you approach "now" (the end of the dataframe) - the number of possible gradients will decrease - ending with exactly one gradient (Now - 35 candles ago).
You also need to be very careful how each of the lines is used to really avoid a lookback, but without actually seeing / understanding what you're trying to do (the above is many words, but little (or confusing) information.

infiniteneurotech · Answer 2 · Tue Apr 04 2023 07:07:36 GMT+0800 (China Standard Time)

Thanks for your reply. The situation is the second one - there are 35 gradient points, and each gradient point is compared to its equivalent past close price, ie the first gradient value is subtracted from close[t-35], then gradient 2 - close[t-34] etc.

I realise my explanation is not overly clear - I've found it very hard to explain how it works and the original article took me a few attempts to understand. Not to be condescending but I found the best way for me to visualise the indicator's machinations was to go to a chart and draw a straight line 35 bars long between the current close and the close 35 bars ago. This is the gradient line, and has actual values on the chart. There will be close values above and below this line, and by subtracting the gradient value from the close and summing the values above and below, you can find out the proportion of close values that were greater or less than the gradient.

The purpose of the indicator is to identify when the market is moving between uptrend, downtrend, and consolidation phases (hence phase change index).

I hope that helps, my paper calculations for this indicator look very promising so it would be great to find a way to implement it in Freqtrade. Thanks

Matthias · Answer 3 · Tue Apr 04 2023 12:44:17 GMT+0800 (China Standard Time)

if one requirement is to look at all 35 gradient points (not a partial amount, like 34 or fewer) - then your indicator will stop 35 candles from the current candle.
you'll therefore only see "35 candles ago, we were moving up" - but have 0 information on what happened in between (this same thing is true for EVERY point in the dataframe ... even though it may not look like that).

your approach will probably something around this to populate the gradient columns (i'm currently too lazy to calculate the actual gradient - and your "+" application doesn't actually do that).

for i in range(35):
    dataframe[f'gradient_{i}'] = dataframe['close'].shift(0-i)  ... dataframe['close'].shift(i)

you then have a list of 35 columns - each of which you'll compare against close (i assume).
(careful, the above intentionally looks into the future).

You can't really do this without looking into the future - as at any point T - you'll need 35 gradients, comparing "close vs close-35", "close+1 vs. close-34", ..."close+34 vs. close-1" ...).

In intermediate calculations, that's also not necessarily a problem.
However, when using the indicator, you must shift it forward by 35 candles, as the comparison result looks into the future by 35 candles.

To me, this sounds like a very visual indicator - which works well when looking at a chart - but not so much when using it in an automated fashion - unless you consider it a long-term indicator, and only use each value once the full values have populated (basically by shifting it forward by 35 candles).

infiniteneurotech · Answer 4 · Tue Apr 04 2023 14:00:56 GMT+0800 (China Standard Time)

Thanks again for your response, that's really helpful. I modified your loop slightly with this:

def phase_change_index(self, dataframe: DataFrame, period):
        df = dataframe.copy()

        # Calculate momentum
        df['momentum']=ta.MOM(df, period)

        data=pd.DataFrame()

        for i in range(35):
            data[f'gradient_{i}'] = df['close'].shift(0-i) + (df['momentum'] * (i/35))

        print(data)

When I print to see what's happening, it looks like the most recent gradient value is in column 0 position 0, then value 2 is in column 1 position -1, column 2 position -2 etc. It actually almost traces out a gradient line across the columns, which is kind of cool. And when I check the values against my hand calculations they're correct. We're winning!

With that data layout I'm not unsure how to subtract the close and the gradient, as I'll need to iterate through both columns and rows. Here's my attempt below, in pseudocode since I don't know how to do it properly, but should give an idea of what I'm trying to achieve:

df['up'] = 0.00
        df['down'] = 0.00
        
        for i in range(35):
            if df['close'].shift(0-i) > df['gradient_i'].shift(0-i):
                df['up'] = df['close'][i] - df['gradient_i'].shift(i)
            else:
                df['down'] = df['gradient_i'].shift(i) - df['close'][i]

Shifting the close price incrementally looks easy, but I don't know how to shift both row and column for the gradient values. Any help is much appreciated!

[edited for formatting]

Matthias · Answer 5 · Tue Apr 04 2023 15:57:51 GMT+0800 (China Standard Time)

well you have 35 gradients per candle - so shifting close again will be wrong.
you'll simply need to compare the single close 35 times (to the 35 gradients you created before - without any shift) - which should give you 35 x 2 new columns (up + down) per candle.

infiniteneurotech · Answer 6 · Tue Apr 04 2023 18:52:07 GMT+0800 (China Standard Time)

I don't think that's quite right though, I'm not comparing each gradient to a single close value, I need to compare each gradient to the close value at that point along the line - gradient.shift(1) compared with close.shift(1), gradient.shift(2) with close.shift(2) etc for the 35 points. That's what I meant by shifting the close as well. Unless I'm misunderstanding what you mean?

Matthias · Answer 7 · Tue Apr 04 2023 22:06:20 GMT+0800 (China Standard Time)

for each point T - you've got 35 gradients.

If you shift the close as well and compare it 35 times (so 35 * 35) - then you're doing the comparison twice.

infiniteneurotech · Answer 8 · Wed Apr 05 2023 07:55:14 GMT+0800 (China Standard Time)

Ok I think I see what you mean, I found a mistake in my code. Instead of:

for i in range(35):
            data[f'gradient_{i}'] = df['close'].shift(0-i) + (df['momentum'] * (i/35))

It should be:

for i in range(35):
            data[f'gradient_{i}'] = df['close'].shift(35) + (df['momentum'] * (i/35))

Which gives me this output:

      gradient_0    gradient_1    gradient_2    gradient_3  ...   gradient_31   gradient_32   gradient_33   gradient_34
0            NaN           NaN           NaN           NaN  ...           NaN           NaN           NaN           NaN
1            NaN           NaN           NaN           NaN  ...           NaN           NaN           NaN           NaN
2            NaN           NaN           NaN           NaN  ...           NaN           NaN           NaN           NaN
3            NaN           NaN           NaN           NaN  ...           NaN           NaN           NaN           NaN
4            NaN           NaN           NaN           NaN  ...           NaN           NaN           NaN           NaN
...          ...           ...           ...           ...  ...           ...           ...           ...           ...
7311    26968.94  27003.831143  27038.722286  27073.613429  ...  28050.565429  28085.456571  28120.347714  28155.238857
7312    27124.91  27152.764571  27180.619143  27208.473714  ...  27988.401714  28016.256286  28044.110857  28071.965429
7313    26971.71  27001.623143  27031.536286  27061.449429  ...  27899.017429  27928.930571  27958.843714  27988.756857
7314    26962.88  26985.226857  27007.573714  27029.920571  ...  27655.632571  27677.979429  27700.326286  27722.673143
7315    27057.36  27085.813143  27114.266286  27142.719429  ...  27939.407429  27967.860571  27996.313714  28024.766857

Now each row is the whole gradient line at that close value (I think).

Which means I should be able to compare close.shift(0) with gradient_0, close.shift(1) with gradient_1, etc without doubling up.
I think it should look something like this, but it doesn't work and I don't know what's wrong:

for columns in data:
            if df['close'].shift(0-i) > data[columns].values:
                df['up'] = df['close'].shift(i) - df[columns].value
            else:
                df['down'] = df[columns].value - df['close'].shift(i)

Once I have the up and down columns populated, I think the rest of the indicator should simply be:

       df['up_sum'] = df['up'].rolling(36).sum()
       df['down_sum'] = df['down'].rolling(36).sum()

        df['deviation_sum'] = df['up_sum'] + df['down_sum']

        df['PCI'] = (df['up_sum']/df['deviation_sum']) * 100

How does this look to you? I think it's nearly there. Thanks for your patience with my questions, I appreciate you taking the time to help.

Matthias · Answer 9 · Wed Apr 05 2023 12:27:01 GMT+0800 (China Standard Time)

well your up/down assignment must be vectorized (look at np.where if you need conditions per row)

the way you have it, it'll be a "last column wins" - which is not what you want

infiniteneurotech · Answer 10 · Wed Apr 05 2023 15:36:24 GMT+0800 (China Standard Time)

Ok, thanks. I've looked into np.where and tried this:


        deviation['up'] = np.where(data['gradient_1'] > df['close'].shift(34), data['gradient_1'] - df['close'].shift(34), 0)
        deviation['down'] = np.where(data['gradient_1'] < df['close'].shift(34), df['close'].shift(34)- data['gradient_1'], 0)

Which gives me the output:

              up        down
0       0.000000    0.000000
1       0.000000    0.000000
2       0.000000    0.000000
3       0.000000    0.000000
4       0.000000    0.000000
...          ...         ...
7311    0.000000  121.078857
7312  181.054571    0.000000
7313   38.743143    0.000000
7314    0.000000   72.133143
7315   87.183143    0.000000

I think this output means that each value in each row of the up column corresponds with gradient_1 - close.shift(34). And it will be updated with every new bar, right? So how do I use np.where to create a function where gradient_1 - close(34), gradient_2 - close(33), gradient_3 - close(32), etc? Each gradient calculation would need its own column, eg up_1, up_2, up_3, down_1, down_2, down_3... etc. Is that right, or is there a better way? Thanks again

Matthias · Answer 11 · Wed Apr 05 2023 16:05:56 GMT+0800 (China Standard Time)

well probably yes - you need 35 up/down values for evern candle - which you then aggregate over- but i think you're messing up the calculations again - by again double-shifting close.

This is all VERY confusing - as the actual intent is totally unclear.
Quite honestly, you're best off first learning pandas, before trying to write an algorithm which you clearly can't explain clearly (which suggests you yourself are not entirely sure how it's actually supposed to work).

infiniteneurotech · Answer 12 · Wed Apr 05 2023 18:09:05 GMT+0800 (China Standard Time)

I do 100% understand how it works, what I lack is the knowledge of python to implement it. As I said right at the start, I know it's hard to explain and it took me a few goes to get my head around it. I'll try one last time to explain it. I don't want to waste your time, I realise you volunteer to help out for an open-source project but I know this will be really simple to calculate with the knowledge of Python and Freqtrade. I know I keep saying it but I do appreciate you taking the time to read all of this.

Perhaps a worked example would be the best way to explain, and to keep it simple I'll use a lookback period of 5 bars. Let's say you have 5 close prices - $1, $2, $2, $1, $5. The first step is to calculate the difference between the latest close and the close 5 bars ago - so 5 -1 = $4 difference. That also happens to be what the momentum indicator does, so I though I could use ta-lib or pandas-ta to calculate momentum.

The next step is to calculate an imaginary straight line between the close 5 periods ago and the current close, which is called the gradient. Take the momentum and divide by 5 (the number of samples), and add that incrementally to the first close value to calculate the points along the line. In this example, a momentum of 4, divided by 5 samples = 0.8. Note that the we end up using n-1 because the first and last values of the gradient line will equal the close values. You'll see what I mean below.

To get the first point along the gradient line, you take close.shift(5) + (momentum*(0/4)), so 1+(4*(0/4) = 1. This is gradient value 1, which will be the same as close.shift(5).
For the second point you take close.shift(4) + (momentum*(1/4)) = 2+(41/4)=3.
For the third point, close.shift(3) + (momentum(2/4)) =2+(4*(2/4)= 4
Fourth point, close.shift(2) + (momentum*(4/5)) = 1+(4*(3/4)) = 4
The last value will be the same as the close at the end of the gradient line, as that's close + (momentum*(4/4)) = 5
( I know there's aliasing in the gradient line in this example, it's because of the low sample rate and round numbers, ignore it)

The next step is to compare the gradient line values with the actual close value at that point. We can call this the deviation. If the gradient > close, then the value is gradient - close, and the result is saved in the up-deviation column. If gradient < close then the value is close - gradient, and that is saved as a down deviation.

So for gradient 1, the close at that point on the gradient line is 1 and the gradient is 1, so the deviation is 0.
For gradient 2, the close at that point on the gradient line is 2 and the gradient is 3, so 3 - 2=1, which is stored in the down column because the gradient > close
For gradient 3, the close is 2 and the gradient is 4, so 4-2 = 2 in the down column because gradient > close
Fourth point is 4 so 4-1 = 3 in the down column
Fifth point is 0 as it's the close itself so there's no deviation.

Now that the up and down columns are populated, we need to work out the sum of those columns. So the sum of down is 1+2+3 = 6. The sum of up is obviously 0 as there were no values stored.

Next add the sum of up with the sum of down. In this case that's 6.

Then the final step is calculate the PCI by dividing the sum of up by (sum of up + sum of down) then * 100 to create a percentage value. So (0/6) * 100 = 0.

The final value tells us is that 0% of the close values were > the gradient line. Without muddying the waters further by trying to explain the interpretation of the results, that's how the indicator is calculated. I can see I've sent a wall of text and I have a headache now, but I'm not sure I can explain it any other way. For the sake of completeness I've attached a pdf of the original publication of the indicator in case you want to read it.

Thanks again

phase_change_index(1).pdf

Matthias · Answer 13 · Thu Apr 06 2023 03:33:17 GMT+0800 (China Standard Time)

Looking at that paper raises more questions than answers ...

Looking at the below image - how is 35.41 + 1.8/0/4 = 35.41 ... and then the next line, it's 35.86, ...
it's wrong in both instances ... from a pure mathematical standpoint.

Matthias · Answer 14 · Thu Apr 06 2023 03:44:37 GMT+0800 (China Standard Time)

i really lack the patience/time to fully understand this - but this is a topic where chatbots might actually do a decent job.

Disclaimer: using chatGPT (GPT4), providing it with your explanation and the following query (without the PDF):

can you code the above in python, using pandas, assuming a dataframe with the columns "date, open, high, low, close, volume", thanks.

It was even nice, thanking for the great explanation 😆

import pandas as pd


def pci(data, lookback=5):
    # Calculate momentum
    data['momentum'] = data['close'] - data['close'].shift(lookback)

    # Calculate gradient values
    gradient_values = []
    for i in range(lookback):
        gradient_value = data['close'].shift(
            lookback) + (data['momentum'] * (i / (lookback - 1)))
        gradient_values.append(gradient_value)

    # Calculate deviations
    up_deviation = pd.DataFrame(index=data.index, columns=[
                                f'up_dev_{i}' for i in range(lookback)])
    down_deviation = pd.DataFrame(index=data.index, columns=[
                                  f'down_dev_{i}' for i in range(lookback)])
    for i, gradient_value in enumerate(gradient_values):
        up_deviation[f'up_dev_{i}'] = (data['close'].shift(
            i) - gradient_value).where(data['close'].shift(i) > gradient_value, 0)
        down_deviation[f'down_dev_{i}'] = (
            gradient_value - data['close'].shift(i)).where(data['close'].shift(i) < gradient_value, 0)

    # Sum up and down deviations
    data['up_dev_sum'] = up_deviation.sum(axis=1)
    data['down_dev_sum'] = down_deviation.sum(axis=1)

    # Calculate PCI
    data['pci'] = (data['up_dev_sum'] /
                   (data['up_dev_sum'] + data['down_dev_sum'])) * 100
    # data.drop(columns=['momentum', 'up_dev_sum', 'down_dev_sum'], inplace=True)

    return data


# Example data
data = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=20, freq='D'),
    'open': [0.5, 1.5, 1.5, 0.5, 4.5, 4.0, 5.0, 5.5, 6.0, 7.0, 8.0, 7.5, 7.0, 6.5, 9.0, 9.5, 8.5, 10.0, 11.0, 12.0],
    'high': [1.5, 2.5, 2.5, 1.5, 5.5, 5.0, 6.0, 6.5, 7.0, 8.0, 9.0, 8.5, 8.0, 7.5, 10.0, 10.5, 9.5, 11.0, 12.0, 13.0],
    'low': [0, 1, 1, 0, 4, 3.5, 4.5, 5.0, 5.5, 6.5, 7.5, 7.0, 6.5, 6.0, 8.5, 9.0, 8.0, 9.5, 10.5, 11.5],
    'close': [1, 2, 2, 1, 5, 4, 5, 6, 6, 7, 8, 8, 7, 7, 9, 10, 9, 10, 11, 12],
    'volume': [1000, 2000, 2000, 1000, 5000, 4000, 5000, 6000, 6000, 7000, 8000, 8000, 7000, 7000, 9000, 10000, 9000, 10000, 11000, 12000]
})

result = pci(data)
print(result)

infiniteneurotech · Answer 15 · Thu Apr 06 2023 06:27:46 GMT+0800 (China Standard Time)

Hahaha that's great, AI really will make us all redundant. I'll try it out. My only other concerns is, from your perspective, does the indicator look forward at all? I'd like to be able to use it in backtesting. It looks ok to me but I've been wrong before...

infiniteneurotech · Answer 16 · Thu Apr 06 2023 15:40:56 GMT+0800 (China Standard Time)

Ok I've had a chance to test out the indicator and after a few tweaks to suit my strategy it's already providing some solid results. So thanks for your time and energy helping me with this - between you, me and ChatGPT we got there in the end 😆

Briefly re ChatGPT for Freqtrade if I may - did you actually just put my explanation in along with your instructions and it produced that completed output? No other steps? If so I know how I'll be building my indicators from now on!

Cheers

Matthias · Answer 17 · Thu Apr 06 2023 16:52:24 GMT+0800 (China Standard Time)

i used this post #322 (comment) - without the first paragraph - and with the query i posted above.

You can easily test out if an indicator is forward looking by calculating the indicator - then adding (or removing) a new row - and calculating it again (for the whole dataframe).

You'd expect that

no old values are changed, and the calculation yields the same resut
the new row get's a value (it didn't have one before).
this repeats for an unlimited amount of candles
this also repeats if old candles (beyond the lookback horizon) are removed, so the dataframe length is kept at an equal length (obviously with the caveat that the "startup candles" won't have values calculated anymore)

infiniteneurotech · Answer 18 · Thu Apr 06 2023 19:19:13 GMT+0800 (China Standard Time)

Ok no problems, I'll keep that in mind for the future. I think we can call this done now, thanks again for your help and patience