zhehaowang / sneaky

Feed and strategy for cross-venue Sneakers trading (Du, StockX).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

strategy: market data research

zhehaowang opened this issue · comments

What are some general characteristics of this market (in particular the venue du)?
Given our non-trivial holding time, we should devise our strategy according to key observations we have on our data.

We have observed:

Very few (model, size) trade regularly

Most can be characterized as having large spread, low volume and liquidity. Missing data on one side is often expected. It would be extremely risky to get into illiquid positions.

Strategy report from the most complete feed run suggests

total (style_id, size) pairs 34887
total (style_id, size) pairs 8725 with data
total (style_id, size) pairs 8725 with fresh data
total (style_id, size) pairs 4753 with fresh transactions
total (style_id, size) pairs 1820 satisfying profit cutoff ratio (bid to last) of 0.01

We can consider trading about 13.6% of the scraped space.

High volatility and price spikes are to be expected, even in most heavily traded pairs

If we sort by size (hence roughly number of transactions) (apparently we've too many files so just a ls -Sl wouldn't do)

find . -name "*.json" -exec ls -l {} + | tr -s ' ' | cut -d' ' -f 5,9 | sort -s -n -k 1,1 | tail

And look at some of our most liquid pairs:

./du_analyzer.py --style_id 554723-051 --size 7.0 --mode plot

Figure_1

The decision the current strategy would derive around 20191014 would be drastically different from any other time.
I cannot yet come up with an explanation for the spikes.

Another more extreme example (a 2017 valentine's day issue):

./du_analyzer.py --style_id 881426-009 --size 7.0 --mode plot

Figure_1

./du_analyzer.py --style_id 881426-009 --size 7.0 --mode stats
        First Date:       2019-07-31T06:49:14.439100
        Last Date:        2019-12-24T06:48:50.548423
        Number of Sales:  160
        Sales / Day:      1.10
        High:             2699.00 CNY 385.80 USD
        Low:              1819.00 CNY 260.01 USD
        First:            1859.00 CNY 265.73 USD
        Last:             2109.00 CNY 301.46 USD
        Average:          2167.75 CNY 309.86 USD
        Stdev:            178.92

This would indicate filtering and sorting by mid-to-last can be quite misleading.

We suspect our strategy to be inherently biased towards more risky new releases

./strategy.py --start_from ../feed/merged.20191225.csv | grep "Release date" | tr -s ' ' | cut -d' ' -f 3 | sort
2008-11-28
2017-01-28
2017-06-10
2017-08-05
2017-10-07
2017-10-07
2017-11-21
2018-09-05
2019-01-22
2019-06-10
2019-08-24
2019-10-25
2019-11-07
2019-11-30
2019-12-06
2019-12-06
2019-12-07
2019-12-07
2019-12-07

As a rough estimate, using the default cut-off ratios, more than half was from this year, and among those tilted towards those just released and started trading a few days ago.
Our belief is that the new issues generally "stabilize" to a price lower than the trading price of the first few days, and this stabilization period would be shorter than our holding time, meaning capturing the difference in new issues can be tricky.

I'm bearish due to volatility for automated bids.
Backtesting (sim) would require non-trivial implementation effort, and due to data scarcity I'd be doubtful whether we can derive meaningful conclusion.
Keep gathering data wouldn't hurt, but I tend to think it too risky to automate bids: we aren't disciplined enough and don't have enough data.

I agree, the sneaker market is heavily affected by "social influencers", for example if Jay Chou post some shoes on Instgram, it will increase the popularity of the shoes thus drive up the sale price / volume. and those trends is hard to predict/track

Yeah.
The other concerned I had was Du probably is not a price-time priority matching based exchange (nor do they claim to be; and even if they are it might not make sense to consider them as one, since the levels seem so thin on even the most traded pairs that the price can jump between 2000 and 2400 a few times intraday as in diagram 2. Maybe there is even seller prioritization here.).

All these would seem to increase the risk / affect whether we can derive any statistical conclusion at all.