JBKahn / django-sharding

A sharding library for Django

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Release timeline/branch

4word opened this issue · comments

Do you have a timeframe you want to do the release in? I am currently running the latest master (as of like an hour ago) against the unit tests I have at work just to make sure nothing crazy comes up that the library's tests didnt catch, if you plan on releasing soon I can make sure we are on the release or master branch on our pre-prod system and report any strangeness.

The only thing I have noticed so far: running migrations appears to be a bit slower than the version we were using that was a month or two old. Runing a trace, it appears to be spending a lot of time determining if a migration should be run on a db, more time than it used to. Not sure if this is a repeatable behavior though, I will make some new junk migrations and see how they go.

If I can release by next week that would be great. What's the best timeline for you as far as testing it out and being reasonably confident?

Hmm. And this is on the same version of Django? Interesting. If this is true and we can compare to another code base where it's not taking as long, we can try to deduce it from the trace of both the old and the new?

Does this account for the pre-migration signal adding? That wasn't in my old version, although it may have been in yours? But then if it's in the allow_migrate, that's interesting....

I am not 100% certain its the allow_migrate, that just seems to be called a lot in the trace. I don't have a trace from the version that was running faster for me. I will investigate some more and maybe checkout one of the old versions of the branch I was using.

Timeline for us is almost immediate, we have 98%+ unit test coverage and are averaging over 20 executions per line during unit tests for all our sharded model code. I will run our end to end tests as well just to be ultra safe, but I can let you know of anything breaking by tomorrow evening. The e2e tests take a while since we do billing stuff in them.

Brilliant, if I was to upgrade our app it would also be same day. If you can run your tests for tomorrow evening and give me a :+1 then I'll run my app through it tomorrow as well and release it tomorrow night/ the next morning or asap if something comes up.

That sounds great. Will let you know, running our unit tests now and will kick off e2e tests in the morning (they take hours)

adding a few small fixes, nothing that should break it: #41 . Mostly linting. Our tests are now passing with that getattr change.

Our test suite is passing.

Our test suite is passing as well, but I can confirm the migration stuff is definitely slower than it was. Our testing suite between the two versions runs for ~ 10 minutes longer with no other code changes, and one of my co-workers has complained about it this morning a few times already since they had new migrations to deal with.

I am betting this new code in the router is the culprit, inspecting the stack in python is super slow, and when we are doing migrations stuff, we are applying it to 512 shards + our main default database, so this is being executed 512 times more often than it usually does.

# New versions of Django use the router to make migrations with no hints.....
        making_migrations = any(['django/core/management/commands/makemigrations.py' in i[1] for i in inspect.stack()])
        if making_migrations:
            return True

Whats the reason for the code here?

I need a way to detect it was making migrations because they do not pass the model name as a hint. They didn't used to run through this code. I can try removing it but it was exploding because of:
https://github.com/django/django/blob/1.10.1/django/core/management/commands/makemigrations.py#L105
Currently it needs the model to determine during migration runs if it's supposed to create that model during the migration. So I cannot remove the dependancy on having the hint there. it's an annoying move by the Django dev team to not pass any hint that you're in a makemigration context or what you're making a migration for.

Interesting. Thanks for opening a ticket. I will take a look as well to see if there is something we can do that's more efficient. It is killer right now: making a migration for a single field on a single model takes like 12 minutes!

I'm not sure what to do and why they created a API like that for allow_migrate.

So... looking at the django source, I think you can just return True if the app has at least 1 sharded model in it and the model_name hint is not passed.

Even in dynamic usage, ALL django calls to the method have the model_name except during makemigrations and during any user defined RunPython or RunSql custom migration commands. We can easily document that you must pass a model_name if you use one of those migration functions.

So I think the check can be changed to something like:

if not hints.get('model_name') and ...code for checking the app has at least 1 sharded model here... :
    return True

It looks like all the makemigrations commands are looking for is a hard stop, a reason from the router that the migrations should not even be considered. For example, if there is a read-only model in the app, you would not want django to generate the migrations, the user should define them on their own (ideally, the user would put read-only models into their own app, but I think people not doing that are the reason for this check).

I would almost say to just return True if the model_name isn't defined no matter what. Thats basically the old behavior, correct? Since the method was never called, it was always just doing the migration?

Also, another option is to write our own makemigrations command that all it does is set an environment variable we can check, then calls super, then unsets the variable. Then any makemigrations would be caught and we could handle them.

class Command(MakemigrationCommand):
    os.environ["MAKING_MIGRATIONS] = "true"
    return_val = super(....)
    del os.environ["MAKING_MIGRATIONS"]
    return return_val
So... looking at the django source, I think you can just return True if the app has at least 1 sharded model in it and the model_name hint is not passed.

Even in dynamic usage, ALL django calls to the method have the model_name except during makemigrations and during any user defined RunPython or RunSql custom migration commands. We can easily document that you must pass a model_name if you use one of those migration functions.

Agreed and I did write that. However I don't want to return True in any case when I'm not sure. if anything I should return None. However, the issue is that when you make a data migration/RunPython command without it...I don't want it to run the migration successfully with no warning. How would you even know that you'd forgotten or where to look. I'm leaning toward overriding makemigrations for now.

I have one more PR to put in this evening that I would like to get put into the release, will open it up after I get some time to write unit tests.

Cool. 👍

On Thu, Sep 8, 2016, 7:09 PM Titus Peterson notifications@github.com
wrote:

I have one more PR to put in this evening that I would like to get put
into the release, will open it up after I get some time to write unit tests.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#40 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACiBvQefibMs-cwEki-gRS27eqZZVryAks5qoJXBgaJpZM4J3fpt
.

Alright, all the tests are passing on my project but I didn't get to this two days ago and I'm taking a few days off work. The day I get in I'm going to push it to our CI server and release it if I don't find anything.

Released.