mhoye / gitcoach

This is a coaching tool for Git intended help identify codependent pieces of code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crashes with memory error

MattSeen opened this issue · comments

Machine Spec

  • Windows 7 Professional 32 bit
  • 64 bit machine
  • 4 GB RAM

Software

  • Python 2.7.1
  • gitcoach 0.2.2

Traceback (most recent call last):
File "C:\Python27\Scripts\gitlearn-script.py", line 9, in
load_entry_point('gitcoach==0.2.2', 'console_scripts', 'gitlearn')()
File "C:\Python27\lib\site-packages\gitcoach-0.2.2-py2.7.egg\gitcoach\commands.py", line 39, in learn
correlations = l.find_correlations(t1)
File "C:\Python27\lib\site-packages\gitcoach-0.2.2-py2.7.egg\gitcoach\learn.py", line 24, in find_correlations
correlations[combotuple] += 1
MemoryError

Thanks for reporting this. It seems like the obvious solution is to not hold all the training data in memory.

Out of curiosity: How large is your repository? How many commits?

Certainly makes sense.
Number of commits: 1384

Number of files: 5312

I've been testing on repos with more than 20K commits and about that number of files, so I'm surprised you're running out of memory.

I'll look into using an SQLite database instead of a pickle dump for storage; that should solve the slowness problems too.

Cool, can't wait to try it out.

OK, I redid storage with SQLite. Let's see if that works. I pushed it to my fork so you can setup.py install -f from that, or this if you have pip.

pip install --upgrade git+git://github.com/tarmstrong/gitcoach.git

Try that out. If that works better, I'll merge it into master. Thanks!

Was just trying out your change, gitlearn sql file was getting ridulously big, 7GBs when I stopped it.

Threw in a few prints, notice that it's choking on several commits in our code base which have radical amount of files being added and removed.

Ah, that makes a lot of sense. Finding the combinations of a list with a lot of files in it would take a while.

I added some code (see #16) that throws away commits with more than 6 files touched, and a command-line argument that lets you tweak that number. Give that a shot.

Sweet. The flag helps big time. Thanks you.

Great -- thanks for testing it out!