JuliaLang / julia

The Julia Programming Language

Home Page:https://julialang.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

activate LLVM37

vtjnash opened this issue · comments

this is an umbrella issue for tracking known issues with switching to LLVM35 as default. please edit this post or add a comment below. close when someone edits deps/Versions.make to 3.6.0 3.7.0

current known issues:

  • requires editing deps/Versions.make - #14623
  • #7910 (upstream patch committed to 3.7 -- only an issue on Mac)
  • figure out performance regression (perhaps only run passes once (requires writing a TargetMachine wrapper and overloading addPassesToEmitMC) (requires changes to LLVM to expose passes))
  • requires leaving behind LLVM33
  • would eliminate content from the "fixed in LLVM 3.5" issue label
  • #8137
  • #8999
  • #9280
  • #9339 (LLVM36-only issue)
  • #9967 (fixed in LLVM36)
  • #10377
  • #10394
  • #11085 linalg/arnoldi test on win64
  • #11083
  • #10806 (afaict, this is a non-fatal issue in valgrind. jwn)
  • #11003
  • #13106 (this issue is a non-fatal performance regression. jwn)
  • re-disable FPO

related:

  • #7779 coverage flags are broken with LLVM 3.5 on master

@Keno

Also good to note that 3.5.1 just entered rc phase: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079673.html

[EDIT: Removed checklist as it was moved to the list above - @Keno]

We won't have backtraces on 3.5, isn't that pretty much a nogo?

ok, with my new commits, that pretty much covers tkelman's issues

Really cool

I am still getting the numbers test failure on latest master, win64. #7728 (comment)

that might be platform dependent. also, have you done a full rebuild of the sysimg? (fwiw, the numbers test seems to just hang for me)

Was on a sandy bridge, and I did make cleanall beforehand. I'm recompiling LLVM 3.5.0 from scratch now in case our compile flags have changed in some way.

they haven't (afaics), i just wanted to make sure you weren't pulling in any cached sys.dll code that might have been affected by the llvm copyprop bug

I'm going to guess 04893a1 or something similar may have fixed the numbers test failure with LLVM 3.5.0, but I'm getting linalg failures now - https://gist.github.com/tkelman/8c409a7083531765027c

i messed up the alignment in ca15a28, and some other stuff too. it was probably the cause of that failure. fixed in 8cea71e

I was wondering why the numbers test looked broken again, thought I was going a little nuts.

We should probably also take a look at performance metrics before flipping the switch. On my llvm-svn build, building the sys image (touch base/sysimg.jl && time make -j2) takes 4 minutes as compared to 2 minutes on LLVM3.3

That's because we run all the passes twice ;). But yes, that's a TODO item.

Derp. Well that makes sense then!

@Keno is that just applicable building the sysimg?

No, MCJIT does not yet have a way to alter the passes that it runs as far as I'm aware (though I was promised this would be possible in the near future), this means that e.g. our SIMD lowering pass would not be run if we didn't run all the passes ourselves (I suppose we could just set MCJIT's opt level to None, but I think that actually also disabled optimizations in the backend).

ah, i see that now. It appears we would need to create a TargetMachine wrapper class and overload addPassesToEmitMC in order to set the passes list. Not very direct, but also not very difficult.

Yes, that would probably work.

so if someone patches that, and we merge your patches for #7910 into the https://github.com/JuliaLang/llvm branch, i think we might be ready to switch to llvm35

Sounds right to me.

Should we be shooting for 3.6.0 now that it's out? http://llvm.org/releases/

it looks like we could create a micro-branch to trivially address the remaining item on the above list and make the switch

Anybody else see a failure on the complex test with 3.6.0?

exception on 3: ERROR: LoadError: test failed: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in expression: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in error at ./error.jl:19
 in default_handler at ./test.jl:27
 in do_test at ./test.jl:50
 in runtests at /home/tkelman/Julia/julia/test/testdefs.jl:79
 in anonymous at ./multi.jl:833
 in run_work_thunk at ./multi.jl:584
 in anonymous at ./multi.jl:833
while loading complex.jl, in expression starting on line 7
        From worker 2:       * broadcast            in  27.41 seconds
ERROR: LoadError: LoadError: test failed: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in expression: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in anonymous at ./task.jl:1370
while loading complex.jl, in expression starting on line 7
while loading /home/tkelman/Julia/julia/test/runtests.jl, in expression starting on line 3

        From worker 3:       * complex             make[1]: *** [all] Error 1
make: *** [test] Error 2

tkelman@ygdesk:~/Julia/julia$ ./julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+3637 (2015-03-01 18:30 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 74c71fb* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> versioninfo()
Julia Version 0.4.0-dev+3637
Commit 74c71fb* (2015-03-01 18:30 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY PENRYN)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.6.0

julia> sqrt(complex(0.0,-0.0))
0.0 + 0.0im

julia> isequal(sqrt(complex(0.0,-0.0)), complex(0.0,-0.0))
false

Hmm, that shouldn't occur: the relevant lines in sqrt are:

x, y = reim(z)
if x==y==0
    return Complex(zero(x),y)
end

Changing y from -0.0 to a 0.0 isn't a valid (IEEE754) transformation. A similar problem occurred in #9880 (comment).

Unfortunately, even after playing around with the pass order, I'm still seeing significant performance regressions in compile time (about 2x during the linalg testsuite, which is the most codegen heavy we have). We should do some profiling to see if this can't be improved.

Once everything is finally fixed and we make the switch, we should remember to go looking for places in the code or tests with LLVM-related todo's.

CI failure in #10741 points out that we are going to need a newer build toolchain on travis because LLVM now requires gcc > 4.7, whereas Travis is still on 4.6 (travis-ci/travis-ci#979, travis-ci/travis-ci#1379).

since we are still targeting msvc, i think we'll need to make sure our code is compatible with both

Both what? We can poke Elliot to help update the ppa for llvm when needed, we can probably use the toolchain ppa (which should have 4.8, 4.9, etc) to build there.

Presumably both gcc and vc++... On that note, LLVM now requires VS2013+, which supports a decent subset of C++11.

What is the status of pass control/ordering issue?

Do we still need this: "It appears we would need to create a TargetMachine wrapper class and overload addPassesToEmitMC in order to set the passes list. Not very direct, but also not very difficult."

Or did someone write that already?

that's item 3 above

Only run passes once (requires writing a TargetMachine wrapper and overloading addPassesToEmitMC) (requires changes to LLVM to expose passes)

also, I realized the proposed wrapper class was impossible to create in C++

I realized the proposed wrapper class was impossible to create in C++

Mind to clarify: does "impossible" refer to the fact that the real target implementations we need (X86Target, etc.) are all themselves derived from llvm::LLVMTargetMachine (via llvm::TargetMachine)?

requires changes to LLVM to expose passes

Are you aware of any existing patches?

One simple idea is to add EngineBuilder::setMCJITPassList(std::vector<Pass*>) and then have MCJIT add all of the passes each time emitObject is called, before calling addPassesToEmitMC.

Are you aware of any existing patches?

the existing mechanisms are deprecated, although the replacement is not yet written

One simple idea is to add EngineBuilder::setMCJITPassList(std::vector<Pass*>) and then have MCJIT add all of the passes each time emitObject is called, before calling addPassesToEmitMC.

yeah, that's the right part of the code that we would want to modify in a custom build of llvm to allow us to put our own pass list there instead of the pass list from the TargetMachine

is any of the new orcjit stuff in 3.6.0 and/or useful for us?

OrcJIT didn't make it into 3.6 AFAICT.

http://llvm.1065342.n5.nabble.com/Re-Modify-a-module-at-runtime-in-MCJIT-td79638.html

I saw this:
"The Orc layers provide a "removeModuleSet" method that enables you to delete the JIT state associated with a particular module (or set of modules) that you have added. This saves you from manually managing multiple MCJIT interfaces."

As something potentially useful out of OrcJIT.

Yes, Orc is quite good and also organizes some of the messy code we have in julia to deal with MCJIT quirks. I suspect we might move to it in the near future.

figure out performance regression compared to JIT

a quick profiling run on OS X with llvm3.6.0+Release+Asserts seems to indicate the following time losses:

  • addPassesToEmitMC (~8% of total function emission time)
  • ~PassManagerImpl (~3% of total function emission time)
  • FPPassManager::doInitialization (~1.5% of total emission time)
  • FPPassManager::doFinalization (~6% of total emission time)
  • PMTopLevelManager::findAnalysisPass / PMTopLevelManager::findAnalysisPassInto (~9% of total emission time, in Assertions build only)
  • FPM->run(f) (adds ~40% to total emission time, mostly in instruction combining, but this time is not being double-counted, since those passes don't rerun during emitObject)

some other info:

  • running the AsmPrinter ~5%
  • physical register allocation (greedy) ~5%
  • live variable analysis run ~2%
  • machine scheduling (processor pipeline optimization) ~5%
  • lowering llvm opcodes to sdnode opcodes ~20%
    • SDNode combination optimizations ~5%
  • relocation ~pretty negligable%
  • totals are given as a % of time spent in MCJIT::emitObject for a call to ./julia invalid_arg

edit: by comparison to llvm3.3, the total function optimization + emission time seems to be ~160 ms vs 360 ms and seems to be a combination of several existing passes becoming much more expensive (register allocation, instruction combining) and the addition of a few more intermediates (PassManager, assembly printing)

#10806 might also be good to have on this list, though it is unclear how serious of an issue it is.

@tkelman mentioned that codegen bugs in LLVM33 were holding up some of the important array changes. Could we just switch to LLVM36 and carry along patches as we already do now for LLVM33?

Specifically I was referring to #10525 (comment)

We don't have proposed patches yet for all of the not-yet-checked boxes in here, but I would be in favor of @Keno trying to bring his patches into the repo here to at least temporarily solve any problems that we know the solution to.

+1 to that.

I've justed tested LLVM 3.6.1 yesterday and it passes all the julia tests. =)

So there is one week left to nominate patches for 3.6.2

https://groups.google.com/d/msg/llvm-dev/wEnVJk3akns/IlgEvBGOpp8J

Re: exposing passes, looks like the LLILC folks are doing some useful work on top of ORC that we might be able to use if it gets upstreamed (or just carry locally, it's also MIT-licensed)? dotnet/llilc#685

3.7.0 is out, I'm guessing there's no really strong reason to try too hard to make 3.6 work other than for distro packagers.

please only add blocking issues to the list at the top list, not performance regressions and such

Isn't the massive codegen performance regression the last blocking issue?

yes, but that is already in the list above, and is a regression in codegen itself, not in the generated code

Between fast-isel, my most recent commit and an LLVM patch which I'll submit upstream tomorrow, I've gotten the bootstrap building time on llvm-svn down from 10 min to 2min5s, with 3.3 taking about 1m45s. Need to do some more extensive benchmarking to see what the remaining regressions are.

edit: link to recent commit is 699dd2d

How big of an impact is the patch? Will we be waiting for 3.8, or can we carry the patch locally against 3.7.0?

11 lines

The best patches are small.

I think that means it's time to switch master to our own form of LLVM with that patch on it. What do you say, @Keno?

s/form/fork/

A patch file that applies to a release tarball of llvm would be preferable to relying on a git fork

Uh, why? We need to track LLVM SVN without being subject to random breakage whenever they decide to change APIs. Maintaining our own fork of LLVM that tracks upstream while rebasing a few patches and fixing breakage periodically seems far easier than anything else. This is literally what git rebase is for.

Do we actually need to track llvm svn on master? Picking an arbitrary unsupported development version of such a large dependency to pin to makes it impossible for distros to package julia. We would also need to leave these branches in place permanently as is, otherwise building past points in julia master history will not be reproducible for future bisecting.

If these patches are small we can nominate them for being backported to llvm 3.7.1. If we can structure our requirements as patch files relative to release versions of llvm rather than our own diverging fork that is only relevant to us, it makes for much better release stability discipline on our part.

I agree with @tkelman. If we want to continue providing nightly packages, better stick to a stable release (possibly with a few patches if needed). I can't impose on the Fedora infrastructure to rebuild a custom LLVM fork everyday with Julia.

Our first priority needs to be making it less impossible to use crucial projects like Cxx, Gallium and the threading branch – all of which need a fairly current version of LLVM. At this point, it really needs to be possible use all of those with a vanilla master version of Julia. Otherwise these projects are never going to get the kind of testing and development that they need to become widely usable. As soon as there's a release of LLVM that we can support all of those, we can use that. This might happen with LLVM 3.9, which should be out by the time Julia 0.5 is ready.

Regarding distros, we can maintain compatibility with whatever versions of LLVM they are willing to ship. However, the current policy of forcing programming languages to use whatever random version of LLVM they happen to package essentially guarantees that things are going to be broken and shitty. We have no control over that policy, but I refuse to let distro policies act as a boat anchor that slows down all of Julia development any more than it already has.

@StefanKarpinski I don't know if you're aware but there has been a fair amount of effort by @staticfloat to set up nightly buildbots for using llvm svn and to build nightlies with the cxx.jl build configuration. Those have never built successfully. We need to fix that first before imposing that level of brokenness and difficulty in building on master.

This isn't just about distros, it's also about not getting in the business of indefinitely maintaining our own custom fork of llvm. We need to get things upstreamed and use upstream's release numbering as some indication that things have been tested and will remain stable within that release series.

Yes, of course, getting things working is necessary. I also guarantee you the fastest way to get things fixed is to switch Julia master over to using LLVM SVN.

And yes, of course we need to upstream patches to LLVM. But we need to do that either way, so it's irrelevant to this debate. Waiting for things to get upstreamed before using them is killing the progress of this project. It needs to stop now. If we keep Julia master on a released version of LLVM that means the earliest we can switch to a newer LLVM is 3.7.1, which might happen in November. But the patch might not make it into the 3.7.1, which means that the earliest we can update LLVM is February. Which is when Julia 0.5 is due – but even if LLVM 3.8 is released before Julia 0.5, it will be too risky to update LLVM that late in the release cycle. That would mean Julia 0.5 being released still using LLVM 3.3. That is totally unacceptable – LLVM 3.3 will be almost three years old at that point and being stuck on it has completely blocked crucial advances to the project.

I'm sorry, there's no other choice here: we need to start getting people using a newer version of LLVM, which requires Julia master using a forked copy of LLVM SVN for a while.

getting in the business of indefinitely maintaining our own custom fork of llvm.

Is there any serious LLVM user who doesn't maintain their own fork? We can't force upstream to take patches on any specific timeframe. If we have to maintain a fork in order to push things forward, then so be it. I'm happy to volunteer to help maintain the fork if it means that we can rid ourselves of the current morass of ifdefs, get some semblance of sanity for building from LLVM trunk on Windows, and get more people pushing hard on debugging improvements.

(IMHO improved debugging is by far the highest-leverage thing we can do right now to increase the velocity of base development, on which so much else rests).

cxx.jl build configuration. Those have never built successfully.

Is there an issue open about this?

Wasn't the proposal to switch master to LLVM 3.7 with patches now?

I don't see how using LLVM 3.7 with patches is better than a periodically updated fork. Since we do upstream fixes, tracking LLVM SVN means that the number of patches we need to maintain gets smaller as the changes get accepted upstream, and we get more testing with upstream changes for longer.

The model here should be that Julia master tracks LLVM master until we're ready for a release, at which point we can stabilize on the most recently released LLVM version, possibly with some patches.

There's a cxx.jl issue about providing nightlies for ci with more details. edit: JuliaInterop/Cxx.jl#73

3.7 with patches is better IMO because it's a known reproducible quantity as opposed to switching to an ephemeral git branch model. Switching back and forth between a fork and releases sounds like extra work when we know julia releases will have to use releases of llvm. That's a ways off right now, but I don't want to get in the situation that you can't build 3-month old master to bisect an issue because the branches have all changed.

For appveyor to continue working we also need to provide new windows llvm binaries for win32 and win64 every time llvm changes. If we get shared library llvm to work on windows (which it never has aiui) that might be fairly simple to provide via buildbot binaries. Otherwise we have to put static libs and headers in the nightly binaries or a separate deps upload, or manually build by hand repeatedly.

If we get shared library llvm to work on windows (which it never has aiui) that might be fairly simple to provide via buildbot binaries.

https://github.com/JuliaLang/llvm/pull/1 (builds fine, haven't run tests recently)

Some of the difficulties with building newer LLVM versions is just that they start to use kernel and compiler features that are newer than our ancient CentOS 5 buildbots! Once 0.4 final is released, we can experiment with CentOS 6 buildbots to see if they work any better. Note that this means people with glibc older than 2.12 will not be able to use our binaries, or build Julia from source. That's probably fine, as I think most scientific clusters (the most difficult customers of all) have moved on to CentOS/RH 6.X already. Not an assertion I can back up with hard data though. :P

Right now the llvm svn and cxx nightlies buildbots are separate jobs than the mainline release builders, so some of that centos 6 experimentation we could start now right?

If one is tracking the bleeding edge of LLVM, is it wise to use a wonderfully-solid-but-invariably-behind distro like CentOS?

Whether using LLVM 3.7 with patches is viable or not largely depends on how many patches are needed. We need at least this 11-line patch to fix the code gen regression, which is quite small. But that's not all we want to support. I think that more modifications are needed to support threading, Cxx and Gallium. I may be wrong about that at this point – I haven't kept track of everything that ended up in LLVM 3.7.

Keeping a rebased patch set on an LLVM fork is a strictly more flexible model than the official release + patches model. We can emulate the release + patches model simply by setting the LLVM fork to the commit of that release. The bisect argument strikes me as specious for a couple of reasons: even if we track LLVM SVN more closely than every release, we would still only update it every month or so, and bisect would work fine within that range of commits; also, the LLVM build system handles incremental rebuilds just fine, so it wouldn't be hard to make sure that bisect works well even with frequently changing LLVM source, as long as we've got our Makefiles right.

We need to build on an old system to make the binaries generic and possible to run anywhere newer. We rebuild gcc on the buildbots but have less control over kernel and glibc.

Everything I said about appveyor applies equally to the llvm package in the juliadeps ppa for travis too, we'll need to keep that frequently updated.

I don't think I explained the bisect issue clearly enough. If we start using an llvm fork on julia master, we need to treat the llvm branch like master or a release branch and not force push or rebase on it. Any new rebase should be a new branch, so previously-referred-to commits from past versions of julia can still be fetched.

edit: we should try our best to make it possible to always check out an arbitrary point in Julia's history and do a clean build from there, which referring to a git branch that has changed and may no longer build with Julia from source would make harder.

As Tony says, I'm not too afraid of getting a build environment that works, I think the bigger impact will be raising the hard limit of what systems Julia 0.5 works on. At our current release pace, it's entirely likely that every old curmudgeonly computer cluster out there will be compatible with glibc 2.12 by the time 0.5 goes final (har har) but it's hard to tell without some kind of hard data.

I'm also in agreement with @tkelman; we will need to start treating our llvm fork like openspecfun or openlibm, and ensuring that we don't accidentally rebase out old commits. If that just means that we create a new branch every time we want to rebase stuff to make it prettier (thereby forcing git to keep that old somewhat-redundant history around), that's just what we'll have to do.

I think the bigger impact will be raising the hard limit of what systems Julia 0.5 works on.

On that note, LLVM3.7 is the last release to support Windows XP as a runtime platform. (to be clear, I'm still in favor of bumping)

an LLVM patch which I'll submit upstream tomorrow

@Keno can you post a link?

Need to do some more extensive benchmarking to see what the remaining regressions are.

my profiling attempts earlier (#9336 (comment)) indicated that another 10% loss comes from having Assertions turned on while having two FunctionPassManagers and another 20% loss from constructing a new FPM for every module

That's not my concern though. I suspect MCJIT still has a couple more O(N^2) case in the number of modules. Haven't submitted the patch yet, but it's just:

diff --git a/include/llvm/ExecutionEngine/SectionMemoryManager.h b/include/llvm/ExecutionEngine/SectionMemoryManager.h
index 0b0dcb0..446c887 100644
--- a/include/llvm/ExecutionEngine/SectionMemoryManager.h
+++ b/include/llvm/ExecutionEngine/SectionMemoryManager.h
@@ -84,6 +84,7 @@ public:

 private:
   struct MemoryGroup {
+      SmallVector<sys::MemoryBlock, 16> PendingMem;
       SmallVector<sys::MemoryBlock, 16> AllocatedMem;
       SmallVector<sys::MemoryBlock, 16> FreeMem;
       sys::MemoryBlock Near;
diff --git a/lib/ExecutionEngine/SectionMemoryManager.cpp b/lib/ExecutionEngine/SectionMemoryManager.cpp
index b22c6db..e319216 100644
--- a/lib/ExecutionEngine/SectionMemoryManager.cpp
+++ b/lib/ExecutionEngine/SectionMemoryManager.cpp
@@ -83,7 +83,7 @@ uint8_t *SectionMemoryManager::allocateSection(MemoryGroup &MemGroup,
   // Save this address as the basis for our next request
   MemGroup.Near = MB;

-  MemGroup.AllocatedMem.push_back(MB);
+  MemGroup.PendingMem.push_back(MB);
   Addr = (uintptr_t)MB.base();
   uintptr_t EndOfBlock = Addr + MB.size();

@@ -138,6 +138,13 @@ bool SectionMemoryManager::finalizeMemory(std::string *ErrMsg)
   // relocations) will get to the data cache but not to the instruction cache.
   invalidateInstructionCache();

+  // Now, remember that we have successfully applied the permissions to avoid
+  // having to apply them again.
+  CodeMem.AllocatedMem.append(CodeMem.PendingMem.begin(),CodeMem.PendingMem.end());
+  RODataMem.AllocatedMem.append(RODataMem.PendingMem.begin(),RODataMem.PendingMem.end());
+  CodeMem.PendingMem.clear();
+  RODataMem.PendingMem.clear();
+
   return false;
 }

@@ -145,7 +152,7 @@ std::error_code
 SectionMemoryManager::applyMemoryGroupPermissions(MemoryGroup &MemGroup,
                                                   unsigned Permissions) {

-  for (sys::MemoryBlock &MB : MemGroup.AllocatedMem)
+  for (sys::MemoryBlock &MB : MemGroup.PendingMem)
     if (std::error_code EC = sys::Memory::protectMappedMemory(MB, Permissions))
       return EC;

@@ -153,7 +160,7 @@ SectionMemoryManager::applyMemoryGroupPermissions(MemoryGroup &MemGroup,
 }

 void SectionMemoryManager::invalidateInstructionCache() {
-  for (sys::MemoryBlock &Block : CodeMem.AllocatedMem)
+  for (sys::MemoryBlock &Block : CodeMem.PendingMem)
     sys::Memory::InvalidateInstructionCache(Block.base(), Block.size());
 }

since the MCJIT MemoryManager can be easily replaced, can we copy that code into our local repo to avoid needing to patch llvm versions, but get the benefits immediately?

Yes. We can. Also, submitted upstream: http://reviews.llvm.org/D13156.

👍 that was a quick one :)

I have found another ~15% performance and also finally managed to get a decent profile here (wasn't easy - naive profiling shows 50% time spent in malloc with no backtrace because OS X system library doesn't have unwind info). I am hopeful that we can get rid of the pass initialization and destruction by doing tricks with ORCJIT (or even MCJIT). I'm also hopeful we might be able to squeeze out the suff in between Pass init/deinit and Isel, which from looking at it doesn't seem like it needs to take that long. For those wondering, overhead of just MCJIT is the stuff to the left of init/right of deinit.

screen shot 2015-10-08 at 1 13 25 am

FWIW, I think I have now found all the O(N^2) in the number of modules. Current performance results are about a 1.5-2x regression on average on the test suite. One outlier is unicode at 2.5x

Good news: Was able to get rid of pass initialization overhead. Bad news: Didn't do much for performance numbers. Will do more benchmarking tomorrow.

Better news: I was benchmarking wrong and there actually does seem to be 14% improvement, which is in line with what I would have expected looking at the profile above.

Even better news: I realized I was accidentally still using LLVM_ASSERTIONS for llvm-svn. With the combination of those two we're down to 1.3-1.5x, which is starting to get close to where we can start talking.

What exactly are you comparing, when you say you get a 2.5x regression on unicode? The performance compiling all of unicode.jl on LLVM 3.3 vs svn?
Are there any particular things in the files included by unicode.jl that are harder for 3.8 to deal with?

I'm getting a segfault on win64 with LLVM 3.7.0 in the linalg/triangular test, not much backtrace though:

#0  0x00000000778494ba in ntdll!RtlVirtualUnwind () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Can open a separate issue if anyone has any inkling which LLVM version we're going to want to end up using, either 3.7.0 + patches or a protected checkpoint branch of 3.8-devel.

@tkelman added to the tracker list above (re-disable FPO)

bisected to 127ee86, if that makes any sense

  • re-disable FPO

I'm getting a segfault at the very start of bootstrap now with LLVM 3.7.0 on both win32 and win64. Expected?

$ gdb -q --args /home/Tony/julia/usr/bin/julia-debug.exe -C x86-64 --output-ji `cygpath -w /home/Tony/julia/usr/lib/julia/inference0.ji` -f coreimg.jl
Reading symbols from /home/Tony/julia/usr/bin/julia-debug.exe...done.
(gdb) r
Starting program: /home/Tony/julia/usr/bin/julia-debug.exe -C x86-64 --output-ji D:\\cygwin64\\home\\Tony\\julia\\usr\\lib\\julia\\inference0.ji -f coreimg.jl
[New Thread 4344.0x2284]

Program received signal SIGSEGV, Segmentation fault.
0x00000001002c00ba in julia.new_0 ()
(gdb) bt
#0  0x00000001002c00ba in julia.new_0 ()
#1  0x000000006ea1dc8c in jl_apply (f=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#2  0x000000006ea21e5b in jl_trampoline (F=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/builtins.c:1021
#3  0x000000006ea1038e in jl_apply (f=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#4  0x000000006ea16ec6 in jl_apply_generic (F=0x1011eebf0, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/gf.c:1916
#5  0x000000006ea27f5d in jl_apply (f=0x1011eebf0, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#6  0x000000006ea2841d in do_call (f=0x1011eebf0, args=0x1011f6448, nargs=2, eval0=0x0,
    locals=0x0, nl=0, ngensym=0) at /home/Tony/julia/src/interpreter.c:65
#7  0x000000006ea2908c in eval (e=0x1011eec30, locals=0x0, nl=0, ngensym=0)
    at /home/Tony/julia/src/interpreter.c:213
#8  0x000000006ea280b9 in jl_interpret_toplevel_expr (e=0x1011eec30)
    at /home/Tony/julia/src/interpreter.c:27
#9  0x000000006ea46aa9 in jl_toplevel_eval_flex (e=0x1011eec10, fast=1)
    at /home/Tony/julia/src/toplevel.c:525
#10 0x000000006ea46d6f in jl_parse_eval_all (fname=0x6f9a5e1d <system_image_path+1309> "boot.jl",
    len=8) at /home/Tony/julia/src/toplevel.c:575
#11 0x000000006ea46fbf in jl_load (fname=0x6f9a5e1d <system_image_path+1309> "boot.jl", len=8)
    at /home/Tony/julia/src/toplevel.c:615
#12 0x000000006ea332b5 in _julia_init (rel=JL_IMAGE_JULIA_HOME) at /home/Tony/julia/src/init.c:575
#13 0x000000006ea34a02 in julia_init (rel=JL_IMAGE_JULIA_HOME) at /home/Tony/julia/src/task.c:278
#14 0x0000000000402e0e in wmain (argc=1, argv=0x2d70b00, envp=0x2d73800)
    at /home/Tony/julia/ui/repl.c:605
#15 0x000000000040140c in __tmainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.2-1/crt/crtexe.c:329
#16 0x000000000040153b in mainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.2-1/crt/crtexe.c:212

since that commit (40d46e7) is still waiting for CI verification before it lands on master, I can confidently say that is not the problem.

fixed @tkelman's finding with 6ee80d3

Just to update my kf/modulecoalescing branch now uses the same or less memory than it does with 3.3. It also passes all tests, so we're getting pretty close. I'm currently running msan/asan/valgrind to make sure there aren't any subtle memory bugs remaining, as well as tracking down a small performance regression (couple %). Getting close.

I don't think I've ever been so excited about a totally invisible change.

Once it's no longer necessary to go to build heroics to use Gallium, it won't be invisible for long 😄.

I say we pull the plug on 3.3 if it is only a small regression.

It's not a small regression until that branch has gone through CI and been merged. LLVM 3.7.0 doesn't bootstrap at all on win32 with current master.

Yes, I'm the the process of cleaning up the patch. Should be ready very soon. I have verified that all the remaining performance regressions are essentially due to LLVM itself rather than the new vs old JIT. Some of those look solvable, but that should be done as part of a general effort to improve the performance, rather than ad hoc now (I don't think we have the infrastructure in place yet to adequately measure and track performance here yet - that's something we should work on).