activate LLVM37

Question

activate LLVM37

vtjnash opened this issue 10 years ago · comments

this is an umbrella issue for tracking known issues with switching to LLVM35 as default. please edit this post or add a comment below. close when someone edits deps/Versions.make to ~~3.6.0~~ 3.7.0

current known issues:

#7779 coverage flags are broken with LLVM 3.5 on master

@Keno

Tony Kelman · Answer 1 · Sat Dec 13 2014 14:43:38 GMT+0800 (China Standard Time)

Also good to note that 3.5.1 just entered rc phase: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079673.html

[EDIT: Removed checklist as it was moved to the list above - @Keno]

Keno Fischer · Answer 2 · Sat Dec 13 2014 16:46:52 GMT+0800 (China Standard Time)

We won't have backtraces on 3.5, isn't that pretty much a nogo?

Jameson Nash · Answer 3 · Sun Dec 14 2014 11:56:46 GMT+0800 (China Standard Time)

ok, with my new commits, that pretty much covers tkelman's issues

Tim Holy · Answer 4 · Sun Dec 14 2014 12:01:16 GMT+0800 (China Standard Time)

Really cool

Tony Kelman · Answer 5 · Sun Dec 14 2014 12:16:32 GMT+0800 (China Standard Time)

I am still getting the numbers test failure on latest master, win64. #7728 (comment)

Jameson Nash · Answer 6 · Sun Dec 14 2014 12:47:32 GMT+0800 (China Standard Time)

that might be platform dependent. also, have you done a full rebuild of the sysimg? (fwiw, the numbers test seems to just hang for me)

Tony Kelman · Answer 7 · Sun Dec 14 2014 12:52:03 GMT+0800 (China Standard Time)

Was on a sandy bridge, and I did make cleanall beforehand. I'm recompiling LLVM 3.5.0 from scratch now in case our compile flags have changed in some way.

Jameson Nash · Answer 8 · Sun Dec 14 2014 13:17:06 GMT+0800 (China Standard Time)

they haven't (afaics), i just wanted to make sure you weren't pulling in any cached sys.dll code that might have been affected by the llvm copyprop bug

Tony Kelman · Answer 9 · Thu Dec 25 2014 17:27:21 GMT+0800 (China Standard Time)

I'm going to guess 04893a1 or something similar may have fixed the numbers test failure with LLVM 3.5.0, but I'm getting linalg failures now - https://gist.github.com/tkelman/8c409a7083531765027c

Jameson Nash · Answer 10 · Mon Dec 29 2014 10:07:50 GMT+0800 (China Standard Time)

i messed up the alignment in ca15a28, and some other stuff too. it was probably the cause of that failure. fixed in 8cea71e

Tony Kelman · Answer 11 · Mon Dec 29 2014 10:13:52 GMT+0800 (China Standard Time)

I was wondering why the numbers test looked broken again, thought I was going a little nuts.

Isaiah Norton · Answer 12 · Sun Jan 25 2015 00:02:22 GMT+0800 (China Standard Time)

We should probably also take a look at performance metrics before flipping the switch. On my llvm-svn build, building the sys image (touch base/sysimg.jl && time make -j2) takes 4 minutes as compared to 2 minutes on LLVM3.3

Keno Fischer · Answer 13 · Sun Jan 25 2015 00:05:48 GMT+0800 (China Standard Time)

That's because we run all the passes twice ;). But yes, that's a TODO item.

Isaiah Norton · Answer 14 · Sun Jan 25 2015 00:29:34 GMT+0800 (China Standard Time)

Derp. Well that makes sense then!

Jameson Nash · Answer 15 · Fri Jan 30 2015 11:14:29 GMT+0800 (China Standard Time)

@Keno is that just applicable building the sysimg?

Keno Fischer · Answer 16 · Fri Jan 30 2015 11:17:09 GMT+0800 (China Standard Time)

No, MCJIT does not yet have a way to alter the passes that it runs as far as I'm aware (though I was promised this would be possible in the near future), this means that e.g. our SIMD lowering pass would not be run if we didn't run all the passes ourselves (I suppose we could just set MCJIT's opt level to None, but I think that actually also disabled optimizations in the backend).

Jameson Nash · Answer 17 · Fri Jan 30 2015 11:25:16 GMT+0800 (China Standard Time)

ah, i see that now. It appears we would need to create a TargetMachine wrapper class and overload addPassesToEmitMC in order to set the passes list. Not very direct, but also not very difficult.

Keno Fischer · Answer 18 · Fri Jan 30 2015 11:26:18 GMT+0800 (China Standard Time)

Yes, that would probably work.

Jameson Nash · Answer 19 · Fri Jan 30 2015 11:30:23 GMT+0800 (China Standard Time)

so if someone patches that, and we merge your patches for #7910 into the https://github.com/JuliaLang/llvm branch, i think we might be ready to switch to llvm35

Keno Fischer · Answer 20 · Fri Jan 30 2015 12:00:17 GMT+0800 (China Standard Time)

Sounds right to me.

Tony Kelman · Answer 21 · Sun Mar 01 2015 01:34:15 GMT+0800 (China Standard Time)

Should we be shooting for 3.6.0 now that it's out? http://llvm.org/releases/

Jameson Nash · Answer 22 · Sun Mar 01 2015 08:42:39 GMT+0800 (China Standard Time)

it looks like we could create a micro-branch to trivially address the remaining item on the above list and make the switch

Tony Kelman · Answer 23 · Mon Mar 02 2015 04:03:19 GMT+0800 (China Standard Time)

Anybody else see a failure on the complex test with 3.6.0?

exception on 3: ERROR: LoadError: test failed: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in expression: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in error at ./error.jl:19
 in default_handler at ./test.jl:27
 in do_test at ./test.jl:50
 in runtests at /home/tkelman/Julia/julia/test/testdefs.jl:79
 in anonymous at ./multi.jl:833
 in run_work_thunk at ./multi.jl:584
 in anonymous at ./multi.jl:833
while loading complex.jl, in expression starting on line 7
        From worker 2:       * broadcast            in  27.41 seconds
ERROR: LoadError: LoadError: test failed: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in expression: isequal(sqrt(complex(0.0,-0.0)),complex(0.0,-0.0))
 in anonymous at ./task.jl:1370
while loading complex.jl, in expression starting on line 7
while loading /home/tkelman/Julia/julia/test/runtests.jl, in expression starting on line 3

        From worker 3:       * complex             make[1]: *** [all] Error 1
make: *** [test] Error 2

tkelman@ygdesk:~/Julia/julia$ ./julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+3637 (2015-03-01 18:30 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 74c71fb* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> versioninfo()
Julia Version 0.4.0-dev+3637
Commit 74c71fb* (2015-03-01 18:30 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY PENRYN)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.6.0

julia> sqrt(complex(0.0,-0.0))
0.0 + 0.0im

julia> isequal(sqrt(complex(0.0,-0.0)), complex(0.0,-0.0))
false

Simon Byrne · Answer 24 · Mon Mar 02 2015 20:31:37 GMT+0800 (China Standard Time)

Hmm, that shouldn't occur: the relevant lines in sqrt are:

x, y = reim(z)
if x==y==0
    return Complex(zero(x),y)
end

Changing y from -0.0 to a 0.0 isn't a valid (IEEE754) transformation. A similar problem occurred in #9880 (comment).

Keno Fischer · Answer 25 · Sun Mar 08 2015 12:25:53 GMT+0800 (China Standard Time)

Unfortunately, even after playing around with the pass order, I'm still seeing significant performance regressions in compile time (about 2x during the linalg testsuite, which is the most codegen heavy we have). We should do some profiling to see if this can't be improved.

Tony Kelman · Answer 26 · Mon Mar 09 2015 02:20:12 GMT+0800 (China Standard Time)

Once everything is finally fixed and we make the switch, we should remember to go looking for places in the code or tests with LLVM-related todo's.

Isaiah Norton · Answer 27 · Sun Apr 05 2015 00:09:53 GMT+0800 (China Standard Time)

CI failure in #10741 points out that we are going to need a newer build toolchain on travis because LLVM now requires gcc > 4.7, whereas Travis is still on 4.6 (travis-ci/travis-ci#979, travis-ci/travis-ci#1379).

Jameson Nash · Answer 28 · Sun Apr 05 2015 05:29:50 GMT+0800 (China Standard Time)

since we are still targeting msvc, i think we'll need to make sure our code is compatible with both

Tony Kelman · Answer 29 · Sun Apr 05 2015 06:56:42 GMT+0800 (China Standard Time)

Both what? We can poke Elliot to help update the ppa for llvm when needed, we can probably use the toolchain ppa (which should have 4.8, 4.9, etc) to build there.

Isaiah Norton · Answer 30 · Sun Apr 05 2015 08:25:42 GMT+0800 (China Standard Time)

Presumably both gcc and vc++... On that note, LLVM now requires VS2013+, which supports a decent subset of C++11.

Isaiah Norton · Answer 31 · Fri Apr 17 2015 02:12:42 GMT+0800 (China Standard Time)

What is the status of pass control/ordering issue?

Do we still need this: "It appears we would need to create a TargetMachine wrapper class and overload addPassesToEmitMC in order to set the passes list. Not very direct, but also not very difficult."

Or did someone write that already?

Jameson Nash · Answer 32 · Fri Apr 17 2015 02:37:02 GMT+0800 (China Standard Time)

that's item 3 above

Only run passes once ~~(requires writing a TargetMachine wrapper and overloading addPassesToEmitMC)~~ (requires changes to LLVM to expose passes)

also, I realized the proposed wrapper class was impossible to create in C++

Isaiah Norton · Answer 33 · Tue Apr 21 2015 06:49:41 GMT+0800 (China Standard Time)

I realized the proposed wrapper class was impossible to create in C++

Mind to clarify: does "impossible" refer to the fact that the real target implementations we need (X86Target, etc.) are all themselves derived from llvm::LLVMTargetMachine (via llvm::TargetMachine)?

requires changes to LLVM to expose passes

Are you aware of any existing patches?

One simple idea is to add EngineBuilder::setMCJITPassList(std::vector<Pass*>) and then have MCJIT add all of the passes each time emitObject is called, before calling addPassesToEmitMC.

Jameson Nash · Answer 34 · Tue Apr 21 2015 08:13:55 GMT+0800 (China Standard Time)

Are you aware of any existing patches?

the existing mechanisms are deprecated, although the replacement is not yet written

One simple idea is to add EngineBuilder::setMCJITPassList(std::vector<Pass*>) and then have MCJIT add all of the passes each time emitObject is called, before calling addPassesToEmitMC.

yeah, that's the right part of the code that we would want to modify in a custom build of llvm to allow us to put our own pass list there instead of the pass list from the TargetMachine

Tony Kelman · Answer 35 · Tue Apr 21 2015 08:15:55 GMT+0800 (China Standard Time)

is any of the new orcjit stuff in 3.6.0 and/or useful for us?

Isaiah Norton · Answer 36 · Tue Apr 21 2015 08:41:15 GMT+0800 (China Standard Time)

OrcJIT didn't make it into 3.6 AFAICT.

Michael Wallace Louwrens · Answer 37 · Tue Apr 21 2015 16:23:06 GMT+0800 (China Standard Time)

http://llvm.1065342.n5.nabble.com/Re-Modify-a-module-at-runtime-in-MCJIT-td79638.html

I saw this:
"The Orc layers provide a "removeModuleSet" method that enables you to delete the JIT state associated with a particular module (or set of modules) that you have added. This saves you from manually managing multiple MCJIT interfaces."

As something potentially useful out of OrcJIT.

Keno Fischer · Answer 38 · Tue Apr 21 2015 16:30:59 GMT+0800 (China Standard Time)

Yes, Orc is quite good and also organizes some of the messy code we have in julia to deal with MCJIT quirks. I suspect we might move to it in the near future.

Jameson Nash · Answer 39 · Mon Apr 27 2015 05:58:22 GMT+0800 (China Standard Time)

figure out performance regression compared to JIT

a quick profiling run on OS X with llvm3.6.0+Release+Asserts seems to indicate the following time losses:

addPassesToEmitMC (~8% of total function emission time)
~PassManagerImpl (~3% of total function emission time)
FPPassManager::doInitialization (~1.5% of total emission time)
FPPassManager::doFinalization (~6% of total emission time)
PMTopLevelManager::findAnalysisPass / PMTopLevelManager::findAnalysisPassInto (~9% of total emission time, in Assertions build only)
FPM->run(f) (adds ~40% to total emission time, mostly in instruction combining, but this time is not being double-counted, since those passes don't rerun during emitObject)

some other info:

running the AsmPrinter ~5%
physical register allocation (greedy) ~5%
live variable analysis run ~2%
machine scheduling (processor pipeline optimization) ~5%
lowering llvm opcodes to sdnode opcodes ~20%
- SDNode combination optimizations ~5%
relocation ~pretty negligable%
totals are given as a % of time spent in MCJIT::emitObject for a call to ./julia invalid_arg

edit: by comparison to llvm3.3, the total function optimization + emission time seems to be ~160 ms vs 360 ms and seems to be a combination of several existing passes becoming much more expensive (register allocation, instruction combining) and the addition of a few more intermediates (PassManager, assembly printing)

Jim Garrison · Answer 40 · Tue May 05 2015 04:47:06 GMT+0800 (China Standard Time)

#10806 might also be good to have on this list, though it is unclear how serious of an issue it is.

Miles Lubin · Answer 41 · Sun May 17 2015 07:56:14 GMT+0800 (China Standard Time)

@tkelman mentioned that codegen bugs in LLVM33 were holding up some of the important array changes. Could we just switch to LLVM36 and carry along patches as we already do now for LLVM33?

Tony Kelman · Answer 42 · Sun May 17 2015 08:00:37 GMT+0800 (China Standard Time)

Specifically I was referring to #10525 (comment)

We don't have proposed patches yet for all of the not-yet-checked boxes in here, but I would be in favor of @Keno trying to bring his patches into the repo here to at least temporarily solve any problems that we know the solution to.

Viral B. Shah · Answer 43 · Sun May 17 2015 08:02:09 GMT+0800 (China Standard Time)

+1 to that.

Jim Garrison · Answer 44 · Tue May 26 2015 23:58:26 GMT+0800 (China Standard Time)

LLVM 3.6.1 has been released.

Yichao Yu · Answer 45 · Wed May 27 2015 00:03:34 GMT+0800 (China Standard Time)

I've justed tested LLVM 3.6.1 yesterday and it passes all the julia tests. =)

Valentin Churavy · Answer 46 · Tue Jun 09 2015 12:50:42 GMT+0800 (China Standard Time)

So there is one week left to nominate patches for 3.6.2

https://groups.google.com/d/msg/llvm-dev/wEnVJk3akns/IlgEvBGOpp8J

Tony Kelman · Answer 47 · Sun Jul 12 2015 03:39:35 GMT+0800 (China Standard Time)

Re: exposing passes, looks like the LLILC folks are doing some useful work on top of ORC that we might be able to use if it gets upstreamed (or just carry locally, it's also MIT-licensed)? dotnet/llilc#685

Tony Kelman · Answer 48 · Sat Sep 05 2015 11:40:06 GMT+0800 (China Standard Time)

3.7.0 is out, I'm guessing there's no really strong reason to try too hard to make 3.6 work other than for distro packagers.

Jameson Nash · Answer 49 · Mon Sep 14 2015 01:59:25 GMT+0800 (China Standard Time)

please only add blocking issues to the list at the top list, not performance regressions and such

Tony Kelman · Answer 50 · Mon Sep 14 2015 02:03:04 GMT+0800 (China Standard Time)

Isn't the massive codegen performance regression the last blocking issue?

Jameson Nash · Answer 51 · Mon Sep 14 2015 02:08:28 GMT+0800 (China Standard Time)

yes, but that is already in the list above, and is a regression in codegen itself, not in the generated code

Keno Fischer · Answer 52 · Thu Sep 24 2015 12:58:08 GMT+0800 (China Standard Time)

Between fast-isel, my most recent commit and an LLVM patch which I'll submit upstream tomorrow, I've gotten the bootstrap building time on llvm-svn down from 10 min to 2min5s, with 3.3 taking about 1m45s. Need to do some more extensive benchmarking to see what the remaining regressions are.

edit: link to recent commit is 699dd2d

Tony Kelman · Answer 53 · Thu Sep 24 2015 13:03:36 GMT+0800 (China Standard Time)

How big of an impact is the patch? Will we be waiting for 3.8, or can we carry the patch locally against 3.7.0?

Keno Fischer · Answer 54 · Thu Sep 24 2015 13:08:03 GMT+0800 (China Standard Time)

11 lines

Tim Holy · Answer 55 · Thu Sep 24 2015 18:01:07 GMT+0800 (China Standard Time)

The best patches are small.

Stefan Karpinski · Answer 56 · Thu Sep 24 2015 22:07:44 GMT+0800 (China Standard Time)

I think that means it's time to switch master to our own form of LLVM with that patch on it. What do you say, @Keno?

Stefan Karpinski · Answer 57 · Thu Sep 24 2015 22:08:07 GMT+0800 (China Standard Time)

s/form/fork/

Tony Kelman · Answer 58 · Thu Sep 24 2015 22:55:42 GMT+0800 (China Standard Time)

A patch file that applies to a release tarball of llvm would be preferable to relying on a git fork

Stefan Karpinski · Answer 59 · Thu Sep 24 2015 23:17:41 GMT+0800 (China Standard Time)

Uh, why? We need to track LLVM SVN without being subject to random breakage whenever they decide to change APIs. Maintaining our own fork of LLVM that tracks upstream while rebasing a few patches and fixing breakage periodically seems far easier than anything else. This is literally what git rebase is for.

Tony Kelman · Answer 60 · Thu Sep 24 2015 23:28:33 GMT+0800 (China Standard Time)

Do we actually need to track llvm svn on master? Picking an arbitrary unsupported development version of such a large dependency to pin to makes it impossible for distros to package julia. We would also need to leave these branches in place permanently as is, otherwise building past points in julia master history will not be reproducible for future bisecting.

If these patches are small we can nominate them for being backported to llvm 3.7.1. If we can structure our requirements as patch files relative to release versions of llvm rather than our own diverging fork that is only relevant to us, it makes for much better release stability discipline on our part.

Milan Bouchet-Valat · Answer 61 · Thu Sep 24 2015 23:33:47 GMT+0800 (China Standard Time)

I agree with @tkelman. If we want to continue providing nightly packages, better stick to a stable release (possibly with a few patches if needed). I can't impose on the Fedora infrastructure to rebuild a custom LLVM fork everyday with Julia.

Stefan Karpinski · Answer 62 · Thu Sep 24 2015 23:52:09 GMT+0800 (China Standard Time)

Our first priority needs to be making it less impossible to use crucial projects like Cxx, Gallium and the threading branch – all of which need a fairly current version of LLVM. At this point, it really needs to be possible use all of those with a vanilla master version of Julia. Otherwise these projects are never going to get the kind of testing and development that they need to become widely usable. As soon as there's a release of LLVM that we can support all of those, we can use that. This might happen with LLVM 3.9, which should be out by the time Julia 0.5 is ready.

Regarding distros, we can maintain compatibility with whatever versions of LLVM they are willing to ship. However, the current policy of forcing programming languages to use whatever random version of LLVM they happen to package essentially guarantees that things are going to be broken and shitty. We have no control over that policy, but I refuse to let distro policies act as a boat anchor that slows down all of Julia development any more than it already has.

Tony Kelman · Answer 63 · Fri Sep 25 2015 00:04:25 GMT+0800 (China Standard Time)

@StefanKarpinski I don't know if you're aware but there has been a fair amount of effort by @staticfloat to set up nightly buildbots for using llvm svn and to build nightlies with the cxx.jl build configuration. Those have never built successfully. We need to fix that first before imposing that level of brokenness and difficulty in building on master.

This isn't just about distros, it's also about not getting in the business of indefinitely maintaining our own custom fork of llvm. We need to get things upstreamed and use upstream's release numbering as some indication that things have been tested and will remain stable within that release series.

Stefan Karpinski · Answer 64 · Fri Sep 25 2015 00:32:15 GMT+0800 (China Standard Time)

Yes, of course, getting things working is necessary. I also guarantee you the fastest way to get things fixed is to switch Julia master over to using LLVM SVN.

And yes, of course we need to upstream patches to LLVM. But we need to do that either way, so it's irrelevant to this debate. Waiting for things to get upstreamed before using them is killing the progress of this project. It needs to stop now. If we keep Julia master on a released version of LLVM that means the earliest we can switch to a newer LLVM is 3.7.1, which might happen in November. But the patch might not make it into the 3.7.1, which means that the earliest we can update LLVM is February. Which is when Julia 0.5 is due – but even if LLVM 3.8 is released before Julia 0.5, it will be too risky to update LLVM that late in the release cycle. That would mean Julia 0.5 being released still using LLVM 3.3. That is totally unacceptable – LLVM 3.3 will be almost three years old at that point and being stuck on it has completely blocked crucial advances to the project.

I'm sorry, there's no other choice here: we need to start getting people using a newer version of LLVM, which requires Julia master using a forked copy of LLVM SVN for a while.

Isaiah Norton · Answer 65 · Fri Sep 25 2015 00:33:22 GMT+0800 (China Standard Time)

getting in the business of indefinitely maintaining our own custom fork of llvm.

Is there any serious LLVM user who doesn't maintain their own fork? We can't force upstream to take patches on any specific timeframe. If we have to maintain a fork in order to push things forward, then so be it. I'm happy to volunteer to help maintain the fork if it means that we can rid ourselves of the current morass of ifdefs, get some semblance of sanity for building from LLVM trunk on Windows, and get more people pushing hard on debugging improvements.

(IMHO improved debugging is by far the highest-leverage thing we can do right now to increase the velocity of base development, on which so much else rests).

cxx.jl build configuration. Those have never built successfully.

Is there an issue open about this?

Miles Lubin · Answer 66 · Fri Sep 25 2015 00:33:29 GMT+0800 (China Standard Time)

Wasn't the proposal to switch master to LLVM 3.7 with patches now?

Stefan Karpinski · Answer 67 · Fri Sep 25 2015 00:40:53 GMT+0800 (China Standard Time)

I don't see how using LLVM 3.7 with patches is better than a periodically updated fork. Since we do upstream fixes, tracking LLVM SVN means that the number of patches we need to maintain gets smaller as the changes get accepted upstream, and we get more testing with upstream changes for longer.

The model here should be that Julia master tracks LLVM master until we're ready for a release, at which point we can stabilize on the most recently released LLVM version, possibly with some patches.

Tony Kelman · Answer 68 · Fri Sep 25 2015 00:56:36 GMT+0800 (China Standard Time)

There's a cxx.jl issue about providing nightlies for ci with more details. edit: JuliaInterop/Cxx.jl#73

3.7 with patches is better IMO because it's a known reproducible quantity as opposed to switching to an ephemeral git branch model. Switching back and forth between a fork and releases sounds like extra work when we know julia releases will have to use releases of llvm. That's a ways off right now, but I don't want to get in the situation that you can't build 3-month old master to bisect an issue because the branches have all changed.

For appveyor to continue working we also need to provide new windows llvm binaries for win32 and win64 every time llvm changes. If we get shared library llvm to work on windows (which it never has aiui) that might be fairly simple to provide via buildbot binaries. Otherwise we have to put static libs and headers in the nightly binaries or a separate deps upload, or manually build by hand repeatedly.

Isaiah Norton · Answer 69 · Fri Sep 25 2015 00:59:45 GMT+0800 (China Standard Time)

If we get shared library llvm to work on windows (which it never has aiui) that might be fairly simple to provide via buildbot binaries.

https://github.com/JuliaLang/llvm/pull/1 (builds fine, haven't run tests recently)

Elliot Saba · Answer 70 · Fri Sep 25 2015 01:04:41 GMT+0800 (China Standard Time)

Some of the difficulties with building newer LLVM versions is just that they start to use kernel and compiler features that are newer than our ancient CentOS 5 buildbots! Once 0.4 final is released, we can experiment with CentOS 6 buildbots to see if they work any better. Note that this means people with glibc older than 2.12 will not be able to use our binaries, or build Julia from source. That's probably fine, as I think most scientific clusters (the most difficult customers of all) have moved on to CentOS/RH 6.X already. Not an assertion I can back up with hard data though. :P

Tony Kelman · Answer 71 · Fri Sep 25 2015 01:08:05 GMT+0800 (China Standard Time)

Right now the llvm svn and cxx nightlies buildbots are separate jobs than the mainline release builders, so some of that centos 6 experimentation we could start now right?

Tim Holy · Answer 72 · Fri Sep 25 2015 01:50:26 GMT+0800 (China Standard Time)

If one is tracking the bleeding edge of LLVM, is it wise to use a wonderfully-solid-but-invariably-behind distro like CentOS?

Stefan Karpinski · Answer 73 · Fri Sep 25 2015 02:16:35 GMT+0800 (China Standard Time)

Whether using LLVM 3.7 with patches is viable or not largely depends on how many patches are needed. We need at least this 11-line patch to fix the code gen regression, which is quite small. But that's not all we want to support. I think that more modifications are needed to support threading, Cxx and Gallium. I may be wrong about that at this point – I haven't kept track of everything that ended up in LLVM 3.7.

Keeping a rebased patch set on an LLVM fork is a strictly more flexible model than the official release + patches model. We can emulate the release + patches model simply by setting the LLVM fork to the commit of that release. The bisect argument strikes me as specious for a couple of reasons: even if we track LLVM SVN more closely than every release, we would still only update it every month or so, and bisect would work fine within that range of commits; also, the LLVM build system handles incremental rebuilds just fine, so it wouldn't be hard to make sure that bisect works well even with frequently changing LLVM source, as long as we've got our Makefiles right.

Tony Kelman · Answer 74 · Fri Sep 25 2015 02:39:48 GMT+0800 (China Standard Time)

We need to build on an old system to make the binaries generic and possible to run anywhere newer. We rebuild gcc on the buildbots but have less control over kernel and glibc.

Everything I said about appveyor applies equally to the llvm package in the juliadeps ppa for travis too, we'll need to keep that frequently updated.

I don't think I explained the bisect issue clearly enough. If we start using an llvm fork on julia master, we need to treat the llvm branch like master or a release branch and not force push or rebase on it. Any new rebase should be a new branch, so previously-referred-to commits from past versions of julia can still be fetched.

edit: we should try our best to make it possible to always check out an arbitrary point in Julia's history and do a clean build from there, which referring to a git branch that has changed and may no longer build with Julia from source would make harder.

Elliot Saba · Answer 75 · Fri Sep 25 2015 11:29:42 GMT+0800 (China Standard Time)

As Tony says, I'm not too afraid of getting a build environment that works, I think the bigger impact will be raising the hard limit of what systems Julia 0.5 works on. At our current release pace, it's entirely likely that every old curmudgeonly computer cluster out there will be compatible with glibc 2.12 by the time 0.5 goes final (har har) but it's hard to tell without some kind of hard data.

I'm also in agreement with @tkelman; we will need to start treating our llvm fork like openspecfun or openlibm, and ensuring that we don't accidentally rebase out old commits. If that just means that we create a new branch every time we want to rebase stuff to make it prettier (thereby forcing git to keep that old somewhat-redundant history around), that's just what we'll have to do.

Isaiah Norton · Answer 76 · Fri Sep 25 2015 11:38:58 GMT+0800 (China Standard Time)

I think the bigger impact will be raising the hard limit of what systems Julia 0.5 works on.

On that note, LLVM3.7 is the last release to support Windows XP as a runtime platform. (to be clear, I'm still in favor of bumping)

Jameson Nash · Answer 77 · Fri Sep 25 2015 13:27:11 GMT+0800 (China Standard Time)

an LLVM patch which I'll submit upstream tomorrow

@Keno can you post a link?

Jameson Nash · Answer 78 · Fri Sep 25 2015 13:34:08 GMT+0800 (China Standard Time)

Need to do some more extensive benchmarking to see what the remaining regressions are.

my profiling attempts earlier (#9336 (comment)) indicated that another 10% loss comes from having Assertions turned on while having two FunctionPassManagers and another 20% loss from constructing a new FPM for every module

Keno Fischer · Answer 79 · Fri Sep 25 2015 13:36:36 GMT+0800 (China Standard Time)

That's not my concern though. I suspect MCJIT still has a couple more O(N^2) case in the number of modules. Haven't submitted the patch yet, but it's just:

diff --git a/include/llvm/ExecutionEngine/SectionMemoryManager.h b/include/llvm/ExecutionEngine/SectionMemoryManager.h
index 0b0dcb0..446c887 100644
--- a/include/llvm/ExecutionEngine/SectionMemoryManager.h
+++ b/include/llvm/ExecutionEngine/SectionMemoryManager.h
@@ -84,6 +84,7 @@ public:

 private:
   struct MemoryGroup {
+      SmallVector<sys::MemoryBlock, 16> PendingMem;
       SmallVector<sys::MemoryBlock, 16> AllocatedMem;
       SmallVector<sys::MemoryBlock, 16> FreeMem;
       sys::MemoryBlock Near;
diff --git a/lib/ExecutionEngine/SectionMemoryManager.cpp b/lib/ExecutionEngine/SectionMemoryManager.cpp
index b22c6db..e319216 100644
--- a/lib/ExecutionEngine/SectionMemoryManager.cpp
+++ b/lib/ExecutionEngine/SectionMemoryManager.cpp
@@ -83,7 +83,7 @@ uint8_t *SectionMemoryManager::allocateSection(MemoryGroup &MemGroup,
   // Save this address as the basis for our next request
   MemGroup.Near = MB;

-  MemGroup.AllocatedMem.push_back(MB);
+  MemGroup.PendingMem.push_back(MB);
   Addr = (uintptr_t)MB.base();
   uintptr_t EndOfBlock = Addr + MB.size();

@@ -138,6 +138,13 @@ bool SectionMemoryManager::finalizeMemory(std::string *ErrMsg)
   // relocations) will get to the data cache but not to the instruction cache.
   invalidateInstructionCache();

+  // Now, remember that we have successfully applied the permissions to avoid
+  // having to apply them again.
+  CodeMem.AllocatedMem.append(CodeMem.PendingMem.begin(),CodeMem.PendingMem.end());
+  RODataMem.AllocatedMem.append(RODataMem.PendingMem.begin(),RODataMem.PendingMem.end());
+  CodeMem.PendingMem.clear();
+  RODataMem.PendingMem.clear();
+
   return false;
 }

@@ -145,7 +152,7 @@ std::error_code
 SectionMemoryManager::applyMemoryGroupPermissions(MemoryGroup &MemGroup,
                                                   unsigned Permissions) {

-  for (sys::MemoryBlock &MB : MemGroup.AllocatedMem)
+  for (sys::MemoryBlock &MB : MemGroup.PendingMem)
     if (std::error_code EC = sys::Memory::protectMappedMemory(MB, Permissions))
       return EC;

@@ -153,7 +160,7 @@ SectionMemoryManager::applyMemoryGroupPermissions(MemoryGroup &MemGroup,
 }

 void SectionMemoryManager::invalidateInstructionCache() {
-  for (sys::MemoryBlock &Block : CodeMem.AllocatedMem)
+  for (sys::MemoryBlock &Block : CodeMem.PendingMem)
     sys::Memory::InvalidateInstructionCache(Block.base(), Block.size());
 }

Jameson Nash · Answer 80 · Fri Sep 25 2015 13:46:12 GMT+0800 (China Standard Time)

since the MCJIT MemoryManager can be easily replaced, can we copy that code into our local repo to avoid needing to patch llvm versions, but get the benefits immediately?

Keno Fischer · Answer 81 · Fri Sep 25 2015 13:46:49 GMT+0800 (China Standard Time)

Yes. We can. Also, submitted upstream: http://reviews.llvm.org/D13156.

Keno Fischer · Answer 82 · Thu Oct 01 2015 10:49:52 GMT+0800 (China Standard Time)

Landed as http://reviews.llvm.org/rL248981.

Stefan Karpinski · Answer 83 · Thu Oct 01 2015 22:39:08 GMT+0800 (China Standard Time)

👍 that was a quick one :)

Keno Fischer · Answer 84 · Thu Oct 08 2015 13:32:02 GMT+0800 (China Standard Time)

I have found another ~15% performance and also finally managed to get a decent profile here (wasn't easy - naive profiling shows 50% time spent in malloc with no backtrace because OS X system library doesn't have unwind info). I am hopeful that we can get rid of the pass initialization and destruction by doing tricks with ORCJIT (or even MCJIT). I'm also hopeful we might be able to squeeze out the suff in between Pass init/deinit and Isel, which from looking at it doesn't seem like it needs to take that long. For those wondering, overhead of just MCJIT is the stuff to the left of init/right of deinit.

Keno Fischer · Answer 85 · Thu Oct 08 2015 14:22:21 GMT+0800 (China Standard Time)

FWIW, I think I have now found all the O(N^2) in the number of modules. Current performance results are about a 1.5-2x regression on average on the test suite. One outlier is unicode at 2.5x

Keno Fischer · Answer 86 · Thu Oct 08 2015 16:26:41 GMT+0800 (China Standard Time)

Good news: Was able to get rid of pass initialization overhead. Bad news: Didn't do much for performance numbers. Will do more benchmarking tomorrow.

Keno Fischer · Answer 87 · Thu Oct 08 2015 17:05:57 GMT+0800 (China Standard Time)

Better news: I was benchmarking wrong and there actually does seem to be 14% improvement, which is in line with what I would have expected looking at the profile above.

Even better news: I realized I was accidentally still using LLVM_ASSERTIONS for llvm-svn. With the combination of those two we're down to 1.3-1.5x, which is starting to get close to where we can start talking.

Scott P. Jones · Answer 88 · Thu Oct 08 2015 17:40:47 GMT+0800 (China Standard Time)

What exactly are you comparing, when you say you get a 2.5x regression on unicode? The performance compiling all of unicode.jl on LLVM 3.3 vs svn?
Are there any particular things in the files included by unicode.jl that are harder for 3.8 to deal with?

Tony Kelman · Answer 89 · Fri Oct 16 2015 16:31:05 GMT+0800 (China Standard Time)

I'm getting a segfault on win64 with LLVM 3.7.0 in the linalg/triangular test, not much backtrace though:

#0  0x00000000778494ba in ntdll!RtlVirtualUnwind () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Can open a separate issue if anyone has any inkling which LLVM version we're going to want to end up using, either 3.7.0 + patches or a protected checkpoint branch of 3.8-devel.

Jameson Nash · Answer 90 · Sat Oct 17 2015 03:37:51 GMT+0800 (China Standard Time)

@tkelman added to the tracker list above (re-disable FPO)

Tony Kelman · Answer 91 · Sat Oct 17 2015 05:26:04 GMT+0800 (China Standard Time)

bisected to 127ee86, if that makes any sense

Tony Kelman · Answer 92 · Tue Nov 03 2015 14:45:29 GMT+0800 (China Standard Time)

re-disable FPO

I'm getting a segfault at the very start of bootstrap now with LLVM 3.7.0 on both win32 and win64. Expected?

$ gdb -q --args /home/Tony/julia/usr/bin/julia-debug.exe -C x86-64 --output-ji `cygpath -w /home/Tony/julia/usr/lib/julia/inference0.ji` -f coreimg.jl
Reading symbols from /home/Tony/julia/usr/bin/julia-debug.exe...done.
(gdb) r
Starting program: /home/Tony/julia/usr/bin/julia-debug.exe -C x86-64 --output-ji D:\\cygwin64\\home\\Tony\\julia\\usr\\lib\\julia\\inference0.ji -f coreimg.jl
[New Thread 4344.0x2284]

Program received signal SIGSEGV, Segmentation fault.
0x00000001002c00ba in julia.new_0 ()
(gdb) bt
#0  0x00000001002c00ba in julia.new_0 ()
#1  0x000000006ea1dc8c in jl_apply (f=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#2  0x000000006ea21e5b in jl_trampoline (F=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/builtins.c:1021
#3  0x000000006ea1038e in jl_apply (f=0x1011eec70, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#4  0x000000006ea16ec6 in jl_apply_generic (F=0x1011eebf0, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/gf.c:1916
#5  0x000000006ea27f5d in jl_apply (f=0x1011eebf0, args=0xd7f078, nargs=2)
    at /home/Tony/julia/src/julia.h:1400
#6  0x000000006ea2841d in do_call (f=0x1011eebf0, args=0x1011f6448, nargs=2, eval0=0x0,
    locals=0x0, nl=0, ngensym=0) at /home/Tony/julia/src/interpreter.c:65
#7  0x000000006ea2908c in eval (e=0x1011eec30, locals=0x0, nl=0, ngensym=0)
    at /home/Tony/julia/src/interpreter.c:213
#8  0x000000006ea280b9 in jl_interpret_toplevel_expr (e=0x1011eec30)
    at /home/Tony/julia/src/interpreter.c:27
#9  0x000000006ea46aa9 in jl_toplevel_eval_flex (e=0x1011eec10, fast=1)
    at /home/Tony/julia/src/toplevel.c:525
#10 0x000000006ea46d6f in jl_parse_eval_all (fname=0x6f9a5e1d <system_image_path+1309> "boot.jl",
    len=8) at /home/Tony/julia/src/toplevel.c:575
#11 0x000000006ea46fbf in jl_load (fname=0x6f9a5e1d <system_image_path+1309> "boot.jl", len=8)
    at /home/Tony/julia/src/toplevel.c:615
#12 0x000000006ea332b5 in _julia_init (rel=JL_IMAGE_JULIA_HOME) at /home/Tony/julia/src/init.c:575
#13 0x000000006ea34a02 in julia_init (rel=JL_IMAGE_JULIA_HOME) at /home/Tony/julia/src/task.c:278
#14 0x0000000000402e0e in wmain (argc=1, argv=0x2d70b00, envp=0x2d73800)
    at /home/Tony/julia/ui/repl.c:605
#15 0x000000000040140c in __tmainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.2-1/crt/crtexe.c:329
#16 0x000000000040153b in mainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.2-1/crt/crtexe.c:212

Jameson Nash · Answer 93 · Tue Nov 03 2015 16:00:33 GMT+0800 (China Standard Time)

since that commit (40d46e7) is still waiting for CI verification before it lands on master, I can confidently say that is not the problem.

Jameson Nash · Answer 94 · Thu Nov 12 2015 05:00:49 GMT+0800 (China Standard Time)

fixed @tkelman's finding with 6ee80d3

Keno Fischer · Answer 95 · Mon Nov 23 2015 10:47:01 GMT+0800 (China Standard Time)

Just to update my kf/modulecoalescing branch now uses the same or less memory than it does with 3.3. It also passes all tests, so we're getting pretty close. I'm currently running msan/asan/valgrind to make sure there aren't any subtle memory bugs remaining, as well as tracking down a small performance regression (couple %). Getting close.

Stefan Karpinski · Answer 96 · Mon Nov 23 2015 23:27:21 GMT+0800 (China Standard Time)

I don't think I've ever been so excited about a totally invisible change.

Tim Holy · Answer 97 · Tue Nov 24 2015 00:09:44 GMT+0800 (China Standard Time)

Once it's no longer necessary to go to build heroics to use Gallium, it won't be invisible for long 😄.

Viral B. Shah · Answer 98 · Wed Nov 25 2015 03:27:40 GMT+0800 (China Standard Time)

I say we pull the plug on 3.3 if it is only a small regression.

Tony Kelman · Answer 99 · Wed Nov 25 2015 03:30:37 GMT+0800 (China Standard Time)

It's not a small regression until that branch has gone through CI and been merged. LLVM 3.7.0 doesn't bootstrap at all on win32 with current master.

Keno Fischer · Answer 100 · Wed Nov 25 2015 03:31:38 GMT+0800 (China Standard Time)

Yes, I'm the the process of cleaning up the patch. Should be ready very soon. I have verified that all the remaining performance regressions are essentially due to LLVM itself rather than the new vs old JIT. Some of those look solvable, but that should be done as part of a general effort to improve the performance, rather than ad hoc now (I don't think we have the infrastructure in place yet to adequately measure and track performance here yet - that's something we should work on).