relative using/import should search current directory

Question

relative using/import should search current directory

StefanKarpinski opened this issue 11 years ago · comments

When I do

# Main.jl
module Main
  using .Foo
end

and there's a file called Foo.jl in the same directory as Main.jl it should be loaded. I suspect that relative using should also not look in the global require places – i.e. Pkg.dir() and then LOAD_PATH. The same applies to import.

Stefan Karpinski commented 11 years ago

Oops.

Uwe Fechner commented 5 years ago

Bump.

Jeff Bezanson · Answer 1 · Tue Oct 22 2013 05:09:20 GMT+0800 (China Standard Time)

I agree it should not look in the global places, but perhaps it should not look at files at all. This doesn't seem to extend to multiple dots, e.g. using ..Foo looking in the parent directory.

Westley Argentum Hennigh-Palermo · Answer 2 · Tue Oct 22 2013 05:36:32 GMT+0800 (China Standard Time)

I would love to see a solution to this problem... But Jeff has a point.

Would be a little weird, but what if using takes a second, optional parameter

using Foo "../../extra-src/"

Tobias Knopp · Answer 3 · Sat Aug 16 2014 00:42:55 GMT+0800 (China Standard Time)

+1 for this.

Stefan Karpinski · Answer 4 · Sat Aug 16 2014 03:55:05 GMT+0800 (China Standard Time)

I'm not sure the relative path thing is a problem. You could, for example, have something like

# Foo.jl
module Foo
  using ..Bar
  using .Baz
end

# Bar.jl
module Bar
  # barfy stuff
end

# Foo/Baz.jl
module Baz
  # bazish stuff
end

That would allow relative imports of sibling modules to automatically work. Sure, it's strange if you do more dots than you're nested into modules, but just don't do that.

Stefan Schwarz · Answer 5 · Sat Aug 16 2014 05:21:57 GMT+0800 (China Standard Time)

Why just not allow to use using with a string?
Such as: using "Baz/Baz.jl".

Stefan Karpinski · Answer 6 · Sat Aug 16 2014 05:53:33 GMT+0800 (China Standard Time)

Because loading from a file is only a side effect of using when no such module exists. The normal case is that the module already exists. If you allow a string, you still have to map those strings to a module.

Stefan Schwarz · Answer 7 · Sat Aug 16 2014 05:59:19 GMT+0800 (China Standard Time)

But if the string, which is a path, qualifies a module, which it does due to its structure?

Stefan Schwarz · Answer 8 · Sat Aug 16 2014 06:21:33 GMT+0800 (China Standard Time)

Ok. Let me try to get this straight.

I do not have an opinion about this "import/include/using" oddity.

Don't you think that sooner or later, the python import strategy turns out to be best?
You guys enabled to have a __init__ function within a module which is automatically called, but not at first, which is quite strange, since I expected it to be called first, just like a BEGIN block in Perl.

I ran into exactly this problem, that I arranged my code in a file directory manner and tried to push! the load-lib path inside the init function. But it is not evaluated at first, which was quite confusing
to me, since I expected it to act such as a CTor or something.

Let's just presume I know what using does semantically. Why not use using to publish several declarations inside the current namespace but give it the ability to assign some prefix for it?

I really liked the import idea of Java. Where dots marked a directory.

For instance:

using Baz.baz

or

using Baz.baz as bz

Does this make sense to you?

Cheers

Stefan

Tony Kelman · Answer 9 · Tue Aug 11 2015 15:07:26 GMT+0800 (China Standard Time)

Bump. Milestoning. #9079 (comment)

Steven G. Johnson · Answer 10 · Fri Aug 21 2015 10:25:32 GMT+0800 (China Standard Time)

It's a little frustrating to have #12695 merged in 0.4 but this slated for 0.5... I feel like it's going to bite people in 0.4 if there is no way to load modules from the current directory short of modifying the load path.

Jeff Bezanson · Answer 11 · Fri Aug 21 2015 10:30:45 GMT+0800 (China Standard Time)

We seem to have been getting by pretty well without that --- outside of the REPL, loading from the CWD is just a bug, and I doubt any packages depended on it. In the REPL, include is probably sufficient.

Maxim Berman · Answer 12 · Thu Sep 03 2015 22:18:45 GMT+0800 (China Standard Time)

I used to structure my code into submodules, with each file representing a module ; back when the CWD was included in the load path, this allowed me to use for instance using Utils to load types and functions exported from Utils.jl. I can now replace this with include("Utils.jl"); using .Utils; however, this is inconvenient e.g. if Utils defines types, because creating this type from module A would create an A.Utils.Type instead of a Utils.Type. What is the recommended way of organizing Julia code (with common functions and types) into subfiles ? Should I add the current directory to the path anyway to use the convenience of modules ? Thanks.

Iain Dunning · Answer 13 · Fri Sep 04 2015 00:03:29 GMT+0800 (China Standard Time)

I have hit the same problem as @bermanmaxim FWIW, and I've moved to just includeing everything instead

Maxim Berman · Answer 14 · Fri Sep 04 2015 21:49:32 GMT+0800 (China Standard Time)

Thanks @IainNZ. Using includes seems indeed to be the standard way now. I guess it's the job of the main file of a module to include everything in the right order to make the subparts work (defining types before functions...) Using distinct modules had the advantage of making the dependencies of each file somewhat more explicit, e.g. putting utils.helperfunction to make clear that the function comes from utils, and not risking including things twice.

toivoh · Answer 15 · Sun Sep 06 2015 14:14:12 GMT+0800 (China Standard Time)

You can use includes in the main file and still structure your code into submodules. That's what I do in Debug.jl.

Jonathan Malmaud · Answer 16 · Mon Oct 26 2015 03:47:14 GMT+0800 (China Standard Time)

@bermanmaxim Not sure I understand the problem, everything seems to work how I'd expect:

module Parent
    export ParentT, ChildT

    module Child
        export ChildT
        type ChildT
        end
    end
    using .Child

    type ParentT
    end

end

module Test
    using Parent

    f(::ParentT)="parent"
    f(::ChildT)="child"
end

Test.f(Parent.ParentT()) # "parent"
Test.f(Parent.ChildT()) # "child"

Jonathan Malmaud · Answer 17 · Mon Oct 26 2015 04:25:22 GMT+0800 (China Standard Time)

Oh I see, it's this that's problematic:

A.jl:

module A
type Atype
end
end

B.jl:

module B
include("A.jl")
import .A: Atype
end

MyPkg.jl:

module MyPkg
include("A.jl")
include("B.jl")
end

MyPkg.B.Atype()  # MyPkg.B.A.Atype
MyPkg.A.Atype()  # MyPkg.A.Atype
end

You might hope to get around this by only includeing from the parent module:

module MyPkg
include("A.jl")
include("B.jl")
end

where
B.jl is now just

module B
import .A: Atype
end

so now

MyPkg.B.Atype() # MyPkg.A.Atype

as you want, but you're back to being reliant on the package entrypoint to manually take into account submodule dependencies:

MyPkg.jl:

module MyPkg
include("B.jl")
include("A.jl")

won't work.

Maxim Berman · Answer 18 · Mon Oct 26 2015 20:05:36 GMT+0800 (China Standard Time)

Thanks @malmaud, I have since followed @toivoh's advice and develop code in a structure similar to Debug.jl.

mattcbro · Answer 19 · Thu Oct 29 2015 16:05:06 GMT+0800 (China Standard Time)

The loss of having the current path in the LOAD_PATH list is distressing to me. I find myself having to add a statement of the form,

push!(LOAD_PATH, pwd())

To all my high level scripts in order to get anything to work. In particular the solution,

include("A.jl")
using A

Does not appear to work because it does not nest properly, ie
file B.jl

module B
include("A.jl")
using A
func2() = ("function 2", func1())

end

file A.jl

module A
export func1
func1() = "func 1"


end

If you try to include("B.jl") you get the error:
ERROR: LoadError: ArgumentError: A not found in path
in require at ./loading.jl:233
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:304
while loading /data/Projects/Energous/B.jl, in expression starting on line 3

However in a julia prompt you can type the contents of B.jl line by line without error if you don't include the module definition.

You might reasonably want to use module A without module B. However if B requires A you will be unable to use B unless you first include A at the highest level. So this means that you have to remember to include the text of every dependent module before you can use it, if they are all in the same working directory.

For my use case this FORCES me to explicitly add the current directory to the path for every script I run in my working directory.

Steven G. Johnson · Answer 20 · Fri Oct 30 2015 01:18:37 GMT+0800 (China Standard Time)

@mattcbro, you should do

include("A.jl")
using .A

to tell it you want the locally defined A module, not a global A module. This will work and does not require you to modify the LOAD_PATH.

using A is potentially wrong anyway because it could get confused if there is another module called A defined in the load path. So, your experience is actually an argument in favor of the current behavior, because it caught a bug that you otherwise might not have noticed.

Steven G. Johnson · Answer 21 · Fri Oct 30 2015 01:20:16 GMT+0800 (China Standard Time)

That being said, I still tend to agree with @StefanKarpinski that using .A (not using A) should look for A.jl in the current directory; it's annoying to have to manually include(A.jl), though it's not a huge deal.

(If A.jl is in some other directory, of course, then you need the manual include.)

Jonathan Malmaud · Answer 22 · Fri Oct 30 2015 01:27:40 GMT+0800 (China Standard Time)

Is there a technical problem with have using .A search the current directory, or is just a design decision at this point? I would definitely favor having that behavior.

Stefan Karpinski · Answer 23 · Fri Oct 30 2015 02:04:55 GMT+0800 (China Standard Time)

At this point I think it's just a design issue. The fact that you can load code from a parent directory with multiple leading dots is kind of strange. To me there's also the question of whether using .B occurring in module A should load B.jl in the current directory or load A/B.jl. The former would tend to keep directory structures pretty flat, which may be a good thing, while the latter would tend to make them more nested. While I generally favor flatter directory structures (consider how ridiculous Java project file trees are), this would seem to tend to put everything in the top-level directory:

# A.jl
module A
    using .B
    using .C
end

# B.jl
module B
    using .D
    using .E
end

And so on – all of A.jl, B.jl, C.jl, D.jl and E.jl would be in the top-level directory, even though it seems like maybe B and C belong in an A directory and maybe D and E belong in a B directory. Moreover, if you want to have a nested directory structure, how would you even express that?

Steven G. Johnson · Answer 24 · Fri Oct 30 2015 02:07:39 GMT+0800 (China Standard Time)

It looks like it would be pretty easy to implement: modify the eval_import_path_ function in src/toplevel.c to add an else if (m == jl_current_module) clause after the if (m == jl_main_module), which looks for a var.jl file in the current directory.

Steven G. Johnson · Answer 25 · Fri Oct 30 2015 02:09:17 GMT+0800 (China Standard Time)

@StefanKarpinski, I thought that the proposal was that using .B would look in the directory of the file that the using statement occurs in (or pwd in the REPL). That's what most people would think of as the "current" directory, and is the same as the directory used for include("B.jl").

If you want a nested directory, or any other directory structure, you would just do include("B/B.jl"); using .B manually as you do now. Doing using .B would only look for a B.jl file if B were not already defined.

Stefan Karpinski · Answer 26 · Fri Oct 30 2015 02:15:59 GMT+0800 (China Standard Time)

That was my original proposal, but I'm wondering how one would introduce a nested folder structure using this mechanism? It seems to me that there wouldn't be any way to do it. One option would be to have module A; using .A.B; end be special syntax that loads for "A/B.jl". That would allow having parts of A defined in a directory. Maybe I'm overthinking this.

Steven G. Johnson · Answer 27 · Fri Oct 30 2015 03:03:19 GMT+0800 (China Standard Time)

@StefanKarpinski, you would introduce a nested folder structure by doing include("B/B.jl"); using .B manually as now; see above. (I edited my post after replying, so maybe you didn't see the 2nd paragraph.)

mattcbro · Answer 28 · Fri Oct 30 2015 03:13:45 GMT+0800 (China Standard Time)

@stevengj OK that works thanks. However please help me understand. What is the preferred use paradigm for creating and using local modules. Do we really have to have both an include() statement along with a using or import statement?

Perhaps the idea is to have a master script that has all of your includes in them? How do you folks do this? I notice that one person simply uses includes instead of using or imports for their local work.

Steven G. Johnson · Answer 29 · Fri Oct 30 2015 03:18:57 GMT+0800 (China Standard Time)

@mattcbro, yes, you currently need both include and using. You don't need import (doing include effectively also does import).

I mostly just use include and don't bother with submodules. The only reason to use submodules is if you want to segregate your namespace, but in that case I normally don't want to do using (I just want import and qualified names). For example, in the PETSc.jl module we are using a PETSc.C module for the raw wrappers around the low-level C interface to keep these zillions of functions from polluting the PETSc namespace, but then we use the fully qualified names, e.g. we do C.foo(...) to call the foo function. Hence the C module has no exports and we don't need using C.

Stefan Karpinski · Answer 30 · Fri Oct 30 2015 14:28:13 GMT+0800 (China Standard Time)

Still having to use include seems to defeat a lot of the purpose of this change.

Scott P. Jones · Answer 31 · Fri Oct 30 2015 22:25:42 GMT+0800 (China Standard Time)

Would having to have the include also mean you have separate copies of the sub module, instead of a single (possibly pre-compiled) one? If so, that seems like the biggest drawback to me, not the extra typing required.

Steven G. Johnson · Answer 32 · Fri Oct 30 2015 23:20:33 GMT+0800 (China Standard Time)

No, you would only have one copy. The purpose would be to save typing the redundant include if you do using .Foo in the common case where Foo.jl is in the same directory. Saving on typing is the only question in this whole thread — there has been and will be no change in functionality.

Stefan Karpinski · Answer 33 · Sun Nov 01 2015 03:04:37 GMT+0800 (China Standard Time)

Saving typing isn't the only issue to me – I'd like to get to a point where you don't need to use include in normal code. To that end, I'd like for relative using to be the way to decompose a module into files and directories. But maybe we as a project don't want that. We should have a conversation about it that doesn't include lots of ill-informed handwringing by people who've barely used Julia about "modularity".

Steven G. Johnson · Answer 34 · Sun Nov 01 2015 03:25:13 GMT+0800 (China Standard Time)

@StefanKarpinski, I like being able to split a long file into pieces without needing to create a submodule (which forces me to either export things or use qualified names).

Stefan Karpinski · Answer 35 · Sun Nov 01 2015 03:27:15 GMT+0800 (China Standard Time)

Ok, maybe we need some other modularity mechanism then. E.g. something where each file gets its own scope and it's less likely for unexported globals to collide across files.

Jonathan Malmaud · Answer 36 · Sun Nov 01 2015 03:28:33 GMT+0800 (China Standard Time)

Well, I don't think anyone's talking about removing include from the language. But it seems bad for it be a requirement for creating hierarchical packages where each file defines a module, which is a style a lot of people seem to like.

Stefan Karpinski · Answer 37 · Sun Nov 01 2015 03:54:46 GMT+0800 (China Standard Time)

Ok, I agree with that. But @stevengj's point is valid that having a separate module for each file is annoying because of exporting and importing, etc. One idea that was raised in a conversation I had at JuliaCon was that submodules would behave more like nested global scopes instead of independent scopes – i.e. this:

module A
    x = 1
    module B
        # x is visible here
        y = x + 1
    end
    # y is not visible here though
end

This would remove a lot of the annoyance of splitting things into submodules.

Jonathan Malmaud · Answer 38 · Sun Nov 01 2015 04:21:33 GMT+0800 (China Standard Time)

I like that, but what if a subfile sets what it thinks is a global variable with a line like X=1, but actually X was previously defined in the including file and has now been clobbered in that outer scope? You would need to get in the habit of defining submodule globals with 'local' or whatever the equivalent keyword would end up being.

Stefan Karpinski · Answer 39 · Sun Nov 01 2015 05:15:11 GMT+0800 (China Standard Time)

Presumably it would work like other scopes and doing X = 1 in the submodule creates a new X binding local to that file.

Jonathan Malmaud · Answer 40 · Sun Nov 01 2015 05:18:41 GMT+0800 (China Standard Time)

Maybe I'm confused, but wouldn't it work the same way this works now:

function f()
  x=1
  let 
    x=2
  end
  x
end

julia> f()
2

Stefan Karpinski · Answer 41 · Sun Nov 01 2015 05:26:51 GMT+0800 (China Standard Time)

I was thinking about the "scope gap" between global and function scope:

julia> x = 1
1

julia> function f()
           x = 2
       end
f (generic function with 1 method)

julia> f()
2

julia> x
1

Jonathan Malmaud · Answer 42 · Sun Nov 01 2015 05:29:54 GMT+0800 (China Standard Time)

Ah, right. +1 from me on having submodules have that kind of scoping semantics.

Scott P. Jones · Answer 43 · Sun Nov 01 2015 05:32:19 GMT+0800 (China Standard Time)

Yes, that is more like what I had expected to happen when I first started using Julia.
👍 to @StefanKarpinski's idea

Jonathan Malmaud · Answer 44 · Mon Nov 02 2015 13:22:05 GMT+0800 (China Standard Time)

A related question is how to handle defining methods on generic functions defined in the parent module. This won't work right now, but maybe it should?

module A
function f end

module B
f(::Int)=1
end

module C
f(::Float64)=2
end

f(1) 

end

mattcbro · Answer 45 · Tue Nov 03 2015 05:18:26 GMT+0800 (China Standard Time)

@stevengj If I just use includes, does precompiling work? I've just started messing around with precompiling and I'm trying to figure out the best work flow. (Nice addition by the way).

Most the examples I've looked at, had the precompile() statement associated with a module.

Jonathan Malmaud · Answer 46 · Tue Nov 03 2015 05:30:29 GMT+0800 (China Standard Time)

If it helps to understand, include does not nothing more than to instruct Julia to copy/paste the code in the file into where the include statement is at runtime. Everything behaves exactly as if you had just copied the code from the file yourself to where the include statement is.

Steven G. Johnson · Answer 47 · Tue Nov 03 2015 12:32:29 GMT+0800 (China Standard Time)

Yes, precompiling works with include.

Deleted user · Answer 48 · Mon Jan 16 2017 01:47:55 GMT+0800 (China Standard Time)

I am just starting to learn Julia, and I quickly hit this issue due to the misleading instructions in Workflow Tips.

I gather from the discussion in this thread that the instructions should read include("./Tmp.jl") rather than import Tmp. Is that correct?

Steven G. Johnson · Answer 49 · Mon Jan 16 2017 06:21:37 GMT+0800 (China Standard Time)

@meowklaski, no need for the ./. But yes, if you have a module Tmp in Tmp.jl in the current directory, then include("Tmp.jl") will also import it. You can also do using Tmp after running include("Tmp.jl") if you want to import the exported names from Tmp.

Stefan Karpinski · Answer 50 · Thu Sep 07 2017 02:29:05 GMT+0800 (China Standard Time)

Based on some in-person brainstorming yesterday, we (mostly with @malmaud, @JeffBezanson) came up with the following scheme. It doesn't only apply to relative using/import, but also doesn't apply to all relative using/import, so it's a bit cross-cutting to what this issue was original about. However, the end-point is quite similar to when we introduced the top-level code via the LOAD_PATH in effect: it allows one to have code following a certain convention to omit include calls and continue to work the same way, with the includes being implied by using/import statements.

If the module in an import resolves to a name that does not already exist and we are currently in the process of loading a prefix of that module, then if files with appropriate names exist, they will be included, and if that provides the desired modules, they be used. If we are loading A for example, and encounter using .B or equivalently using A.B within the definition of the A module, then we will look for B.jl relative to the location of the source path of A in two places:

If joinpath(dirname(A_path), "B.jl")exists, load it; otherwise
If joinpath(dirname(A_path), "B", "B.jl") exists, load it; otherwise
Raise error that B could not be found.

The premise is that a module is either provided by a single file of that name or a directory of that name with the file by that name as an entry-point. This implies a stack of paths to what one is currently loading, and you have to look through the whole stack for the innermost file you are currently loading which is a prefix of what you want to load. Some examples should help clarify:

While loading A from src/A.jl and B from src/B.jl:
- find A.B.C as src/C.jl or src/C/C.jl
While loading A from src/A.jl and B from src/B/B.jl:
- find A.B.C as src/B/C.jl or src/B/C/C.jl
- find A.D as src/D.jl or src/D/D.jl.

One will note that an absolute import like using A.B inside of module A can trigger this behavior; meanwhile a relative import like using ..B inside of top-level module A will not trigger this behavior, so this relative names are not really the significant feature here.

Another thing to observe is that there are many potential file hierarchies for a given module hierarchy. On one hand, that could be a bit confusing, but on the other hand, forcing a deep hierarchy when everything can easily be contained in a few top-level files is quite annoying.

Stefan Karpinski · Answer 51 · Fri Sep 22 2017 03:34:06 GMT+0800 (China Standard Time)

As far as I can tell, this is a non-breaking change so it could be moved off the 1.0 milestone in a pinch. It would be very nice to have for 1.0 however, so I'll leave it here for now.

Stefan Karpinski · Answer 52 · Fri Nov 17 2017 04:24:37 GMT+0800 (China Standard Time)

Resolved: we don't have time for this now and it's a non-breaking feature.

Xiang Ji · Answer 53 · Tue May 01 2018 21:42:50 GMT+0800 (China Standard Time)

IMO the documentation on modules needs more clarity. It should explain how one normally splits a project into multiple files. Currently there's only a very brief section on "modules and files" and it doesn't explain the issue well at all. I didn't know the correct way to proceed in my project (first include, then using) until I found this issue via Google. The documentation has been excellent but it seems possible for this section to do better. Any real project needs to be split into multiple files and I believe many people would be wondering about this.

Xiu-zhe (Roger) Luo · Answer 54 · Tue May 01 2018 21:58:16 GMT+0800 (China Standard Time)

@x-ji Agreed, and I think when this issue is solved, people will be able to use using/import only? since local modules can be loaded directly through them.

Steven G. Johnson · Answer 55 · Wed May 02 2018 04:34:23 GMT+0800 (China Standard Time)

@x-ji, the manual does explain how one normally splits a project.
Most modules need to be split into multiple files, but not multiple submodules. You can just have a single module that includes multiple files. That's why there are no using statements or submodules in that section of the manual.

Xiang Ji · Answer 56 · Fri Sep 21 2018 21:14:52 GMT+0800 (China Standard Time)

@stevengj Could you point out where in the documentation https://docs.julialang.org/en/v1/manual/modules/ is it stated how to normally split a project? I don't think it's clearly explained at all, certainly nothing about your suggestion of using only includes. The documentation begins with an example about import, using and export, and only much later does it mention the concept of include. In no way does it make it clear that one is expected to use include as the default way to organize a project. Or are you talking about a completely different section of the manual?

The way you're suggesting, that for most projects one would just define one module and includes all the other files in that one module (while paying attention to the include order and avoid circular dependencies), and completely ignore the using and import mechanisms, is just simply unintuitive for people who are used to most other languages. This also makes people feel a bit uneasy about code maintenance in large projects. OK I can get it if this is "the Julia way". But at least please state it clearly in the documentation.

Also after carefully reading through this thread I can finally get that currently there might be different ways to approaching project organization, one is to have no submodules at all and use includes only, another is to include and then using submodules. However the documentation is unclear on any of them, which IMO is unfriendly to newcomers. It can point out different approaches and give some examples.

Xiubo Zhang · Answer 57 · Fri Sep 21 2018 22:51:28 GMT+0800 (China Standard Time)

@x-ji I share your concerns about how Julia projects need to be organised differently from some other popular languages.

For me, having to manually "glue" together files in a package using include() in the package entry script was the most confusing part. In some other languages such as Python and Java, there exists a default convention that maps the names of the modules to the files that contain their definitions on the file system, so when a module is imported, the language runtime automatically knows where to find them. For Julia, my understanding is that this mapping exists for packages but not for modules, so the mapping has to be managed manually using all the include() statements.

In terms of the best practices for organising multi-file, multi-module Julia projects, I find the structures and strategies used by the Yao.jl package to be very sensible.

Steven G. Johnson · Answer 58 · Fri Sep 21 2018 23:31:08 GMT+0800 (China Standard Time)

@x-ji, see the section https://docs.julialang.org/en/v1/manual/modules/#Modules-and-files-1

Xiu-zhe (Roger) Luo · Answer 59 · Fri Sep 21 2018 23:50:24 GMT+0800 (China Standard Time)

Thanks @zhangxiubo for mentioning our package. I think when this issue is solved (as @StefanKarpinski proposed), we will be able to load files/modules locally without using include to organize them manually.

@stevengj I think what @x-ji want is to let the compiler itself to find the module and load them, which is just what is proposed in this issue. I was concerned about this problem once, since I was using Python, C/C++, I prefer to write all my dependencies of current script before I start implementing things. This will help those who is trying to read your code get to know what you are doing.

There was a debate in discourse about whether we should use include for organizing files, or this should be solved by the compiler itself.

https://discourse.julialang.org/t/what-is-the-preferred-way-to-manage-multiple-files/8969

I think the reason we have to write

#ifndef MAIN_H
#define MAIN_H

// code

#endif // MAIN_H

for C/C++ each time, is just because the compiler cannot handle file dependencies itself. I don't want Julia to inherit this feature as well... The proposed way of loading modules is quite similar to rust to me.

But unfortunately, at the moment (v1.0.0) if you want to let the compiler itself solve the file dependencies (without include), which means in each file

# A.jl
include("B.jl")
include("C.jl")

# B.jl
include("D/D.jl")

# C.jl
include ("D/D.jl")

will cause an error... This syntax may have the following disadvantages:

readability: it can be hard to know the file dependencies from a single file by others (You will have to find the dependencies in the upper most file, which includes everything)
stability: it may cause unexpected error while developing in a team, when the source code is not organized well manually.
hard-to-solve-order: when a file has multiple dependencies, it will be hard to solve the include order

@x-ji In my own experience, organizing files with include in Julia at the moment, should usually follow a tree structure, which will make the dependencies more linear. And try not to use deep hierarchy of modules. Most Julia project will just use only one module, and include linearly (but the include may have an order).

Uwe Fechner · Answer 60 · Tue Jul 23 2019 19:27:29 GMT+0800 (China Standard Time)

When will the originally issue be fixed? It should be very easy. It causes a lot of trouble for me when structuring my code. The workaround of using include and using does not always work (see: julia-vscode/julia-vscode#807 ).

Stefan Karpinski · Answer 61 · Wed Jul 24 2019 02:05:23 GMT+0800 (China Standard Time)

When someone gets around to it. I've thought about taking a crack at it several times in the past few months but haven't quite found the time. If someone else wants to give it a try, I agree that it shouldn't be that difficult.

Alok Singh · Answer 62 · Wed Dec 23 2020 16:55:44 GMT+0800 (China Standard Time)

@StefanKarpinski where is the relevant code for this? i'd like to work on it since i have some time free and would like to give back to the community

Steven G. Johnson · Answer 63 · Thu Dec 24 2020 04:49:01 GMT+0800 (China Standard Time)

@alok, using .Foo ultimately calls src/toplevel.c:eval_import_path, and I think that's probably where you would need to begin. In particular, I guess we want to handle the case where this line is currently throwing UndefVarError: Foo not defined.

Patrick Kidger · Answer 64 · Fri Dec 25 2020 09:11:15 GMT+0800 (China Standard Time)

I'm not sure this actually improves anything. This is still duplicates D:

# A.jl
module A
import .B
import .C
end

# B.jl
module B
import .D
end

# C.jl
module C
import .D
end

# D.jl
module D
end

which still necessitates a fix, e.g.:

# A.jl
module A
import .B
import .C
end

# B.jl
import D
module B
import ..D
end

# C.jl
import D
module C
import ..D
end

# D.jl
module D
end

The main difference just being that some include statements got swapped for import statements. In doing so muddying what is at present a clear distinction between include for files and import for modules.

What was once an error may now instead be a subtle, silent bug. Especally for folks coming from languages like Python, for which the first code block I've posted will seem to work as they expect -- silently introducing a duplication bug.

Incidentally this also couples files and modules together in a way that people didn't sound keen on in discourse.

These all seem like issues, or is there something I'm missing? (It's easy to imagine versions of this that don't have any of these issues..)

Xiu-zhe (Roger) Luo · Answer 65 · Sat Dec 26 2020 05:56:08 GMT+0800 (China Standard Time)

@patrick-kidger why do you think so? module D will only be loaded once if what is proposed here is implemented. - loading is different from including, these are two different concepts

update: I think you may talking about something more implementation detail, which is about when and where to eval the module, instead of loading, note the eval of module happens implicitly when you load a module - that means it may not eval if the module has been evaluated. but not about the proposal where @StefanKarpinski summarised above.

I tried to spend some time to implement this, I find this is more about where to eval (not loading) the sub-modules so that we can get the same symbol for the same file, and it seems there are two different cases:

loading relative module C from Main
loading relative module C inside a project module MyProject from file MyProject/src/C.jl

And if we follow how we currently load packages, which evals the package module in __toplevel__:

for case 1 this is simple, eval the module in __toplevel__ seems to work fine, since if there can't be two D.jl in the current working directory
for case 2 eval the module in __toplevel__ seems to be problematic now, there can be another project module OtherProject loads C in OtherProject/src/C.jl that is actually a different module, which will cause the MyProject's C get replaced

to resolve this issue, it seems to make sense to always eval the module inside:

if there is a parent module that is defined of the name of the directory, e.g A is defined in A/A.jl, B is defined in A/B.jl, then B will be eval in A when it's loaded (with or without creating the symbol B in module A's global scope)
if the parent module does not have a path, which is usually Main, it is eval in Base.__toplevel__

the above implementation should give one the same object of a relative module, but now I have some implementation questions:

what is the best way to eval B in module A without creating the symbol B in the global scope of A? I currently can only think about creating an implicit bare module A.__toplevel__ for this.

Stefan Karpinski · Answer 66 · Sat Dec 26 2020 22:37:27 GMT+0800 (China Standard Time)

I think @patrick-kidger makes a good point. In general, the way caching of modules loaded via import works is that there's a canonical place where the loaded modules is found and if that module already exists, then the import gives it to you and otherwise some includes are done to load that module. In his example, when import .D occurs in B it includes D.jl in B, which is expected to define a module named D (otherwise it would be an error like it is for package import). This, of course, does nothing to create a module named D inside of C so when import .D occurs in C it would do the same thing, creating a duplicate, unrelated (but identically defined) module D inside of C. Of course, we could recall that D.jl has been loaded due to an import and remember the module that resulted and instead of including D.jl again, we could just create a new binding for D inside of C for the same module loaded as B.D. That's a bit weird, however, since the details of the resulting module depend on where it was loaded from first. For example it would have a different fullname:

julia> module A
           module B
               module D
               end
           end
       end
Main.A

julia> m = A.B.D
Main.A.B.D

julia> fullname(m)
(:Main, :A, :B, :D)

If A were a package instead of a module in Main we would have fullname(m) == (:A, :B, :D). Let's consider that situation but we can always take A == Main to see what would happen in Main. If we did what is the more current proposal in this issue, it would lead to a situation where if import .D occurs inside of B first you would get fullname(m) == (:A, :B, :D) whereas if import .D occurs inside of C first you would get fullname(m) == (:A, :C, :D). That violates the general principle that we try to maintain that import order should not matter.

The obvious solution would be that you have to write import ..D in order to load D.jl in the current source directory, which would make D a sibling module to B and C instead of a child in each. In other words, it would actually cause the full name of D to be (:A, :D) rather than (:A, :B, :D) or (:A, :C, :D). That's fine as far as it goes, but what if import ..D occurred in the top-level package module? Then you'd have D as a sibling module to A rather than a child of it, i.e. the fullname of D would simply by (:D,) and it would be a top-level root module just like A and other package modules and Main are. It would, however, be accessible as D from inside of A since that's how it was loaded.

Is that a good idea? It's not entirely clear to me. It doesn't seem to cause any technical problems. We have, however, not generally encouraged people to write packages that define more than one top-level module even though it is possible to do so.

Stefan Karpinski · Answer 67 · Sat Dec 26 2020 22:57:32 GMT+0800 (China Standard Time)

If we want to take as principles:

Loading files via import alone cannot accidentally load the same file twice as two different modules.
Import order does not matter — the resulting module should be the same no matter who loads it first;

Then I think we can conclude that if we're loading A.B from B.jl in the top-level directory of package A and we see import .D, then we cannot load D.jl from the top-level directory, since if that works, then it would also work while loading A.C from C.jl, which either leads to a violation of the first rule if these imports produce separate modules or a violation of the second rule if these produce the same module, which would have to either have a fullname of A.B.D or A.C.D depending on the import order.

The other case to consider is when we want A.B.D and A.C.D to be different, which should be possible to express as well. Of course, they should have different source files, which means that they should be loaded from B/D.jl and C/D.jl, respectively.

Stefan Karpinski · Answer 68 · Sun Dec 27 2020 00:53:23 GMT+0800 (China Standard Time)

This leads to an interesting pickle:

while loading A from A.jl if we see import .B we do want to load it from B.jl
while loading A.B form B.jl if we see import .D we do not want to load it from D.jl
while loading A.B form B.jl if we see import .D we do want to load it from B/D.jl

More generally, since two different submodules, B and C, can be defined in the same directory — the same file even — we must include B and C as path components in the path where we look for the file to load to provide .D relative to each of them. That is when we see import .D in modules A.B and A.C we should look in path that ends with B/D.jl and C/D.jl, respectively, rather than just D.jl, so that these imports are certain not to load the same path. At the same time, while we're loading A, if we see import .B we probably want to load it from B.jl rather than A/B.jl although the latter would make things easily consistent although more deeply nested than we probably want.

One approach that could accomplish this is to ignore the current source file's location and just locate all relative module imports relative to the source directory of the current package. That means that if the package A has src/subdir/file.jl as where the B module is defined, if it has import .D in it, then the one and only place where that would be looked for is src/B/D.jl. While loading top-level code rather than a package, instead of starting at the packages' src directory it would start in the current directory (pwd).

Xiu-zhe (Roger) Luo · Answer 69 · Sun Dec 27 2020 06:20:52 GMT+0800 (China Standard Time)

If A were a package instead of a module in Main we would have fullname(m) == (:A, :B, :D). Let's consider that situation but we can always take A == Main to see what would happen in Main. If we did what is the more current proposal in this issue, it would lead to a situation where if import .D occurs inside of B first you would get fullname(m) == (:A, :B, :D) whereas if import .D occurs inside of C first you would get fullname(m) == (:A, :C, :D). That violates the general principle that we try to maintain that import order should not matter.

I think if we allow evaluating the module inside a __toplevel__ module of the current folder's module, this problem is automatically resolved. And this is probably more intuitive since then the module loading respects the file structure. Let me explain this with some examples:

now let me define loading and eval to make it clearer:

eval: evaluate the module file (e.g D.jl) in a parent module (this is Base.include)
loading: when the symbol of the target module is created, it may or may not trigger the evaluation of the module file (this is the Base.require)

I think the question is about which is D's parent module?

My proposal is to let each directory of a package has its own __toplevel__ module, for the above example, it means there are the following implicit modules:

MyPackage.__toplevel__
C.__toplevel__
Base.__toplevel__ (the global top-level module, which is currently provided)

now the proposed behavior is:

Case 1: package loading

the package has the following structure

.
├── LICENSE
├── Manifest.toml
├── Project.toml
├── README.md
├── src
 │         ├── A.jl
 │         ├── B.jl
 │         ├── C
 │          │         ├── C.jl
 │          │        └── D.jl
 │         ├── D.jl
 │        └── MyPackage.jl
└── test
          └── runtests.jl

and the loading relationship:

# MyPackage.jl
module MyPackage
using .A
using .C
end

# A.jl
module A
using .D
end

# B.jl
module B
using .D
end

# C/C.jl
module C
using .D
end

# D.jl
module D
end

# C/D.jl
module D
end

for the above program, expect behavior is:

A wants to load D from D.jl
B wants to load D from D.jl
C wants to load D from C/D.jl
MyPackage wants to load A from src/A.jl
MyPackage wants to load C from src/C/C.jl

When we load the package MyPackage, we traverse the module structure from top to bottom, which means when we load MyPackage, we won't see B and B wants to load D, and we can always expect
to load a directory module first, as result the behavior of loading becomes:

when A loads D, we evaluate D.jl in MyPackage.__toplevel__, so we get the module object as MyPackage.__toplevel__.D, then create a binding in A so that A.D === MyPackage.__toplevel__.D is true
when C loads D, we evaluate D.jl in MyPackage.C.__toplevel__.D, then create a binding D in C so that C.D === C.__toplevel__.D is true

Case 2: script loading

assume we have some scripts in a folder

 .
├── A.jl
├── B.jl
├── C
 │   ├── C.jl
 │   └── D.jl
└── D.jl

and has a similar loading relationship

# A.jl
module A
using .D
using .C
end

# B.jl
module B
using .D
end

# C/C.jl
module C
using .D
end

# D.jl
module D
end

# C/D.jl
module D
end

now if we execute julia A.jl, the proposed behavior is:

load module A in Main (the current existing behavior)
load module D in Base.__toplevel__ and create binding A.D so that A.D === Base.__toplevel__.D
load module C in Base.__toplevel__ and create binding A.C so that A.C === Base.__toplevel__.C
load module D (from C/D.jl) in C.__toplevel__ and create binding C.D so that C.D === C.__toplevel__.D

Summary

Evaluating the module inside a __toplevel__ module is how we currently handle the packages, since unlike packages, we only need to guarantee relative module is identical at runtime, we just need to apply this rule recursively on all local directories, the problem will be resolved automatically. I think this behavior will now be consistent with packages and intuitive to use in this way.

Patrick Kidger · Answer 70 · Sun Dec 27 2020 06:21:39 GMT+0800 (China Standard Time)

@StefanKarpinski

The obvious solution would be that you have to write import ..D in order to load D.jl in the current source directory, which would make D a sibling module to B and C instead of a child in each.

You highlight some potential inelegance issues with this approach. But additionally, what if B and C are not siblings of each other? It is then impossible for D to be a sibling to both. The basic problem being that in general there can be arbitrary mismatch between the filesystem tree and the module tree, and D can be imported from anywhere.

One approach that could accomplish this is to ignore the current source file's location and just locate all relative module imports relative to the source directory of the current package. That means that if the package A has src/subdir/file.jl as where the B module is defined, if it has import .D in it, then the one and only place where that would be looked for is src/B/D.jl. While loading top-level code rather than a package, instead of starting at the packages' src directory it would start in the current directory (pwd).

This seems complicated: names are coupled between the module B in src/subdir/file.jl and the filesystem src/B; it implies that B's imports depend upon the larger structure of the (potentially large) package, rather than just the part local to it; etc.

1. Loading files via `import` alone cannot accidentally load the same file twice as two different modules.

2. Import order does not matter — the resulting module should be the same no matter who loads it first;

These principles seem sound to me.

Taken together, they would seem to imply to me that the location of an import <file> statement cannot be used to determine the location of any modules (defined by <file>) within the module hierarchy: the tree of modules can be arbitrarily complicated, and import <file> statements can occur multiple times at arbitrary locations within it.

Here's a proposal: copy the pattern that's already used for packages. After all, files are essentially "mini-packages" within your package, so the wheel doesn't need reinventing here.

Explicitly: the first time an import <file> statement is encountered then load it at the global level. Then always insert a reference to that. Disambiguate each file(=module) by what's already disambiguating them in reality, i.e. their location within the filesystem.

So a fullname of the form (:PackageName, :__imports__, path-to-file...), where :PackageName may be :Main.

Nothing gets duplicated; import order is irrelevant; every module gets access to what it requests. And in fact dotted lookup still works, for the same reason that this works: module A end; module B import ..A end; B.A.

EDIT: encouragingly, it looks like @Roger-luo and I posted similar solutions at the same time!

By the way, I'm realising that I didn't @ you in my response to you in the (very rapidly growing) discussion over on discourse, which perhaps I should have done.

Stefan Karpinski · Answer 71 · Sun Dec 27 2020 06:55:16 GMT+0800 (China Standard Time)

By the way, I'm realising that I didn't @ you in my response to you in the (very rapidly growing) discussion over on discourse, which perhaps I should have done.

Don't worry, I'm paying attention to that thread as well, I'm just currently more interested in how to solve this specific issue than the broader discussion there, although I will probably post something there as well.

Stefan Karpinski · Answer 72 · Sun Dec 27 2020 07:01:41 GMT+0800 (China Standard Time)

This seems complicated

It's not really: if you need A.B.D then you load B/D.jl in the package A. That's it. Doesn't matter if the import is absolute or relative; the full name of the module determines the path it is defined at. Nothing else matters, including the path where it happens to be loaded from.

Xiu-zhe (Roger) Luo · Answer 73 · Sun Dec 27 2020 08:25:59 GMT+0800 (China Standard Time)

@StefanKarpinski won't this imply that if I want to use D from D.jl in B and C, I will need to write

module  A

import .D

module B
import ..D
end

module C
import ..D
end

end

which means I will need to write this import statement twice for every module like D? this seems redundant to me and for example user may not want the symbol D inside A at all. Should we let import ..D be loading D.jl at the same directory then?

And if I understand correctly, how current import D: a, b, c works is by storing the package module D in loaded_modules and evaluate it in __toplevel__ so that D does not get imported into the parent module, if we want to implement this for relative modules, we still need to have a directory __toplevel__ for evaluating the modules? or how do we hide the symbol D when it is loaded via using .D: a, b, c. I assume if we simply let every module contains a __toplevel__ module for module evaluation would make implementation a lot simpler? It seems the syntax convention does not matter - where the module gets evaluated in implementation matters here.

if we allow a __toplevel__ module to be defined in each module for evaluating relative modules, the original proposal will just work by changing Base.require

Stefan Karpinski · Answer 74 · Mon Dec 28 2020 00:45:24 GMT+0800 (China Standard Time)

I was thinking that the import .D inside of module A would not be required, but then it is quite weird that a binding would be created inside of A when A didn't create a binding, import anything or call include.

There are a couple of separate issues here:

Do these relative imports that auto-load essentially just act like an implicit include (in some module) followed by the very same import? Or do they act more like package imports where the imported module is loaded into some anonymous toplevel place and then bound just into where the import occurs?
When an auto-load import occurs, how is the file that is included determined? There are a few potential inputs: (a) the source root of the current package / pwd when in Main; (b) the fullname of the resulting module that will end up being bound; (c) the path of file in which the import occurs.

The existing relative import mechanism just navigates the module hierarchy and looks for modules and then does the import, so there is no notion of import <file>: the RHS of an import is a module, not a file. When you see import .X that means "look up the binding X in the current module and import it as a module." When you see import ..D that means "look up the binding X in the parent module of this module and import it as a module."

I'm not sure what the right choice for decision 1 above is: on the one hand, it's less of a novel feature to have auto-load imports just be equivalent to inserting an include somewhere, but then the feature either only works for import .X or import ..X introduces spooky action at a distance in the parent module, which is bad. Perhaps it's better for the autoload feature to load the imported module like a package in total isolation and then import it. It's more different, but then that code is guaranteed to be a pure dependency of the code depending on it since it would have no way of even accessing the module into which it's being loaded.

Another consideration is that people have asked for the ability to load just a slice of a package without the rest of the package and if auto-loaded internal imports are loaded like packages, then they could also be loaded independently and imported by external code. The tricky part there would be knowing if it's ok to load the code like that or not: if you see some external code do import A.B.D does that require loading all of package A or is it ok to just load A/B/D.jl as a top-level mini-package and then provide it? Of course at that point there's an even stronger case to be made that this should be spun out into its own package, but it's worth connecting the two issues.

Regarding decision 2: I can see the appeal of import .X meaning import the file X.jl in the same directory as the current source file. The presumably import ..X would presumably mean import the file X.jl in the parent directory of the current source file, etc. One of the issues that troubles me is that module M; import .X; end in a source file would do something different from eval(M, :(import .X)) in a different source file. Note that this doesn't happen with packages: import X means the same thing in the same module no matter how it is evaluated. Perhaps, if we're going to have a file-oriented auto-load import mechanism it would be better to use the syntax import "X.jl" that was proposed early in this issue.

Here's another problem. Suppose import .D occurs in submodule B inside of package A. The fullname path to that module is A.B.D. What if I replace import .D with import A.B.D. Shouldn't that work in the same way? Of course this interacts with the idea of external imports being able to load slivers of a package like this: the external code cannot load D with import .D, it would have to load it with import A.B.D. But even within a package, it feels a bit wrong that importing a module using a relative path and importing it using an absolute path to the same module would do different things. Given that, how would we make sure that import .D and import A.B.D do the same thing? One option would be to strip off the common fullname for the current module and the module being imported and then treat import A.B.D as import D.jl. But then this feature is no longer so clear. It would seem more obvious if import A.B.D and import .D both loaded B/D.jl inside of package A because that's the fullname of the resulting module. Either that or use the import "D.jl" syntax instead and make it clear that the imported thing is a path.

Here's another issue to consider: what does import D.E (or import "D/E.jl") mean if D doesn't exist? Does it try to load D.jl or does it try to load D/E.jl?

Patrick Kidger · Answer 75 · Mon Dec 28 2020 01:40:34 GMT+0800 (China Standard Time)

... Or do they act more like package imports where the imported module is loaded into some anonymous toplevel place and then bound just into where the import occurs?

+1 on this, via:

Perhaps it's better for the autoload feature to load the imported module like a package in total isolation and then import it. It's more different, but then that code is guaranteed to be a pure dependency of the code depending on it since it would have no way of even accessing the module into which it's being loaded.

As this gives the kind of strict dependency-tracking that I think makes this kind of feature so useful in the first place.

Regarding decision 2: ... Perhaps, if we're going to have a file-oriented auto-load import mechanism it would be better to use the syntax import "X.jl" that was proposed early in this issue.

Also +1 for this syntax. Much clearer what's going on, avoids problems in the scenarios you highlight, and avoids ambiguity in scenarios like

# A.jl
module A
    module B end
    import .B
end

# B.jl
module B
end

Another consideration is that people have asked for the ability to load just a slice of a package ... if you see some external code do import A.B.D does that require loading all of package A or is it ok to just load A/B/D.jl as a top-level mini-package and then provide it?

I think either is fine, but this isn't a point I have strong opinions about. (If import A occurs afterwards, it should get access to the same D, of course.)

Here's another issue to consider: what does import D.E (or import "D/E.jl") mean if D doesn't exist? Does it try to load D.jl or does it try to load D/E.jl?

I'm not sure how trying to "load "D/E.jl"" is is possible if "D doesn't exist". (Or why loading "D.jl" is on the cards.)

Xiu-zhe (Roger) Luo · Answer 76 · Mon Dec 28 2020 03:47:18 GMT+0800 (China Standard Time)

Do these relative imports that auto-load essentially just act like an implicit include (in some module) followed by the very same import?

If it's the same as an implicit include then why not use include? which is more explicit, explicit is better.

Or do they act more like package imports where the imported module is loaded into some anonymous toplevel place and then bound just into where the import occurs?

I think this is necessary to implement features like using .D: a, b, c since we are not supposed to create the binding here, but still need to evaluate D somewhere.

When an auto-load import occurs, how is the file that is included determined? There are a few potential inputs: (a) the source root of the current package / pwd when in Main; (b) the fullname of the resulting module that will end up being bound; (c) the path of file in which the import occurs.

Perhaps it's better for the autoload feature to load the imported module like a package in total isolation and then import it. It's more different, but then that code is guaranteed to be a pure dependency of the code depending on it since it would have no way of even accessing the module into which it's being loaded.

I agree but then may need to find a way to give the module file a unique identifier since we won't have UUID for it.

The tricky part there would be knowing if it's ok to load the code like that or not: if you see some external code do import A.B.D does that require loading all of package A or is it ok to just load A/B/D.jl as a top-level mini-package and then provide it?

I think this makes a lot of sense for meta-packages - they are just a combination of several packages for convenience, there is no need to load other parts. So this would help in speeding up the loading time a lot I think.

Here's another issue to consider: what does import D.E (or import "D/E.jl") mean if D doesn't exist? Does it try to load D.jl or does it try to load D/E.jl?

I think import D.E will error in this case since there is no module identifier D, but import "D/E.jl" should work since it is a path. The syntax difference between import A.B.C and import "path" I believe is about whether we should enforce a semantic about module identifier.

One possible case is there are two modules in one file

# A.jl
module A end
module B end

it is clear that we will only import A if we write import .A, but if we write import "A.jl" what does it mean? I don't see the path statement make sense here and it would be strange if we allow strings without allowing us to use of arbitrary path names. If we are going to allow using a path, then the proper statement should be written as

using "path/to/file.jl": A, B
using "path/to/file.jl".A: a, b, c

which will evaluate the file in a top-level module (either globally or in parent module) then create bindings for A and B instead of just using "path/to/file.jl" and error.

But I agree, using a path string seems more correct to me, given, unlike python, we don't enforce modules to attach with a file or directory thus we will need the extra syntax to distinguish this. But I think we may want to consider using "D.jl" or import "D.jl" as an invalid syntax and just error since it does not say anything explicitly about the module identifier D (it's not the same as path) or it may contain several modules.

PS. @fonsp may be interested in the using <path> proposal, but maybe we should create a new issue about importing URLs if this issue is gonna provide the syntax eventually.

Fons van der Plas · Answer 77 · Mon Dec 28 2020 04:35:45 GMT+0800 (China Standard Time)

[opinion, feel free to hide]

Keep in mind that codebase management is a difficult-to-learn aspect of many programming languages, including julia (include vs import .A vs import A vs using A vs import A.B vs ]add A vs ...), and new users are exposed to it relatively early.

Right now, the difficulty is contained because you can teach (python) users that "in Julia, file/folder structure is only used in include". The using LocalPath.LocalFile syntax seems like a potential point of confusion, the ES6-style suggestions by @Roger-luo sound easier to explain (even simpler than the current syntax).

If it's about fewer keystrokes, why not add a macro to our vs code extension? (And of course I'm happy to make any Julia-specific additions to Pluto!)

Xiu-zhe (Roger) Luo · Answer 78 · Mon Dec 28 2020 04:45:57 GMT+0800 (China Standard Time)

I realize I was referring to @patrick-kidger 's proposal on having a new keyword from and it is the ES6-style as @fonsp mentioned, which is used to specify a file path, I think this is better than using <path>, since: (a) the colon mark : is quite unreadable combined with quote mark ". (b) the syntax is an addition with current syntax, which might make @fonsp happy - one does not have to learn it from the beginning. (c) it provides the semantic asking for an explicit identifier from the user instead of just a file path, so it will look like something as following:

(update: I realized the from should be in the front so we can import multiple modules correctly.)

from "path/to/file_contains_A.jl" import A
from "path/to/file_contains_B_C.jl" import B, C 
from "path/to/file_contains_A.jl" using A: a, b
from "path/to/file_contains_A.jl" import A: name as foo

Tamas K. Papp · Answer 79 · Mon Dec 28 2020 17:41:30 GMT+0800 (China Standard Time)

Having reread the whole discussion, I am not sure that the original goal

there's a file called Foo.jl in the same directory as Main.jl it should be loaded

is the right solution given the changes to the loader since 2013.

I would argue for using / import to work only through the package loading mechanism: the symbols are looked up in the stacked environment, and then the loader takes care of paths etc.

using / import should never care about the current directory or the filesystem directly.

Uwe Fechner · Answer 80 · Mon Dec 28 2020 21:47:37 GMT+0800 (China Standard Time)

using / import should never care about the current directory or the filesystem directly.

I do not agree at all. Why should every module you load be part of a package? I find this very annoying if I just want to write an application and no package. I am very much in favour of implementing the original idea. Not doing it is a severe draw back of Julia compared to many other programming languages.

Tamas K. Papp · Answer 81 · Mon Dec 28 2020 22:04:53 GMT+0800 (China Standard Time)

Why should every module you load be part of a package?

This is not really a restriction, a "package" is a very lightweight thing as far as the loader is concerned — just a file ModuleName.jl is sufficient. Note that just putting all modules in src/ in a single package works fine, eg see how FinEtools.jl is organized. (That said, even an application should be a project, if not a package.)

Your use case is valid, it's just that using and import are not the right place to specify where the code is on the filesystem if it happens to be outside LOAD_PATH.

Uwe Fechner · Answer 82 · Mon Dec 28 2020 22:59:56 GMT+0800 (China Standard Time)

Well, why should I have to specify a load path if the module I want to load is present in the current directory? I just don't understand why that is needed. And the load path is ignored by vscode, a longstanding issue, no solution so far: julia-vscode/julia-vscode#307

Patrick Kidger · Answer 83 · Mon Dec 28 2020 23:45:10 GMT+0800 (China Standard Time)

Having reread the whole discussion, I am not sure that the original goal

there's a file called Foo.jl in the same directory as Main.jl it should be loaded

is the right solution given the changes to the loader since 2013.

I would argue for using / import to work only through the package loading mechanism: the symbols are looked up in the stacked environment, and then the loader takes care of paths etc.

using / import should never care about the current directory or the filesystem directly.

Does this admit a diamond dependency pattern? (More generally any DAG.)
It's not clear to me that it does.

EDIT: also, I think any symbol-based approach has ambiguity problems with the existing use of using/import to get access to modules, in the same way I outlined above.

Xiu-zhe (Roger) Luo · Answer 84 · Tue Dec 29 2020 02:12:42 GMT+0800 (China Standard Time)

Well, why should I have to specify a load path if the module I want to load is present in the current directory? I just don't understand why that is needed. And the load path is ignored by vscode, a longstanding issue, no solution so far: julia-vscode/julia-vscode#307

I agree with you.

This is not really a restriction, a "package" is a very lightweight thing as far as the loader is concerned — just a file ModuleName.jl is sufficient. Note that just putting all modules in src/ in a single package works fine, eg see how FinEtools.jl is organized. (That said, even an application should be a project, if not a package.)

packages/projects are much heavier thing comparing to a single script file. If Julia is a static compiled language, that is fine, because you will have compile configs, etc. anyway, but Julia is a dynamic language, and there are people who want to use it for scripting, so asking everyone to put things in a project makes no sense - especially it's not something emphasized enough in the documentation and quite against the effort like Pluto that trying to make a single file more accessible.

using / import should never care about the current directory or the filesystem directly.

Partially agree, using/import only says what identifier to import, thus we need an extra semantic to say from where.
because we have defined the argument of using/import to be a Julia identifier. That's why I think we will need the from keyword as I proposed, which is fully compatible with previous Julia code and previous Julia code and we can implement forward/backward compatibility via a macro @from

I tried to explain to @patrick-kidger my idea on this from keyword and evaluation in __toplevel__ proposal, maybe he has a better explanation than me if the above proposals do not convince you yet.

Patrick Kidger · Answer 85 · Tue Dec 29 2020 06:09:34 GMT+0800 (China Standard Time)

Not sure what I can add that hasn't already been said. At the very least, @Roger-luo and I have had a small discussion and have converged on what we think works best. Namely, access files by a syntax of the form from "B.jl" import some_obj, another_obj (or equivalently something like import "B.jl": some_obj, another_obj if introducing another keyword is undesired). Each file is evaluated in isolation and stored somewhere global (__toplevel__), to avoid duplication issues. Bindings are taken wrt this single evaluation of the file.

(If it would be helpful we can put together a more complete write-up on the problem / why this is the solution / the various alternate proposals.)

Patrick Kidger · Answer 86 · Tue Jan 05 2021 06:57:33 GMT+0800 (China Standard Time)

I'm interpreting those reactions as a request to offer the write-up. :)

So @Roger-luo and have converged on what we think is the solution. What follows is a description of the problem, our proposed solution, and what goes wrong with the various alternate proposals.

We've also put together a macro-based implementation: FromFile.jl. It's a fully tested package, so we'll probably register it soon.

Specification

Problem

Files (as distinct from modules and packages) naturally exhibit a dependency structure. Getting access to one file from another currently relies on using include, usually in some "parent" file.

This has two major issues:

The dependency structure between files is not made explicit.
Topologically sorting the dependency structure (to determine include order) is a burden placed upon the developer.

It addition, it also often necessitates an unnecessarily verbose include("file_containing_mymodule.jl"); import MyModule. (Which seems to be what initially prompted this issue.)

Solution

The proposal is to extend the using/import syntax, by giving it a mode by which it can access files.

Each file loaded in this manner would be evaluated in total isolation, stored globally in its package (which may be Main), and a binding added to the current context.

If a file has already been loaded then it would be looked up in the global reference rather than being re-evaluated, which avoids duplication issues.

File identity is determined by filesystem location.

If the specified file does not exist in the specified location, then an error is raised.

Files are looked up relative to the filesystem location of the file in which the statement is written.

As every file then imports its dependencies, then both of the major issues previously identified are resolved.

Syntax

The suggested syntax is from "../folder/file.jl" import myobj1, myobj2, which would expect and require objects with names :myobj1, :myobj2 to be defined inside file.jl. These objects could be modules, functions, etc.

If all of myobj1, myobj2, etc. are modules, then import may be replaced with using to instead get access to all symbols exported by those modules. Likewise the other usual variants on this syntax are supported, ... import mymodule: myobj and so on.

Implementation

The above is essentially syntactic sugar for:

If PackageName.var"folder/file.jl" does not already exist:
- Create PackageName.var"folder/file.jl" as a module.
- include("folder/file.jl") into PackageName.var"folder/file.jl".
Evaluate one of the following expressions, according to the precise syntax used, where for readability we let m denote PackageName.var"folder/file.jl":
- from "folder/file.jl" import myobj1, myobj2:
  import m: myobj1, myobj2
- from "folder/file.jl" import mymodule: myobj1, myobj2:
  import m.mymodule: myobj1, myobj2
- from "folder/file.jl" import mymodule.myobj1, mymodule.myobj2:
  import m.mymodule.myobj1, m.mymodule.myobj2
- from "folder/file.jl" using mymodule1, mymodule2:
  using m.mymodule1, m.mymodule2
- from "folder/file.jl" using mymodule: myobj1, myobj2:
  using m.mymodule: myobj1, myobj2

Wrapping each file into a module is essentially necessary to isolate the contents of each file; however this is an implementation detail not exposed to users.

Alternate proposals

One proposal was to use import "../folder/file.jl", and to expect and require a module with name :file to be defined inside "file.jl". However this has additional limitations:

It does not naturally introduce any symbols into the current scope.
It does not mesh as well with current Julia, which allows for multiple modules in a file.
It requires defining a module of the same name as the file, which is a small amount of extra overhead.

One proposal was to use the syntax import "../folder/file.jl": myobj1, myobj2. However this make it seem like import "../folder/file.jl" should also be valid, which it is not. (As we don't want to enforce a file<->module equivalence.)

One proposal was to to use the syntax import .file or import ..file. However this has ambiguity issues, as the same syntax can be used to import modules in the same file. (Given the right module structure at the point it is invoked.)

For the above reasons, introducing an additional keyword was seen as the neatest approach.

One proposal was to locate things in Packagename.__toplevel__ (or some other name like PackageName.__imports__), rather than just in PackageName. However this doesn't work with precompilation of packages, which produce errors due to the __toplevel__ module already being closed. This would mean that we don't pollute the main package namespace, though, so a way to have this work would be desirable.

One proposal was to use the syntax from ..folder.file import myobj1, myobj2. However the current syntax better supports getting the file from an arbitrary URI. For example a proposed extension was to accept URLs, if there is interest in this in the future.

One proposal was to try and hook into the existing package loading mechanism, using that to lookup symbols into paths. However doing so may have ambiguity issues as above, and would introduce substantial extra boilerplate in the form of Project.toml/Manifest.toml files potentially in every subfolder.

One proposal was to use import "file.jl" as a shortcut for include("file.jl"); import .file. However this does not offer a meaningful improvement in functionality, and in particular does not solve the two main problems identified at the start.

One proposal was to demand that the filesystem lookup should be done relative to the source root of the package, or pwd in the case of Main. (Rather than relative to the file in which the from statement is located.) However this means that each file now has nonlocal dependency, on the entire structure of the rest of the package; for example this makes moving whole folders of files much harder.

One proposal was to ignore the source file's location and use the current module's name to perform lookup wrt the source root of the package; i.e. to look in src/B/D.jl when encountering from "D.jl" import ... within the module B. However this lacks the required expressivity, as it can only express trees, not DAGs.

Stefan Karpinski · Answer 87 · Thu Jun 10 2021 23:49:47 GMT+0800 (China Standard Time)

As we don't want to enforce a file<->module equivalence.

I don't agree with this: I think we do want to enforce a file-module equivalence for this. Why wouldn't we? I think this correspondence is one of the popular features of both Python and Java which seem to be the primary inspirations here.

I'm reposting some responses to this proposal that I posted on discourse:

This may be superficial, but with the leading from blah import syntax that’s proposed is far too Python-influenced and doesn’t fit with how imports work in Julia, which is import|using followed by an identifier of what module to import followed by names to import.
It has way too much flexibility and features: the ability to specify a file name and one or more modules and multiple names to import is way over the top. Imports is already an aspect of the language with too much surface area and variations, which we want to reduce, not increase even further. A proposal with this many variations is not going to fly.

Basically, I think we should have a file-module correspondence for this, which alleviates some of the featuriness. That still leaves some options, however. Here's one — which of these do we do?

Use the existing import .A syntax and map .A to an implicit file path
Introduce a new syntax like import "A.jl" and derive the module name implicitly from the file name

Another question is whether we want the module M ... end wrapper to be required or implicit. I favor it being implicit, but I also favored that back when we created packages in the first place but that's not what we ended up doing. Now it's a bit weird for packages to require the explicit module P ... end while these "little packages" don't require it.

Christopher Rackauckas · Answer 88 · Thu Jun 10 2021 23:56:37 GMT+0800 (China Standard Time)

Introduce a new syntax like import "A.jl" and derive the module name implicitly from the file name

That seems like a nice compromise between being explicit about code and file locations while cutting down on the overall number of steps.

pdeffebach · Answer 89 · Thu Jun 10 2021 23:59:22 GMT+0800 (China Standard Time)

I don't agree with this: I think we do want to enforce a file-module equivalence for this. Why wouldn't we? I think this correspondence is one of the popular features of both Python and Java which seem to be the primary inspirations here.

Being able to be flexible with files is a major benefit of Julia. Enforcing file-module equivalence would be very annoying to me, perhaps due to my lack of background in python and javascript. Commenting here so it doesn't look like there is a consensus towards one-module-one-file rules.

Petr Krysl · Answer 90 · Fri Jun 11 2021 00:20:56 GMT+0800 (China Standard Time)

So @Roger-luo and have converged on what we think is the solution. What follows is a description of the problem, our

In my opinion this "solution" would increase the cognitive load on the developer, and most certainly on the user. Now I have to keep track of files? No thank you, I have had enough of it in python!

There must be a better way of addressing the perceived "problem".

Brian Chen · Answer 91 · Fri Jun 11 2021 00:29:24 GMT+0800 (China Standard Time)

This is an odd argument to me, as if anything include(...) requires more effort to keep track of files while also not giving any syntactic affordances of what external dependencies said files expect to exist. As mentioned before, this "spooky action at a distance" imposes additional mental overhead for readers of code and makes life difficult for automated tooling such as goto definition.

Petr Krysl · Answer 92 · Fri Jun 11 2021 00:39:56 GMT+0800 (China Standard Time)

Perhaps we have very different development models in mind.
When I hear "file" I imagine something that is totally open to outside influences.
Module on the other hand is a controlled environment, independent of the order
in which other modules have been evaluated. Certainly not true for files.

Brian Chen · Answer 93 · Fri Jun 11 2021 00:53:07 GMT+0800 (China Standard Time)

Those definitions make sense to me too. I think where we differ is that I think said lack of control is the problem. For example, how do you decide where external functions in an included file exist if that file can be included anywhere? This wouldn't be a problem if only "entrypoint" code (i.e. that which is run in the global namespace) was permitted to be uncontrolled, but currently "controlled" module code can freely include these uncontrolled files as well. Alternatively, files could be forced to declare all external dependencies, but that would be a larger breaking change.

Petr Krysl · Answer 94 · Fri Jun 11 2021 00:54:16 GMT+0800 (China Standard Time)

I would say that providing users with files is a bad idea in general. Provide users with modules!

Patrick Kidger · Answer 95 · Fri Jun 11 2021 00:55:27 GMT+0800 (China Standard Time)

Another +1 with @StefanKarpinski @ChrisRackauckas on import "file.jl".

Additionally +1 for keeping the module declaration implicit. Or rather, it's just the import statement that wraps a file into a module; it's not that every file is implicitly also a module.

Brian Chen · Answer 96 · Fri Jun 11 2021 01:05:51 GMT+0800 (China Standard Time)

I would say that providing users with files is a bad idea in general. Provide users with modules!

Here I'll invoke the "pit of success" argument. If most existing packages are using include and it takes more effort to use modules without include (i.e. creating locally scoped packages), is someone more likely to use modules or just include everywhere?

I would also add that sometimes you need to navigate through library code as well. Currently this requires a lot of backtracking up the include chain for somefile.jl for each item that is not explicitly imported. You could argue that library writers ought to be following conventions and grouping as many import declarations as possible in one place, but why not have the language help with that instead of expecting users to discover it themselves?

Petr Krysl · Answer 97 · Fri Jun 11 2021 01:46:37 GMT+0800 (China Standard Time)

Here I'll invoke the "pit of success" argument. If most existing packages are using include and it takes more effort to use modules without include (i.e. creating locally scoped packages), is someone more likely to use modules or just include everywhere?

I am sure I don't understand this: if I use a package, I don't care how many times or where it uses include!?

Brian Chen · Answer 98 · Fri Jun 11 2021 02:13:11 GMT+0800 (China Standard Time)

Let's say you're using a function bar from a package Foo and it throws an error (or gives an unexpected result). The stack trace looks like this:

...
[end-1] bar
[end]   foo

Just walking the stacktrace directly doesn't turn up anything, because the issue is in a function baz that bar calls before it calls foo. e.g:

function bar(x)
  res = baz(x)
  foo(res)
end

Okay, so where are all these functions defined?

- Foo.jl
- a.jl
  - bar()
- b.jl
  - baz()
- c.jl
  - foo()

(I've purposefully kept things flat, but in practice there would almost certainly be more nesting).
Following conventions, the library writer has probably set up the includes like this:

- Foo.jl
  - a.jl
    - b.jl
    - c.jl

Or like this:

- Foo.jl
  - b.jl
  - c.jl
  - a.jl

Great, so we just need to find where bar and then baz are defined! But without grepping the entire codebase, how do we do that? This is my point, one is forced to re-traverse from the top of the include DAG for every external reference that isn't provided by a child include. Not having any pointers back up the DAG means users are forced to reconstruct and hold the dependency graph in their heads with essentially zero help from the language.

Now, the above is not a hypothetical scenario, but something that happens in all languages. The question becomes how much weight you want to put on being able to navigate a (unfamiliar) codebase efficiently, and I'd argue that should be higher priority than saving some boilerplate.