HebiRobotics / MFL

A Java library for reading and writing MATLAB's MAT File format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AbstractArray constructor ignores the global argument

nedtwigg opened this issue · comments

Yes, thank you for pointing out the bug. global is a keyword that defines variables as globally accessible.

https://mathworks.com/help/matlab/ref/global.html

Wow! Didn't realize MATLAB had this behavior. Happy to send a PR for this as well, once #22 gets sorted :)

Reading your link I'm realizing global probably only applies to root-level entries (which makes perfect sense). In that case I think global shouldn't be in the AbstractArray class at all, and should instead be treated like the name attribute.

What do you think about adding a global flag to NamedArray and maybe renaming the class
something semantically similar to MatlabVariable? It'd represent an array with a name and global flag that sits at the root level. This would match MATLAB semantics much better.

It's a good thought, but my gut is to leave it as-is. Here's a couple reasons:

  • if a misformed MAT-file happens to have the global flag set in a non-root array, what would you do? Ignore it? Throw an error?
  • looking at the MatFile interface, the only way to get the isGlobal() value is through the Iterable<NamedArray> getEntries(). That's okay, but it's a little awkward for people who care about the isGlobal() but want to access via map or index

It's not crazy to change it, but it does seem like a bit of an impedance mismatch with the file format. It's too bad that there's a mismatch between the underlying format and the semantic meaning, but there's too many MAT files out there to change it now ;-)

  • I'm ok with ignoring it. I believe that's what MATLAB is doing, and semantically it doesn't mean anything. For example, what would it mean if the 2nd variable of a cell array were global? There is no way to reference it without the cell array being global.

  • I assume that there are very few people who will ever care about this, so it probably doesn't need to have a particularly user friendly API. It'd be simple enough to iterate over the entries and create a globals map in user space. However, at this early stage I'm also not opposed to changing the interface to something that better matches (not sure what it'd be yet).

There are quite a few things that are in every array header that only get used in a subset of cases:

  • name: only used at the root level, otherwise empty
  • global flag: only used at the root level, otherwise ignored
  • logical flag: only used for numerical matrices, otherwise ignored
  • complex flag: only used for numerical matrices, otherwise ignored
  • number of non-zero-values: only used for sparse matrices, otherwise ignored

So far I've tried to treat the MAT-File format as an implementation detail, and tried to keep the public API close to the actual MATLAB behavior. The globals flag was an oversight on my part.

No MAT file generated by MATLAB should ever have the global flag set on a sub-element, so this would only come up when loading files created by 3rd party libraries that let users set invalid options.

I've committed the 1 line constructor fix. I'll keep the issue open for discussion. The more I think about it, the more I lean towards moving the global flag to the root level.

I'll create a PR for an API proposal.

One trick I'm doing right now is that I can treat a MatFile as a List<NamedArray>, and I can also do that for Struct. That way I can model the data as a tree of NamedArray, and I can fit everything into that tree, and the code doesn't have to specific to root or child entries of a file.

If NamedArray becomes RootArray with a boolean isGlobal(), then I can't stuff the entire hierarchy into it anymore, and I have to treat children of a scalar struct differently than children of a file. Or I could stuff everything into RootArray, but it's the same problem where isGlobal() is present everywhere, but only actually matters for root entries.

That's just my usecase though, and I'll be fine either way :)

I may be misunderstanding this, but is there really a difference between the two options? isGlobal() is still present everywhere and always false.

1) NamedArray

  • getName()
  • getValue().isGlobal() (always false)
  • getValue()

2) Variable

  • getName()
  • isGlobal() (always false)
  • getValue()

In Case 1 you could technically create an array that returns true, but IMO returning an incorrect value would be worse than returning a redundant / useless value.

Is there really a difference between the two options?

I don't think so, which is why I'm mildly in favor of the status quo :)

I meant for your use case ;)

I see benefits in following MATLAB semantics and making it harder for users to create invalid files, e.g., Mat5.newCell(1,1).set(0, myGlobalArray).

I'm also not particularly happy with the naming of NamedArray. Variable may also be a weird name for a class, but it could potentially clean this up a bit, i.e., a MatFile has a local or global Variable that consists of a name and an Array value.

for(Variable variable : matFile.getVariables(){ ... }

I'll just test around with the API a bit. If it doesn't make the API cleaner, I'm ok leaving it as is.

Please take a look at #27. I think it makes the API cleaner and somewhat more readable..

I wasn't sure whether to rename MatFile::getEntries(), MatFile::addArray , Mat5Writer::writeArray etc., so I left those the same for now.

Aside from doing a search-replace from NamedArray to Variable there shouldn't be any changes required to your existing code.

Did another iteration and changed it again to MatFile.Entry

I like the MatFile.Entry name, especially in the context of Mat5Writer.writeEntry. This will work well for my use case, and it definitely improves the public API semantics to better match the in-MATLAB experience. Good call :)

Merged in #27. Thanks!

I may do another intermediate 0.4 release, but believe this is getting very close to a releasable 1.0.