fsnotify / fsnotify

Cross-platform filesystem notifications for Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add fanotify support

purpleidea opened this issue · comments

commented

Would there be any objections if someone sent in patches to add support for fanotify?

That would be great.

The only thing is, how does someone using Linux choose between them? We may need something along the lines of #104 first.

commented

Just wanted to get the approximate ACK, so that I can point people here if they're interested in fanotify benefits. Thanks!

@purpleidea I think it would be best to build fanotify Go wrapper out as a separate repo and then look at integrating after that.

@amir73il is working on a "super block watch" for Linux, providing "the ability to set a single (fanotify) watch on a root directory and get notified on all the legacy inotify events without the need to recursively add watches on all directories." https://lkml.org/lkml/2016/12/20/312

This could avoid the need for a user-space recursive watcher (#16) on modern Linux kernels.

commented

@nathany Thanks for the info! Looking forward to @amir73il's patches!

Cheers

Well the patches are out there already in my github (applied to kernel 4.9), but for those of you hoping for this functionality to get upstream, I suggest to be patient.

I have no doubt it is going to be some time before this feature can be
merged to an official kernel.
My bet is that I will have to maintain it out of tree for a while, and only
after real users show genuine interest in the feature, it will be seriously
considered for upstream.

This is were you guys can be of help.
So far I had only one guy rooting for my patches on LKML
and he has also tested them on his system.

When promoting a feature for upstream it is important to bring solid use cases that require the feature and argue that the same cannot be achieved by user library code and existing kernel functionality.

However, if you can't test my work on a distro kernel then it is going to be harder to claim that it is beneficial for your use cases.
To solve this chicken and egg problem I plan to provide install-able
kernel modules for commonly used Linux distros, so using fanotify super block
should be as easy as e.g.: apt-get install fsntotify-tools.

I cannot guaranty when I will get to providing this level of installation though, so if there are any of you out there not afraid of building a custom kernel, I will gladly assist you if you want to test my patches.

Cheers.

Thanks Amir.

Perhaps another option to make the patched kernel available would be to maintain a Vagrant box built with Packer. That way we could test fanotify super block using a VirtualBox VM from any operating system.

Yes, that could work. And I promise to assist the person who volunteers to work on this setup.

commented

Amir, which kernel version you would like to target ?

commented

@amir73il I have pinged some kernel engineers at my company to look into your patch. In the meantime, if you have a moment, could you look into and recommend an algorithm or suggest an improvement to the recursive file watching which I've implemented for mgmt? The code is available here:
https://github.com/purpleidea/mgmt/blob/master/recwatch/recwatch.go#L134

Cheers!

@tiwaana question is moot. I would like to target the earliest kernel version possible, but since this is not a bug fix nor a trivial improvement, some things have to happen first not all of them depend on me, not necessarily in that order:

  1. Technical review of patches (I am working on getting that)
  2. Design review of patched (ditto)
  3. Review of the proposed kernel-user API
  4. Demonstrate a cut and clear benefit to Linux users community
  5. Demonstrate no performance regressions for users not using the feature

@purpleidea thanks for the ping. If your company will show interest in the super block watch, that can be a game changer. wrt your recursive watcher, I am new to golang and have zero knowledge about fsnotify library, but it appears your code is not calling addSubFolders() recursively from Init() more than 1 level of depth, so if you never get events on the direct sub folders you will never add watchers for level 2 subdirs, but I may be missing something. Also I don't see any handling of Move events for dirs, unless it is handled in lib by generating Rename/Create event pair.

commented

@amir73il is working on a "super block watch" for Linux, providing "the ability to set a single (fanotify) watch on a root directory and get notified on all the legacy inotify events without the need to recursively add watches on all directories." https://lkml.org/lkml/2016/12/20/312

You do know, that fanotify supports recursive watch on (any, even bind) mountpoint with FAN_MARK_MOUNT, right?

@isage focus on the part 'all the legacy inotify event', namely, create/move/delete.
an fanotify mount watch does not provide those events.

Linux fanotify added directory events (move/delete/etc) back in 2017:

https://lwn.net/Articles/717060/

@pabs3 I was not aware of any distro that picked up the patch you mentioned,
but actually, Linux did get fanotify directory events back in May.

@nathany, sorry I forgot to update you when the feature got merged upstream:
https://kernelnewbies.org/Linux_5.1#Improved_fanotify_for_better_file_system_monitorization

Man pages were already updated:
http://man7.org/linux/man-pages/man7/fanotify.7.html

On my github, you can find demo conversion of inotifywait tool to use fanotify super block watch instead of a recursive inotify watch:
https://github.com/amir73il/inotify-tools/commits/fanotify_dirent

Please note that at this time, the feature enables user to listen on ALL directory events in the filesystem and any sort of filtering by subtree would have to be implemented in user space.
Implementing subtree filter in kernel is on my roadmap, but cannot promise anything yet.

Let me know if you are interested in using fanotify and if you have any questions.

Other than requiring a newer Linux kernel, is there any disadvantage to using fanotify? Could we detect support for fanotify and fallback to inotify if not available?

Would two or more people be interested in building out a stand-alone fanotify module/package, either in a separate repository or a subfolder of fsnotify? Then we could look at integrating it into fsnotify after that.

To detect support just need to execute fanotify_init(FAN_REPORT_FID, 0).
If you do net get EINVAL you can use the feature.

The disadvatage compared to recursive inotify is that there is no subtree level filterting in the kernel.
When you set a watch by FAN_MARK_FILESYSTEM you get all events on filesystem and need to filter them by path prefix in userspace.

At the moment, directory modification events are NOT supported along with FAN_MARK_MOUNT due to Linux vfs implementation constrains.

@amir73il any update on this issue?

@s3rj1k which updates are you expecting?
There is no timeline or any guaranty that subtree filtering in-kernel will ever be available,
but that shouldn't matter - it's just an optimization.

The way I see it, the kernel code is ready and waiting for volunteers to implement the userspace recursive watcher. I even provided sample C code.

I forgot to mention in the answer to @nathany, that unlike inotify, fanotify requires SYS_CAP_ADMIN. Not sure if that is a problem for fsnotify.

@amir73il Hi, basic support for fanotify in fsnotify.
I actually need only the FS_MOUNT watcher, as it can be used for recursive watcher.

@s3rj1k you may think that (by bindmount?) you can trick fanotify into doing recursive watch by watching FAN_MARK_MOUNT, but it is not true.
All processes (e.g. shells) that have have already chdir into the directory tree will be holding a reference to the directory under the bind mount and changes they make to directory will not have been intercepted by the "recursive" watcher.
And regardless, implementing directory events on FAN_MARK_MOUNT is challenging technically.
So complicated and not usable in many cases == not worth the effort.

My point is that implementing recursive watch should be possible in userspace with FAN_MARK_FILESYSTEM support from kernel.
Every event report fid which can be used to decode its path and then userspace fsnotify daemon can determine if there is someone interested in events on this path or not.
This is the Mac OS X's fseventsd approach - it moves the complicated API to userspace daemon, which most users get their fs notifications from.

Perhaps what I described is outside the scope of fsnotify library, but the library could do a simple filter by path on its own as well - it will just be less efficient when there are many event listeners on the system.

@amir73il I'll just watch / and filterout by path.
One use case is for watching changes in containers (LXC, LXD, ...)

@s3rj1k great let me know if you run into issues - and I expect that you will, because I know the functionality provided is minimal, but in order to improve it I need userspace needs to drive kernel development.

For LXC/LXD it depends on the storage backend.
For Directory backend you simply need to filter by path.
You should have no issues with LVM backend.
ZFS may be fine, but I never tried, so may require fixes.
Btrfs: watching a btrfs subvolume is currently not supported, see EXDEV error in
http://man7.org/linux/man-pages/man2/fanotify_mark.2.html
It's a challenge to fix that, but if there is a requirement I can look into it.

Another functionality you may find missing compared to inotify is the filenames on create/delete/rename events.
I have patches for this functionality, but there is push back from upstream kernel on
these patches, so again, needs kernel feature needs to be driven from userspace needs.

@amir73il I actually plan on using it for Directory backend.
If you are interested related issue in LXD https://github.com/lxc/lxd/issues/6304
So this is actually pretty usable but in C :)

@s3rj1k Are you interested only in open/close events?
That is already available in fanotify for a long time regardless of my changes.

@amir73il Open/Close recursively, for antivirus on demand scanner.
Can current code do this recursively? If so, I'll try this tomorrow and report back.
Thanks for your quick support.

@s3rj1k I don't understand the question. Fanotify could always do that. If you are asking about fsnotify lib support the no, but you are talking to the wrong person

Hmm, @nathany ? any news on initial fanotify support?

I played a bit with fanotify and go, here is a working example
https://github.com/s3rj1k/go-fanotify

@amir73il are there any examples with newer fanotify functionality?
FAN_MOVED_FROM, FAN_MOVED_TO, ... and friends

@s3rj1k see link above to inotofy-tools global watch demo

@amir73il The latter example actually helps a lot for none-C people.
Manual states that FAN_MARK_MOUNT can't be used FAN_REPORT_FID.
Is there a way to achieve similar functionality for FAN_REPORT_FID?

@s3rj1k you will have to be more specific than "similar functionality".
What you have is FAN_MARK_FILESYSTEM.
If the example code I shared is not enough to understand what it can do, I'm afraid there is not much more that I can do to assist.

@amir73il yes. I've set FAN_MARK_FILESYSTEM.
I was talking about some sort of recursive watch as FAN_MARK_MOUNT does.
I assume this should work similar.
Still have troubles with writing go code :)

@nathany basic (pre 5.1 kernel) fanotify support can be added using golang.org/x/sys/unix
You can checkout https://github.com/s3rj1k/go-fanotify for example code.

Would be nice to have someone with more knowledge of C to port those new fanotify structures to golang.
Sadly I can get them to work :(

@amir73il You mentioned earlier

Another functionality you may find missing compared to inotify is the filenames on create/delete/rename events.
I have patches for this functionality, but there is push back from upstream kernel on
these patches, so again, needs kernel feature needs to be driven from userspace needs.

Can you make this patch public?
I assume this patch is for legacy fanotify interface that was discussed in https://lwn.net/Articles/717060/?

For the record, the remaining bits of fanotify filesystem watch have been merged to kernel v5.9:
https://kernelnewbies.org/Linux_5.9#Core_.28various.29

Man pages were updated for using modes like FAN_REPORT_DFID_NAME, which most closely resembles the inotify event information:
https://www.man7.org/linux/man-pages/man2/fanotify_init.2.html

Btrfs: watching a btrfs subvolume is currently not supported, see EXDEV error in
http://man7.org/linux/man-pages/man2/fanotify_mark.2.html
It's a challenge to fix that, but if there is a requirement I can look into it.

@amir73il I'd like to use fanotify inside a Docker container on a btrfs filesystem. Does this mean that wouldn't work?

I'm not that familiar with how LXC/LXD is implemented on top of btrfs and so don't know if a "btrfs subvolume" would always be used when running Docker on btrfs or only in certain setups...

Hmm. I guess the other issue is that using this API within Docker would require granting CAP_SYS_ADMIN, which is discouraged since it's an overloaded permission which grants access to many things. It's a shame it's not able to use some more granular permission.

Btrfs: watching a btrfs subvolume is currently not supported, see EXDEV error in
http://man7.org/linux/man-pages/man2/fanotify_mark.2.html
It's a challenge to fix that, but if there is a requirement I can look into it.

@amir73il I'd like to use fanotify inside a Docker container on a btrfs filesystem. Does this mean that wouldn't work?

I'm not that familiar with how LXC/LXD is implemented on top of btrfs and so don't know if a "btrfs subvolume" would always be used when running Docker on btrfs or only in certain setups...

Since kernel v6.8 commit 30ad1938326b ("fanotify: allow "weak" fsid when watching a single filesystem")
inode watches are allowed on btrfs subvolumes, so fsnotifywatch --fanotify --recursive will work,
but I am assuming that you wanted to use fsnotifywatch --filesystem?
that is currently not supported on btrfs subvolumes.

Hmm. I guess the other issue is that using this API within Docker would require granting CAP_SYS_ADMIN, which is discouraged since it's an overloaded permission which grants access to many things. It's a shame it's not able to use some more granular permission.

This problem is a bit easier to solve using idmapped mounts.
I have a relatively simple kernel patch to allow setting filesystem marks inside container without the need for global CAP_SYS_ADMIN:
https://github.com/amir73il/linux/commits/fanotify_userns/
The other side of the problem is that open_by_handle_at() requires global CAP_DAC_READ_SEARCH
so we need to make that userns aware as well.
None of this is very controversial I think, but I had other priorities and noone has made assertive requests for this.

@brauner do I remember anything that was holding this back?

Thanks for all the details @amir73il!

so fsnotifywatch --fanotify --recursive will work, but I am assuming that you wanted to use fsnotifywatch --filesystem?

I really just need to watch a specific directory. So before stumbling upon this thread, I would have said the former. However, your comment from a few years ago above seemed to suggest always using the latter (#114 (comment)), so I guess it depends on if your advice from back then still holds today.

Thanks for all the details @amir73il!

so fsnotifywatch --fanotify --recursive will work, but I am assuming that you wanted to use fsnotifywatch --filesystem?

I really just need to watch a specific directory. So before stumbling upon this thread, I would have said the former. However, your comment from a few years ago above seemed to suggest always using the latter (#114 (comment)), so I guess it depends on if your advice from back then still holds today.

@benmccann I am not sure if my comment is relevant to your use case.
If you want to watch a single directory within container you should be fine with just fsnotifywatch --fanotify


you should also be fine with inotifywatch as there is not that much different in this case
or did you mean that you want to watch a single directory and its recursively?
there are some advantages to watching --filesystem over --recursive, but mostly for very large directory trees
and as you noticed --filesystem does not work on btrfs subvols and currently does not work inside unpriv container, so thats not an options for you

I want to watch the directory recursively. Sorry for not making that clear.

I want to use fanotify both because the directory may be large and I'd like to avoid inotify limits and because I'd like to detect file moves and it appears that can be done in a reasonable way with fanotify whereas it is "inherently racy" with inotify (per the man pages)

ok, fsnotifywatch --fanotify --recursive will have similar limits, but following renamed files paths may be more reliable.
although please note that fsnotifywatch mostly uses fanotify as inotify drop-in replacement
for example the new FAN_RENAME event which replaces the disjoint FAN_MOVED_FROM/TO events is not watched
not sure if that practically matters to your use case - you will have to try and see

fsnotifywatch --fanotify --recursive will have similar limits

Just to be sure I understood correctly, similar limits to --filesystem meaning it does not work inside an unprivileged container? Though it sounds like --recursive does work with btrfs subvols unlike --filesystem (#114 (comment))

fsnotifywatch --fanotify --recursive will have similar limits

Just to be sure I understood correctly, similar limits to --filesystem meaning it does not work inside an unprivileged container? Though it sounds like --recursive does work with btrfs subvols unlike --filesystem (#114 (comment))

no I meant --fanotify --recursive have similar scaling limitations as --inotify --recursive
it does not have the limitations of --filesystem for working inside containers, but you need a very recent kernel v6.8 for --fanotify --recursive to work on btrsf subvol

Is any of this related to implementing a fanotify backend in the fsnotify library? I don't want to come off as too much of a curmudgeon, but this is not a generic fanotify discussion thread, and having tons of off-topic stuff rather detracts from the purpose of this issue.

Is any of this related to implementing a fanotify backend in the fsnotify library? I don't want to come off as too much of a curmudgeon, but this is not a generic fanotify discussion thread, and having tons of off-topic stuff rather detracts from the purpose of this issue.

the answer is maybe. your question is a bit broad for a yes or no answer.
the relation is that if fsnotify library abstraction was created to reflect the inotify semantics, then the abstractions may need to be enhanced to get the full benefits of fanotify filesystem watch, but if fsnotify library already know how to do filesystems watch with MacOS and Windows then it should be easier to implement a fanotify filesystem watch backend

Is any of this related to implementing a fanotify backend in the fsnotify library?

I think it'd probably be helpful when implementing an fanotify backend to at least document some of the limitations such as whether it works on Docker and in what scenarios. It could potentially impact what fanotify APIs are chosen to build on top of as well. In any case, I can take some of this discussion elsewhere. Sorry if this discussion felt like noise