Upstream DSR and connector related metadata from Fides
TheAndrewJackson opened this issue · comments
The fides project has extended the fideslang models to included extra metadata for processing DSRs. Most notably the FidesopsMeta
model. All of the changes made to fideslang models in the dataset.py should be upstreamed into the fideslang project. This is to standardize on what DSR metadata should look like.
Acceptance Criteria
- Upstreamed all of the DSR fields from fides into fideslang
- Update the Taxonomy model to not strip out extra fields that are included in JSON payloads
- Update
fidesops_meta
tofides_meta
@ThomasLaPiana I have a couple spec related questions:
- Should
fidesctl_meta
be deprecated and or removed?- I'm not sure if this is used at all. If not, now would be a great time to delete it.
- Does it make sense to rename
fidesops_meta
todsr_meta
instead offides_meta
? (@pattisdr I'm curious of what you think of this idea as well)- I was wondering if this might be a little more self documenting. From my undestanding
fidesops_meta
was used for DSRs so we could just skip straight to that.
- I was wondering if this might be a little more self documenting. From my undestanding
hey @TheAndrewJackson my vote would be for this to be as generic as possible and I wonder if dsr_meta
is too specific because the information contained there could also be used for data maps too.
In other words, some of the information in fidesops_meta
includes how one table is linked to another. We use that to build a graph when we're running DSR's, to know that we need to visit table A and table B before we visit table C, but those relationships also describe the existing system in a sense.
But I'm pretty new to the -ctl side of things, so @ThomasLaPiana can better weigh in on whether fidesops_meta information can be shared between -ops and -ctl features.
@pattisdr That makes sense! Thanks for the input. I think fides_meta
makes more sense if it contains other info in addition what's related to DSRs.
Yep, I think fides_meta
is a good catch-all! With the projects merged, it is much easier (neigh, impossible?) for us to accidentally overwrite each other's keys in that field, so I don't think we need to have separate _meta
fields
also @SteveDMurphy can confirm whether or not we use fidesctl_meta
, I believe we do for some system scanning and maybe classification?
Hey @pattisdr I've re-assigned this from @TheAndrewJackson. It looks like some progress has already been made — when fidesplus
is released we should do a quickhandover :).
also @SteveDMurphy can confirm whether or not we use
fidesctl_meta
, I believe we do for some system scanning and maybe classification?
oof, sorry I missed this! But yes the fidesctl_meta
can currently be used by the scan system aws
functionality. I believe there is also a generic meta
field used that may be general purpose enough for our uses but open to other ideas as well 👍🏽
To simplify matters, we could keep our focus on combining just Dataset concepts here. scan system aws
functionality writes to Systems > fidesctl_meta but not Datasets > fidesctl_meta. And Fidesops never did anything related to Systems, so we don't necessarily need to touch Systems just yet.
So focusing on Datasets, we could combine fidesctl_meta
from fideslang and fidesops_meta
from fidesops into one field:
- Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.
- Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
- Downstream in fides, rename
ctl_datasets.fidesctl_meta
column tofides_meta
I think we should leave the generic meta
field alone for customer use, per this docstring:
"An optional object that provides additional information about the Dataset. You can structure the object however you like. It can be a simple set of
key: value
properties or a deeply nested hierarchy of objects. How you use the object is up to you: Fides ignores it."
To simplify matters, we could keep our focus on combining just Dataset concepts here.
scan system aws
functionality writes to Systems > fidesctl_meta but not Datasets > fidesctl_meta. And Fidesops never did anything related to Systems, so we don't necessarily need to touch Systems just yet.So focusing on Datasets, we could combine
fidesctl_meta
from fideslang andfidesops_meta
from fidesops into one field:
Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.
- Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
Downstream in fides, rename
ctl_datasets.fidesctl_meta
column tofides_meta
I think we should leave the generic
meta
field alone for customer use, per this docstring:"An optional object that provides additional information about the Dataset. You can structure the object however you like. It can be a simple set of
key: value
properties or a deeply nested hierarchy of objects. How you use the object is up to you: Fides ignores it."
Huge agree here!
- Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.
- Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
- Downstream in fides, rename
ctl_datasets.fidesctl_meta
column tofides_meta
For the fideslang change only, would it make sense to introduce fides_meta
across System
and Organization
(and anywhere else we need) as well so we can plan to deprecate the use of both fidesctl_meta
and fidesops_meta
without needing updates in both repo's again?
Looking at the code, maybe we can directly add meta
and fides_meta
to the FidesBase
class? That way we don't have to worry about this in the future, and I don't think it hurts for every object to have it? (every top-level object at least)
Quick backwards compat question: is it possible to make this field work if it's called fidesops_meta
or fides_meta
? I'd like to stay compatible with existing datasets, especially all the SaaS connector ones.
hey @NevilleS that's the very thing I'm mulling over now! What I'm considering is a validator that takes in fidesops_meta if it exists and replaces fides_meta with that value, and perhaps adds a deprecation warning there.
We also could move it over as fidesops_meta too, that is simpler.
Looking at the code, maybe we can directly add meta and fides_meta to the FidesBase class? That way we don't have to worry about this in the future, and I don't think it hurts for every object to have it? (every top-level object at least)
I think scope is starting to creep here, the PR's are getting large, so I'd prefer to keep this specific issue constrained to datasets. Happy to ticket a followup to address this if that's alright.
- My current work is renaming fidesctl_meta to be fides_meta on the fideslang Dataset model and also doing a data migration downstream in fides
ctl_datasets
to match. - Adding this same thing to Systems, Organizations would mean matching migrations downstream, and adding updates to the systems-related code that was writing to the fidesctl_meta field.
@pattisdr I can't argue with keeping a small scope 🙂 thanks for calling out the creep!
Agreed we should focus on datasets here
Great, reticketed here, we could do this last: https://github.com/ethyca/fideslang/issues/96