izar / pytm

A Pythonic framework for threat modeling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataFlow object attributes need work (or moves)

colesmj opened this issue · comments

Many of the attributes defined for the DataFlow object belong elsewhere:

Correctly assigned:

  • source = varElement(None, required=True)
    
  • sink = varElement(None, required=True)
    
  • order = varInt(-1, doc="Number of this data flow in the threat model")
    
  • note = varString("")
    

Maybe correct:

  • isResponse = varBool(False, doc="Is a response to another data flow") --> Is this a dup with `responseTo`?
    
  • response = varElement(None, doc="Another data flow that is a response to this one") --> If this is non-empty, is either `isResponse` or `responseTo` needed (since it would be detectable as True if non-empty)?
    
  • responseTo = varElement(None, doc="Is a response to this data flow") --> Is this a dup with `isResponse`?
    
  • data = varData([], doc="Default type of data in incoming data flows") --> Does this represent the data sent by the source, or returned by the sink? I think this highlights a challenge in setting data to the flow and not associating the connection to the source as sender of data, and server is replier of data.
    

Should be a property of the Source:

  • srcPort = varInt(-1, doc="Source TCP port")
    
  • isEncrypted = varBool(False, doc="Is the data encrypted") --> Clarification needed - is this data encryption independent of the protocol?
    
  • authenticatesDestination = varBool(False, doc="""Verifies the identity of the destination,
    
  • for example by verifying the authenticity of a digital certificate.""")
  • checksDestinationRevocation = varBool(False, doc="""Correctly checks the revocation status
    
  • of credentials used to authenticate the destination""")

Should be a property of the Sink:

  • usesSessionTokens = varBool(False)
    
  • authorizesSource = varBool(False)
    
  • usesLatestTLSversion = varBool(False) --> This will become out of date (TLS 1.2 to TLS 1.3 to whatever is next), and TLS is not the only option for secure protocol, so maybe this should be a list
    
  • implementsAuthenticationScheme = varBool(False)
    
  • authenticatedWith = varBool(False)
    
  • protocol = varString("", doc="Protocol used in this data flow") --> With this list, it is possible to check for `usesLatestTLSversion` state
    
  • dstPort = varInt(-1, doc="Destination TCP port")
    

Should be a property of either source or sink:

  • usesVPN = varBool(False) --> DataFlows are associated with a source (the initiator) and a sink (the target). It is either the source or sink that determines if a VPN is in use. The source may use one, or the sink may do so, or both, but the DataFlow would if anything inherit the state of this flag based on `source.usesVPN or sink.usesVPN`. 
    
  • implementsCommunicationProtocol = varBool(False) --> Sink always determines the protocol to be used by the source, but this may also apply to the source's comm stack
    

Just another related note on source in a DataFlow - that is also the "initiator" (probably) of a comm flow, which another important attribute to have available.

isResponse is related to responseTo. First one is a boolean and second one is a reference to another dataflow. Having both is supposed to make creating simple models easier as you don't need to point to a specific response if it's unambiguous. Automatically assigning this reference is used when drawing the DFD and can be used in threat conditions.

response is similar to responseTo, it's the other end of the relation.

Since we mark dataflows as either requests or responses data should be unambiguous already.

Sources already have a port but its treated as a default value for any dataflow coming out of it if the dataflow doesn't specify srcPort. Remember that there can be multiple dataflows coming out of a single element and all could have different ports. Same applies to destination ports and I guess it's better to start with them, since most often source ports are chosen randomly.

isEncrypted on the dataflow means the whole connection, not just data. I agree it should be stated clearly but also it's a good idea to add this to the (new) Data class.

authenticatesDestination and checksDestinationRevocation are also already defined at sources and these are defaults for all outgoing dataflows. A single process can create connections to different services with totally different properties.

Other sink attributes you mentioned also are treated as defaults. As for usesVPN - I think this should be represented as other boundaries and more dataflows. I have no idea what implementsCommunicationProtocol represents.

A minimum example:

client = Actor("client")
server = Server("server")

Dataflow(client, server, "request")
Dataflow(server, client, "response", isResponse=True)

a more verbose one:

client = Actor("client")
server = Server("server")

req = Dataflow(client, server, "request")
resp = Dataflow(server, client, "response", responseTo=req)

where the response property in the request will be assigned automatically.

Both examples will produce exactly same output.

See Issue #35 for how implementsCommunicationProtocol is used (incorrectly) now...

@nineinchnick Thanks for your feedback. Perhaps the response related items are properly used in DataFlow, but my other suggested changes I think are still valid, for the reasons I called out; and that the properties (such as authenticatesSource, except the usesVPN - I agree it is weird) is important to know on the server as an object, and would be independent of dataflow for any given port/process, but I'm willing to be convinced otherwise.

I agree about usesLatestTLSversion, it should be converted to a min required TLS version attribute of an Element and Dataflows should define TLS version used and we should check this similar to data classification level in https://github.com/izar/pytm/blob/master/pytm/pytm.py#L1269

authorizesSource is already in every asset like server and process, we should just remove it from the dataflow.

Question on removing existing attrs: if a model attempts to set a non-existent attr, is the behavior to add it to the instance of the object?
If so, then the rules that rely on DataFlow.usesLatestTLSversion should be changed to not use it, unless the matching (which is pair-wise) is not granular enough (i.e. if a pair includes source and dataflow, and dataflow and sink, then adding the attr (due to a legacy model) would result in a bad finding potentially, or dup at least).

I was thinking of detecting when someone sets non-existing attributes to detect typos. This could be optional, similar to how we already detect duplicate dataflows.

Anyway, removals are always breaking changes so we need to tread carefully here. We haven't yet adopted semver officially but this would go into 2.0 if at all.

This may become a problem moving forward, imho. I can foresee (or would like to move us in the direction of) supporting the addition of attributes that do not yet have detection in rules. This could be metadata or documentation within a model, or attrs for which we don't yet have threats, or that threat definition may vary by consumer of a single model (the creator of a model is not always the consumer of the model).
Will want to discuss this more later.

So interesting - there is no rule that looks for implementsCommunicationProtocol, so this is now an attribute without a use. But should we remove it? Perhaps. Or leave it for a future use case.

I'm trying to address the tls thingy here: #123 What's left in this issue?

I also suggest removing authenticatedWith in #119

There's no doubt that there is a lot of space for a semantic argument here of where any given attribute "belongs". I want to add to this discussion the dimension of "belongs according to whom?" .
Nowadays, security experts and developers still see things differently. When adopting a strictly OO approach, it may make sense to move some attributes around. But consider, for example, usesVPN. Yes, the dataflow would inherit it from the source/sink. But then the modeler has the added work of making it true for both source and sink (does it make sense to have it true on only one side? that to me translates to ambiguity). On the other hand, in my mind it makes perfect sense for a given Dataflow to yell "hey, I need to go over a VPN!" (to be exact, i have been thinking about VPCs, but potatos. It is a requirement of the dataflow, not a "favor" done to it by the sink/source.

For these reasons pytm should provide an unambiguous set of attrs and be easily extendable. Still, this is fuzzy, so just as few as possible. And if we don't have threats or docs for an attr it should not be included.

We also already have isEncrypted which should be true if the dataflow goes over VPN. Encapsulation is just more assets and dataflows.