mitre-attack / attack-datasources

This content is analysis and research of the data sources currently listed in ATT&CK.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about prior art and specific mappings

chris-counteractive opened this issue Β· comments

Thank you for this! We love ATT&CK, but the data sources sections have always felt a bit "loose" and left mostly as an exercise for the reader. The blog series and this repo prompted a couple questions I hoped you could discuss:

  1. Why not use/extend an existing schema for the abstractions?

    For example, STIX Cyber-observable Objects (SCO) cover some of the same ground, and link nicely with STIX-formatted intel ... like ATT&CK itself. The spec for the objects and their relationships reads a bit like your yaml data sources, and they can be reified with real data. Seems like STIX SCO is a natural fit, plus it has a well-thought-out relationship model, serialization format, extensions, etc.

    The Elastic Common Schema (ECS) is great too - it's permissively licensed, available for collaboration on github, has abstractions for many of the examples you provide (users, processes, etc.), and is already powering a lot of searches, visualizations, and analytics. We see it more in ops contexts, and it's perhaps a bit more flexible than SCOs. For example, you see it frequently merged with existing event data so you get the benefit of the abstractions without sacrificing the specificity of the original event.

    One of the beautiful things about ATT&CK is it reduced bike-shedding over terminology and helped the infosec community focus - STIX and ECS have put a lot of similar work, seems good to stand on the shoulders of giants. Naming things is hard, and it takes time to overcome intuitions (even at the top level: e.g., to my ear the phrase "data source" connotes the place you get the data, rather than an abstraction of the observable, but I'm just one guy πŸ™‚).

    In any case, if ATT&CK leveraged one of these for the abstract entities, seems you could save energy for more ATT&CK-specific work like mapping those to (sub-)techniques or the actual concrete logs/artifacts.

  2. Are there plans to be more specific about mappings to artifacts?

    Presumably the idea is that (sub-)techniques would eventually use these new abstract data sources to replace or augment the text in the current "Data Sources" section. Unfortunately, unless I'm missing something, the proposed model doesn't seem to have a way to capture links to the concrete logs/artifacts.

    For example, the mapping example in figure 13 in part 2 of the blog series illustrates this last step:

    Data source mapping example

    That is, it shows links from the data components to specific event logs on the right, and that last step is really useful ... but it doesn't actually live anywhere in this repo's proposed approach. For many teams that last leg is the hard part! If we took your schema, for example, maybe added something like:

    - name: Service
      definition: Information about software programs that run in the background ...
      example_artifacts:
        - {os: windows, artifact: Security Audit Event 4688}
        - {os: windows, artifact: Sysmon Event 1}
        - {os: windows, artifact: Prefetch file}
        - {os: linux, artifact: auditd SYSCALL event}
        - {os: linux, artifact: auditd EXECVE event}
        # etc

    Perhaps this is considered out of scope, but hopefully not; it'd be great to see something as authoritative as ATT&CK pointing folks to specific useful artifacts rather than just the abstraction. I'd love to hear your thoughts.

Thanks again for your hard work on this and all the related projects, I look forward to learning more!

Hey @chris-counteractive!

Thanks for reaching out, we love to hear ATT&CK is helping you and how we further improve that! Addressing your specific questions:

1. As you saw in the blogs, we are still researching + collecting feedback/contributions on how a more refined model for data sources should be captured in ATT&CK. We have looked at existing STIX structures (and will continue to do so), but like you said in practice SCOs (though similar) are used a bit differently than what we are aiming for. Regarding ECS, I agree that's there's a wealth of existing resources that can be leveraged, but we are wary of adopting a model fit to a specific technology stack (vice something that everyone can more directly leverage). That said, we do invite contributions/lessons learned as well as for individuals to translate into whatever format works best for them!
2. Similar to the above comment, our objective is to document a model at the right level of abstraction that can be used by all. We definitely invite extensions/adoptions to make those specific mappings you mentioned, but as you know every technology/implementation/deployment may have a different schema so these mappings are perhaps more appropriate at a more tailored/personal level. I think this image from the first part of the blog captures it best.
image

Thanks again for reaching out, and definitely keep the contributions and ideas coming!

Thanks for the reply, @jcwilliamsATmitre, and for the clarifications! A few thoughts:

in practice SCOs (though similar) are used a bit differently than what we are aiming for

Of course, and that's totally your prerogative - I respect the thought, research, and effort you've put in. A normalization schema for information sharing is a different use-case, but to paraphrase Gene Kranz in Apollo 13, "we don't care what it was designed to do, we care about what it can do." πŸ˜€ They've put a lot of effort into the abstractions and tooling and are committed to keeping it up to date, with no effort on your part. Extending it to cover the differences might be easier than green-field development. Plus, if it's designed well, anyone with STIX-aware stuff could pull in ATT&CK integrations with lower effort (same for ECS).

It's actually the diagram in your reply that brought this all to mind:

data source, component, or relationship from this proposal example stix sco parallels
Process process type
Process Creation process type, created_time property
Process Network Connection process type, opened_connection_refs property
Process Created Process process type, parent_ref and/or child_ref properties
User Created Process process type, creator_user_ref property
User Executed Command process type, creator_user_ref and command_line properties
Process connected to IP process type, opened_connection_refs property, network-traffic type, dst_ref property

... and so on. Even those that aren't built into the model are flexibly captured with SROs. Since you'd only use the abstractions (the type definitions, basically cribbing their nomenclature), you probably wouldn't need any of the fiddly bits like identifiers. Anyway, I know it's not easy, but I could see a future where you accidentally re-create a subset of this while trying to meet your design goals.

I'm all for new models when there's clear improvement (e.g., ATT&CK itself, NIST CSF), and this might be a great case for one. I just like advocating for quality, open projects that might fit the bill, if only to avoid standards proliferation.

we are wary of adopting a model fit to a specific technology stack (vice something that everyone can more directly leverage)

Absolutely a sensible approach, I prefer open and platform agnostic as much as anyone. To clarify about ECS in particular, though, I think it meets that standard. It's maintained by Elastic and includes some tooling to ease interaction with elasticsearch, but it's more generally applicable:

  • it's released under the Apache 2.0 license same as most ATT&CK stuff (not the Elastic license)
  • it's a separate repo from the full elastic stack, beats, etc.
  • the schema docs (like this one for processes, for example) just describe event fields, and are only loosely coupled with elasticsearch itself ... easily used in other contexts
  • the schema docs are remarkably similar to the data source yaml examples in this repo

In both cases (STIX and ECS) having a "sponsor" org that's not MITRE can be a feature not a bug so long as everything's fork-able, and the non-commercial core of the elastic stack is open source too, reducing the risk of vendor lock-in for those who use the other tooling.

We definitely invite extensions/adoptions to make those specific mappings you mentioned ... these mappings are perhaps more appropriate at a more tailored/personal level

I noticed that diagram too, and it could be I'm just not reading it correctly. Wouldn't those community "extensions/adoptions" need to be able to "plug in" to the abstractions somehow? I suppose I saw that as a place for inbound PRs. If not, and I just maintain my own list somewhere, it seems like a lost opportunity for me to learn from how others are doing it. That also seems pretty close to the status quo, where teams continually re-discover various resources on this topic πŸ˜€ (shout out to malware archaeology).

I don't see a drawback to listing the most common concrete sources (as you show with windows security 4688s, sysmon event 1s, etc.), particularly if you qualify them somehow ("sample artifacts," "example logs," or similar), but I could certainly be wrong.

If there's a better place for this type of discussion, please let me know. Thanks again!

Put another way, I love the idea of being able to "officially" say something like:

"To detect $TACTIC, ATT&CK suggests using data that maps to the $TYPES STIX SCO object type(s). Examples of such data include $EXAMPLES."

like so:

"To detect T1534.003, ATT&CK suggests using data that maps to the process STIX SCO object type. Examples of such data include Windows Security events with ID 4688."

... which wouldn't require nearly as much heavy lifting on your part πŸ˜€

First off, this is a perfect venue for these discussions! We definitely want others to be able to track and build on great ideas so thanks again for sharing.

Interesting point on the STIX SCOs - admittedly I am nowhere near an expert on the nuances of the standard, but I'll comb through the links you sent as well as talk to our resident SMEs next week. I do know that we aim to keep our content compliant with the standard so there may be less wiggle room, but I'll report back with more specifics later.

Another interesting point about the specific mappings - like I said our objective for this repo was to build towards what would be documented within ATT&CK, but like you said that comes from raw evidence that could also benefit other users. We've seen some other projects (ex: https://github.com/OTRF/OSSEM-DM/blob/44618fa828d988837a069ef35830f67241293d53/docs/attack_ds_event_mappings.md) start to touch on those mappings but I do agree that we can work to aggregate this data - maybe in https://github.com/mitre-attack/attack-datasources/tree/main/sub_techniques_research_reference?

Thanks, @jcwilliamsATmitre, we appreciate the engagement!

I thought it'd be helpful to re-phrase my questions as an affirmative proposal, to make clearer side-by-side comparisons to your proposal. Consider the following alternative:

amendment: use STIX SCO types as abstractions for data supporting ATT&CK detections and analysis

  1. For each ATT&CK (sub-)technique, list abstract STIX Cyber-observable Object (SCO) types that support detecting that (sub-)technique. If new types are needed, create using STIX SCO idioms and contribute back to STIX. Store in each (sub-)technique.
  2. For each SCO type mapped to one or more (sub-)techniques, curate examples of concrete artifacts corresponding to the type. Include, for example, specific log events and forensic data -- things that commonly reify ("make real") these abstract types. Source these from the community after listing core examples. Store separately, to re-use across techniques.

illustrations

The structural template might look something like:

(sub-)technique $TECHNIQUE can be detected using data fitting SCO type(s) $TYPES (with constraint(s) $CONSTRAINTS) commonly found using concrete artifacts $ARTIFACTS.

A parallel version of figure 13 in part 2 might look something like:

(sub-)technique SCO types and constraints artifacts
maintained by ATT&CK maintained by STIX (core) and ATT&CK (extensions) maintained by ATT&CK, curated from community
T1543.003 Windows Service process type, windows-service-ext extension Sysmon EID 1
Security EID 4688
Security EID 4697
System EID 7045
windows-registry-key type, windows-registry-value type Security EID 4688
Sysmon EID 12
Security EID 4657

general comparison

  • Both offer consistent language in the "data sources" section of each (sub-)technique.
  • Both offer some reusability across platforms
  • Using STIX SCO types uses existing nomenclature, vs. having to run through the abstraction process described in the second post in the blog series. That is, less repeat work.
  • Both would have to expand vocabulary to cover certain areas (e.g., cloud, ICS). Less so with ECS, which has more cloud-related vocab, but still many gaps. This can be seen as a feature, as it would push STIX/ECS to expand where needed.
  • Any necessary extensions will be able to leverage idioms and templates from existing types.
  • Internally consistent. For example, the proposal blog series has a "Process" as a data source, but also a "Powershell Log" (see figure 9 in part 2). By using an existing abstraction language, we avoid the risk of conflating an abstract type ("Process") with an artifact type ("Powershell Log")
  • Consistent with the STIX formatting of the rest of ATT&CK.
  • Concrete STIX-formatted intel will already match.
  • Facilitates collaboration with other stakeholders (e.g., OASIS and their community, which already includes MITRE members)
  • Reduces new terminology and naming burden - anyone using STIX is already "on board" (e.g., a process is an observable type, a specific, concrete notepad.exe, PID 1234, started at 1245Z on 1 Jan 2020 is a observable, each with established properties and flexible relationship semantics -- we don't have to socialize new terminology like "data source," "data element," and "data component")

Many of these also apply to ECS.

use-cases

"how does this help me?"

Can help you answer questions like:

  • "We want to detect a specific (sub-)technique, what data should we work with?"
  • "We're investing in new capabilities, what artifacts will help me detect the most (sub-)techniques?"
  • "We gathered intelligence from a recent incident, what techniques might it relate to?"
  • ... etc.

If folks don't find this compelling, it'll be easier to close out this issue having made an actual proposal πŸ˜ƒ Thanks again!

Another related project we use a lot is the MITRE Cyber Analytics Repository (CAR) and its CARET tool. They also built a custom abstraction layer (their "data model") explicitly inspired by CyBOX (precursor to STIX SCOs), and mapped them to concrete artifacts ("sensors").

CAR enhances current-state ATT&CK by tying techniques to concrete data sources (via analytics and their data model), and demonstrates slick visualizations that can produce real insights. You can see at a glance what sensors give you the most bang for the buck with respect to analytic, technique, and group coverage.

It is, though, limited in that it:

  1. only includes sysmon and autoruns, leaving out most default logging and artifacts. they include osquery in the list of sensors, but it's not on the coverage map at all, meaning there's no coverage of anything non-windows.
  2. only goes to the log source level, rather than the event type or field level, so there's no help for practitioners looking to find the data for the abstract model.

It seems like it's still active, not sure if it'll be updated to today's new ATT&CK v8. Your colleagues there may have some lessons learned about why to use or not use an existing abstraction, vs. making a new one.

Also, regarding "compliance with the standard" with respect to STIX, I thought I'd note that ATT&CK does a nice job of extending STIX when needed using the x-mitre-object-name and x_field_name idioms, as documented here. That is, there's precedent for flexibility within the spec.

Sorry for another brain dump, but this took us down the rabbit hole! This is mostly for the benefit of those who use these other standards and tools and like to see how they interact. Thanks again!

Yeah we're been interacting/cross-pollinating with @ikiril01 and others from CAR. Still TBD but as you said there may be some interesting opportunities to explore!

And regarding STIX extensions, we looked into this before and it seems like a very reasonable route. I'll poke around into the SCOs idea and get back to you with more tangible specifics though.

@chris-counteractive some great thoughts and discussion. I actually helped design and build much of STIX 2.x's SCOs (and CybOX back in the day), so I can speak a bit from my perspectives here:

  1. I absolutely support re-using existing data models as much as possible. I think STIX SCOs have great applications in this regard, although the one thing they currently lack is any notion of actions on objects (e.g., process creates). This is mostly because SCOs were initially designed to support sightings ("here's a dump of some stuff I saw on my systems") vs. other applications like analytics. ECS/CAR/OSSEM models may also be viable options here.

  2. Mapping Techniques/Sub-techniques --> data model entities/constraints --> artifacts is IMO the way to go in order to make this actionable and answer the common question of "where do I get the data I need to detect on technique X?". I would even go a step further in terms of constraints and detail specific actions to look for, along with object properties that must be collected (e.g., process.integrity_level for detecting on UAC Bypass). Specifying this in STIX would likely be possible, albeit with a custom object as you mentioned.

  3. I also support the notion of binning/tagging data sources by use case, as this can help users understand which data sources are useful for different facets of their defensive cyber ops.

Regarding CAR, I've also been helping drive that work as of late, so I can add a few things:

  • One thing we've learned is that it's easier to use and map to a flattened data model, which is why the CAR data model is structured the way it is, with everything contained within a single object (i.e., no references).
  • We hope to add more non-Windows analytics in the near future, so sensors like OSQuery can actually be populated with things they detect.
  • When you say "only goes to the log source level, rather than the event type or field level" I'm assuming you're referring to not having mappings to the specific event ID/fields (e.g., Sysmon EID 1)?
  • We'll be integrating ATT&CK v8 in the near future, so stay tuned πŸ‘

I appreciate the perspective, @ikiril01, along with all your work on STIX and CybOX! I'm excited to hear CAR is moving along full-speed to v8 and to non-windows platforms, and yes, I was referring to CAR not reaching the event ID level ... if I'm mistaken, please let me know! We've gotten a lot of mileage out of the CARET tool with our clients, visualizing the pareto principle at work when it comes to sensors -- thanks so much for all your efforts.

As I thought through this, I needed to explore it in detail to sort through some of the edges, so I created a repo that rounds out this idea of using SCOs and custom objects: https://github.com/counteractive/scope

There's a lot of detail in that readme (if you're patient enough to read it! πŸ˜ƒ), but the TL;DR is the following:

  1. it proposes three custom SCOs, system (with extensions), session, and api, with arguments and numerical analysis to support.
  2. it proposes a new SDO, a STIX Type object (stix-type) to describe STIX types themselves, rather than instances of the type. To paraphrase the readme:
    The idea is to capture details about STIX object types as concrete STIX objects, because it's only STIX objects that can contain real data and be the target of references. This provides some cool benefits:
    • You can store data that applies to all objects of that type. For you programmers, think static members in C++/Java or prototype properties in javascript.
    • With a first-class stix-type object, we could add data source information under the external_references field for the stix-type object pertaining to each SCO (or to a custom evidence_locations field, or whatever).
    • Relationships (SROs) could then refer to types rather than just instances. There's some detail here that I think is relevant to @ikiril01's comment about "actions on objects".
    • You can transmit the STIX specification as STIX data. It's like having a compiler written in the language itself.

It provides some specific examples and a tool that converts yaml SCO descriptions to html consistent with the official STIX spec. Basically, the idea is to flesh this out concretely to see how it works with existing ATT&CK, to see if it actually holds water.

No feelings will be hurt if I'm missing the mark, but hopefully this ups the bar for rigorous discussion of alternatives. Thanks again!

Admin note: closing all remaining issues and pull requests prior to archiving the repository