Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Dependency Graph and Task Config for each Task #1302

Open
HolzmanoLagrene opened this issue Jul 12, 2023 · 6 comments
Open

Get Dependency Graph and Task Config for each Task #1302

HolzmanoLagrene opened this issue Jul 12, 2023 · 6 comments

Comments

@HolzmanoLagrene
Copy link
Contributor

HolzmanoLagrene commented Jul 12, 2023

I have two feature requests for the API

Allow to fetch a graph for how the Jobs and Evidences are connected.

At the moment i have no clue what kind of Jobs actually will get started if I process an evidence with a set of Jobs. It would be really cool to be able to know beforehand what Jobs will get triggered based on the Evidence-Type and the initial Jobs selected.

For now i do this with some sort of twisted reflection and create a graph to get an impression of what is going to happen:

from turbinia.jobs.interface import TurbiniaJob
from turbinia.workers import TurbiniaTask
result = {}
for subclass in TurbiniaJob.__subclasses__():
    evidence_in = []
    evidence_out = []
    tasks = {}
    for name, class_ in inspect.getmembers(inspect.getmodule(subclass), inspect.isclass):
        class_hierarchy = inspect.getmro(class_)
        if TurbiniaJob in class_hierarchy:
            evidence_in += [a.__name__ for a in class_.evidence_input]
            evidence_out += [a.__name__ for a in class_.evidence_output]
        elif TurbiniaTask in class_hierarchy:
            tasks[class_.__name__] = class_.TASK_CONFIG
    result[subclass.__name__] = {"evidence_in": evidence_in, "evidence_out": evidence_out, "tasks": tasks}
n = pp.Network(directed=True)
for jobname, data in result.items():
    n.add_node(jobname, type="job")
    for ev_out in data["evidence_out"]:
        n.add_node(ev_out, type="evidence")
        n.add_edge(jobname, ev_out)
    for ev_in in data["evidence_in"]:
        n.add_node(ev_in, type="evidence")
        n.add_edge(ev_in, jobname)
    for taskname, config in data["tasks"].items():
        n.add_node(taskname, type="task", config=config)
        n.add_edge(jobname, taskname)

A visiual representation looks something like this:
image

Allow to fetch the Task Config for each Task

As it is possible for each evidence type to fetch the needed and possible parameters it would be amazing to be able to fetch the possible task parameters for each Task.

@aarontp
Copy link
Member

aarontp commented Jul 24, 2023

Looks like an interesting script! Just in case you hadn't seen it, we have something similar in https://github.com/google/turbinia/blob/master/tools/turbinia_job_graph.py

Is this something that would be helpful to be in the API server, or is having the job graph script enough, or are there things we could add to that rather than adding it into the API server?

Another somewhat related feature request is to get this same graph for a given request after it has completed which would require tracking the same flow to understand more easily how things were processed.

Regarding getting the task config for each task given the evidence type: I think the part that is missing in order to do that is the Job -> Task mapping, which is currently done in each Jobs create_tasks method, so it doesn't have a static mapping for the Task types. We could potentially add another attribute similar to evidence_input and evidence_output though, and potentially even refactor out most of the create_tasks methods altogether. That being said, it should be easy to enumerate all tasks and their task config variables if that would be useful.

@HolzmanoLagrene
Copy link
Contributor Author

Yes the possibility to get a graph of the Jobs that are going to be run would indeed be very interesting to me. Maybe it'll help if i describe the intended use case:

My idea is to get the Jobs that could possibly be run based on the Evidence-Type. As Jobs trigger other Jobs based on their Output-Types it is not always clear from the beginning what can be done in the first place. E.g. if I want to search to run a Grep-Job, this can only be done if a PlasoFile-Output is generated. This type however is only created if i run the Plaso-Job in the first place...If I know beforehand what Jobs will be run, I can provide them with the appropriate parameters to do what I want.

To do this, getting a graph that shows me the dependencies between Evidence-Types, Jobs and Tasks is the first step. The second step would be to get the parameters for each Task.

So in a nutshell I would love to have accessible through the API:

  • A graph representation of the dependencies preferably as json
  • The possibility to get the Task-Config for each Task

How should we proceed regarding both ideas?

@aarontp
Copy link
Member

aarontp commented Sep 7, 2023

Would a static but regenerate-able representation of this data be OK instead of putting this into the API server? Ie. if we were to update https://github.com/google/turbinia/blob/master/tools/turbinia_job_graph.py to include .json output and the task configs, would that be good enough to meet your needs for this?

@HolzmanoLagrene
Copy link
Contributor Author

It would be more than I was hoping for 🎉☺️

@HolzmanoLagrene
Copy link
Contributor Author

How is the status on this? Did anyone have the time to look into this yet?

@aarontp
Copy link
Member

aarontp commented Feb 27, 2024

@HolzmanoLagrene Sorry, I haven't gotten a chance to do that yet, but I'll try to carve out some time sometime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants