Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte run extensions #5

Open
ajnelson opened this issue Sep 11, 2013 · 3 comments
Open

Byte run extensions #5

ajnelson opened this issue Sep 11, 2013 · 3 comments

Comments

@ajnelson
Copy link
Member

(This issue is related to: adding XML namespaces for The SleuthKit and NTFS.)

Presently, the DFXML practice has been to generate one byte_runs element per file, to denote the primary data location. I have encountered a few needs for some sort of cleaving of data from metadata while reporting both:

  • In the Registry, the location of the metadata and data for a cell are both important to know. Also, small values are embedded directly into the metadata. I know other file systems do this embedding, too (XFS, IIRC).
  • In NTFS, it is perfectly legal to have multiple $DATA attributes, several indices - but Fiwalk only reports one of each, dependent on whether it's a regular file or directory.
  • I've needed the byte address of some NTFS metadata before to check some strange results, but alas, Fiwalk doesn't report that. I needed to turn to the TSK command line tools, and saw how that could be made a little more convenient, even being worth codifying in the DFXML schema.

I think it'd be worth having multiple byte_runs elements for individual files. An attribute can distinguish their types. If we were to use the File System Forensic Analysis (FSFA) book's nomenclature, this attribute would be category, with values file system, content, metadata, file name, and application. I think it would be more practical to use an attribute facet (I'm open to a different attribute name), with these values:

  • data - the contents, as interpreted now. For regular files, this would be the file contents; for directories, the raw directory listing data. If one were using the DFXML Python library and called fileobject.contents(), this would be the byte_runs element dumped.
  • metadata - the byte addresses of all the file system metadata for this file. In most file systems, one of the byte_run elements would be the directory entry that references the file. In NTFS, this would be all of the resident and non-resident MFT entries. In POSIX file systems, this would be the inode address.
  • auxiliary - Extra expected data. For example, in NTFS, some special file system files contain data streams and index attributes.
  • other - something unexpected, like an NTFS directory with embedded file content (a $DATA attribute).

To illustrate what this would look like, consider this corner case: the NTFS security database, $Secure. (These values were drawn from the NPS "domexusers" image. Full istat output is viewable here.) This file has a regular $DATA attribute, and two indices.

<byte_runs facet="metadata">
...
</byte_runs>
<byte_runs facet="data" tsk:extract_arg="9-128-18" ntfs:attr_label="$SDS" ntfs:attr_type="$DATA">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-144-16" ntfs:attr_label="$SDH" ntfs:attr_type="$INDEX_ROOT">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-144-17" ntfs:attr_label="$SII" ntfs:attr_type="$INDEX_ROOT">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-160-19" ntfs:attr_label="$SDH" ntfs:attr_type="$INDEX_ALLOCATION">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-160-20" ntfs:attr_label="$SII" ntfs:attr_type="$INDEX_ALLOCATION">
...
</byte_runs>

Having multiple byte_runs makes three additional DFXML extensions look handy:

  • tsk:extract_arg, the triplet one would pass to icat to extract exactly this run's contents
  • ntfs:attr_type, the NTFS attribute-type name
  • ntfs:attr_label, the attribute name embedded in the MFT entry. (Discussions with somebody lead to some confusion when I initially suggested this be called ntfs:attr_name in keeping with the FSFA terminology. attr_label seems a bit safer from ambiguities.)

Does this look sufficiently useful to include in the base DFXML schema?

@ajnelson
Copy link
Member Author

It seems another facet should be the filename structure's address, since the two have independent allocation statuses. I incorrectly lumped filename and metadata structures together in the above proposal.

So, the wrinkle that introduces: Some file systems, like FAT, overload the same data structures as filename and metadata. It may be better to use Boolean flags instead of a "facet" attribute: is_data, is_metadata_struct, is_filename_struct, is_auxiliary, is_other. Thoughts? API grumbles?

ajnelson added a commit to ajnelson/dfxml_schema that referenced this issue Nov 23, 2013
I didn't realize that the default for an unspecified maxOccurs is 1,
not "unbounded."  This rendered the use of "##other"-namespace elements
limited to a clearly incorrect degree.

This patch clarifies all remaining min/max element occurrences, found
with:

    grep minOccurs dfxml.xsd | grep -v maxOccurs

Except, one is left unspecified: the upper bound on <byte_run> elements
is in a proposed revision here:
<dfxml-working-group#5>.

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist ajnelson-nist added this to the v1.2.0 milestone Nov 3, 2017
@ajnelson-nist ajnelson-nist removed this from the v1.2.0 milestone Nov 15, 2017
@ajnelson-nist
Copy link
Contributor

I'm removing this from v1.2.0 for now, because I'd like to do some testing with the @facet values. The Objects.py Python bindings support facets, but need a parser to generate facets aside from "data".

@ajnelson-nist
Copy link
Contributor

The objects.py program has provided this experimental interface for a while:

This interface was discussed in the 2019 article "Standardization of file recovery classification and authentication," and at least one producer is under proposal. So, this feature will merge into develop for the 1.3.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants