Byte run extensions #5

ajnelson · 2013-09-11T21:07:24Z

(This issue is related to: adding XML namespaces for The SleuthKit and NTFS.)

Presently, the DFXML practice has been to generate one byte_runs element per file, to denote the primary data location. I have encountered a few needs for some sort of cleaving of data from metadata while reporting both:

In the Registry, the location of the metadata and data for a cell are both important to know. Also, small values are embedded directly into the metadata. I know other file systems do this embedding, too (XFS, IIRC).
In NTFS, it is perfectly legal to have multiple $DATA attributes, several indices - but Fiwalk only reports one of each, dependent on whether it's a regular file or directory.
I've needed the byte address of some NTFS metadata before to check some strange results, but alas, Fiwalk doesn't report that. I needed to turn to the TSK command line tools, and saw how that could be made a little more convenient, even being worth codifying in the DFXML schema.

I think it'd be worth having multiple byte_runs elements for individual files. An attribute can distinguish their types. If we were to use the File System Forensic Analysis (FSFA) book's nomenclature, this attribute would be category, with values file system, content, metadata, file name, and application. I think it would be more practical to use an attribute facet (I'm open to a different attribute name), with these values:

data - the contents, as interpreted now. For regular files, this would be the file contents; for directories, the raw directory listing data. If one were using the DFXML Python library and called fileobject.contents(), this would be the byte_runs element dumped.
metadata - the byte addresses of all the file system metadata for this file. In most file systems, one of the byte_run elements would be the directory entry that references the file. In NTFS, this would be all of the resident and non-resident MFT entries. In POSIX file systems, this would be the inode address.
auxiliary - Extra expected data. For example, in NTFS, some special file system files contain data streams and index attributes.
other - something unexpected, like an NTFS directory with embedded file content (a $DATA attribute).

To illustrate what this would look like, consider this corner case: the NTFS security database, $Secure. (These values were drawn from the NPS "domexusers" image. Full istat output is viewable here.) This file has a regular $DATA attribute, and two indices.

<byte_runs facet="metadata">
...
</byte_runs>
<byte_runs facet="data" tsk:extract_arg="9-128-18" ntfs:attr_label="$SDS" ntfs:attr_type="$DATA">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-144-16" ntfs:attr_label="$SDH" ntfs:attr_type="$INDEX_ROOT">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-144-17" ntfs:attr_label="$SII" ntfs:attr_type="$INDEX_ROOT">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-160-19" ntfs:attr_label="$SDH" ntfs:attr_type="$INDEX_ALLOCATION">
...
</byte_runs>
<byte_runs facet="auxiliary" tsk:extract_arg="9-160-20" ntfs:attr_label="$SII" ntfs:attr_type="$INDEX_ALLOCATION">
...
</byte_runs>

Having multiple byte_runs makes three additional DFXML extensions look handy:

tsk:extract_arg, the triplet one would pass to icat to extract exactly this run's contents
ntfs:attr_type, the NTFS attribute-type name
ntfs:attr_label, the attribute name embedded in the MFT entry. (Discussions with somebody lead to some confusion when I initially suggested this be called ntfs:attr_name in keeping with the FSFA terminology. attr_label seems a bit safer from ambiguities.)

Does this look sufficiently useful to include in the base DFXML schema?

The text was updated successfully, but these errors were encountered:

ajnelson · 2013-10-11T20:32:48Z

It seems another facet should be the filename structure's address, since the two have independent allocation statuses. I incorrectly lumped filename and metadata structures together in the above proposal.

So, the wrinkle that introduces: Some file systems, like FAT, overload the same data structures as filename and metadata. It may be better to use Boolean flags instead of a "facet" attribute: is_data, is_metadata_struct, is_filename_struct, is_auxiliary, is_other. Thoughts? API grumbles?

I didn't realize that the default for an unspecified maxOccurs is 1, not "unbounded." This rendered the use of "##other"-namespace elements limited to a clearly incorrect degree. This patch clarifies all remaining min/max element occurrences, found with: grep minOccurs dfxml.xsd | grep -v maxOccurs Except, one is left unspecified: the upper bound on <byte_run> elements is in a proposed revision here: <dfxml-working-group#5>. Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist · 2017-11-15T16:33:16Z

I'm removing this from v1.2.0 for now, because I'd like to do some testing with the @facet values. The Objects.py Python bindings support facets, but need a parser to generate facets aside from "data".

ajnelson-nist · 2022-08-05T13:21:31Z

The objects.py program has provided this experimental interface for a while:

Property ByteRuns.facet, with expected string values inode, name, data. Absent a value, data is assumed.
Property FileObject.data_brs, a backwards-compatible alias for FileObject.byte_runs.
New properties FileObjects.inode_brs and FileObjects.name_brs.

This interface was discussed in the 2019 article "Standardization of file recovery classification and authentication," and at least one producer is under proposal. So, this feature will merge into develop for the 1.3.0 release.

ajnelson mentioned this issue Mar 15, 2014

NTFS extra timestamps #16

Open

ajnelson-nist mentioned this issue Jul 5, 2017

Reducing duplicate data by condensing hard-linked files #27

Open

ajnelson-nist added this to the v1.2.0 milestone Nov 3, 2017

ajnelson-nist removed this from the v1.2.0 milestone Nov 15, 2017

sheldoug mentioned this issue Aug 5, 2022

Fiwalk: Expose allocation and addressing for inodes and dirents sleuthkit/sleuthkit#2739

Closed

ajnelson-nist mentioned this issue Aug 5, 2022

Fiwalk: Expose allocation and addressing for inodes and dirents sleuthkit/sleuthkit#2740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte run extensions #5

Byte run extensions #5

ajnelson commented Sep 11, 2013

ajnelson commented Oct 11, 2013

ajnelson-nist commented Nov 15, 2017

ajnelson-nist commented Aug 5, 2022

Byte run extensions #5

Byte run extensions #5

Comments

ajnelson commented Sep 11, 2013

ajnelson commented Oct 11, 2013

ajnelson-nist commented Nov 15, 2017

ajnelson-nist commented Aug 5, 2022