-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte run extensions #5
Comments
It seems another facet should be the filename structure's address, since the two have independent allocation statuses. I incorrectly lumped filename and metadata structures together in the above proposal. So, the wrinkle that introduces: Some file systems, like FAT, overload the same data structures as filename and metadata. It may be better to use Boolean flags instead of a "facet" attribute: |
I didn't realize that the default for an unspecified maxOccurs is 1, not "unbounded." This rendered the use of "##other"-namespace elements limited to a clearly incorrect degree. This patch clarifies all remaining min/max element occurrences, found with: grep minOccurs dfxml.xsd | grep -v maxOccurs Except, one is left unspecified: the upper bound on <byte_run> elements is in a proposed revision here: <dfxml-working-group#5>. Signed-off-by: Alex Nelson <[email protected]>
I'm removing this from v1.2.0 for now, because I'd like to do some testing with the |
The
This interface was discussed in the 2019 article "Standardization of file recovery classification and authentication," and at least one producer is under proposal. So, this feature will merge into |
(This issue is related to: adding XML namespaces for The SleuthKit and NTFS.)
Presently, the DFXML practice has been to generate one
byte_runs
element per file, to denote the primary data location. I have encountered a few needs for some sort of cleaving of data from metadata while reporting both:I think it'd be worth having multiple
byte_runs
elements for individual files. An attribute can distinguish their types. If we were to use the File System Forensic Analysis (FSFA) book's nomenclature, this attribute would becategory
, with valuesfile system
,content
,metadata
,file name
, andapplication
. I think it would be more practical to use an attributefacet
(I'm open to a different attribute name), with these values:data
- the contents, as interpreted now. For regular files, this would be the file contents; for directories, the raw directory listing data. If one were using the DFXML Python library and calledfileobject.contents()
, this would be the byte_runs element dumped.metadata
- the byte addresses of all the file system metadata for this file. In most file systems, one of thebyte_run
elements would be the directory entry that references the file. In NTFS, this would be all of the resident and non-resident MFT entries. In POSIX file systems, this would be the inode address.auxiliary
- Extra expected data. For example, in NTFS, some special file system files contain data streams and index attributes.other
- something unexpected, like an NTFS directory with embedded file content (a$DATA
attribute).To illustrate what this would look like, consider this corner case: the NTFS security database,
$Secure
. (These values were drawn from the NPS "domexusers" image. Fullistat
output is viewable here.) This file has a regular$DATA
attribute, and two indices.Having multiple byte_runs makes three additional DFXML extensions look handy:
tsk:extract_arg
, the triplet one would pass toicat
to extract exactly this run's contentsntfs:attr_type
, the NTFS attribute-type namentfs:attr_label
, the attribute name embedded in the MFT entry. (Discussions with somebody lead to some confusion when I initially suggested this be calledntfs:attr_name
in keeping with the FSFA terminology.attr_label
seems a bit safer from ambiguities.)Does this look sufficiently useful to include in the base DFXML schema?
The text was updated successfully, but these errors were encountered: