License: see cc-noderiv-license.txt
polytaxis is a specification of a universal (non-filetype-specific) tag file header.
-
Many filetypes do not support metadata, and adding metadata to new filetypes requires extra developer effort and planning. A unified metadata specification should ease new development and increase the availability of file metadata.
-
Disparate metadata implementations decrease compatibility and increase maintenance effort for every piece of software that interacts with metadata.
-
Individual metadata formats tend to be domain specific. This impedes developing general metadata-based file management and organization solutions.
As a concrete example, there are many software packages that will catalogue audio files, and there are many software packages that will catalogue image files, but few if not none that catalogue both.
-
Individual metadata formats are arbitrarily limited: allowing only certain tags, not supporting multiple values per tag, etc.
-
Individual metadata formats are overengineered: XMP stores structured, typed metadata. This makes interpreting and categorizing the data difficult without knowlege of the usage of the file.
polytaxis supports key/value tags as well as valueless-tags. The polytaxis header is always utf-8 encoded, and textfiles with a polytaxis header must also be utf-8 encoded.
The header has two formats - sized and unsized. The sized header begins with an explicitly specified length and can have extra allocated space so that tags can be modified without shifting the core file data. The unsized header has no unprintable characters and ends with a delimiter. The unsized header is intended to be used with purely textual data and is not recommended to be used with large files.
Polytaxis is designed to be mostly non-binary to ease implementation and debugging, and aid data recovery if something goes wrong.
In the following description, \n
indicates the new-line character (0x0A
). \0
indicates a null character (0x00
).
The polytaxis header is always the first data in a tagged file. The header starts with
polytaxis00
(the 'magic') which specifies that the header is present and the version of the header.
For sized headers, the magic is suffixed with a space and followed by a zero-padded 10-digit decimal integer length, \n
, the encoded tags, and at least one \0
if the amount written to this point is less than the indicated length.
The length is the number of bytes in the encoded tags and following padding and does not include the bytes from the start of the header through the new line after the length specifier. The total size of the header is thus 23 plus the indicated length. The post-header data starts at an offset equal to that value.
An example with no padding:
polytaxis00 0000000076
author=Whole Wheat Pasta
date=1988-13-32
critical
meeting-2014
format=FormA
DOCX HEADER
An example with 1 byte of padding:
polytaxis00 0000000077
author=Whole Wheat Pasta
date=1988-13-32
critical
meeting-2014
format=FormA
\0DOCX HEADER
An example with 436 bytes of padding:
polytaxis00 0000000512
author=Whole Wheat Pasta
date=1988-13-32
critical
meeting-2014
format=FormA
\0<any data here, for 435 bytes (skipping a bit)...>DOCX HEADER
For unsized headers, the magic is suffixed with u
then followed by \n
, the encoded tags, and a marker (<<<<\n
) to indicate the end of the tags.
A complete example:
polytaxis00u
author=Rendaw
date=1988-07-04
backup
diary
<<<<
This is my diary.
Today I wrote another standard specification. Nobody likes my standards but I don't care. I will show them.
Tags are written as part\n
or part=part\n
, where a part
is utf-8 text with the special sequences \
, =
, \n
, and <<<<\n
escaped by \
.
- Don't use
>
,<
, or!=
in tag keys. - When tagging a filetype that does not natively support polytaxis, suffix the filename with
.p
(ex:my_house.jpg.p
).
-
This is a library with common methods for interacting with the polytaxis header and files containing the header.
It also contains the utility
ptmod
for modifying the polytaxis header on files. -
polytaxis-unwrap
provides a compatibility layer for software that doesn't support polytaxis headers through a fuse filesystem. -
polytaxis-monitor
provides a service to monitor directories for changes and index any tags found within. It also proviesptq
to query the index from the command line. -
polytaxis-adventure
is a browser for thepolytaxis-monitor
index, built with Qt. -
ptmod-gui
is a graphical editor for polytaxis tags. -
polytaxis-utils
provides the following utilities:polytaxis-import
- Translates tags from filetype-specific metadata fields to polytaxis headers.polytaxis-cleanup
- Performs common tagged file manipulations.unpt
- Translates paths for use withpolytaxis-unwrap
.
© Rendaw, Zarbosoft 2015