Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…into CADC-12561
  • Loading branch information
Adrian Damian authored and Adrian Damian committed Feb 9, 2024
2 parents 3fc90a6 + 71664dc commit 701e01f
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 23 deletions.
2 changes: 1 addition & 1 deletion cadc-inventory/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sourceCompatibility = 1.8

group = 'org.opencadc'

version = '0.9.4'
version = '0.10.0'

description = 'OpenCADC Storage Inventory core library'
def git_url = 'https://github.com/opencadc/storage-inventory'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -384,15 +384,15 @@ public static void assertValidPathComponent(Class caller, String name, String te
boolean slash = (test.indexOf('/') >= 0);
boolean escape = (test.indexOf('\\') >= 0);
boolean percent = (test.indexOf('%') >= 0);
boolean colon = (test.indexOf(":") >= 0);
boolean colon = (test.indexOf(':') >= 0);
boolean semic = (test.indexOf(';') >= 0);
boolean amp = (test.indexOf('&') >= 0);
boolean dollar = (test.indexOf('$') >= 0);
boolean question = (test.indexOf('?') >= 0);
boolean sqopen = (test.indexOf('[') >= 0);
boolean sqclose = (test.indexOf(']') >= 0);

if (space || slash || escape || percent || semic || amp || dollar || question || sqopen || sqclose) {
if (space || slash || escape || percent || colon || semic || amp || dollar || question || sqopen || sqclose) {
String s = "invalid ";
if (caller != null) {
s += caller.getSimpleName() + ".";
Expand Down
44 changes: 24 additions & 20 deletions vault-quota/Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ The definitive source of content-length (file size) of a DataNode comes from the
In the case of a `vault` service co-located with a single storage site (`minoc`),
the new Artifact is visible in the database as soon as the PUT to `minoc` is
completed. In the case of a `vault` service co-located with a global SI, the new
Artifact is visible in the database once it is synced from the site of the PUT to
`minoc` to the global database by `fenwick` (or worst case: `ratik`).
Artifact is visible in the database once it is synced from the site of the PUT to
the global database by `fenwick` (or worst case: `ratik`).

## TODO
The design below only takes into account incremental propagation of space used
Expand Down Expand Up @@ -61,6 +61,9 @@ but there is nothing there right now.

## validation

### ContainerNode vs child nodes discrepancies
TODO: figure out how to validate ContainerNode sizes vs sum(child sizes) in a live system

### DataNode vs Artifact discrepancies
These can be validated in parallel by multiple threads, subdivide work by bucket.

Expand All @@ -73,37 +76,38 @@ else: ??
discrepancy 2: DataNode exists but Artifact does not
explanation: DataNode created, Artifact never (successfully) put
evidence: dataNode.size == 0
action: none
evidence:
action: set nodeSize = 0
discrepancy 3: DataNode exists but Artifact does not
explanation: deleted or lost Artifact
evidence: DataNode.size != 0 (deleted vs lost: DeletedArtifactEvent exists)
action: fix DataNode.size
action: fix nodeSize
discrepancy 4: DataNode.size != Artifact.contentLength
discrepancy 4: nodeSize != Artifact.contentLength
explanation: pending/missed Artifact event
action: fix DataNode and propagate delta to parent ContainerNode (same as incremental)
```

This could be accomplished with a single query on on inventory.Artifact full outer join
vospace.Node to get all the pairs. The more generic approach would be to do a merge join
of two iterators:

Iterator<Artifact> aiter = artifactDAO.iterator(vaultNamespace, bucket);
Iterator<DataNode> niter = nodeDAO.iterator(vaultNamespace, bucket);

The more generic dual iterator approach could be made to work if the inventory and vospace
content are in different PG database or server - TBD.
The most generic implementation is a merge join of two iterators (see ratik, tantar):
```
Iterator<Artifact> aiter = artifactDAO.iterator(vaultNamespace, bucket); // artifact.uri order
Iterator<DataNode> niter = nodeDAO.iterator(vaultNamespace, bucket); // storageID order
```

## database changes required
note: all field and column names TBD
* add `size` and `delta` fields to ContainerNode (transient)
* add `size` field to DataNode (transient)
* add `size` to the `vospace.Node` table
note: fields in Node classes probably not transient but TBD
* add `nodeSize` and `delta` fields to ContainerNode
* add `nodeSize` field to DataNode (no size props in LinkNode!)
* add `nodeSize` to the `vospace.Node` table
* add `delta` to the `vospace.Node` table
* add `storageBucket` to DataNode
* add `storageBucket` to `vospace.Node` table (validation)
* incremental sync query/iterator (ArtifactDAO?)
* lookup DataNode by storageID (ArtifactDAO?)
## cadc-inventory-db API required
* incremental sync query/iterator: ArtifactDAO.iterator(Namespace ns, String uriBucketPrefix, Date minLastModified)?
* lookup DataNode by storageID: NodeDAO.getDataNode(URI storageID)?
* validate-by-bucket: use ArtifactDAO.iterator(String uriBucketPrefix, boolean ordered, Namespace ns)
* validate-by-bucket: NodeDAO.dataNodeIterator(String storageBucketPrefix, boolean ordered)
* indices to support new queries

0 comments on commit 701e01f

Please sign in to comment.