Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Commit

Permalink
Move documents into /docs folder
Browse files Browse the repository at this point in the history
  • Loading branch information
andrew committed May 20, 2019
1 parent 5f2d27d commit e705ce8
Show file tree
Hide file tree
Showing 11 changed files with 733 additions and 5 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@

IPFS Package Managers Special Interest Group

- [Package Management Glossary](glossary.md)
- [Package Management Categories](categories.md)
- [Package Manager list](package-managers)
- [Package Management Glossary](docs/glossary.md)
- [Package Management Categories](docs/categories.md)
- [How IPFS Concepts map to package manager concepts](docs/concepts.md)
- [Problems with Package Managers](docs/problems.md)
- [Facilitating the Correct Abstractions](docs/abstractions.md)
- [Package indexing and linking](docs/linking.md)
- [Cladistic tree of depths of integration](docs/decentralization.md)
- [Decentralized Publishing](docs/decentralization.md)
- [Academic papers related to package management](docs/papers.md)

## Integrations

Expand Down
94 changes: 94 additions & 0 deletions docs/abstractions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Facilitating the Correct Abstractions

(Acknowledging, of course, the hubris of the title -- we can only hope and try!)

To contribute meaningfully to advancing the state of package management, we must first understand package management.

First in understanding package management, we should identify and understand the stages of package management. These are stages I would identify:

- *[human authorship phase is ready to produce a package]*
- pack content
- write release metadata (version name, etc)
- upload content and release metadata
- *[-- switch between producer to consumer --]*
- fetch release metadata
- transitive dependency resolution
- lockfile creation
- *[-- possible switch to even further downstream consumer --]*
- lockfile read and content fetch
- content unpack
- *[a new human authorship phase takes over!]*

![cycle](https://user-images.githubusercontent.com/627638/53887868-6f7ce580-4023-11e9-9109-803fbd5a06ef.jpg)

(Image is an earlier visualization of roughly the same concepts, but pictured with authorship cycle closed.
Also note this image contains an "install" phase which is elided in the list above
or perhaps equates to "content unpack" depending on your POV; and several other steps
were combined rather than enumerated clearly.)

Understanding these phases of package management, we can begin to identify what
might be the key concepts of APIs that haul data between each of the steps.
And understanding what key concepts and data need to be hauled between each
step gives us a roadmap to how IPFS/IPLD can help haul that data!

---

Now. There's many interesting things in the above:

- Never forget that rather than a list, there is actually a cycle when creation
gets involved. I won't talk about this more in this issue, but in the longest
runs, it's incredibly important to mind how we can close this loop.

- Some of these phases are particularly clear in how they can relate to IPFS!
For example, uploading of packages and fetching of packages: clearly, these
operations can benefit from IPFS by treating it as a simple content bucket
that happens to be be particularly well decentralized. Since this is already
clear, I also won't talk any more about this in this issue.

- You might have noticed I injected some Opinions into a few of the steps.
In particular, that ordering of transitive resolution vs lockfile creation
vs metadata fetch is not entirely universally adopted! Some systems skip
the lockfile concept entirely and re-do dependency resolve every time they're used!
Some systems vary in what the lockfile contains (version numbers that are still
technically somewhat vague and need centralized/online translation into content,
versus content-identifiers/hashes, etc). Of course, systems vary *wildly*
in terms of what information they actually act on and what exact logic they
use for transitive dependency resolution. And alarmingly, most systems don't
clearly separate metadata fetch from resolution processes at all.

---

That last set of things I really want to focus in on.

I think my biggest takeaway by far from the last couple years of thinking about this whole domain is that segmenting resolve from all other operations is absolutely Of The Essence.
It's the point that never ceases to be contended, and for fundamental rather than incidental reasons: it is *correct* for different situations and packagers and user stories to use different resolution strategies.

It's also (what a coincidence) the key API concept that lets IPFS help other systems while keeping clear boundaries that let them get on with whatever locally contendable (e.g language specific) logic they need to.

---

But here we've got a bummer. Essentially no modern package managers I can think
of actually intentionally designed their resolve stages to be separate and pluggable.

The more we encourage separation of resolve from the steps that follow it,
the more clear it becomes for every system to have lockfiles;
and the more things have lockfiles, the happier we are,
because the jump from lockfile to content-addressable distribution system gets
more incremental and becomes more obviously a right choice.
But this is already widely clear and quite popular!

More interesting is what happens when we encourage separation of resolve from
the steps that *precede* it -- namely, from "metadata fetch".

If we can encourage a world of package managers which have clearly delineated
boundaries between metadata fetch and the evaluation of transitive dependency resolution
upon that metadata, we both get clearer points for integration IPFS/IPLD in the
metadata distribution, AND we provide a huge boost to enabling "reproducible resolve" --
an issue I've written more about [here](https://repeatr.io/vision/strategy/reproducible-resolve/) in the Timeless Stack docs --
which sets up the whole world nicely for bigger and better ecosystems of reproducible builds.

---

Thank you for coming to my Github issue / thinkpiece.

Where can we go from here? No idea: I just want to put all these thoughts out there to cook. We'll probably want to consider these in roadmapping anything beyond the most shortterm basic content-bucket integrations; and perhaps start circulating concepts like separating resolve from metadata transport sooner rather than later to prepare the ground for future work in that direction.
6 changes: 3 additions & 3 deletions blockers.md → docs/blockers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ Filestore expects files to be immutable once added, so rsyncing updates to exist

Adding a directory of files to MFS means calling out to `ipfs files write` for every file, ideally there should be one command to write a directory of files to MFS.

Alternative approach may be to mount MFS as a fuse filesystem (ala https://github.com/tableflip/ipfs-fuse)
Alternative approach may be to mount MFS as a fuse filesystem (ala https://github.com/tableflip/ipfs-fuse)

### Updating rolling changes requires rehashing all files

If there is a regular cron job downloading updates to a mirror with rsync, there's currently no easy way to only re-add the files that have been added/changed/removed without rehashing every file in the whole mirror directory.
If there is a regular cron job downloading updates to a mirror with rsync, there's currently no easy way to only re-add the files that have been added/changed/removed without rehashing every file in the whole mirror directory.

Mounting MFS as a fuse filesystem (ala https://github.com/tableflip/ipfs-fuse) and rsyncing directly onto fuse may be one approach.
Mounting MFS as a fuse filesystem (ala https://github.com/tableflip/ipfs-fuse) and rsyncing directly onto fuse may be one approach.

Alternatively there could be a ipfs rysnc command line tool that could talk directly with rsync server protocol.

Expand Down
File renamed without changes.
226 changes: 226 additions & 0 deletions docs/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# How IPFS Concepts map to package manager concepts

## The address of a package

HTTP is the most popular method for downloading the actual contents of a package, given a url, the package manager client makes a http request and the response body is the package contents, which gets saved to disk.

The package url usually contains the registry domain, package name and version number:

http://package-manager-registry.com/package-name/1.0.0.tar.gz

When you want to download a package using IPFS, rather than a url that contains the name and version, instead you provide a cryptographic hash of the contents that you’d like to recieve, for example:

/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B

You may notice, that unlike with the URL, there is no domain name, because that hash uniquely describes the contents, IPFS doesn’t need to load it from a particular server, it can request that package from anyone else who is sharing it on IPFS without worrying if it has been tampered with.

## Indexing packages

To work out the address of a package that you’d like to download, first you need to find out a few details about it, this is done using a package index, which is one of the primary roles of a registry.

Let’s say you already know the name of the package that you’d like to grab, “libfoobar”, to construct a http url your now only missing one part, the version number. Registries usually provide a way of finding out what the available version numbers for a package are. One way is with a JSON API over http:

http://package-manager-registry.com/libfoobar/versions

That returns a list of available version numbers:

```
[
{
"number": "0.0.1",
...
},
{
"number": "0.2.0",
...
},
{
"number": "1.0.0",
...
},
{
"number": "1.0.1",
...
}
]
```


To enable clients to download packages over IPFS you can provide the cryptographic hash (CID) of the contents of each version of the package along with it’s number:

```
[
{
"number": "0.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B"
...
},
{
"number": "0.2.0",
"cid": "/ipfs/QmTeHfj83a09e6b0da3a6e1163ce53bd03eebfc1c507ds"
...
},
{
"number": "1.0.0",
"cid": "/ipfs/QmTeHfjf778979bb559fbd3c384d9692d9260d5123a7b3"
...
},
{
"number": "1.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUFc9cbd8e968edfa8f22a33cff7"
...
}
]
```

IPFS doesn’t just have to store the package contents, it can also store the JSON list of available versions.

```
$ ipfs add versions.json
=> QmctG9GhPmwyjazcpseajzvSMsj7hh2RTJAviQpsdDBaxz
```

Doing this introduces some challenges, for one thing the CID of a list of versions for a package isn’t human readable, so needs a second way of finding that CID.

One way to solve that would be to create another index of all package names with the CID of the json file of list of versions for that package:

```
[
{
"name": "libfoo",
"versions": "/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B"
...
},
{
"name": "libbar",
"versions": "/ipfs/QmTeHfj83a09e6b0da3a6e1163ce53bd03eebfc1c507ds"
...
},
{
"name": "libbaz",
"versions": "/ipfs/QmTeHfjf778979bb559fbd3c384d9692d9260d5123a7b3"
...
}
]
```

You could create these linked json files manually but IPFS already has a technology called IPLD.

Another challenge is that every time there’s a new version of “libfoobar” released, the contents of versions.json changes, which produces a different hash when added to IPFS, the index of all the package names can be updated after each release as well, producing a merkle tree of package data

But of course then the indexes of all package names has the same problem, it gets updated after any package has a new version released.

There are a couple of IPFS technologies to help with this: IPNS and DNSLink.

IPNS allows you to create a mutable link to content on IPFS, you can think of a IPNS name as a pointer to an IPFS hash, which may change in the future to point to a different IPFS hash.

For example we could use IPNS to point to the hash of a json file of released versions of libfoo, taking the CID of versions.json and publishing it to IPNS:

```
$ ipfs name publish QmctG9GhPmwyjazcpseajzvSMsj7hh2RTJAviQpsdDBaxz
=> Published to QmSRzfkzkgofxg2cWKiqhTQRjscS4DC2c8bAD2TbECJCk6: /ipfs/QmctG9GhPmwyjazcpseajzvSMsj7hh2RTJAviQpsdDBaxz
```

Now we can use the IPNS address to load versions.json:

```
$ ipfs cat /ipns/QmSRzfkzkgofxg2cWKiqhTQRjscS4DC2c8bAD2TbECJCk6
=> [
{
"number": "0.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B"
...
},
...
]
```

After a new version of libfoo is published, we can add the tarball to IPFS, edit versions.json to include it:

```
[
{
"number": "0.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B"
...
},
{
"number": "0.2.0",
"cid": "/ipfs/QmTeHfj83a09e6b0da3a6e1163ce53bd03eebfc1c507ds"
...
},
{
"number": "1.0.0",
"cid": "/ipfs/QmTeHfjf778979bb559fbd3c384d9692d9260d5123a7b3"
...
},
{
"number": "1.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUFc9cbd8e968edfa8f22a33cff7"
...
},
{
"number": "2.0.0",
"cid": "/ipfs/Qm384d9692d9260d5123a7b3UFc9cbd8e968edfa8f2233"
...
}
]
```

and then add the updated versions.json to IPFS:

```
$ ipfs add versions.json
=> added QmcCmvc9K7fVeY9xQEj3xoa9HWnxs2M5zX97wTAvTQ61a9 versions.json
```

Then we can update that same IPNS address to point to the new copy of versions.json:

```
$ ipfs name publish QmcCmvc9K7fVeY9xQEj3xoa9HWnxs2M5zX97wTAvTQ61a9
=> Published to QmSRzfkzkgofxg2cWKiqhTQRjscS4DC2c8bAD2TbECJCk6: /ipfs/QmcCmvc9K7fVeY9xQEj3xoa9HWnxs2M5zX97wTAvTQ61a9
```

The same IPNS address now points to the new versions.json file:

```
$ ipfs cat /ipns/QmSRzfkzkgofxg2cWKiqhTQRjscS4DC2c8bAD2TbECJCk6
=> [
{
"number": "0.0.1",
"cid": "/ipfs/QmTeHfjrEfVDUDRootgUF45eZoeVxKCy3mjNLA8q5fnc1B"
...
},
...
{
"number": "2.0.0",
"cid": "/ipfs/Qm384d9692d9260d5123a7b3UFc9cbd8e968edfa8f2233"
...
}
]
```

DNSLink gives you similar functionality to IPFS but adds a dependency to DNS. It works by using a domain name instead of a hash, for example:

/ipns/package-manager-registry.com

To configure that domain name to point to a specific IPFS hash, you add a TXT dns record with content in the following form:

dnslink=/ipfs/<CID for your content here>

When requesting content from a DNSLink, IPFS will look up the TXT record on that domain name and then fetch the content for the CID stored within it.

DNSLink can be useful for adding human readable names as well as adding a layer of social trust with users already familiar with your domain name, and at the time of writing DNSLink is quite a bit faster that using IPNS.

One downside of DNSLink is that updating a DNS record every time you wish to make a change can be fiddly, not every DNS provider has an API that can be used to automate the action and DNS propagation can take hours for changes to be updated world wide in some cases.

IPNS names on the other hand have the added benefit that they work purely using IPFS technology and can be resolved without needing any traditional infrastructure, as well as working “offline”.
Checking for new versions of packages

## Publishing packages

TODO

## Verifying package contents

TODO
Loading

0 comments on commit e705ce8

Please sign in to comment.