Skip to content

2021 edgesys

Animesh Trivedi edited this page Apr 26, 2021 · 1 revision

Keynote notes from Animesh

edge storage keynote EdgeSys 2021

  • Move "compute" and "storage" close to edge
  • Remove centralized control
  • cloud = country/continent
  • core = state/province/city
  • edge = neighbourhod/building/city
  • shared infrastrucutre

Application = Drones : object detection and avoidance

"stateless" sutff is ok - container orchestration etc.

"database" at the cloud - then core/edge needs access to the remote cloud database

local execution + remote data execution >>> remote execution

Goal: data locality, make local execution!

mechanism: replicate to all the nodes, from the cloud, core, edge...make sure a local copy to run. chalenge: consistency (very high replicaiton factor) and replication (partial, dynamic) ==> replication factor is massive. 5-10x types ===> DC storage, 2-5 copies. In edge, there might be ~10-100s of copies. ==> consistenty over low slow links WANs ===> poor networking, not enough resources, heterogenity. ==> intermittent connectivity.

Full replicaiton might not be possible at the edge due to resource contraints. DC only do full replication, not a partial replication. What does it mean?

Me: Dynamic deployment. => Needs a high thoughput RM in case of high context swithc to manage share resources.

Local and Global queries (defining the storage access)

=========================== PATH STORE == local query execution

Multiple independent rings, the top-levlel in the cloud is the final persistent copy, all inclusive.

ME: weak consistency database = is this good enough for the edge. Eventual consistency => reaons: because high latency (is it true)

  • Pull based from the child based model. Parent do not keep track of data copies.
  • Dynamic replication, nodes can be added be added dynamically

PathStore - replication. DB will be empty. How does it fetch data. Once the first query comes in, then it pulls in on demand to pull data from the parent rings. CQL-based data touching and pulling.

Then finally children push updates in the background later once (if) there is a connectivity issues.

====================

ME: In his example of PathStore - the computer queries are coming from the edge to the cloud ME: Can we do horizontal queirying? or sharing of data appropriately. Formula = {compute_fast, network_weak} and then "dynamically" find where the data copies should be.

ME: Also, so many copies are wasting so much space! Me: They use timing for detecting the time conflict.

====================

They have PathStore, SessionStore == no other system out there there they can compare.

Based on how you maintain quorum - either reads and writes can be supported very fast.

========= GLOBAL queries ==================

Now we want to query all/many nodes. Before examples so far were for a single nodes who were doing pulling/pushing.

ME: i am wondering what is the right commpute API here? a giant event driven system, execute whereever data is?

For global queries = touch multiple nodes. Flood the system. High overheads ==> Me: how to ellinminate duplicate data execution

API: What is your freshness requirement? Laxity. Based on that, process all the data which is 10 seconds of older. Gaunteed to meet that or more requirement. System will give you freshness data result.

They modified CQL to include WHERE now() - timestamp < 60 seconds, LAXITY=60s. (These are very small dumb executions, we should do better!)

ME: How efficiently can we grep the freshness of data?

Feather system - filtering, aggregation, grouping, ordering, and limiting the result set. (some similrity with Arrow?)

Give you a measure of completness in presence of a failure. For example, a full query can only be executed 10mins before, now we can only execute on 80% of the network.

ME: they do not do much for mobility and tag.

Data pushed up every 60 seconds (!), that long!

Slide 44: latency profile (y axis) and x axis is the staleness allowed.

== futrure ==

  • goe-consistency API : an area of interest in which this query should be executed.
  • What should be the storage API - file system? for the edge.
  • prefetching and mobility policies
  • no disjoint keys

====

Query origination direction: from cloud to edge, or sensors to cloud ? Hirarchy : physcial world. P2P is not going to fly -- how does it will work on the edge?

Cassandra - sweet spot. No one knows what is the killer app? KV <-> DB. Middleground. CQL API was a good middle point? Is DB is the right way to do it? They have a file system interface for programming?

Q: no killer app - no single API, Cassandra was a middle compromise. They are building a file system as well.
Q: no single application/service - wasting resources, depends on what kind of tradeoff resource isolation (more flexible) <-> performance (less isolation)

=== There is a GC polciy. Only the root cloud is the final resting place. Other rings ejects, LRU.

== How low can we go? Can we do 100ms, 10ms, 1 ms replication traffic. They used 60 sec as a replication cycle.

Clone this wiki locally