Skip to content

Commit

Permalink
Reorganize README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ghostwords committed Oct 12, 2023
1 parent f5fc8c4 commit d60a045
Show file tree
Hide file tree
Showing 2 changed files with 137 additions and 125 deletions.
127 changes: 127 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Architecture

Once a run is confirmed, scans get initialized in parallel. Each scan instance receives their portion of the site list.

```mermaid
stateDiagram-v2
[*] --> ConfirmRun
state fork1 <<fork>>
ConfirmRun --> fork1
fork1 --> BadgerInit1
fork1 --> BadgerInit2
fork1 --> BadgerInitN
state InitScans {
cr1: CreateDroplet
cr2: CreateDroplet
cr3: CreateDroplet
dep1: InstallDependencies
dep2: InstallDependencies
dep3: InstallDependencies
sta1: StartScan
sta2: StartScan
sta3: StartScan
state BadgerInit1 {
[*] --> cr1
cr1 --> dep1
dep1 --> UploadSiteList1
UploadSiteList1 --> sta1
sta1 --> [*]
}
--
state BadgerInit2 {
[*] --> cr2
cr2 --> dep2
dep2 --> UploadSiteList2
UploadSiteList2 --> sta2
sta2 --> [*]
}
--
state BadgerInitN {
[*] --> cr3
cr3 --> dep3
dep3 --> UploadSiteListN
UploadSiteListN --> sta3
sta3 --> [*]
}
}
state join1 <<join>>
BadgerInit1 --> join1
BadgerInit2 --> join1
BadgerInitN --> join1
join1 --> [*]
```

The run is now resumable. Scans are checked for progress and status (errored/stalled/complete) in parallel.

- If a scan fails, its instance is deleted and the scan gets reinitialized.
- When a scan fails to progress long enough, it is considered stalled. Stalled scans get restarted, which mostly means they get to keep going after skipping the site they got stuck on.
- When a scan finishes, the results are extracted and the instance is deleted.

This continues until all scans finish.

```mermaid
stateDiagram-v2
[*] --> PollForStatus
state fork2 <<fork>>
PollForStatus --> fork2
fork2 --> CheckBadgerScan1
fork2 --> CheckBadgerScan2
fork2 --> CheckBadgerScanN
state ManageInProgressScans {
err1: CheckForFailure
err2: CheckForFailure
err3: CheckForFailure
pro1: ExtractProgress
pro2: ExtractProgress
pro3: ExtractProgress
sta1: CheckForStall
sta2: CheckForStall
sta3: CheckForStall
state CheckBadgerScan1 {
[*] --> err1
err1 --> pro1
pro1 --> sta1
sta1 --> [*]
}
--
state CheckBadgerScan2 {
[*] --> err2
err2 --> pro2
pro2 --> sta2
sta2 --> [*]
}
--
state CheckBadgerScanN {
[*] --> err3
err3 --> pro3
pro3 --> sta3
sta3 --> [*]
}
}
state join2 <<join>>
CheckBadgerScan1 --> join2
CheckBadgerScan2 --> join2
CheckBadgerScanN --> join2
state check1 <<choice>>
join2 --> check1
check1 --> PrintProgress : One or more scans still running
check1 --> MergeResults : All scans finished
PrintProgress --> PollForStatus
MergeResults --> [*]
```

On completion scan results are merged by Privacy Badger as if each result was manually imported on the Manage Data tab on Privacy Badger's options page.
135 changes: 10 additions & 125 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,136 +4,14 @@ Runs distributed [Badger Sett](https://github.com/EFForg/badger-sett) scans on D

![Badger Swarm demo recording](badger-swarm-screencast.gif)


## Architecture

Badger Swarm converts a Badger Sett scan of X sites into N Badger Sett scans of X/N sites. This makes medium scans complete as quickly as small scans, and large scans complete in a reasonable amount of time.

Once a run is confirmed, scans get initialized in parallel. Each scan instance receives their portion of the site list.

```mermaid
stateDiagram-v2
[*] --> ConfirmRun
state fork1 <<fork>>
ConfirmRun --> fork1
fork1 --> BadgerInit1
fork1 --> BadgerInit2
fork1 --> BadgerInitN
state InitScans {
cr1: CreateDroplet
cr2: CreateDroplet
cr3: CreateDroplet
dep1: InstallDependencies
dep2: InstallDependencies
dep3: InstallDependencies
sta1: StartScan
sta2: StartScan
sta3: StartScan
state BadgerInit1 {
[*] --> cr1
cr1 --> dep1
dep1 --> UploadSiteList1
UploadSiteList1 --> sta1
sta1 --> [*]
}
--
state BadgerInit2 {
[*] --> cr2
cr2 --> dep2
dep2 --> UploadSiteList2
UploadSiteList2 --> sta2
sta2 --> [*]
}
--
state BadgerInitN {
[*] --> cr3
cr3 --> dep3
dep3 --> UploadSiteListN
UploadSiteListN --> sta3
sta3 --> [*]
}
}
state join1 <<join>>
BadgerInit1 --> join1
BadgerInit2 --> join1
BadgerInitN --> join1
join1 --> [*]
```

The run is now resumable. Scans are checked for progress and status (errored/stalled/complete) in parallel.

- If a scan fails, its instance is deleted and the scan gets reinitialized.
- When a scan fails to progress long enough, it is considered stalled. Stalled scans get restarted, which mostly means they get to keep going after skipping the site they got stuck on.
- When a scan finishes, the results are extracted and the instance is deleted.
For more information, visit our [Introducing Badger Swarm: New Project Helps Privacy Badger Block Ever More Trackers](https://www.eff.org/deeplinks/2023/10/privacy-badger-learns-block-ever-more-trackers) blog post.

This continues until all scans finish.

```mermaid
stateDiagram-v2
[*] --> PollForStatus
state fork2 <<fork>>
PollForStatus --> fork2
fork2 --> CheckBadgerScan1
fork2 --> CheckBadgerScan2
fork2 --> CheckBadgerScanN
state ManageInProgressScans {
err1: CheckForFailure
err2: CheckForFailure
err3: CheckForFailure
pro1: ExtractProgress
pro2: ExtractProgress
pro3: ExtractProgress
sta1: CheckForStall
sta2: CheckForStall
sta3: CheckForStall
state CheckBadgerScan1 {
[*] --> err1
err1 --> pro1
pro1 --> sta1
sta1 --> [*]
}
--
state CheckBadgerScan2 {
[*] --> err2
err2 --> pro2
pro2 --> sta2
sta2 --> [*]
}
--
state CheckBadgerScanN {
[*] --> err3
err3 --> pro3
pro3 --> sta3
sta3 --> [*]
}
}
state join2 <<join>>
CheckBadgerScan1 --> join2
CheckBadgerScan2 --> join2
CheckBadgerScanN --> join2
state check1 <<choice>>
join2 --> check1
check1 --> PrintProgress : One or more scans still running
check1 --> MergeResults : All scans finished
PrintProgress --> PollForStatus
MergeResults --> [*]
```
## Architecture

On completion scan results are merged by Privacy Badger as if each result was manually imported on the Manage Data tab on Privacy Badger's options page.
See [ARCHITECTURE.md](ARCHITECTURE.md).


## Setup
Expand All @@ -147,3 +25,10 @@ On completion scan results are merged by Privacy Badger as if each result was ma
7. Run `./main.sh` to initiate a new run.

Once you are told the run is resumable, you can stop the script with <kbd>Ctrl</kbd>-<kbd>C</kbd> and then later resume the in-progress run with `./main.sh -r`.


## Helpful Bash scripting links
- https://google.github.io/styleguide/shellguide.html
- https://mywiki.wooledge.org/
- https://tldp.org/LDP/abs/html/index.html
- https://www.shellcheck.net/

0 comments on commit d60a045

Please sign in to comment.