Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More demos #45

Open
simonw opened this issue Dec 8, 2021 · 7 comments
Open

More demos #45

simonw opened this issue Dec 8, 2021 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@simonw
Copy link
Owner

simonw commented Dec 8, 2021

Using this thread to collect them - not yet decided if I'll turn these into live demos or just have a piece of documentation suggesting people try these out.

@simonw simonw added the documentation Improvements or additions to documentation label Dec 8, 2021
@simonw
Copy link
Owner Author

simonw commented Dec 8, 2021

Scraped USGS earthquakes:

git clone https://github.com/simonw/usgs-scraper
cd usgs-scraper
git-history file usgs.db usgs.json --id id --convert '
features = json.loads(content)["features"]
for feature in features:
    item = {"id": feature["id"]}
    item.update(feature["properties"]) 
    item["latitude"] = feature["geometry"]["coordinates"][1]
    item["longitude"] = feature["geometry"]["coordinates"][0]
    yield item
'

@simonw
Copy link
Owner Author

simonw commented Dec 10, 2021

https://github.com/adolph/getIncidentsGit Hoston incidents. Record looks like this:

{
  "ActiveIncidentDataTable": [
    {
      "Agency": "F",
      "Address": "WESTLAKE PARK BLVD",
      "CrossStreet": "BLK GRISBY RD",
      "KeyMap": "488B",
      "XCoord": "-95630227",
      "YCoord": "29783580",
      "CombinedResponse": "F",
      "CallTimeOpened": "12/06/2021 17:42",
      "IncidentType": "Automatic Alarm",
      "AlarmLevel": "0",
      "NumberOfUnits": "1",
      "Units": "E078;"
    }
]

So --convert of:

incidents = json.loads(content)["ActiveIncidentDataTable"]
for incident in incidents:
    x = incident.pop("XCoord")
    y = incident.pop("YCoord")
    incident["Latitude"] = float(y) / 1000000 if y else None
    incident["Longitude"] = float(x) / 1000000 if x else None
    call_time = incident.pop("CallTimeOpened")
    date, time = call_time.split(" ")
    mm, dd, yyyy = date.split("/")
    incident["CallTimeOpened"] = "{}-{}-{} {}".format(yyyy, mm, dd, time)
    incident["Units"] = [unit for unit in incident["Units"].split(";") if unit]
    yield incident

So the recipe is:

git-history file incidents.db incidents.json --convert '
incidents = json.loads(content)["ActiveIncidentDataTable"]
for incident in incidents:
    x = incident.pop("XCoord")
    y = incident.pop("YCoord")
    incident["Latitude"] = float(y) / 1000000 if y else None
    incident["Longitude"] = float(x) / 1000000 if x else None
    call_time = incident.pop("CallTimeOpened")
    date, time = call_time.split(" ")
    mm, dd, yyyy = date.split("/")
    incident["CallTimeOpened"] = "{}/{}/{} {}".format(yyyy, mm, dd, time)
    incident["Units"] = [unit for unit in incident["Units"].split(";") if unit]
    yield incident
' --id Address --id CrossStreet --id CallTimeOpened --ignore-duplicate-ids

@simonw
Copy link
Owner Author

simonw commented Dec 10, 2021

FARA from https://github.com/simonw/fara-history

git-history file fara.db \
  ../fara-history/FARA_All_Registrants.csv \
  --repo ../fara-history --id "Registration_Number" \
  --branch master --convert '
decoded = content.decode("utf-8")
reader = csv.DictReader(io.StringIO(decoded), dialect="excel")
for row in reader:
  yield dict((key.replace(" ", "_"), value) for key, value in row.items())
' --import io --import csv --ignore-duplicate-ids

@simonw
Copy link
Owner Author

simonw commented Dec 10, 2021

https://github.com/mmarvick/scrape-seattle-city-light-outages

git-history file seattle-city-light.db outages.json \
  --id id --branch master

@simonw
Copy link
Owner Author

simonw commented Dec 10, 2021

https://github.com/simonw/package-stats

git-history file stats.db stats.json --convert '
data = json.loads(content)
for key, counts in data.items():
    for date, count in counts.items():
        yield {
            "package": key,
            "date": date,
            "count": count
        }
' --id package --id date

And for releases in that same repo:

git-history file stats.db datasette-app-releases.json \
  --namespace releases \
  --convert '
    data = json.loads(content)
    if isinstance(data, dict) and data.get("message"):
        # Probably a rate-limiting error
        return
    for row in json.loads(content):
        version = row["tag_name"]
        for asset in row["assets"]:
            yield {
                "version": version,
                "asset": asset["name"],
                "download_count": asset["download_count"]
            }
    ' --id version --id asset

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

https://github.com/abkfenris/sugarloaf looks like a fun demo for CSV data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant