diff --git a/404.html b/404.html index 9455f59a9..2ddd4914f 100644 --- a/404.html +++ b/404.html @@ -20,11 +20,11 @@ - +
- + diff --git a/adoption/index.html b/adoption/index.html index c8fe16060..1837bcfff 100644 --- a/adoption/index.html +++ b/adoption/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

# Frictionless Adoption

Projects and collaborations that use Frictionless.

The Frictionless Data project provides software and standards to work with data. On this page we share projects and collaborations that use Frictionless, including collaborations with the Frictionless Team and also community projects that use our toolkit.

TIP

If you use Frictionless in your work and want to share it with community, please write to the Frictionless Team using any available contact provided on this site and we will add your project to this page.

# Pilot Collaborations

We work closely with data researchers and institutions to help them integrate Frictionless into their workflow. Click on individual Pilots to learn more.

BCO-DMO

A Pilot with the Biological and Chemical Oceanography Data Management Office (BCO-DMO).

PUDL

A pilot with the Public Utility Data Liberation project, PUDL, aims to make US energy data easier to use.

Dryad

A pilot to add Frictionless Data Validation within Dryad, a curated resource that makes research data discoverable, freely reusable, and citable.

Data Readiness Group

A pilot with Dr. Philippe Rocca-Serra at Oxford's Data Readiness Group to remove the friction in reported scientific experimental results by applying the Data Package specifications.

Data Management for TEDDINET

A pilot to use Frictionless Data approaches to address data legacy issues facing the TEDDINET project, a research network addressing the challenges of transforming energy demand in our buildings.

Western Pennsylvania Regional Data Center

A pilot to showcase an implementation that expounds on quality and description of datasets in CKAN-based open data portals with the Western Pennsylvania Regional Data Center - a part of The University of Pittsburgh Center for Urban and Social Research.

UK Data Service

A pilot to use Frictionless Data software to assess and report on data quality and make a case for generating visualizations with ensuing data and metadata with UK data.

eLife

A pilot to explore the use of goodtables library to validate all scientific research datasets hosted by eLife and make a case for open data reuse in the field of Life and BioMedical sciences.

University of Cambridge - Retinal Mosaics

A pilot to trial Frictionless software for packaging and reading data to support computational techniques to investigate development of the nervous system.

Pacific Northwest National Laboratory - Active Data Biology

A pilot to explore the use of Frictionless Data's specifications and software to generate schemas for tabular data and validate metadata stored as part of a biological application on GitHub.

Causa Natura - Pescando Datos

A pilot to explore the use of data validation software in the Causa Natura project to improve quality of data to support fisher communities and advocacy groups.

# Tool Fund Grantee Projects

As part of the Reproducible Research project, we awarded several projects with small grants to build new tooling for open research based on the Frictionless codebase. Click on individual Tool Fund profiles to learn more.

Schema Collaboration

Data managers and researchers collaborate to write packages and tabular schemas (by Carles Pina Estany).

Frictionless Data Package for InterMine

Add data package support to InterMine, an open-source biological data warehouse (by Nikhil Vats).

Frictionless Data for Wheat

Added Frictionless support to the Designing Future Wheat project data portal which houses large-scale wheat datasets (by Simon Tyrrell and Xingdong Bian).

Metrics in Context

Developing an open standard to describe metadata of scholarly metrics by using Frictionless specifications (by Asura Enkhbayar).

Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools

Evaluate the use of Frictionless Data as a common format for the analysis of neuronal spontaneous activity recordings in comparison to HDF5 (by Stephen Eglen and Alexander Shtyrov).

Neuroscience Experiments System Tool Fund

Adapt the existing export component of RIDC NeuroMat's Neuroscience Experiments System to conform to the Frictionless Data specifications (by João Alexandre Peschanski, Cassiano dos Santos and Carlos Eduardo Ribas).

Frictionless DarwinCore

A tool to convert DarwinCore Archives into Frictionless Data Packages (by André Heughebaert).

Frictionless Google Sheets Tool (WIP)

Prototype a Data Package import/export add-on to Google Sheets (by Stephan Max).

Frictionless Open Referral

Implement datapackage bundling of Open Referral CSV files, which contain human health and social services data (by Shelby Switzer and Greg Bloom).

Software Libraries Grantees

In 2017, 6 grantees were awared funds to translate the Frictionless Python libraries into other software languages. The awardees and languages were: Matt Thompson - Clojure; Ori Hoch - PHP; Daniel Fireman - Go; Georges Labrèche - Java; Oleg Lavrovsky - Julie; and Open Knowledge Greece - R. You can read more about them each on the people page.

# Community Projects

The Frictionless Data project develops open source standards and software that can be re-used by anyone. Here is a list of projects that our community has created on top of Frictionless. If you would like your project to be featured here, let us know!

Libraries Hacked

Libraries hacked is a project started in 2014 to promote the use of open data in libraries.

Open Data Blend

Open Data Blend is a set of open data services that aim to make large and complex UK open data easier to analyse.

Data Curator

Data Curator is a simple desktop data editor to help describe, validate and share usable open data.

HubMAP

HuBMAP is creating an open, global atlas of the human body at the cellular level.

Etalab

Etalab, a department of the French interministerial digital service, launched schema.data.gouv.fr

Nimble Learn - datapackage-m

A set of functions written in Power Query M for working with Tabular Data Packages in Power BI Desktop and Power Query for Excel.

Nimble Learn - Datapackage-connector

Power BI Custom Connector that loads one or more tables from Tabular Data Packages into Power BI.

Zegami

Zegami is using Frictionless Data specifications for data management and syntactic analysis on their visual data analysis platform.

Center for Data Science and Public Policy, Workforce Data Initiative

Supporting state and local workforce boards in managing and publishing data.

Cell Migration Standardization Organization

Using Frictionless Data specs to package cell migration data and load it into Pandas for data analysis and creation of visualizations.

Collections as Data Facets - Carnegie Museum of Art Collection Data

Use of Frictionless Data specifications in the release of Carnegie Museum of Arts’ Collection Data for public access & creative use.

OpenML

OpenML is an online platform and service for machine learning, whose goal is to make ML and data analysis simple.

The Data Retriever

Data Retriever uses Frictionless Data specifications to generate and package metadata for publicly available data.

Tesera

Tesera uses Frictionless Data specifications to package data in readiness for use in different systems and components.

data.world

data.world uses Frictionless Data specifications to generate schema and metadata related to an uploaded dataset and containerize all three in a Tabular Data Package.

John Snow Labs

John Snow Labs uses Frictionless Data specifications to avail data to users for analysis.

Open Power System Data

Open Power System Data uses Frictionless Data specifications to avail energy data for analysis and modeling.

Dataship

Dataship used Frictionless Data specifications as the basis for its easy to execute, edit and share notebooks for data analysis.

European Commission

The European Commission launched a CSV schema validator using the tabular data package specification, as part of the ISA² Interoperability Testbed.

Validata

OpenDataFrance created Validata, a platform for local public administration in France to validate CSV files on the web, using the tabular data package specification.

# Find Frictionless Datasets

Where can I find Frictionless Datasets?

# Grant-funded work

# Frictionless Data for Reproducible Research

From September 2018 til December 2021, the Frictionless Data team focused on enhanced dissemination and training activities, and further iterations on our software and specifications via a range of collaborations with research partners. We aimed to use Frictionless tooling to resolve research data workflow issues, create a new wave of open science advocates, and teach about FAIR data management. This pivotal work was funded by the Alfred P. Sloan Foundation and overseen by the Frictionless team at the Open Knowledge Foundation. You can read more details about this grant here (opens new window).

# Pilot Collaborations

Pilots are intensive, hands-on collaborations with researcher teams to resolve their research data management workflow issues with Frictionless Data software and specs. You can read about the Pilot projects on our blog.

# Tool Fund

The Tool Fund is a $5000 grant to develop an open tool for reproducible science or research built using the Frictionless Data codebase. Learn more by reading Tool Fund Blogs or by visiting the Tool Fund site (opens new window).

# Fellows Programme

The Fellows Programme (opens new window) trains early career researchers to become champions of the Frictionless Data tools and approaches in their field. Read more about the Programme, including Fellows biographies and the programme syllabus, on the Fellows website (opens new window)

# Data Institutions - Website Update

In 2021, we partnered with the Open Data Institute (ODI) to improve our existing documentation and add new features on Frictionless Data to create a better user experience for all. Working with a series of feedback sessions from our community members, we created our new documentation portal (opens new window) for the Frictionless Framework and several new tutorials. Read more about this grant here (opens new window).

# Frictionless Field Guide

In 2017, OKF received funding from the Open Data Institute to create a Frictionless Data Field Guide. This guide provided step-by-step instructions for improving data publishing workflows. The field guide introduced new ways of working informed by the Frictionless Data suite of software that data publishers can use independently, or adapt into existing personal and organisational workflows. You can read more details about this work here (opens new window).

# Data Package Integrations

In 2016, Google funded OKF to work on tool integration for Data Packages as part of our broader work on Frictionless Data to support the open data community. You can read more about this work here (opens new window).

# Data Packages Development

In 2016, OKF received funding from The Alfred P. Sloan Foundation to work on a broad range of activities to enable better research and more effective civic tech through Frictionless Data. The funding targeted standards work, tooling, and infrastructure around “data packages” as well as piloting and outreach activities to support researchers and civic technologists in addressing real problems encountered when working with data. You can read more about this work here (opens new window).


- + diff --git a/assets/js/10.0e8811e8.js b/assets/js/10.f7c1e786.js similarity index 99% rename from assets/js/10.0e8811e8.js rename to assets/js/10.f7c1e786.js index d01e634e0..658662a03 100644 --- a/assets/js/10.0e8811e8.js +++ b/assets/js/10.f7c1e786.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[10],{419:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos1.92f0c35b.png"},420:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos2.d04a929b.png"},421:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos3.41defd34.png"},422:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos4.8d95dae7.png"},423:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos5.aa82e072.png"},424:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos6.9e89ebcb.png"},572:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"context"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),o("p",[e._v("Causa Natura is a non-profit organization based in Mexico. It supports public policies to allow management of natural resources respecting human rights, equity, efficiency and sustainability. This project, “Pescando Datos” seeks to advocate for improved public policies for more than just subsidies allocation, through the collection of, analysis, and visualization of data around subsidies available to fishing communities in Mexico.")]),e._v(" "),o("p",[e._v("After an extended period of analysis a web platform is being built in order to explore data and visualize it with launch due for later in 2017. Following a meeting at csv,conf after a presentation by Adrià Mercader on "),o("a",{attrs:{href:"https://www.youtube.com/watch?v=Gk2F4hncAgY&index=35&list=PLg5zZXwt2ZW5UIz13oI56vfZjF6mvpIXN",target:"_blank",rel:"noopener noreferrer"}},[e._v("‘Continuous Data Validation for Everybody’"),o("OutboundLink")],1),e._v(" we have piloted with Causa Natura to explore how our goodtables service can support the project. We spoke to Eduardo Rolón, Executive Director of Causa Natura and Gabriela Rodriguez who is working on the platform.")]),e._v(" "),o("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[e._v("#")]),e._v(" Problem We Were Trying To Solve")]),e._v(" "),o("p",[e._v("Causa Natura are making a lot of freedom of Information requests in Mexico on information to do with fishers in order to understand how policies are impacting people. The data is needed to support a range of stakeholders from the many co-op fisher communities to advocacy organisations.")]),e._v(" "),o("blockquote",[o("p",[e._v("Eduardo Rolón: Advocacy organizations, either from CSOs or from the fisheries sector may be more interested in data that evaluates and supports policy recommendations. Fisher communities have more immediate needs, such as how to obtain better governmental services and support.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: The data is important to us because Campaigns and decisions will be made based on the analysis on the data Causa Natura collected. To be able to do the required analysis we need good data.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: Currently, there is a tedious process of cleaning to give us data that can be worked on. Much of the data Causa Natura was using came as PDFs and needed to be processed. We process a lot of PDFs and Excel files and there are a lot of problems getting the OCR to capture the information correctly to csv. For example, names are not consistent and this causes us a lot of problems.")])]),e._v(" "),o("h2",{attrs:{id:"the-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),o("h3",{attrs:{id:"software"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#software"}},[e._v("#")]),e._v(" Software")]),e._v(" "),o("p",[e._v("goodtables was an existing Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. We introduced goodtables in a "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2015/02/20/introducing-goodtables.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog post"),o("OutboundLink")],1),e._v(" earlier this year.")]),e._v(" "),o("p",[e._v("On top of that, Open Knowledge International has developed "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(", a web service for a continuous data validation that connects to different data sources to generate structure and content reports.")]),e._v(" "),o("h3",{attrs:{id:"what-did-we-do"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),o("p",[e._v("Let’s see how "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" has helped to identify source and structural errors in the Causa Natura pilot dataset:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(419),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("After we’ve signed in, we synchronize our GitHub repositories and activate the repository we want to validate ("),o("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-causanatura",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/pilot-causanatura"),o("OutboundLink")],1),e._v("):")]),e._v(" "),o("p",[o("img",{attrs:{src:a(420),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("Once the repository is activated, every time there is an update on the data hosted on GitHub, the service will generate a validation report. This is how one of these reports looks like:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(421),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("Here, we see that there are 59 valid tables, but the report has identified source and structural errors in 41 of the other tables hosted on the repository, including:")]),e._v(" "),o("ul",[o("li",[e._v("duplicate rows")]),e._v(" "),o("li",[e._v("duplicate headers")]),e._v(" "),o("li",[e._v("blank rows")]),e._v(" "),o("li",[e._v("missing values")])]),e._v(" "),o("p",[e._v("The full list of checks exercised by "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" can be found in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Quality Spec"),o("OutboundLink")],1),e._v(". And the full report can be found "),o("a",{attrs:{href:"http://goodtables.io/github/frictionlessdata/pilot-causanatura/jobs/7",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("After identifying errors we went back do a manual cleanup of the data. As we mentioned, there is no need to run "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" validation manually - it happens on any GitHub push for all activated repositories:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(422),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("If we need to customize a validation process we can put a goodtables.yml configuration file on the repository root, allowing us to tweak settings like the actual checks to perform, limit of rows to check, etc:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(423),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("And instant feedback is available via GitHub commit statuses and a "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" badge that can be included in the README file:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(424),alt:"ADBio"}})]),e._v(" "),o("h2",{attrs:{id:"review"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[e._v("#")]),e._v(" Review")]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: Right now I have not been using it extensively yet but I have a lot of faith that it could get incorporated in the process of importing data into the Github repository. It should be easy to introduce into our workflow. I really like the process of hooks after git-push as I’m trying to get the organization to use Github for new data. I really like the validation part and that a report is generated each time data is pushed. This is very important and very useful. This makes it easier for the people who are doing the cleaning of data who may not have experience with GitHub.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: The web interface needs a lot of usability work. But the idea is awesome. There are problems and it is kind of hard to use at the moment as it takes a long time to sync repositories and the process is not clear, but i think it has a huge potential to make a difference to the work we are doing, mostly if people use Github to store data then it could make a difference.")])]),e._v(" "),o("h2",{attrs:{id:"next-steps"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[e._v("#")]),e._v(" Next Steps")]),e._v(" "),o("h3",{attrs:{id:"areas-for-further-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#areas-for-further-work"}},[e._v("#")]),e._v(" Areas for further work")]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: With continuous integration it would be very helpful to be notified with messages about the problems in the data. Perhaps emails notifications would be a good way to go, or integrations with other programs - Slack for example - would be fantastic.")])]),e._v(" "),o("p",[e._v("One thing to note is that all the errors shown following the analysis refer to the structure of the data files (missing headers, duplicate rows, etc). Including schema validation against some of the files would be a very logical next step in testing whether the contents of the data are what is expected). We are now planning to work with Causa Natura to take the steps to identify a subset of the data and create a base schema/data package that will be easily expandable and extendable.")]),e._v(" "),o("h3",{attrs:{id:"find-out-more"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#find-out-more"}},[e._v("#")]),e._v(" Find Out More")]),e._v(" "),o("p",[e._v("To explore for the yourself and collaborate, see the Pescando Datos project on "),o("a",{attrs:{href:"https://github.com/pescandodatos/datos",target:"_blank",rel:"noopener noreferrer"}},[e._v("github"),o("OutboundLink")],1),e._v(" and our goodtables "),o("a",{attrs:{href:"http://goodtables.io/github/frictionlessdata/pilot-causanatura",target:"_blank",rel:"noopener noreferrer"}},[e._v("reports"),o("OutboundLink")],1),e._v(" from the project.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[10],{419:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos1.92f0c35b.png"},420:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos2.d04a929b.png"},421:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos3.41defd34.png"},422:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos4.8d95dae7.png"},423:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos5.aa82e072.png"},424:function(e,t,a){e.exports=a.p+"assets/img/pescandodatos6.9e89ebcb.png"},573:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"context"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),o("p",[e._v("Causa Natura is a non-profit organization based in Mexico. It supports public policies to allow management of natural resources respecting human rights, equity, efficiency and sustainability. This project, “Pescando Datos” seeks to advocate for improved public policies for more than just subsidies allocation, through the collection of, analysis, and visualization of data around subsidies available to fishing communities in Mexico.")]),e._v(" "),o("p",[e._v("After an extended period of analysis a web platform is being built in order to explore data and visualize it with launch due for later in 2017. Following a meeting at csv,conf after a presentation by Adrià Mercader on "),o("a",{attrs:{href:"https://www.youtube.com/watch?v=Gk2F4hncAgY&index=35&list=PLg5zZXwt2ZW5UIz13oI56vfZjF6mvpIXN",target:"_blank",rel:"noopener noreferrer"}},[e._v("‘Continuous Data Validation for Everybody’"),o("OutboundLink")],1),e._v(" we have piloted with Causa Natura to explore how our goodtables service can support the project. We spoke to Eduardo Rolón, Executive Director of Causa Natura and Gabriela Rodriguez who is working on the platform.")]),e._v(" "),o("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[e._v("#")]),e._v(" Problem We Were Trying To Solve")]),e._v(" "),o("p",[e._v("Causa Natura are making a lot of freedom of Information requests in Mexico on information to do with fishers in order to understand how policies are impacting people. The data is needed to support a range of stakeholders from the many co-op fisher communities to advocacy organisations.")]),e._v(" "),o("blockquote",[o("p",[e._v("Eduardo Rolón: Advocacy organizations, either from CSOs or from the fisheries sector may be more interested in data that evaluates and supports policy recommendations. Fisher communities have more immediate needs, such as how to obtain better governmental services and support.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: The data is important to us because Campaigns and decisions will be made based on the analysis on the data Causa Natura collected. To be able to do the required analysis we need good data.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: Currently, there is a tedious process of cleaning to give us data that can be worked on. Much of the data Causa Natura was using came as PDFs and needed to be processed. We process a lot of PDFs and Excel files and there are a lot of problems getting the OCR to capture the information correctly to csv. For example, names are not consistent and this causes us a lot of problems.")])]),e._v(" "),o("h2",{attrs:{id:"the-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),o("h3",{attrs:{id:"software"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#software"}},[e._v("#")]),e._v(" Software")]),e._v(" "),o("p",[e._v("goodtables was an existing Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. We introduced goodtables in a "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2015/02/20/introducing-goodtables.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog post"),o("OutboundLink")],1),e._v(" earlier this year.")]),e._v(" "),o("p",[e._v("On top of that, Open Knowledge International has developed "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(", a web service for a continuous data validation that connects to different data sources to generate structure and content reports.")]),e._v(" "),o("h3",{attrs:{id:"what-did-we-do"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),o("p",[e._v("Let’s see how "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" has helped to identify source and structural errors in the Causa Natura pilot dataset:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(419),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("After we’ve signed in, we synchronize our GitHub repositories and activate the repository we want to validate ("),o("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-causanatura",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/pilot-causanatura"),o("OutboundLink")],1),e._v("):")]),e._v(" "),o("p",[o("img",{attrs:{src:a(420),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("Once the repository is activated, every time there is an update on the data hosted on GitHub, the service will generate a validation report. This is how one of these reports looks like:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(421),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("Here, we see that there are 59 valid tables, but the report has identified source and structural errors in 41 of the other tables hosted on the repository, including:")]),e._v(" "),o("ul",[o("li",[e._v("duplicate rows")]),e._v(" "),o("li",[e._v("duplicate headers")]),e._v(" "),o("li",[e._v("blank rows")]),e._v(" "),o("li",[e._v("missing values")])]),e._v(" "),o("p",[e._v("The full list of checks exercised by "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" can be found in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Quality Spec"),o("OutboundLink")],1),e._v(". And the full report can be found "),o("a",{attrs:{href:"http://goodtables.io/github/frictionlessdata/pilot-causanatura/jobs/7",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("After identifying errors we went back do a manual cleanup of the data. As we mentioned, there is no need to run "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" validation manually - it happens on any GitHub push for all activated repositories:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(422),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("If we need to customize a validation process we can put a goodtables.yml configuration file on the repository root, allowing us to tweak settings like the actual checks to perform, limit of rows to check, etc:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(423),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("And instant feedback is available via GitHub commit statuses and a "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" badge that can be included in the README file:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(424),alt:"ADBio"}})]),e._v(" "),o("h2",{attrs:{id:"review"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[e._v("#")]),e._v(" Review")]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: Right now I have not been using it extensively yet but I have a lot of faith that it could get incorporated in the process of importing data into the Github repository. It should be easy to introduce into our workflow. I really like the process of hooks after git-push as I’m trying to get the organization to use Github for new data. I really like the validation part and that a report is generated each time data is pushed. This is very important and very useful. This makes it easier for the people who are doing the cleaning of data who may not have experience with GitHub.")])]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: The web interface needs a lot of usability work. But the idea is awesome. There are problems and it is kind of hard to use at the moment as it takes a long time to sync repositories and the process is not clear, but i think it has a huge potential to make a difference to the work we are doing, mostly if people use Github to store data then it could make a difference.")])]),e._v(" "),o("h2",{attrs:{id:"next-steps"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[e._v("#")]),e._v(" Next Steps")]),e._v(" "),o("h3",{attrs:{id:"areas-for-further-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#areas-for-further-work"}},[e._v("#")]),e._v(" Areas for further work")]),e._v(" "),o("blockquote",[o("p",[e._v("Gabriela Rodriguez: With continuous integration it would be very helpful to be notified with messages about the problems in the data. Perhaps emails notifications would be a good way to go, or integrations with other programs - Slack for example - would be fantastic.")])]),e._v(" "),o("p",[e._v("One thing to note is that all the errors shown following the analysis refer to the structure of the data files (missing headers, duplicate rows, etc). Including schema validation against some of the files would be a very logical next step in testing whether the contents of the data are what is expected). We are now planning to work with Causa Natura to take the steps to identify a subset of the data and create a base schema/data package that will be easily expandable and extendable.")]),e._v(" "),o("h3",{attrs:{id:"find-out-more"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#find-out-more"}},[e._v("#")]),e._v(" Find Out More")]),e._v(" "),o("p",[e._v("To explore for the yourself and collaborate, see the Pescando Datos project on "),o("a",{attrs:{href:"https://github.com/pescandodatos/datos",target:"_blank",rel:"noopener noreferrer"}},[e._v("github"),o("OutboundLink")],1),e._v(" and our goodtables "),o("a",{attrs:{href:"http://goodtables.io/github/frictionlessdata/pilot-causanatura",target:"_blank",rel:"noopener noreferrer"}},[e._v("reports"),o("OutboundLink")],1),e._v(" from the project.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/100.b75e047c.js b/assets/js/100.f3be7755.js similarity index 98% rename from assets/js/100.b75e047c.js rename to assets/js/100.f3be7755.js index f6d43d19d..d0b1b568c 100644 --- a/assets/js/100.b75e047c.js +++ b/assets/js/100.f3be7755.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[100],{631:function(e,t,n){"use strict";n.r(t);var o=n(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,n=e._self._c||t;return n("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[n("p",[n("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v5"),n("OutboundLink")],1),e._v(", which occurred virtually in May 2020, featured several talks about using Frictionless Data, and was also organized by two members of the Frictionless Data team, Lilly Winfree and Jo Barratt. csv,conf is a community conference that brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over the years we have had over a hundred different talks from a huge range of speakers, most of which you can still watch back on our "),n("a",{attrs:{href:"http://youtube.com/csvconf",target:"_blank",rel:"noopener noreferrer"}},[e._v("YouTube Channel"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("COVID-19 threw a wrench in our plans for csv,conf,v5, and we ended up converting the conference to a virtual event. We were looking forward to our first conference in Washington DC, but unfortunately, like many other in-person events, this was not going to be possible in 2020. However, there were many positive outcomes of moving to a virtual conference. For instance, the number of attendees quadrupled (over 1000 people registered!) and people were able to attend from all over the world.")]),e._v(" "),n("p",[e._v("During the conference, there were several talks showcasing Frictionless Data. Two of the Frictionless Data Fellows, Monica Granados and Lily Zhao, presented a talk (“"),n("a",{attrs:{href:"https://youtu.be/tZmu5DGPRmA",target:"_blank",rel:"noopener noreferrer"}},[e._v("How Frictionless Data Can Help You Grease Your Data"),n("OutboundLink")],1),e._v("”) that had over 100 people watching live, which is many more than would have been at their talk in person. Other related projects gave talks that incorporated Frictionless Data, such as Christina Gosnell and Pablo Virgo from Catalyst Cooperative discussing “"),n("a",{attrs:{href:"https://youtu.be/ktLTC7SENHk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting climate advocates the data they need."),n("OutboundLink")],1),e._v("” I also recommend watching “"),n("a",{attrs:{href:"https://youtu.be/3Ban-orpVtc",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data and Code for Reproducible Research"),n("OutboundLink")],1),e._v("” by Lisa Federer and Maryam Zaringhalam, and “"),n("a",{attrs:{href:"https://youtu.be/XV_jxbB1cBY",target:"_blank",rel:"noopener noreferrer"}},[e._v("Low-Income Data Diaries - How “Low-Tech” Data Experiences Can Inspire Accessible Data Skills and Tool Design"),n("OutboundLink")],1),e._v("” by David Selassie Opoku. You can see the full list of talks, with links to slides and videos, on the csv,conf website: "),n("a",{attrs:{href:"https://csvconf.com/speakers/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/speakers/"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("If you are planning on organizing a virtual event, you can read more about how csv,conf,v5 was planned here: "),n("a",{attrs:{href:"https://csvconf.com/going-online",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/going-online"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("We hope to see some of you next year for csv,conf,v6!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[100],{632:function(e,t,n){"use strict";n.r(t);var o=n(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,n=e._self._c||t;return n("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[n("p",[n("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v5"),n("OutboundLink")],1),e._v(", which occurred virtually in May 2020, featured several talks about using Frictionless Data, and was also organized by two members of the Frictionless Data team, Lilly Winfree and Jo Barratt. csv,conf is a community conference that brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over the years we have had over a hundred different talks from a huge range of speakers, most of which you can still watch back on our "),n("a",{attrs:{href:"http://youtube.com/csvconf",target:"_blank",rel:"noopener noreferrer"}},[e._v("YouTube Channel"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("COVID-19 threw a wrench in our plans for csv,conf,v5, and we ended up converting the conference to a virtual event. We were looking forward to our first conference in Washington DC, but unfortunately, like many other in-person events, this was not going to be possible in 2020. However, there were many positive outcomes of moving to a virtual conference. For instance, the number of attendees quadrupled (over 1000 people registered!) and people were able to attend from all over the world.")]),e._v(" "),n("p",[e._v("During the conference, there were several talks showcasing Frictionless Data. Two of the Frictionless Data Fellows, Monica Granados and Lily Zhao, presented a talk (“"),n("a",{attrs:{href:"https://youtu.be/tZmu5DGPRmA",target:"_blank",rel:"noopener noreferrer"}},[e._v("How Frictionless Data Can Help You Grease Your Data"),n("OutboundLink")],1),e._v("”) that had over 100 people watching live, which is many more than would have been at their talk in person. Other related projects gave talks that incorporated Frictionless Data, such as Christina Gosnell and Pablo Virgo from Catalyst Cooperative discussing “"),n("a",{attrs:{href:"https://youtu.be/ktLTC7SENHk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting climate advocates the data they need."),n("OutboundLink")],1),e._v("” I also recommend watching “"),n("a",{attrs:{href:"https://youtu.be/3Ban-orpVtc",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data and Code for Reproducible Research"),n("OutboundLink")],1),e._v("” by Lisa Federer and Maryam Zaringhalam, and “"),n("a",{attrs:{href:"https://youtu.be/XV_jxbB1cBY",target:"_blank",rel:"noopener noreferrer"}},[e._v("Low-Income Data Diaries - How “Low-Tech” Data Experiences Can Inspire Accessible Data Skills and Tool Design"),n("OutboundLink")],1),e._v("” by David Selassie Opoku. You can see the full list of talks, with links to slides and videos, on the csv,conf website: "),n("a",{attrs:{href:"https://csvconf.com/speakers/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/speakers/"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("If you are planning on organizing a virtual event, you can read more about how csv,conf,v5 was planned here: "),n("a",{attrs:{href:"https://csvconf.com/going-online",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/going-online"),n("OutboundLink")],1),e._v(".")]),e._v(" "),n("p",[e._v("We hope to see some of you next year for csv,conf,v6!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/105.80399967.js b/assets/js/105.394d3853.js similarity index 99% rename from assets/js/105.80399967.js rename to assets/js/105.394d3853.js index 9ea025c94..fa0c43db8 100644 --- a/assets/js/105.80399967.js +++ b/assets/js/105.394d3853.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[105],{637:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This grantee profile features Simon Tyrrell, Xingdong Bian, and Robert Davey for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")])]),e._v(" "),a("h2",{attrs:{id:"meet-the-grassroots-team"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#meet-the-grassroots-team"}},[e._v("#")]),e._v(" Meet the Grassroots team")]),e._v(" "),a("p",[e._v("Hi I’m Simon Tyrrell and I’m a research software engineer having spent most of my career in academia. My first degree was in Maths and I did my PhD in Cheminformatics, both done at the University of Sheffield. After some postdoctoral fellowships in Computational Chemistry, I now happily reside in the field of Bioinformatics here at the Earlham Institute (EI) writing software to a diet of tea and loud guitars, both listened to and played.")]),e._v(" "),a("p",[e._v("Xingdong Bian is a member of the "),a("a",{attrs:{href:"http://www.earlham.ac.uk/davey-group",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Infrastructure group"),a("OutboundLink")],1),e._v(", he joined the Earlham Institute in January 2010 and was involved in the development of EI’s Laboratory Information Management System (MISO) and the TGAC Browser. He has worked on solutions for data visualisation, managing servers, genomic databases and bioinformatics tools. Xingdong is now working mainly on the Grassroots project as a research software engineer. He has a BSc in Computer Science from the University of Sheffield and a MSc in Software Engineering from the University of York.")]),e._v(" "),a("p",[e._v("Robert Davey leads the Data Infrastructure group at the Earlham Institute and is the PI for the Grassroots project. He has a PhD in Computer Science from the University of East Anglia, undertaken at the Roberts lab in the "),a("a",{attrs:{href:"http://www.ncyc.co.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("National Collection of Yeast Cultures"),a("OutboundLink")],1),e._v(". Rob leads a number of large computing infrastructure development and deployment projects, is a certified "),a("a",{attrs:{href:"https://software-carpentry.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Software Carpentry"),a("OutboundLink")],1),e._v(" Instructor and Trainer, an editorial board member for Nature Scientific Data, and a "),a("a",{attrs:{href:"https://www.software.ac.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Software Sustainability Institute"),a("OutboundLink")],1),e._v(" Fellow.")]),e._v(" "),a("p",[e._v("Together Xingdong and I work in Robert Davey’s team at the Earlham Institute developing Grassroots. This is a set of middleware tools for sharing bioinformatics data and services so that users and developers can do scientific analyses as easily as possible.")]),e._v(" "),a("h2",{attrs:{id:"how-did-you-first-hear-about-frictionless-data"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data?")]),e._v(" "),a("p",[e._v("We have always been big believers in the FAIR data principles and when we saw a tweet about the Frictionless Data tool fund, the more that we read about it, the more it seemed to be exactly what we were after! Even without the fund, it is likely to have been something that we would have looked to implement anyway.")]),e._v(" "),a("h2",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),a("p",[e._v("As part of the Designing Future Wheat (DFW) project, we currently have two different repositories: the "),a("a",{attrs:{href:"https://opendata.earlham.ac.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("DFW data portal"),a("OutboundLink")],1),e._v(", using "),a("a",{attrs:{href:"https://irods.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("iRODS"),a("OutboundLink")],1),e._v(" with "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("mod_eirods_dav"),a("OutboundLink")],1),e._v(", and a "),a("a",{attrs:{href:"https://ckan.grassroots.tools/",target:"_blank",rel:"noopener noreferrer"}},[e._v("digital repository"),a("OutboundLink")],1),e._v(" using "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(". Both of these contain a wide variety of heterogeneous data such as genetic sequences, field trial experiment results, images, spreadsheets, publications, etc., and we are trying to standardise how to expose these datasets and their associated metadata. This is where Frictionless Data comes in! The ability to have consistent methods of accessing this information should make it easier for other researchers and data scientists to access and do some great work with all of this data.")]),e._v(" "),a("h2",{attrs:{id:"how-can-the-open-data-open-source-or-open-science-communities-engage-with-the-work-you-are-doing"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-or-open-science-communities-engage-with-the-work-you-are-doing"}},[e._v("#")]),e._v(" How can the open data, open source, or open science communities engage with the work you are doing?")]),e._v(" "),a("p",[e._v("We firmly believe in open source and open data and everything that we create is freely available. We plan to build a selection of Frictionless Data tools and make them available on our existing data portals so people can try them out and give any feedback. These will be rolled out incrementally so that progress is visible from early on. Our initial set of work will focus on extending the DFW data portal that uses one of our existing tools, eirods-dav ("),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/billyfish/eirods-dav"),a("OutboundLink")],1),e._v(") which is a tool for exposing the data in an iRODS repository in a user-friendly way with rich APIs for developers and data scientists too. So if anyone has any feedback, ideas, suggestions, rants 😃, please raise an issue at the GitHub repo; the more, the merrier!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[105],{638:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This grantee profile features Simon Tyrrell, Xingdong Bian, and Robert Davey for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")])]),e._v(" "),a("h2",{attrs:{id:"meet-the-grassroots-team"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#meet-the-grassroots-team"}},[e._v("#")]),e._v(" Meet the Grassroots team")]),e._v(" "),a("p",[e._v("Hi I’m Simon Tyrrell and I’m a research software engineer having spent most of my career in academia. My first degree was in Maths and I did my PhD in Cheminformatics, both done at the University of Sheffield. After some postdoctoral fellowships in Computational Chemistry, I now happily reside in the field of Bioinformatics here at the Earlham Institute (EI) writing software to a diet of tea and loud guitars, both listened to and played.")]),e._v(" "),a("p",[e._v("Xingdong Bian is a member of the "),a("a",{attrs:{href:"http://www.earlham.ac.uk/davey-group",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Infrastructure group"),a("OutboundLink")],1),e._v(", he joined the Earlham Institute in January 2010 and was involved in the development of EI’s Laboratory Information Management System (MISO) and the TGAC Browser. He has worked on solutions for data visualisation, managing servers, genomic databases and bioinformatics tools. Xingdong is now working mainly on the Grassroots project as a research software engineer. He has a BSc in Computer Science from the University of Sheffield and a MSc in Software Engineering from the University of York.")]),e._v(" "),a("p",[e._v("Robert Davey leads the Data Infrastructure group at the Earlham Institute and is the PI for the Grassroots project. He has a PhD in Computer Science from the University of East Anglia, undertaken at the Roberts lab in the "),a("a",{attrs:{href:"http://www.ncyc.co.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("National Collection of Yeast Cultures"),a("OutboundLink")],1),e._v(". Rob leads a number of large computing infrastructure development and deployment projects, is a certified "),a("a",{attrs:{href:"https://software-carpentry.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Software Carpentry"),a("OutboundLink")],1),e._v(" Instructor and Trainer, an editorial board member for Nature Scientific Data, and a "),a("a",{attrs:{href:"https://www.software.ac.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Software Sustainability Institute"),a("OutboundLink")],1),e._v(" Fellow.")]),e._v(" "),a("p",[e._v("Together Xingdong and I work in Robert Davey’s team at the Earlham Institute developing Grassroots. This is a set of middleware tools for sharing bioinformatics data and services so that users and developers can do scientific analyses as easily as possible.")]),e._v(" "),a("h2",{attrs:{id:"how-did-you-first-hear-about-frictionless-data"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data?")]),e._v(" "),a("p",[e._v("We have always been big believers in the FAIR data principles and when we saw a tweet about the Frictionless Data tool fund, the more that we read about it, the more it seemed to be exactly what we were after! Even without the fund, it is likely to have been something that we would have looked to implement anyway.")]),e._v(" "),a("h2",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),a("p",[e._v("As part of the Designing Future Wheat (DFW) project, we currently have two different repositories: the "),a("a",{attrs:{href:"https://opendata.earlham.ac.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("DFW data portal"),a("OutboundLink")],1),e._v(", using "),a("a",{attrs:{href:"https://irods.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("iRODS"),a("OutboundLink")],1),e._v(" with "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("mod_eirods_dav"),a("OutboundLink")],1),e._v(", and a "),a("a",{attrs:{href:"https://ckan.grassroots.tools/",target:"_blank",rel:"noopener noreferrer"}},[e._v("digital repository"),a("OutboundLink")],1),e._v(" using "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(". Both of these contain a wide variety of heterogeneous data such as genetic sequences, field trial experiment results, images, spreadsheets, publications, etc., and we are trying to standardise how to expose these datasets and their associated metadata. This is where Frictionless Data comes in! The ability to have consistent methods of accessing this information should make it easier for other researchers and data scientists to access and do some great work with all of this data.")]),e._v(" "),a("h2",{attrs:{id:"how-can-the-open-data-open-source-or-open-science-communities-engage-with-the-work-you-are-doing"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-or-open-science-communities-engage-with-the-work-you-are-doing"}},[e._v("#")]),e._v(" How can the open data, open source, or open science communities engage with the work you are doing?")]),e._v(" "),a("p",[e._v("We firmly believe in open source and open data and everything that we create is freely available. We plan to build a selection of Frictionless Data tools and make them available on our existing data portals so people can try them out and give any feedback. These will be rolled out incrementally so that progress is visible from early on. Our initial set of work will focus on extending the DFW data portal that uses one of our existing tools, eirods-dav ("),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/billyfish/eirods-dav"),a("OutboundLink")],1),e._v(") which is a tool for exposing the data in an iRODS repository in a user-friendly way with rich APIs for developers and data scientists too. So if anyone has any feedback, ideas, suggestions, rants 😃, please raise an issue at the GitHub repo; the more, the merrier!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/106.3819cf66.js b/assets/js/106.b974b400.js similarity index 98% rename from assets/js/106.3819cf66.js rename to assets/js/106.b974b400.js index 5260bf02e..ed170a288 100644 --- a/assets/js/106.3819cf66.js +++ b/assets/js/106.b974b400.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[106],{640:function(t,e,r){"use strict";r.r(e);var o=r(29),a=Object(o.a)({},(function(){var t=this,e=t.$createElement,r=t._self._c||e;return r("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[r("p",[t._v("We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),r("p",[r("img",{attrs:{src:"/img/blog/community.jpg",alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),r("p",[t._v("The hangout is scheduled to hold on "),r("strong",[t._v("27th August 2020 at 5 pm BST / 4 PM UTC")]),t._v(". If you would like to attend the hangout, "),r("a",{attrs:{href:"https://forms.gle/3wEGBy2q4Q6pdNfK8",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event using this form"),r("OutboundLink")],1)]),t._v(" "),r("p",[t._v("Looking forward to seeing you there!")]),t._v(" "),r("h2",{attrs:{id:"community-hangout-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#community-hangout-recording"}},[t._v("#")]),t._v(" Community Hangout Recording")]),t._v(" "),r("p",[t._v("If you missed the community hangout and would like to catch up on what was discussed, here’s a recording of the hangout.")]),t._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/HezM-DPB2v4",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),r("p",[t._v("Here is a short summary of what we were up to:")]),t._v(" "),r("ul",[r("li",[t._v("An RFC (request for comments) we are working on and other tools:\n"),r("ul",[r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/project/blob/master/rfcs/0006-software-structure.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("Restructuring Libraries to Drivers and Toolkits"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/tabulator-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("tabulator-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/datopian/data.js",target:"_blank",rel:"noopener noreferrer"}},[t._v("data.js"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/datopian/jsv",target:"_blank",rel:"noopener noreferrer"}},[t._v("jsv"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("frictionless-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-swift",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-swift"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-swift",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-swift"),r("OutboundLink")],1)])])]),t._v(" "),r("li",[t._v("Discussion on Google Analytics vs alternatives, some of them being open-source.")])]),t._v(" "),r("h2",{attrs:{id:"technical-presentation-on-frictionless-py"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#technical-presentation-on-frictionless-py"}},[t._v("#")]),t._v(" Technical presentation on frictionless-py")]),t._v(" "),r("p",[t._v("We also made available a technical presentation of a new tool we are working on: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("frictionless-py"),r("OutboundLink")],1),t._v(". If you would like to delve deeper into the nuts and bolts of it, here it is for your enjoyment!")]),t._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/VPnC8cc6ly0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[106],{637:function(t,e,r){"use strict";r.r(e);var o=r(29),a=Object(o.a)({},(function(){var t=this,e=t.$createElement,r=t._self._c||e;return r("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[r("p",[t._v("We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),r("p",[r("img",{attrs:{src:"/img/blog/community.jpg",alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),r("p",[t._v("The hangout is scheduled to hold on "),r("strong",[t._v("27th August 2020 at 5 pm BST / 4 PM UTC")]),t._v(". If you would like to attend the hangout, "),r("a",{attrs:{href:"https://forms.gle/3wEGBy2q4Q6pdNfK8",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event using this form"),r("OutboundLink")],1)]),t._v(" "),r("p",[t._v("Looking forward to seeing you there!")]),t._v(" "),r("h2",{attrs:{id:"community-hangout-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#community-hangout-recording"}},[t._v("#")]),t._v(" Community Hangout Recording")]),t._v(" "),r("p",[t._v("If you missed the community hangout and would like to catch up on what was discussed, here’s a recording of the hangout.")]),t._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/HezM-DPB2v4",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),r("p",[t._v("Here is a short summary of what we were up to:")]),t._v(" "),r("ul",[r("li",[t._v("An RFC (request for comments) we are working on and other tools:\n"),r("ul",[r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/project/blob/master/rfcs/0006-software-structure.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("Restructuring Libraries to Drivers and Toolkits"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/tabulator-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("tabulator-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/datopian/data.js",target:"_blank",rel:"noopener noreferrer"}},[t._v("data.js"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/datopian/jsv",target:"_blank",rel:"noopener noreferrer"}},[t._v("jsv"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("frictionless-py"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-swift",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-swift"),r("OutboundLink")],1)]),t._v(" "),r("li",[r("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-swift",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-swift"),r("OutboundLink")],1)])])]),t._v(" "),r("li",[t._v("Discussion on Google Analytics vs alternatives, some of them being open-source.")])]),t._v(" "),r("h2",{attrs:{id:"technical-presentation-on-frictionless-py"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#technical-presentation-on-frictionless-py"}},[t._v("#")]),t._v(" Technical presentation on frictionless-py")]),t._v(" "),r("p",[t._v("We also made available a technical presentation of a new tool we are working on: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("frictionless-py"),r("OutboundLink")],1),t._v(". If you would like to delve deeper into the nuts and bolts of it, here it is for your enjoyment!")]),t._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/VPnC8cc6ly0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/107.1d7289ba.js b/assets/js/107.224beb5f.js similarity index 99% rename from assets/js/107.1d7289ba.js rename to assets/js/107.224beb5f.js index 8387eefb2..57dc23078 100644 --- a/assets/js/107.1d7289ba.js +++ b/assets/js/107.224beb5f.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[107],{638:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to introduce the newest Fellows for Cohort 2 of the Frictionless Data "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducible Research Fellows Programme"),a("OutboundLink")],1),e._v("! Over the next nine months, these eight early career researchers will be learning about open science, data management, and how to use Frictionless Data tooling in their work to make their data more open and their research more reusable. As an introduction, each Fellow has written a short blog about themselves and their goals. Read below to meet the Fellows and click on their individual blogs to learn more about them!")]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/katerina-fellow.jpg",alt:"Katerina picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi everyone, my name is Katerina Drakoulaki")]),e._v(", I am from Greece and Cyprus, and I’m currently doing my PhD at the National and Kapodistrian University of Athens. My PhD combines all my interests: linguistics, language disorders, music cognition, and working with children! Research reproducibility is important in order to reliably identify and provide intervention to children with difficulties. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-katerina/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Katerina here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/evelyn-fellow.jpg",alt:"Evelyn picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello everybody! I’m Evelyn Night")]),e._v(", an MSc student at the "),a("a",{attrs:{href:"https://www.uonbi.ac.ke/",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of Nairobi"),a("OutboundLink")],1),e._v(" and a research fellow at the "),a("a",{attrs:{href:"http://www.icipe.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("International Center of Insect Physiology and Ecology"),a("OutboundLink")],1),e._v(". Growing up in a tiny village in Kano plains of Western Kenya, I always had a passion for learning. Fast forward through the years I find my way into academia pursuing a master’s degree and characterizing insect pollinator communities using morphometric and molecular tools for my thesis. My goal is to improve agricultural research capacity in the country and to also enhance formation of policies that would ensure increase in agricultural productivity. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-evelyn/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Evelyn here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/dani-fellow.png",alt:"Dani picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi everyone! I’m Dani")]),e._v(", a cognitive neuroscientist and open science enthusiast. I live and work in San Sebastian, a beautiful city by the sea in northern Spain. We have a responsibility to overcome the current incentive system in the Academy to provide more honest, accessible, and quality research. I look forward to learning more about Frictionless Data tools and incorporating them into my work so that my research is open to everyone. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-dani/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dani here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/kate-fellow.png",alt:"Kate picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello hi! I’m Kate Bowie")]),e._v(", a 28-year-old midwesterner studying the human microbiome, or the collection of bacteria that live in and on the human body. As I dive deeper into the field of microbiome science, I am becoming an advocate for putting resources and time into improving research reproducibility. I wanted to become a Frictionless Fellow so that I could learn tools to help microbiome science data workflows become more reproducible and engage in the open science community. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-kate/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Kate here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/sam-fellow.jpeg",alt:"Sam picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello! My name is Sam Wilairat.")]),e._v(" I am currently earning a Master of Library and Information Science degree (MLIS) and have an interest in data librarianship. As a fellow, I’m hoping to learn frictionless data principles and tools to ultimately promote them at my institution via education and outreach to researchers. I believe Open Science is the future and the more people embrace it, the more equitable and innovative research will be! Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-sam/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Sam here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/anne-fellow.png",alt:"Anne picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hey everyone, I’m Anne!")]),e._v(" I’m a graduate student based in Geneva, Switzerland that was born and bred in a few places across the United States (including New York, Chicago, Houston, and Washington DC!). Here in Switzerland, I study international institutions with the eye of an anthropologist or sociologist, through long-term ethnographic research. I’m excited to learn how to apply the Frictionless Data tools in my work throughout these nine months, and to experiment with new forms of conveying social science research in the process.")]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/ritwik-fellow.png",alt:"Ritwik picture",height:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi Ritwik here!")]),e._v(" I am based near Delhi, India and am doing my masters in Sustainable buildings, Energy conservation and Climate Change from International Institute of Information Technology Hyderabad. It is very important that the research which is carried in this domain is reproducible and available to all so we can use it to spread awareness among people. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-ritwik/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Ritwik here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/jacqueline-fellow.jpg",alt:"Jacqueline picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi! My name is Jacqueline.")]),e._v(" I am a Master’s Candidate and Interdisciplinary Innovation Fellow in the Department of Computer and Information Science at the University of Pennsylvania. I applied to be a Reproducible Research Fellow to build space into my research process for actively exploring open science and reproducibility issues. As a scientist, I consider it an obligation to share my knowledge as widely and freely as possible and to ensure that my findings can be vetted through replication studies and other important checks. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-jacqueline/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Jacqueline here."),a("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[107],{640:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to introduce the newest Fellows for Cohort 2 of the Frictionless Data "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducible Research Fellows Programme"),a("OutboundLink")],1),e._v("! Over the next nine months, these eight early career researchers will be learning about open science, data management, and how to use Frictionless Data tooling in their work to make their data more open and their research more reusable. As an introduction, each Fellow has written a short blog about themselves and their goals. Read below to meet the Fellows and click on their individual blogs to learn more about them!")]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/katerina-fellow.jpg",alt:"Katerina picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi everyone, my name is Katerina Drakoulaki")]),e._v(", I am from Greece and Cyprus, and I’m currently doing my PhD at the National and Kapodistrian University of Athens. My PhD combines all my interests: linguistics, language disorders, music cognition, and working with children! Research reproducibility is important in order to reliably identify and provide intervention to children with difficulties. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-katerina/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Katerina here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/evelyn-fellow.jpg",alt:"Evelyn picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello everybody! I’m Evelyn Night")]),e._v(", an MSc student at the "),a("a",{attrs:{href:"https://www.uonbi.ac.ke/",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of Nairobi"),a("OutboundLink")],1),e._v(" and a research fellow at the "),a("a",{attrs:{href:"http://www.icipe.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("International Center of Insect Physiology and Ecology"),a("OutboundLink")],1),e._v(". Growing up in a tiny village in Kano plains of Western Kenya, I always had a passion for learning. Fast forward through the years I find my way into academia pursuing a master’s degree and characterizing insect pollinator communities using morphometric and molecular tools for my thesis. My goal is to improve agricultural research capacity in the country and to also enhance formation of policies that would ensure increase in agricultural productivity. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-evelyn/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Evelyn here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/dani-fellow.png",alt:"Dani picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi everyone! I’m Dani")]),e._v(", a cognitive neuroscientist and open science enthusiast. I live and work in San Sebastian, a beautiful city by the sea in northern Spain. We have a responsibility to overcome the current incentive system in the Academy to provide more honest, accessible, and quality research. I look forward to learning more about Frictionless Data tools and incorporating them into my work so that my research is open to everyone. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-dani/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dani here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/kate-fellow.png",alt:"Kate picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello hi! I’m Kate Bowie")]),e._v(", a 28-year-old midwesterner studying the human microbiome, or the collection of bacteria that live in and on the human body. As I dive deeper into the field of microbiome science, I am becoming an advocate for putting resources and time into improving research reproducibility. I wanted to become a Frictionless Fellow so that I could learn tools to help microbiome science data workflows become more reproducible and engage in the open science community. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-kate/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Kate here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/sam-fellow.jpeg",alt:"Sam picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hello! My name is Sam Wilairat.")]),e._v(" I am currently earning a Master of Library and Information Science degree (MLIS) and have an interest in data librarianship. As a fellow, I’m hoping to learn frictionless data principles and tools to ultimately promote them at my institution via education and outreach to researchers. I believe Open Science is the future and the more people embrace it, the more equitable and innovative research will be! Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-sam/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Sam here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/anne-fellow.png",alt:"Anne picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hey everyone, I’m Anne!")]),e._v(" I’m a graduate student based in Geneva, Switzerland that was born and bred in a few places across the United States (including New York, Chicago, Houston, and Washington DC!). Here in Switzerland, I study international institutions with the eye of an anthropologist or sociologist, through long-term ethnographic research. I’m excited to learn how to apply the Frictionless Data tools in my work throughout these nine months, and to experiment with new forms of conveying social science research in the process.")]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/ritwik-fellow.png",alt:"Ritwik picture",height:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi Ritwik here!")]),e._v(" I am based near Delhi, India and am doing my masters in Sustainable buildings, Energy conservation and Climate Change from International Institute of Information Technology Hyderabad. It is very important that the research which is carried in this domain is reproducible and available to all so we can use it to spread awareness among people. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-ritwik/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Ritwik here."),a("OutboundLink")],1)]),e._v(" "),a("hr"),e._v(" "),a("img",{attrs:{src:"/img/blog/jacqueline-fellow.jpg",alt:"Jacqueline picture",width:"100px",align:"right"}}),e._v(" "),a("p",[a("strong",[e._v("Hi! My name is Jacqueline.")]),e._v(" I am a Master’s Candidate and Interdisciplinary Innovation Fellow in the Department of Computer and Information Science at the University of Pennsylvania. I applied to be a Reproducible Research Fellow to build space into my research process for actively exploring open science and reproducibility issues. As a scientist, I consider it an obligation to share my knowledge as widely and freely as possible and to ensure that my findings can be vetted through replication studies and other important checks. Read more about "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-jacqueline/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Jacqueline here."),a("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/109.73c03972.js b/assets/js/109.d3f9e9d9.js similarity index 99% rename from assets/js/109.73c03972.js rename to assets/js/109.d3f9e9d9.js index 0bfe20077..d2aad15b7 100644 --- a/assets/js/109.73c03972.js +++ b/assets/js/109.d3f9e9d9.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[109],{642:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("h2",{attrs:{id:"frictionless-framework"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-framework"}},[e._v("#")]),e._v(" Frictionless Framework")]),e._v(" "),a("p",[e._v("We are excited to announce our new high-level Python framework, frictionless-py: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py"),a("OutboundLink")],1),e._v(". Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny: "),a("a",{attrs:{href:"https://youtu.be/VPnC8cc6ly0",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://youtu.be/VPnC8cc6ly0"),a("OutboundLink")],1)]),e._v(" "),a("h2",{attrs:{id:"why-did-we-write-new-python-code"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-did-we-write-new-python-code"}},[e._v("#")]),e._v(" Why did we write new Python code?")]),e._v(" "),a("p",[e._v("Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries ("),a("code",[e._v("datapackage")]),e._v(","),a("code",[e._v("goodtables")]),e._v(", "),a("code",[e._v("tableschema")]),e._v(","),a("code",[e._v("tabulator")]),e._v(") were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands - mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is "),a("code",[e._v("frictionless-py")]),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"what-happens-to-the-old-python-code-datapackage-py-goodtables-py-tableschema-py-tabulator-py-how-does-this-affect-current-users"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-happens-to-the-old-python-code-datapackage-py-goodtables-py-tableschema-py-tabulator-py-how-does-this-affect-current-users"}},[e._v("#")]),e._v(" What happens to the old Python code ("),a("code",[e._v("datapackage-py")]),e._v(", "),a("code",[e._v("goodtables-py")]),e._v(", "),a("code",[e._v("tableschema-py")]),e._v(", "),a("code",[e._v("tabulator-py")]),e._v(")? How does this affect current users?")]),e._v(" "),a("p",[a("code",[e._v("Datapackage-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py#datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v("), "),a("code",[e._v("tableschema-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py#tableschema-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v("), "),a("code",[e._v("tabulator-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tabulator-py#tabulator-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v(") still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on "),a("code",[e._v("frictionless-py")]),e._v(", and encourage you to consider starting to experiment with or work with "),a("code",[e._v("frictionless-py")]),e._v(" during the last months of 2020 and migrate to it starting from 2021 "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/migration-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("(here is our migration guide)"),a("OutboundLink")],1),e._v(". The one important thing to note is that "),a("code",[e._v("goodtables-py")]),e._v(" has been subsumed by "),a("code",[e._v("frictionless-py")]),e._v(" (since version 3 of Goodtables). We will continue to bug-fix "),a("code",[e._v("goodtables@2.x")]),e._v(" in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py/tree/goodtables",target:"_blank",rel:"noopener noreferrer"}},[e._v("this branch"),a("OutboundLink")],1),e._v(" and it is also still available on "),a("a",{attrs:{href:"https://pypi.org/project/goodtables/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PyPi"),a("OutboundLink")],1),e._v(" as it was before. Please note that "),a("code",[e._v("frictionless@3.x")]),e._v(" version’s API is not stable as we are continuing to work on it at the moment. We will release "),a("code",[e._v("frictionless@4.x")]),e._v(" by the end of 2020 to be the first SemVer/stable version.")]),e._v(" "),a("h2",{attrs:{id:"what-does-frictionless-py-do"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-does-frictionless-py-do"}},[e._v("#")]),e._v(" What does "),a("code",[e._v("frictionless-py")]),e._v(" do?")]),e._v(" "),a("p",[a("code",[e._v("Frictionless-py")]),e._v(" has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods.")]),e._v(" "),a("p",[a("em",[e._v("Describe your data")]),e._v(": You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details.")]),e._v(" "),a("p",[a("em",[e._v("Extract your data")]),e._v(": You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others.")]),e._v(" "),a("p",[a("em",[e._v("Validate your data")]),e._v(": You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process.")]),e._v(" "),a("p",[a("em",[e._v("Transform your data")]),e._v(": You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data.")]),e._v(" "),a("p",[e._v("Additional features:")]),e._v(" "),a("ul",[a("li",[e._v("Powerful Python framework")]),e._v(" "),a("li",[e._v("Convenient command-line interface")]),e._v(" "),a("li",[e._v("Low memory consumption for data of any size")]),e._v(" "),a("li",[e._v("Reasonable performance on big data")]),e._v(" "),a("li",[e._v("Support for compressed files")]),e._v(" "),a("li",[e._v("Custom checks and formats")]),e._v(" "),a("li",[e._v("Fully pluggable architecture")]),e._v(" "),a("li",[e._v("The included API server")]),e._v(" "),a("li",[e._v("More than 1000+ tests")])]),e._v(" "),a("h2",{attrs:{id:"how-can-users-get-started"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-users-get-started"}},[e._v("#")]),e._v(" How can users get started?")]),e._v(" "),a("p",[e._v("We recommend that you begin by reading the "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1VyDx6C3pxF3Vab8MxH_sI86OTSNmYuDJ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting Started Guide"),a("OutboundLink")],1),e._v(" and the "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1HGXJa7BWyEgoGZLkC6tKt2DMqgeHibEY",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction Guide"),a("OutboundLink")],1),e._v(". We also have in depth documentation for "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1eIq1ZTUntJplRxkGHxmqlxZ0zyXCm0wU",target:"_blank",rel:"noopener noreferrer"}},[e._v("Describing Data"),a("OutboundLink")],1),e._v(", "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1is_PcpzFl42aWI2B2tHaBGj3jxsKZ_eZ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Extracting Data"),a("OutboundLink")],1),e._v(", "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1cJSZlG_v6OI3I2FtnXdKOSPjhwZNjMK1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating Data"),a("OutboundLink")],1),e._v(", and "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1C4dFWDExyxzGIwLUovrDQZghZK4JK2PD",target:"_blank",rel:"noopener noreferrer"}},[e._v("Transforming Data"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"how-can-you-give-us-feedback"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-give-us-feedback"}},[e._v("#")]),e._v(" How can you give us feedback?")]),e._v(" "),a("p",[e._v("What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or by opening an issue in the "),a("code",[e._v("frictionless-py")]),e._v(" repo: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"faq-s"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#faq-s"}},[e._v("#")]),e._v(" FAQ’s")]),e._v(" "),a("h3",{attrs:{id:"where-s-the-documentation"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#where-s-the-documentation"}},[e._v("#")]),e._v(" Where’s the documentation?")]),e._v(" "),a("p",[e._v("Are you a new user? Start here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/getting-started/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting Started"),a("OutboundLink")],1),e._v(" & "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/introduction-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction Guide"),a("OutboundLink")],1),a("br"),e._v("\nAre you an existing user? Start here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/migration-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Migration Guide"),a("OutboundLink")],1),a("br"),e._v("\nThe full list of documentation can be found here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py#documentation",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py#documentation"),a("OutboundLink")],1)]),e._v(" "),a("h3",{attrs:{id:"what-s-the-difference-between-datapackage-and-frictionless"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-the-difference-between-datapackage-and-frictionless"}},[e._v("#")]),e._v(" What’s the difference between "),a("code",[e._v("datapackage")]),e._v(" and "),a("code",[e._v("frictionless")]),e._v("?")]),e._v(" "),a("p",[e._v("In general, "),a("code",[e._v("frictionless")]),e._v(" is our new generation software while "),a("code",[e._v("tabulator")]),e._v("/"),a("code",[e._v("tableschema")]),e._v("/"),a("code",[e._v("datapackage")]),e._v("/"),a("code",[e._v("goodtables")]),e._v(" are our previous generation software. "),a("code",[e._v("Frictionless")]),e._v(" has a lot of improvements over them. Please see this issue for the full answer and a code example: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues/428",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues/428"),a("OutboundLink")],1)]),e._v(" "),a("h3",{attrs:{id:"i-ve-spotted-a-bug-where-do-i-report-it"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-ve-spotted-a-bug-where-do-i-report-it"}},[e._v("#")]),e._v(" I’ve spotted a bug - where do I report it?")]),e._v(" "),a("p",[e._v("Let us know by opening an issue in the "),a("code",[e._v("frictionless-py")]),e._v(" repo: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),a("OutboundLink")],1),e._v(". For "),a("code",[e._v("tabulator")]),e._v("/"),a("code",[e._v("tableschema")]),e._v("/"),a("code",[e._v("datapackage")]),e._v(" issues, please use the corresponding issue tracker and we will triage it for you. Thanks!")]),e._v(" "),a("h3",{attrs:{id:"i-have-a-question-where-do-i-get-help"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-have-a-question-where-do-i-get-help"}},[e._v("#")]),e._v(" I have a question - where do I get help?")]),e._v(" "),a("p",[e._v("You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://discord.com/invite/j9DNFNw"),a("OutboundLink")],1),e._v(". We also have a Twitter account "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("(@frictionlessd8a)"),a("OutboundLink")],1),e._v(" and community calls where you can come meet the team and ask questions: "),a("a",{attrs:{href:"http://frictionlessdata.io/events/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://frictionlessdata.io/events/"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"i-want-to-help-how-do-i-contribute"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-want-to-help-how-do-i-contribute"}},[e._v("#")]),e._v(" I want to help - how do I contribute?")]),e._v(" "),a("p",[e._v("Amazing, thank you! We always welcome community contributions. Start here ("),a("a",{attrs:{href:"https://frictionlessdata.io/contribute/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/contribute/"),a("OutboundLink")],1),e._v(") and here ("),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/CONTRIBUTING.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/blob/master/CONTRIBUTING.md"),a("OutboundLink")],1),e._v(") and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.")]),e._v(" "),a("h3",{attrs:{id:"additional-links-resources"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#additional-links-resources"}},[e._v("#")]),e._v(" Additional Links/Resources")]),e._v(" "),a("ul",[a("li",[e._v("Intro to "),a("code",[e._v("frictionless-py")]),e._v(" video: "),a("a",{attrs:{href:"https://youtu.be/VPnC8cc6ly0",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://youtu.be/VPnC8cc6ly0"),a("OutboundLink")],1)]),e._v(" "),a("li",[a("code",[e._v("frictionless-py")]),e._v(" repository: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Frictionless Data website: "),a("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[109],{643:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("h2",{attrs:{id:"frictionless-framework"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-framework"}},[e._v("#")]),e._v(" Frictionless Framework")]),e._v(" "),a("p",[e._v("We are excited to announce our new high-level Python framework, frictionless-py: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py"),a("OutboundLink")],1),e._v(". Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny: "),a("a",{attrs:{href:"https://youtu.be/VPnC8cc6ly0",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://youtu.be/VPnC8cc6ly0"),a("OutboundLink")],1)]),e._v(" "),a("h2",{attrs:{id:"why-did-we-write-new-python-code"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-did-we-write-new-python-code"}},[e._v("#")]),e._v(" Why did we write new Python code?")]),e._v(" "),a("p",[e._v("Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries ("),a("code",[e._v("datapackage")]),e._v(","),a("code",[e._v("goodtables")]),e._v(", "),a("code",[e._v("tableschema")]),e._v(","),a("code",[e._v("tabulator")]),e._v(") were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands - mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is "),a("code",[e._v("frictionless-py")]),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"what-happens-to-the-old-python-code-datapackage-py-goodtables-py-tableschema-py-tabulator-py-how-does-this-affect-current-users"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-happens-to-the-old-python-code-datapackage-py-goodtables-py-tableschema-py-tabulator-py-how-does-this-affect-current-users"}},[e._v("#")]),e._v(" What happens to the old Python code ("),a("code",[e._v("datapackage-py")]),e._v(", "),a("code",[e._v("goodtables-py")]),e._v(", "),a("code",[e._v("tableschema-py")]),e._v(", "),a("code",[e._v("tabulator-py")]),e._v(")? How does this affect current users?")]),e._v(" "),a("p",[a("code",[e._v("Datapackage-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py#datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v("), "),a("code",[e._v("tableschema-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py#tableschema-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v("), "),a("code",[e._v("tabulator-py")]),e._v(" (see "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tabulator-py#tabulator-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("details"),a("OutboundLink")],1),e._v(") still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on "),a("code",[e._v("frictionless-py")]),e._v(", and encourage you to consider starting to experiment with or work with "),a("code",[e._v("frictionless-py")]),e._v(" during the last months of 2020 and migrate to it starting from 2021 "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/migration-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("(here is our migration guide)"),a("OutboundLink")],1),e._v(". The one important thing to note is that "),a("code",[e._v("goodtables-py")]),e._v(" has been subsumed by "),a("code",[e._v("frictionless-py")]),e._v(" (since version 3 of Goodtables). We will continue to bug-fix "),a("code",[e._v("goodtables@2.x")]),e._v(" in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py/tree/goodtables",target:"_blank",rel:"noopener noreferrer"}},[e._v("this branch"),a("OutboundLink")],1),e._v(" and it is also still available on "),a("a",{attrs:{href:"https://pypi.org/project/goodtables/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PyPi"),a("OutboundLink")],1),e._v(" as it was before. Please note that "),a("code",[e._v("frictionless@3.x")]),e._v(" version’s API is not stable as we are continuing to work on it at the moment. We will release "),a("code",[e._v("frictionless@4.x")]),e._v(" by the end of 2020 to be the first SemVer/stable version.")]),e._v(" "),a("h2",{attrs:{id:"what-does-frictionless-py-do"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-does-frictionless-py-do"}},[e._v("#")]),e._v(" What does "),a("code",[e._v("frictionless-py")]),e._v(" do?")]),e._v(" "),a("p",[a("code",[e._v("Frictionless-py")]),e._v(" has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods.")]),e._v(" "),a("p",[a("em",[e._v("Describe your data")]),e._v(": You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details.")]),e._v(" "),a("p",[a("em",[e._v("Extract your data")]),e._v(": You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others.")]),e._v(" "),a("p",[a("em",[e._v("Validate your data")]),e._v(": You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process.")]),e._v(" "),a("p",[a("em",[e._v("Transform your data")]),e._v(": You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data.")]),e._v(" "),a("p",[e._v("Additional features:")]),e._v(" "),a("ul",[a("li",[e._v("Powerful Python framework")]),e._v(" "),a("li",[e._v("Convenient command-line interface")]),e._v(" "),a("li",[e._v("Low memory consumption for data of any size")]),e._v(" "),a("li",[e._v("Reasonable performance on big data")]),e._v(" "),a("li",[e._v("Support for compressed files")]),e._v(" "),a("li",[e._v("Custom checks and formats")]),e._v(" "),a("li",[e._v("Fully pluggable architecture")]),e._v(" "),a("li",[e._v("The included API server")]),e._v(" "),a("li",[e._v("More than 1000+ tests")])]),e._v(" "),a("h2",{attrs:{id:"how-can-users-get-started"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-users-get-started"}},[e._v("#")]),e._v(" How can users get started?")]),e._v(" "),a("p",[e._v("We recommend that you begin by reading the "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1VyDx6C3pxF3Vab8MxH_sI86OTSNmYuDJ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting Started Guide"),a("OutboundLink")],1),e._v(" and the "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1HGXJa7BWyEgoGZLkC6tKt2DMqgeHibEY",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction Guide"),a("OutboundLink")],1),e._v(". We also have in depth documentation for "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1eIq1ZTUntJplRxkGHxmqlxZ0zyXCm0wU",target:"_blank",rel:"noopener noreferrer"}},[e._v("Describing Data"),a("OutboundLink")],1),e._v(", "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1is_PcpzFl42aWI2B2tHaBGj3jxsKZ_eZ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Extracting Data"),a("OutboundLink")],1),e._v(", "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1cJSZlG_v6OI3I2FtnXdKOSPjhwZNjMK1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating Data"),a("OutboundLink")],1),e._v(", and "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1C4dFWDExyxzGIwLUovrDQZghZK4JK2PD",target:"_blank",rel:"noopener noreferrer"}},[e._v("Transforming Data"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"how-can-you-give-us-feedback"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-give-us-feedback"}},[e._v("#")]),e._v(" How can you give us feedback?")]),e._v(" "),a("p",[e._v("What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or by opening an issue in the "),a("code",[e._v("frictionless-py")]),e._v(" repo: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"faq-s"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#faq-s"}},[e._v("#")]),e._v(" FAQ’s")]),e._v(" "),a("h3",{attrs:{id:"where-s-the-documentation"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#where-s-the-documentation"}},[e._v("#")]),e._v(" Where’s the documentation?")]),e._v(" "),a("p",[e._v("Are you a new user? Start here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/getting-started/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Getting Started"),a("OutboundLink")],1),e._v(" & "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/introduction-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction Guide"),a("OutboundLink")],1),a("br"),e._v("\nAre you an existing user? Start here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/docs/target/migration-guide/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("Migration Guide"),a("OutboundLink")],1),a("br"),e._v("\nThe full list of documentation can be found here: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py#documentation",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py#documentation"),a("OutboundLink")],1)]),e._v(" "),a("h3",{attrs:{id:"what-s-the-difference-between-datapackage-and-frictionless"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-the-difference-between-datapackage-and-frictionless"}},[e._v("#")]),e._v(" What’s the difference between "),a("code",[e._v("datapackage")]),e._v(" and "),a("code",[e._v("frictionless")]),e._v("?")]),e._v(" "),a("p",[e._v("In general, "),a("code",[e._v("frictionless")]),e._v(" is our new generation software while "),a("code",[e._v("tabulator")]),e._v("/"),a("code",[e._v("tableschema")]),e._v("/"),a("code",[e._v("datapackage")]),e._v("/"),a("code",[e._v("goodtables")]),e._v(" are our previous generation software. "),a("code",[e._v("Frictionless")]),e._v(" has a lot of improvements over them. Please see this issue for the full answer and a code example: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues/428",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues/428"),a("OutboundLink")],1)]),e._v(" "),a("h3",{attrs:{id:"i-ve-spotted-a-bug-where-do-i-report-it"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-ve-spotted-a-bug-where-do-i-report-it"}},[e._v("#")]),e._v(" I’ve spotted a bug - where do I report it?")]),e._v(" "),a("p",[e._v("Let us know by opening an issue in the "),a("code",[e._v("frictionless-py")]),e._v(" repo: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),a("OutboundLink")],1),e._v(". For "),a("code",[e._v("tabulator")]),e._v("/"),a("code",[e._v("tableschema")]),e._v("/"),a("code",[e._v("datapackage")]),e._v(" issues, please use the corresponding issue tracker and we will triage it for you. Thanks!")]),e._v(" "),a("h3",{attrs:{id:"i-have-a-question-where-do-i-get-help"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-have-a-question-where-do-i-get-help"}},[e._v("#")]),e._v(" I have a question - where do I get help?")]),e._v(" "),a("p",[e._v("You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://discord.com/invite/j9DNFNw"),a("OutboundLink")],1),e._v(". We also have a Twitter account "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("(@frictionlessd8a)"),a("OutboundLink")],1),e._v(" and community calls where you can come meet the team and ask questions: "),a("a",{attrs:{href:"http://frictionlessdata.io/events/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://frictionlessdata.io/events/"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"i-want-to-help-how-do-i-contribute"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-want-to-help-how-do-i-contribute"}},[e._v("#")]),e._v(" I want to help - how do I contribute?")]),e._v(" "),a("p",[e._v("Amazing, thank you! We always welcome community contributions. Start here ("),a("a",{attrs:{href:"https://frictionlessdata.io/contribute/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/contribute/"),a("OutboundLink")],1),e._v(") and here ("),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/blob/master/CONTRIBUTING.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/blob/master/CONTRIBUTING.md"),a("OutboundLink")],1),e._v(") and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.")]),e._v(" "),a("h3",{attrs:{id:"additional-links-resources"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#additional-links-resources"}},[e._v("#")]),e._v(" Additional Links/Resources")]),e._v(" "),a("ul",[a("li",[e._v("Intro to "),a("code",[e._v("frictionless-py")]),e._v(" video: "),a("a",{attrs:{href:"https://youtu.be/VPnC8cc6ly0",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://youtu.be/VPnC8cc6ly0"),a("OutboundLink")],1)]),e._v(" "),a("li",[a("code",[e._v("frictionless-py")]),e._v(" repository: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Frictionless Data website: "),a("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/11.cec52a09.js b/assets/js/11.cfbbc0e4.js similarity index 99% rename from assets/js/11.cec52a09.js rename to assets/js/11.cfbbc0e4.js index 9e2ae8712..73303cc9d 100644 --- a/assets/js/11.cec52a09.js +++ b/assets/js/11.cfbbc0e4.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[11],{521:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-1.a484b939.png"},522:function(e,t){e.exports=""},523:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-3.f1a972a0.png"},524:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-4.7862ba68.png"},525:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-5.8895357e.png"},526:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-6.8f8bc957.png"},702:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("em",[e._v("Originally published on: "),o("a",{attrs:{href:"https://www.mysociety.org/2022/09/13/publishing-and-analysing-data-our-workflow/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.mysociety.org/2022/09/13/publishing-and-analysing-data-our-workflow/"),o("OutboundLink")],1)])]),e._v(" "),o("p",[e._v("I recently blogged about the data "),o("a",{attrs:{href:"https://www.mysociety.org/2022/09/13/we-want-you-to-build-on-our-local-climate-data-tell-us-what-you-need/",target:"_blank",rel:"noopener noreferrer"}},[e._v("we’re publishing and making use of in mySociety’s climate programme"),o("OutboundLink")],1),e._v(" (and how we want to help people make use of it!). This blog post explores behind the scenes how we’re managing that data, using the GitHub ecosystem and Frictionless Data standards to validate and publish data.")]),e._v(" "),o("h2",{attrs:{id:"how-we-re-handling-common-data-analysis-and-data-publishing-tasks"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-we-re-handling-common-data-analysis-and-data-publishing-tasks"}},[e._v("#")]),e._v(" How we’re handling common data analysis and data publishing tasks.")]),e._v(" "),o("p",[e._v("Generally we do all our data analysis in Python and Jupyter notebooks. While we have some analysis using R, we have more Python developers and projects, so this makes it easier for analysis code to be shared and understood between analysis and production projects.")]),e._v(" "),o("p",[e._v("Following the same basic ideas as (and stealing some folder structure from) the "),o("a",{attrs:{href:"https://drivendata.github.io/cookiecutter-data-science/",target:"_blank",rel:"noopener noreferrer"}},[e._v("cookiecutter data science"),o("OutboundLink")],1),e._v(" approach that each small project should live in a separate repository, we have a "),o("a",{attrs:{href:"https://github.com/mysociety/python-data-auto-template",target:"_blank",rel:"noopener noreferrer"}},[e._v("standard repository template"),o("OutboundLink")],1),e._v(" for working with data processing and analysis.")]),e._v(" "),o("p",[e._v("The template defines a folder structure, and standard config files for development in Docker and VS Code. A shared data_common library builds a base Docker image (for faster access to new repos), and common tools and utilities that are shared between projects for dataset management. This includes helpers for managing dataset releases, and for working with our charting theme. The use of Docker means that the development environment and the GitHub Actions environment can be kept in sync – and so processes can easily be shifted to a scheduled task as a GitHub Action.")]),e._v(" "),o("p",[e._v("The advantage of this common library approach is that it is easy to update the set of common tools from each new project, but because each project is pegged to a commit of the common library, new projects get the benefit of advances, while old projects do not need to be updated all the time to keep working.")]),e._v(" "),o("p",[e._v("This process can run end-to-end in GitHub – where the repository is created in GitHub, Codespaces can be used for development, automated testing and building happens with GitHub Actions and the data is published through GitHub Pages. The use of GitHub Actions especially means testing and validation of the data can live on Github’s infrastructure, rather than requiring additional work for each small project on our servers.")]),e._v(" "),o("h2",{attrs:{id:"dataset-management"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#dataset-management"}},[e._v("#")]),e._v(" Dataset management")]),e._v(" "),o("p",[e._v("One of the goals of this data management process is to make it easy to take a dataset we’ve built for our purposes, and make it easily accessible for re-use by others.")]),e._v(" "),o("p",[e._v("The data_common library contains a dataset command line tool – which automates the creation of various config files, publishing, and validation of our data.")]),e._v(" "),o("p",[e._v("Rather than reinventing the wheel, we use the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless data standard"),o("OutboundLink")],1),e._v(" as a way of describing the data. A repo will hold one or more "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),o("OutboundLink")],1),e._v(", which are a collection of "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data resources"),o("OutboundLink")],1),e._v(" (generally a CSV table). The dataset tool detects changes to the data resources, and updates the config files. Changes between config files can then be used for automated version changes.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(521),alt:"mysociety-img-1"}})]),e._v(" "),o("h2",{attrs:{id:"data-integrity"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-integrity"}},[e._v("#")]),e._v(" Data integrity")]),e._v(" "),o("p",[e._v("Leaning on the frictionless standard for basic validation that the structure is right, we use "),o("a",{attrs:{href:"https://docs.pytest.org/en/7.1.x/",target:"_blank",rel:"noopener noreferrer"}},[e._v("pytest"),o("OutboundLink")],1),e._v(" to run additional tests on the data itself. This means we define a set of rules that the dataset should pass (eg ‘all cells in this column contain a value’), and if it doesn’t, the dataset will not validate and will fail to build.")]),e._v(" "),o("p",[e._v("This is especially important because we have datasets that are fed by automated processes, read external Google Sheets, or accept input from other organisations. The "),o("a",{attrs:{href:"https://mysociety.github.io/uk_local_authority_names_and_codes/",target:"_blank",rel:"noopener noreferrer"}},[e._v("local authority codes dataset"),o("OutboundLink")],1),e._v(" has "),o("a",{attrs:{href:"https://github.com/mysociety/uk_local_authority_names_and_codes/tree/main/tests",target:"_blank",rel:"noopener noreferrer"}},[e._v("a number of tests"),o("OutboundLink")],1),e._v(" to check authorities haven’t been unexpectedly deleted, that the start date and end dates make sense, and that only certain kinds of authorities can be designated as the county council or combined authority overlapping with a different authority. This means that when someone submits a change to the source dataset, we can have a certain amount of faith that the dataset is being improved because the automated testing is checking that nothing is obviously broken.")]),e._v(" "),o("p",[e._v("The automated versioning approach means the defined structure of a resource is also a form of automated testing. Generally following the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",target:"_blank",rel:"noopener noreferrer"}},[e._v("semver rules for frictionless data"),o("OutboundLink")],1),e._v(" (exception that adding a new column after the last column is not a major change), the dataset tool will try and determine if a change from the previous version is a MAJOR (backward compatibility breaking), MINOR (new resource, row or column), or PATCH (correcting errors) change. Generally, we want to avoid major changes, and the automated action will throw an error if this happens. If a major change is required, this can be done manually. The fact that external users of the file can peg their usage to a particular major version means that changes can be made knowing nothing is immediately going to break (even if data may become more stale in the long run).")]),e._v(" "),o("p",[o("img",{attrs:{src:a(522),alt:"mysociety-img-2"}})]),e._v(" "),o("h2",{attrs:{id:"data-publishing-and-accessibility"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-publishing-and-accessibility"}},[e._v("#")]),e._v(" Data publishing and accessibility")]),e._v(" "),o("p",[e._v("The frictionless standard allows an optional description for each data column. We make this required, so that each column needs to have been given a human readable description for the dataset to validate successfully. Internally, this is useful as enforcing documentation (and making sure you really understand what units a column is in), and means that it is much easier for external users to understand what is going on.")]),e._v(" "),o("p",[e._v("Previously, we were uploading the CSVs to GitHub repositories and leaving it as that – but GitHub isn’t friendly to non-developers, and clicking a CSV file opens it up in the browser rather than downloading it.")]),e._v(" "),o("p",[e._v("To help make data more accessible, we now publish a small GitHub Pages site for each repo, which allows small static sites to be built from the contents of a repository (the "),o("a",{attrs:{href:"https://everypolitician.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("EveryPolitician project"),o("OutboundLink")],1),e._v(" also used this approach). This means we can have fuller documentation of the data, better analytics on access, sign-posting to surveys, and better sign-posted links to downloading multiple versions of the data.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(523),alt:"mysociety-img-3"}})]),e._v(" "),o("p",[e._v("The automated deployment means we can also very easily create Excel files that packages together all resources in a package into the same file, and include the meta-data information about the dataset, as well as information about how they can tell us about how they’re using it.")]),e._v(" "),o("p",[e._v("Publishing in an Excel format acknowledges a practical reality that lots of people work in Excel. CSVs don’t always load nicely in Excel, and since Excel files can contain multiple sheets, we can add a cover page that makes it easier to use and understand our data by packaging all the explanations inside the file. We still produce both CSVs and XLSX files – and can now do so with very little work.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(524),alt:"mysociety-img-4"}})]),e._v(" "),o("p",[e._v("For developers who are interested in making automated use of the data, we also provide "),o("a",{attrs:{href:"https://github.com/mysociety/mysoc-dataset",target:"_blank",rel:"noopener noreferrer"}},[e._v("a small package"),o("OutboundLink")],1),e._v(" that can be used in Python or as a CLI tool to fetch the data, and instructions on the download page on "),o("a",{attrs:{href:"https://mysociety.github.io/composite_uk_imd/downloads/uk_index_xlsx/latest",target:"_blank",rel:"noopener noreferrer"}},[e._v("how to use it"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("img",{attrs:{src:a(525),alt:"mysociety-img-5"}})]),e._v(" "),o("p",[e._v("At mySociety Towers, we’re fans of "),o("a",{attrs:{href:"https://datasette.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette"),o("OutboundLink")],1),e._v(", a tool for exploring datasets. Simon Willison recently released "),o("a",{attrs:{href:"https://github.com/simonw/datasette-lite",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette Lite"),o("OutboundLink")],1),e._v(", a version that runs entirely in the browser. That means that just by publishing our data as a SQLite file, we can add a link so that people can explore a dataset without leaving the browser. You can even create shareable links for queries: for example, "),o("a",{attrs:{href:"https://lite.datasette.io/?url=https://mysociety.github.io/uk_local_authority_names_and_codes/data/uk_la_past_current/latest/uk_la_past_current.sqlite#/uk_la_past_current/uk_local_authorities_current?_facet=region®ion=Scotland",target:"_blank",rel:"noopener noreferrer"}},[e._v("all current local authorities in Scotland"),o("OutboundLink")],1),e._v(", or "),o("a",{attrs:{href:"https://lite.datasette.io/?url=https://mysociety.github.io/composite_uk_imd/data/uk_index/latest/uk_index.sqlite#/uk_index/la_labels?_sort=label&_facet=label&label=1st+IMD+quintile",target:"_blank",rel:"noopener noreferrer"}},[e._v("local authorities in the most deprived quintile"),o("OutboundLink")],1),e._v(". This lets us do some very rapid prototyping of what a data service might look like, just by packaging up some of the data using our new approach.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(526),alt:"mysociety-img-6"}})]),e._v(" "),o("h2",{attrs:{id:"data-analysis"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-analysis"}},[e._v("#")]),e._v(" Data analysis")]),e._v(" "),o("p",[e._v("Something in use in a few of our repos is the ability to automatically deploy analysis of the dataset when it is updated.")]),e._v(" "),o("p",[e._v("Analysis of the dataset can be designed in a Jupyter notebook (including tables and charts) – and this can be re-run and published on the same GitHub Pages deploy as the data itself. For instance, the "),o("a",{attrs:{href:"https://mysociety.github.io/uk_ruc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("UK Composite Rural Urban Classification"),o("OutboundLink")],1),e._v(" produces "),o("a",{attrs:{href:"https://mysociety.github.io/uk_ruc/analysis/background_and_analysis.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("this analysis"),o("OutboundLink")],1),e._v(". For the moment, this is just replacing previous automatic README creation – but in principle makes it easy for us to create simple, self-updating public charts and analysis of whatever we like.")]),e._v(" "),o("p",[e._v("Bringing it all back together and keeping people to up to date with changes")]),e._v(" "),o("p",[e._v("The one downside of all these datasets living in different repositories is making them easy to discover. To help out with this, we add all data packages to our "),o("a",{attrs:{href:"https://data.mysociety.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.mysociety.org"),o("OutboundLink")],1),e._v(" catalogue (itself a Jekyll site that updates via GitHub Actions) and have started a lightweight "),o("a",{attrs:{href:"https://data.mysociety.org/newsletter",target:"_blank",rel:"noopener noreferrer"}},[e._v("data announcement email list"),o("OutboundLink")],1),e._v(". If you have got this far, and want to see more of our data in future – "),o("a",{attrs:{href:"https://data.mysociety.org/newsletter",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up"),o("OutboundLink")],1),e._v("!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[11],{521:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-1.a484b939.png"},522:function(e,t){e.exports=""},523:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-3.f1a972a0.png"},524:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-4.7862ba68.png"},525:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-5.8895357e.png"},526:function(e,t,a){e.exports=a.p+"assets/img/mysociety-img-6.8f8bc957.png"},703:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("em",[e._v("Originally published on: "),o("a",{attrs:{href:"https://www.mysociety.org/2022/09/13/publishing-and-analysing-data-our-workflow/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.mysociety.org/2022/09/13/publishing-and-analysing-data-our-workflow/"),o("OutboundLink")],1)])]),e._v(" "),o("p",[e._v("I recently blogged about the data "),o("a",{attrs:{href:"https://www.mysociety.org/2022/09/13/we-want-you-to-build-on-our-local-climate-data-tell-us-what-you-need/",target:"_blank",rel:"noopener noreferrer"}},[e._v("we’re publishing and making use of in mySociety’s climate programme"),o("OutboundLink")],1),e._v(" (and how we want to help people make use of it!). This blog post explores behind the scenes how we’re managing that data, using the GitHub ecosystem and Frictionless Data standards to validate and publish data.")]),e._v(" "),o("h2",{attrs:{id:"how-we-re-handling-common-data-analysis-and-data-publishing-tasks"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-we-re-handling-common-data-analysis-and-data-publishing-tasks"}},[e._v("#")]),e._v(" How we’re handling common data analysis and data publishing tasks.")]),e._v(" "),o("p",[e._v("Generally we do all our data analysis in Python and Jupyter notebooks. While we have some analysis using R, we have more Python developers and projects, so this makes it easier for analysis code to be shared and understood between analysis and production projects.")]),e._v(" "),o("p",[e._v("Following the same basic ideas as (and stealing some folder structure from) the "),o("a",{attrs:{href:"https://drivendata.github.io/cookiecutter-data-science/",target:"_blank",rel:"noopener noreferrer"}},[e._v("cookiecutter data science"),o("OutboundLink")],1),e._v(" approach that each small project should live in a separate repository, we have a "),o("a",{attrs:{href:"https://github.com/mysociety/python-data-auto-template",target:"_blank",rel:"noopener noreferrer"}},[e._v("standard repository template"),o("OutboundLink")],1),e._v(" for working with data processing and analysis.")]),e._v(" "),o("p",[e._v("The template defines a folder structure, and standard config files for development in Docker and VS Code. A shared data_common library builds a base Docker image (for faster access to new repos), and common tools and utilities that are shared between projects for dataset management. This includes helpers for managing dataset releases, and for working with our charting theme. The use of Docker means that the development environment and the GitHub Actions environment can be kept in sync – and so processes can easily be shifted to a scheduled task as a GitHub Action.")]),e._v(" "),o("p",[e._v("The advantage of this common library approach is that it is easy to update the set of common tools from each new project, but because each project is pegged to a commit of the common library, new projects get the benefit of advances, while old projects do not need to be updated all the time to keep working.")]),e._v(" "),o("p",[e._v("This process can run end-to-end in GitHub – where the repository is created in GitHub, Codespaces can be used for development, automated testing and building happens with GitHub Actions and the data is published through GitHub Pages. The use of GitHub Actions especially means testing and validation of the data can live on Github’s infrastructure, rather than requiring additional work for each small project on our servers.")]),e._v(" "),o("h2",{attrs:{id:"dataset-management"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#dataset-management"}},[e._v("#")]),e._v(" Dataset management")]),e._v(" "),o("p",[e._v("One of the goals of this data management process is to make it easy to take a dataset we’ve built for our purposes, and make it easily accessible for re-use by others.")]),e._v(" "),o("p",[e._v("The data_common library contains a dataset command line tool – which automates the creation of various config files, publishing, and validation of our data.")]),e._v(" "),o("p",[e._v("Rather than reinventing the wheel, we use the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless data standard"),o("OutboundLink")],1),e._v(" as a way of describing the data. A repo will hold one or more "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),o("OutboundLink")],1),e._v(", which are a collection of "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data resources"),o("OutboundLink")],1),e._v(" (generally a CSV table). The dataset tool detects changes to the data resources, and updates the config files. Changes between config files can then be used for automated version changes.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(521),alt:"mysociety-img-1"}})]),e._v(" "),o("h2",{attrs:{id:"data-integrity"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-integrity"}},[e._v("#")]),e._v(" Data integrity")]),e._v(" "),o("p",[e._v("Leaning on the frictionless standard for basic validation that the structure is right, we use "),o("a",{attrs:{href:"https://docs.pytest.org/en/7.1.x/",target:"_blank",rel:"noopener noreferrer"}},[e._v("pytest"),o("OutboundLink")],1),e._v(" to run additional tests on the data itself. This means we define a set of rules that the dataset should pass (eg ‘all cells in this column contain a value’), and if it doesn’t, the dataset will not validate and will fail to build.")]),e._v(" "),o("p",[e._v("This is especially important because we have datasets that are fed by automated processes, read external Google Sheets, or accept input from other organisations. The "),o("a",{attrs:{href:"https://mysociety.github.io/uk_local_authority_names_and_codes/",target:"_blank",rel:"noopener noreferrer"}},[e._v("local authority codes dataset"),o("OutboundLink")],1),e._v(" has "),o("a",{attrs:{href:"https://github.com/mysociety/uk_local_authority_names_and_codes/tree/main/tests",target:"_blank",rel:"noopener noreferrer"}},[e._v("a number of tests"),o("OutboundLink")],1),e._v(" to check authorities haven’t been unexpectedly deleted, that the start date and end dates make sense, and that only certain kinds of authorities can be designated as the county council or combined authority overlapping with a different authority. This means that when someone submits a change to the source dataset, we can have a certain amount of faith that the dataset is being improved because the automated testing is checking that nothing is obviously broken.")]),e._v(" "),o("p",[e._v("The automated versioning approach means the defined structure of a resource is also a form of automated testing. Generally following the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",target:"_blank",rel:"noopener noreferrer"}},[e._v("semver rules for frictionless data"),o("OutboundLink")],1),e._v(" (exception that adding a new column after the last column is not a major change), the dataset tool will try and determine if a change from the previous version is a MAJOR (backward compatibility breaking), MINOR (new resource, row or column), or PATCH (correcting errors) change. Generally, we want to avoid major changes, and the automated action will throw an error if this happens. If a major change is required, this can be done manually. The fact that external users of the file can peg their usage to a particular major version means that changes can be made knowing nothing is immediately going to break (even if data may become more stale in the long run).")]),e._v(" "),o("p",[o("img",{attrs:{src:a(522),alt:"mysociety-img-2"}})]),e._v(" "),o("h2",{attrs:{id:"data-publishing-and-accessibility"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-publishing-and-accessibility"}},[e._v("#")]),e._v(" Data publishing and accessibility")]),e._v(" "),o("p",[e._v("The frictionless standard allows an optional description for each data column. We make this required, so that each column needs to have been given a human readable description for the dataset to validate successfully. Internally, this is useful as enforcing documentation (and making sure you really understand what units a column is in), and means that it is much easier for external users to understand what is going on.")]),e._v(" "),o("p",[e._v("Previously, we were uploading the CSVs to GitHub repositories and leaving it as that – but GitHub isn’t friendly to non-developers, and clicking a CSV file opens it up in the browser rather than downloading it.")]),e._v(" "),o("p",[e._v("To help make data more accessible, we now publish a small GitHub Pages site for each repo, which allows small static sites to be built from the contents of a repository (the "),o("a",{attrs:{href:"https://everypolitician.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("EveryPolitician project"),o("OutboundLink")],1),e._v(" also used this approach). This means we can have fuller documentation of the data, better analytics on access, sign-posting to surveys, and better sign-posted links to downloading multiple versions of the data.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(523),alt:"mysociety-img-3"}})]),e._v(" "),o("p",[e._v("The automated deployment means we can also very easily create Excel files that packages together all resources in a package into the same file, and include the meta-data information about the dataset, as well as information about how they can tell us about how they’re using it.")]),e._v(" "),o("p",[e._v("Publishing in an Excel format acknowledges a practical reality that lots of people work in Excel. CSVs don’t always load nicely in Excel, and since Excel files can contain multiple sheets, we can add a cover page that makes it easier to use and understand our data by packaging all the explanations inside the file. We still produce both CSVs and XLSX files – and can now do so with very little work.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(524),alt:"mysociety-img-4"}})]),e._v(" "),o("p",[e._v("For developers who are interested in making automated use of the data, we also provide "),o("a",{attrs:{href:"https://github.com/mysociety/mysoc-dataset",target:"_blank",rel:"noopener noreferrer"}},[e._v("a small package"),o("OutboundLink")],1),e._v(" that can be used in Python or as a CLI tool to fetch the data, and instructions on the download page on "),o("a",{attrs:{href:"https://mysociety.github.io/composite_uk_imd/downloads/uk_index_xlsx/latest",target:"_blank",rel:"noopener noreferrer"}},[e._v("how to use it"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("img",{attrs:{src:a(525),alt:"mysociety-img-5"}})]),e._v(" "),o("p",[e._v("At mySociety Towers, we’re fans of "),o("a",{attrs:{href:"https://datasette.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette"),o("OutboundLink")],1),e._v(", a tool for exploring datasets. Simon Willison recently released "),o("a",{attrs:{href:"https://github.com/simonw/datasette-lite",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette Lite"),o("OutboundLink")],1),e._v(", a version that runs entirely in the browser. That means that just by publishing our data as a SQLite file, we can add a link so that people can explore a dataset without leaving the browser. You can even create shareable links for queries: for example, "),o("a",{attrs:{href:"https://lite.datasette.io/?url=https://mysociety.github.io/uk_local_authority_names_and_codes/data/uk_la_past_current/latest/uk_la_past_current.sqlite#/uk_la_past_current/uk_local_authorities_current?_facet=region®ion=Scotland",target:"_blank",rel:"noopener noreferrer"}},[e._v("all current local authorities in Scotland"),o("OutboundLink")],1),e._v(", or "),o("a",{attrs:{href:"https://lite.datasette.io/?url=https://mysociety.github.io/composite_uk_imd/data/uk_index/latest/uk_index.sqlite#/uk_index/la_labels?_sort=label&_facet=label&label=1st+IMD+quintile",target:"_blank",rel:"noopener noreferrer"}},[e._v("local authorities in the most deprived quintile"),o("OutboundLink")],1),e._v(". This lets us do some very rapid prototyping of what a data service might look like, just by packaging up some of the data using our new approach.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(526),alt:"mysociety-img-6"}})]),e._v(" "),o("h2",{attrs:{id:"data-analysis"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data-analysis"}},[e._v("#")]),e._v(" Data analysis")]),e._v(" "),o("p",[e._v("Something in use in a few of our repos is the ability to automatically deploy analysis of the dataset when it is updated.")]),e._v(" "),o("p",[e._v("Analysis of the dataset can be designed in a Jupyter notebook (including tables and charts) – and this can be re-run and published on the same GitHub Pages deploy as the data itself. For instance, the "),o("a",{attrs:{href:"https://mysociety.github.io/uk_ruc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("UK Composite Rural Urban Classification"),o("OutboundLink")],1),e._v(" produces "),o("a",{attrs:{href:"https://mysociety.github.io/uk_ruc/analysis/background_and_analysis.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("this analysis"),o("OutboundLink")],1),e._v(". For the moment, this is just replacing previous automatic README creation – but in principle makes it easy for us to create simple, self-updating public charts and analysis of whatever we like.")]),e._v(" "),o("p",[e._v("Bringing it all back together and keeping people to up to date with changes")]),e._v(" "),o("p",[e._v("The one downside of all these datasets living in different repositories is making them easy to discover. To help out with this, we add all data packages to our "),o("a",{attrs:{href:"https://data.mysociety.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.mysociety.org"),o("OutboundLink")],1),e._v(" catalogue (itself a Jekyll site that updates via GitHub Actions) and have started a lightweight "),o("a",{attrs:{href:"https://data.mysociety.org/newsletter",target:"_blank",rel:"noopener noreferrer"}},[e._v("data announcement email list"),o("OutboundLink")],1),e._v(". If you have got this far, and want to see more of our data in future – "),o("a",{attrs:{href:"https://data.mysociety.org/newsletter",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up"),o("OutboundLink")],1),e._v("!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/110.57ecfd91.js b/assets/js/110.a47179af.js similarity index 99% rename from assets/js/110.57ecfd91.js rename to assets/js/110.a47179af.js index 0f51d016a..2cc1357a5 100644 --- a/assets/js/110.57ecfd91.js +++ b/assets/js/110.a47179af.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[110],{643:function(e,t,a){"use strict";a.r(t);var n=a(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("The theme of this year’s "),a("a",{attrs:{href:"http://www.openaccessweek.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Access Week"),a("OutboundLink")],1),e._v(" is “Open with Purpose: Taking Action to Build Structural Equity and Inclusion”. How can we be more purposeful in the open space? How can we work towards true equity and inclusion? The following blog is a compilation of the Fellows’ thoughts and reflections on this theme.")]),e._v(" "),a("h3",{attrs:{id:"katerina"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#katerina"}},[e._v("#")]),e._v(" Katerina")]),e._v(" "),a("p",[e._v("When I read this year’s theme I wondered how I could relate to it. Inclusion was the word that made it for me. At first I thought about how the Fellowship itself was inclusive for me, a person with a humanities background that had not had the chance to receive any institutional or structured support when it comes to programming and data management. Afterwards, what came to mind is how inclusive are the things I’m currently learning on the programme with regard with the populations I work in my clinical role. Cognitive accessibility is an effort to make online content more accessible to persons with overall cognitive difficulties, that is difficulties with memory, attention, and language. These are not rare difficulties, as they characterize individuals with learning difficulties (developmental language disorder, dyslexia), autism spectrum disorder, attention deficit-hyperactivity disorder (ADHD), dementia, aphasia and other cognitive difficulties following a stroke. I discovered a lot of initiatives and guidelines on how online content could be more accessible: using alternatives to text, such as figures, audio, or a simpler layout, making content appear in predictable ways, giving more time to individuals to interact with the content, focusing on readability of the content among others. In sum, many individuals among us have difficulties accessing online content in an optimal way. More information about what we can do about it "),a("a",{attrs:{href:"https://developer.mozilla.org/en-US/docs/Web/Accessibility/Cognitive_accessibility#WCAG_Guidelines",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://www.w3.org/WAI/cognitive/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"dani"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dani"}},[e._v("#")]),e._v(" Dani")]),e._v(" "),a("p",[e._v("Once again, we see academia and the overall scientific research environment engaged in a discussion about who should bear the costs of scientific publications. Few have welcomed with open arms the "),a("a",{attrs:{href:"https://www.nature.com/articles/d41586-020-02959-1",target:"_blank",rel:"noopener noreferrer"}},[e._v("new agreement"),a("OutboundLink")],1),e._v(" between a few German institutions and the Nature Publishing group. The obvious gap between what the prestigious publishing group demands and what researchers can afford has turn the news into some sort of bad joke. However, it seems that many have accepted by now other relatively cheaper Open Access publishing arrangements. At least, relatively cheaper for them. Research funding is nowadays so scarce and precarious in many countries that a simple article processing charge of 1200€ will prevent researchers from submitting to such journal. No doubt there is good will in those who fight to make the current publishing model more open. However, I can’t help but feel there is a lack of awareness of the financial gap involved in setting an acceptable threshold for article processing charges that are based on the standards of the world’s major economic powers.")]),e._v(" "),a("h3",{attrs:{id:"sam"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#sam"}},[e._v("#")]),e._v(" Sam")]),e._v(" "),a("p",[e._v("Libraries spend an enormous amount of money paying journal subscription fees in order to give their patrons access to cutting edge research. Imagine a world in which paywalls are a thing of the past and these thousands of dollars currently reserved at every library for journal subscription costs could be redistributed. Librarians need to support Open Access and to publicly reject the current systems in place that restrict access to information for the majority of the global community. Librarians should stop and ask themselves, what are the long term effects of supporting the current system? What historic injustices are being perpetuated by paying for standard subscription-based journals? If librarianship is based upon providing equitable service to all information users, supporting Open Access is a necessity.")]),e._v(" "),a("h3",{attrs:{id:"anne"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#anne"}},[e._v("#")]),e._v(" Anne")]),e._v(" "),a("p",[e._v("My colleagues and I have been having interesting discussions about what Open Access means in the context of our respective disciplines, and so many of them have boiled down to funding models, and how to make sure that the (financial) incentives are in the right place. So when I approached these questions of structural equity and inclusion, I wondered how we can balance the ideals of open access that allow for creative collaboration, open knowledge, and more equitable contributions (all things that brought us all together at OKF) with the necessary requirements of funding and the pressure to publish. In my own discipline, these "),a("a",{attrs:{href:"https://savageminds.org/2015/05/27/open-access-what-cultural-anthropology-gets-right-and-american-anthropologist-gets-wrong/",target:"_blank",rel:"noopener noreferrer"}},[e._v("debates have been happening for a long time"),a("OutboundLink")],1),e._v(", and were recently brought to light because of an experimental Open Access journal called "),a("a",{attrs:{href:"https://anthrodendum.org/2018/06/13/hau-is-dead-long-live-oa-initiatives/",target:"_blank",rel:"noopener noreferrer"}},[e._v("HAU"),a("OutboundLink")],1),e._v(", which was founded by the late "),a("a",{attrs:{href:"https://www.nytimes.com/2020/09/04/books/david-graeber-dead.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("David Graeber"),a("OutboundLink")],1),e._v(". Furthermore, as a journalist, I tend not to equate open access with accessibility more generally, because making something available or open doesn’t necessarily mean that it will be used (let alone understood by a wider audience!). This is the integral role that journalism can play within the open access academic community, in my view: through increased data literacy, visualisation tools, and what I call “translation through storytelling”. This is what drew me to #dataviz, and why I’m creating interactive visualisations of human rights data from the United Nations with OKF. While the Universal Periodic Review is well-known for being one of the most inclusive and equitable venues at the UN, few know about it outside of Geneva. So as Open Access Week comes to a close, I’ve been starting to re-think the movement as “open, accessible, fundable, and understandable”. Maybe it’s not as catchy, but it’s what I hope to embody!")]),e._v(" "),a("h3",{attrs:{id:"ritwik"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#ritwik"}},[e._v("#")]),e._v(" Ritwik")]),e._v(" "),a("p",[e._v("Wherever I see terms as ‘Open access’ and ‘Open Science’, I usually think about how we can make changes to the current research environment so as to extract meaning from open research space and allow people to learn more about this and move beyond conventional ‘Research Journals’. One of the ways we can empower structural and racial equity in research is by investing in Open Science infrastructures and services and capacity building for Open Science by including Open translation services and tools like github to lower the language barrier. Not every potential reader of openly available science is fluent in English and Automatic translation is not always correct, but mere information translations can still convey the overall meaning. We can take help from open source development programs to empower organisations like CC Extractor and other local translation free softwares so we can include languages like Spanish, Italian, Hindi, Japanese and other native languages so that everyone is able to break those barriers and understand literature promoted in different languages. Similarly provide sustainable funding mechanisms and foster decentralized, community-owned/-run non-profit open source initiatives in this space. Apply an inclusive, holistic approach to science and research in the sense of Open Scholarship to include human value education, open scholcomm and open education with a view on teaching in the seminar and classroom, etc. - basically the whole variety of research and teaching practices that define academic life, but still remain underrepresented in the larger debate around Open Science.")]),e._v(" "),a("h3",{attrs:{id:"evelyn"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#evelyn"}},[e._v("#")]),e._v(" Evelyn")]),e._v(" "),a("p",[e._v("‘Equity’ and ‘inclusion’ are two words that I know too well given the yawning gaps that exist between the haves and have-nots in the African society. Research indeed is the core of any society, identifying calamities and solving them in the most sustainable of ways. These two words therefore occupy an integral space in the open research arena since structural equity and inclusion would mean that research knowledge is given for free irrespective of any societal construct for productive downstream research. Although open access has been lauded for promoting access to high quality research at no costs, authors have so far faced sky high publishing costs that have quite limited the number of papers that make it to the open "),a("a",{attrs:{href:"https://www.researchprofessionalnews.com/rr-news-africa-pan-african-2020-10-open-access-publishing-fees-a-crisis-for-african-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("especially in low and middle income regions like Africa"),a("OutboundLink")],1),e._v(". The need to subsidize publishing costs to the open space is thus apparent with the overall goal of strengthening research capacity and impactful research especially for such regions to be at par with the rest of the world in research and development. Research societies and governments need to forge bilateral pacts whose main purpose is to encourage open access by introducing waivers on publishing costs and also curbing predatory journals that most often than not derail the reputation of scientists.Indeed the achievement of structural equity and inclusion will require that the authors and users of scientific papers alike get to disseminate and access knowledge for free.")]),e._v(" "),a("h3",{attrs:{id:"kate"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#kate"}},[e._v("#")]),e._v(" Kate")]),e._v(" "),a("p",[e._v("In organizing my ideas for a coherent reflection on the theme of this year’s Open Access Week, I thought of recent news out of the United Kingdom. UK’s National Institute for Health Research (NIHR) recently revealed "),a("a",{attrs:{href:"https://www.nihr.ac.uk/news/nihr-responds-to-the-governments-call-for-further-reduction-in-bureaucracy-with-new-measures/25633",target:"_blank",rel:"noopener noreferrer"}},[e._v("new measures"),a("OutboundLink")],1),e._v(" no longer requiring universities to have memberships to specific charters and concordats to receive grant funding. This may seem like a move towards removing roadblocks for funding, however membership to these charters, such as the "),a("a",{attrs:{href:"https://www.advance-he.ac.uk/equality-charters/athena-swan-charter",target:"_blank",rel:"noopener noreferrer"}},[e._v("Athena SWAN Charter"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://www.advance-he.ac.uk/equality-charters/race-equality-charter",target:"_blank",rel:"noopener noreferrer"}},[e._v("Race Equality Charter"),a("OutboundLink")],1),e._v(", provide universities strategies to identify and address institutional and cultural barriers. In a 2020 world in which the Open Access community picks a theme that specifically mentions “structural equity and inclusion” as its goals, institutes of power, like UK’s NIHR, seem to be tone-deaf by no longer requiring charters to guide them in those structures. I commend the Open Access community for leading the way by prioritizing equity and inclusion in its pursuit to share knowledge, and I believe we should all challenge institutional framework, like UK’s NIHR, to embrace the values of the open access community.")]),e._v(" "),a("p",[e._v("Jacqueline"),a("br"),e._v("\nAs a machine learning researcher, this year’s Open Access Week theme resonates. Open access, structural equity, and inclusion should be explicit goals in artificial intelligence (AI) research. To quote the "),a("a",{attrs:{href:"https://www.ajl.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Algorithmic Justice League"),a("OutboundLink")],1),e._v(", “Technology should serve all of us. Not just the priviledged few.” However, the demographics of the AI community do not reflect societal diversity, and this can allow algorithms to reinforce harmful "),a("a",{attrs:{href:"https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-learning-bias-criminal-justice/",target:"_blank",rel:"noopener noreferrer"}},[e._v("systemic biases"),a("OutboundLink")],1),e._v(". But even if we know who is writing the algorithms that affect our lives, we often don’t know how these predictive systems make their decisions. A recent "),a("a",{attrs:{href:"https://www.nature.com/articles/s41586-020-2766-y",target:"_blank",rel:"noopener noreferrer"}},[e._v("response"),a("OutboundLink")],1),e._v(" to a Google Health "),a("a",{attrs:{href:"https://www.nature.com/articles/s41586-019-1799-6",target:"_blank",rel:"noopener noreferrer"}},[e._v("closed source tool"),a("OutboundLink")],1),e._v(" for breast cancer screening argues that failing to release code and training data undermines the scientific value, transparency, and reproducibility of AI systems. Ironically, however, this well-worded argument lies behind a paywall that limits transparency by design. Competing views on closed access AI publishing are captured in the 2018 "),a("a",{attrs:{href:"https://openaccess.engineering.oregonstate.edu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("boycott"),a("OutboundLink")],1),e._v(" of Nature Machine Intelligence, its "),a("a",{attrs:{href:"https://www.sciencemag.org/news/2018/05/why-are-ai-researchers-boycotting-new-nature-journal-and-shunning-others",target:"_blank",rel:"noopener noreferrer"}},[e._v("coverage"),a("OutboundLink")],1),e._v(" in the scientific media, and the journal’s "),a("a",{attrs:{href:"https://www.nature.com/articles/s42256-020-0144-y",target:"_blank",rel:"noopener noreferrer"}},[e._v("rebuttal"),a("OutboundLink")],1),e._v(". Whether you stand by "),a("a",{attrs:{href:"https://www.coalition-s.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Plan S"),a("OutboundLink")],1),e._v(" or not, open conversations around the ethics of access and transparency are important steps toward safe, equitable, and inclusive AI.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[110],{644:function(e,t,a){"use strict";a.r(t);var n=a(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("The theme of this year’s "),a("a",{attrs:{href:"http://www.openaccessweek.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Access Week"),a("OutboundLink")],1),e._v(" is “Open with Purpose: Taking Action to Build Structural Equity and Inclusion”. How can we be more purposeful in the open space? How can we work towards true equity and inclusion? The following blog is a compilation of the Fellows’ thoughts and reflections on this theme.")]),e._v(" "),a("h3",{attrs:{id:"katerina"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#katerina"}},[e._v("#")]),e._v(" Katerina")]),e._v(" "),a("p",[e._v("When I read this year’s theme I wondered how I could relate to it. Inclusion was the word that made it for me. At first I thought about how the Fellowship itself was inclusive for me, a person with a humanities background that had not had the chance to receive any institutional or structured support when it comes to programming and data management. Afterwards, what came to mind is how inclusive are the things I’m currently learning on the programme with regard with the populations I work in my clinical role. Cognitive accessibility is an effort to make online content more accessible to persons with overall cognitive difficulties, that is difficulties with memory, attention, and language. These are not rare difficulties, as they characterize individuals with learning difficulties (developmental language disorder, dyslexia), autism spectrum disorder, attention deficit-hyperactivity disorder (ADHD), dementia, aphasia and other cognitive difficulties following a stroke. I discovered a lot of initiatives and guidelines on how online content could be more accessible: using alternatives to text, such as figures, audio, or a simpler layout, making content appear in predictable ways, giving more time to individuals to interact with the content, focusing on readability of the content among others. In sum, many individuals among us have difficulties accessing online content in an optimal way. More information about what we can do about it "),a("a",{attrs:{href:"https://developer.mozilla.org/en-US/docs/Web/Accessibility/Cognitive_accessibility#WCAG_Guidelines",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://www.w3.org/WAI/cognitive/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"dani"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dani"}},[e._v("#")]),e._v(" Dani")]),e._v(" "),a("p",[e._v("Once again, we see academia and the overall scientific research environment engaged in a discussion about who should bear the costs of scientific publications. Few have welcomed with open arms the "),a("a",{attrs:{href:"https://www.nature.com/articles/d41586-020-02959-1",target:"_blank",rel:"noopener noreferrer"}},[e._v("new agreement"),a("OutboundLink")],1),e._v(" between a few German institutions and the Nature Publishing group. The obvious gap between what the prestigious publishing group demands and what researchers can afford has turn the news into some sort of bad joke. However, it seems that many have accepted by now other relatively cheaper Open Access publishing arrangements. At least, relatively cheaper for them. Research funding is nowadays so scarce and precarious in many countries that a simple article processing charge of 1200€ will prevent researchers from submitting to such journal. No doubt there is good will in those who fight to make the current publishing model more open. However, I can’t help but feel there is a lack of awareness of the financial gap involved in setting an acceptable threshold for article processing charges that are based on the standards of the world’s major economic powers.")]),e._v(" "),a("h3",{attrs:{id:"sam"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#sam"}},[e._v("#")]),e._v(" Sam")]),e._v(" "),a("p",[e._v("Libraries spend an enormous amount of money paying journal subscription fees in order to give their patrons access to cutting edge research. Imagine a world in which paywalls are a thing of the past and these thousands of dollars currently reserved at every library for journal subscription costs could be redistributed. Librarians need to support Open Access and to publicly reject the current systems in place that restrict access to information for the majority of the global community. Librarians should stop and ask themselves, what are the long term effects of supporting the current system? What historic injustices are being perpetuated by paying for standard subscription-based journals? If librarianship is based upon providing equitable service to all information users, supporting Open Access is a necessity.")]),e._v(" "),a("h3",{attrs:{id:"anne"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#anne"}},[e._v("#")]),e._v(" Anne")]),e._v(" "),a("p",[e._v("My colleagues and I have been having interesting discussions about what Open Access means in the context of our respective disciplines, and so many of them have boiled down to funding models, and how to make sure that the (financial) incentives are in the right place. So when I approached these questions of structural equity and inclusion, I wondered how we can balance the ideals of open access that allow for creative collaboration, open knowledge, and more equitable contributions (all things that brought us all together at OKF) with the necessary requirements of funding and the pressure to publish. In my own discipline, these "),a("a",{attrs:{href:"https://savageminds.org/2015/05/27/open-access-what-cultural-anthropology-gets-right-and-american-anthropologist-gets-wrong/",target:"_blank",rel:"noopener noreferrer"}},[e._v("debates have been happening for a long time"),a("OutboundLink")],1),e._v(", and were recently brought to light because of an experimental Open Access journal called "),a("a",{attrs:{href:"https://anthrodendum.org/2018/06/13/hau-is-dead-long-live-oa-initiatives/",target:"_blank",rel:"noopener noreferrer"}},[e._v("HAU"),a("OutboundLink")],1),e._v(", which was founded by the late "),a("a",{attrs:{href:"https://www.nytimes.com/2020/09/04/books/david-graeber-dead.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("David Graeber"),a("OutboundLink")],1),e._v(". Furthermore, as a journalist, I tend not to equate open access with accessibility more generally, because making something available or open doesn’t necessarily mean that it will be used (let alone understood by a wider audience!). This is the integral role that journalism can play within the open access academic community, in my view: through increased data literacy, visualisation tools, and what I call “translation through storytelling”. This is what drew me to #dataviz, and why I’m creating interactive visualisations of human rights data from the United Nations with OKF. While the Universal Periodic Review is well-known for being one of the most inclusive and equitable venues at the UN, few know about it outside of Geneva. So as Open Access Week comes to a close, I’ve been starting to re-think the movement as “open, accessible, fundable, and understandable”. Maybe it’s not as catchy, but it’s what I hope to embody!")]),e._v(" "),a("h3",{attrs:{id:"ritwik"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#ritwik"}},[e._v("#")]),e._v(" Ritwik")]),e._v(" "),a("p",[e._v("Wherever I see terms as ‘Open access’ and ‘Open Science’, I usually think about how we can make changes to the current research environment so as to extract meaning from open research space and allow people to learn more about this and move beyond conventional ‘Research Journals’. One of the ways we can empower structural and racial equity in research is by investing in Open Science infrastructures and services and capacity building for Open Science by including Open translation services and tools like github to lower the language barrier. Not every potential reader of openly available science is fluent in English and Automatic translation is not always correct, but mere information translations can still convey the overall meaning. We can take help from open source development programs to empower organisations like CC Extractor and other local translation free softwares so we can include languages like Spanish, Italian, Hindi, Japanese and other native languages so that everyone is able to break those barriers and understand literature promoted in different languages. Similarly provide sustainable funding mechanisms and foster decentralized, community-owned/-run non-profit open source initiatives in this space. Apply an inclusive, holistic approach to science and research in the sense of Open Scholarship to include human value education, open scholcomm and open education with a view on teaching in the seminar and classroom, etc. - basically the whole variety of research and teaching practices that define academic life, but still remain underrepresented in the larger debate around Open Science.")]),e._v(" "),a("h3",{attrs:{id:"evelyn"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#evelyn"}},[e._v("#")]),e._v(" Evelyn")]),e._v(" "),a("p",[e._v("‘Equity’ and ‘inclusion’ are two words that I know too well given the yawning gaps that exist between the haves and have-nots in the African society. Research indeed is the core of any society, identifying calamities and solving them in the most sustainable of ways. These two words therefore occupy an integral space in the open research arena since structural equity and inclusion would mean that research knowledge is given for free irrespective of any societal construct for productive downstream research. Although open access has been lauded for promoting access to high quality research at no costs, authors have so far faced sky high publishing costs that have quite limited the number of papers that make it to the open "),a("a",{attrs:{href:"https://www.researchprofessionalnews.com/rr-news-africa-pan-african-2020-10-open-access-publishing-fees-a-crisis-for-african-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("especially in low and middle income regions like Africa"),a("OutboundLink")],1),e._v(". The need to subsidize publishing costs to the open space is thus apparent with the overall goal of strengthening research capacity and impactful research especially for such regions to be at par with the rest of the world in research and development. Research societies and governments need to forge bilateral pacts whose main purpose is to encourage open access by introducing waivers on publishing costs and also curbing predatory journals that most often than not derail the reputation of scientists.Indeed the achievement of structural equity and inclusion will require that the authors and users of scientific papers alike get to disseminate and access knowledge for free.")]),e._v(" "),a("h3",{attrs:{id:"kate"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#kate"}},[e._v("#")]),e._v(" Kate")]),e._v(" "),a("p",[e._v("In organizing my ideas for a coherent reflection on the theme of this year’s Open Access Week, I thought of recent news out of the United Kingdom. UK’s National Institute for Health Research (NIHR) recently revealed "),a("a",{attrs:{href:"https://www.nihr.ac.uk/news/nihr-responds-to-the-governments-call-for-further-reduction-in-bureaucracy-with-new-measures/25633",target:"_blank",rel:"noopener noreferrer"}},[e._v("new measures"),a("OutboundLink")],1),e._v(" no longer requiring universities to have memberships to specific charters and concordats to receive grant funding. This may seem like a move towards removing roadblocks for funding, however membership to these charters, such as the "),a("a",{attrs:{href:"https://www.advance-he.ac.uk/equality-charters/athena-swan-charter",target:"_blank",rel:"noopener noreferrer"}},[e._v("Athena SWAN Charter"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://www.advance-he.ac.uk/equality-charters/race-equality-charter",target:"_blank",rel:"noopener noreferrer"}},[e._v("Race Equality Charter"),a("OutboundLink")],1),e._v(", provide universities strategies to identify and address institutional and cultural barriers. In a 2020 world in which the Open Access community picks a theme that specifically mentions “structural equity and inclusion” as its goals, institutes of power, like UK’s NIHR, seem to be tone-deaf by no longer requiring charters to guide them in those structures. I commend the Open Access community for leading the way by prioritizing equity and inclusion in its pursuit to share knowledge, and I believe we should all challenge institutional framework, like UK’s NIHR, to embrace the values of the open access community.")]),e._v(" "),a("p",[e._v("Jacqueline"),a("br"),e._v("\nAs a machine learning researcher, this year’s Open Access Week theme resonates. Open access, structural equity, and inclusion should be explicit goals in artificial intelligence (AI) research. To quote the "),a("a",{attrs:{href:"https://www.ajl.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Algorithmic Justice League"),a("OutboundLink")],1),e._v(", “Technology should serve all of us. Not just the priviledged few.” However, the demographics of the AI community do not reflect societal diversity, and this can allow algorithms to reinforce harmful "),a("a",{attrs:{href:"https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-learning-bias-criminal-justice/",target:"_blank",rel:"noopener noreferrer"}},[e._v("systemic biases"),a("OutboundLink")],1),e._v(". But even if we know who is writing the algorithms that affect our lives, we often don’t know how these predictive systems make their decisions. A recent "),a("a",{attrs:{href:"https://www.nature.com/articles/s41586-020-2766-y",target:"_blank",rel:"noopener noreferrer"}},[e._v("response"),a("OutboundLink")],1),e._v(" to a Google Health "),a("a",{attrs:{href:"https://www.nature.com/articles/s41586-019-1799-6",target:"_blank",rel:"noopener noreferrer"}},[e._v("closed source tool"),a("OutboundLink")],1),e._v(" for breast cancer screening argues that failing to release code and training data undermines the scientific value, transparency, and reproducibility of AI systems. Ironically, however, this well-worded argument lies behind a paywall that limits transparency by design. Competing views on closed access AI publishing are captured in the 2018 "),a("a",{attrs:{href:"https://openaccess.engineering.oregonstate.edu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("boycott"),a("OutboundLink")],1),e._v(" of Nature Machine Intelligence, its "),a("a",{attrs:{href:"https://www.sciencemag.org/news/2018/05/why-are-ai-researchers-boycotting-new-nature-journal-and-shunning-others",target:"_blank",rel:"noopener noreferrer"}},[e._v("coverage"),a("OutboundLink")],1),e._v(" in the scientific media, and the journal’s "),a("a",{attrs:{href:"https://www.nature.com/articles/s42256-020-0144-y",target:"_blank",rel:"noopener noreferrer"}},[e._v("rebuttal"),a("OutboundLink")],1),e._v(". Whether you stand by "),a("a",{attrs:{href:"https://www.coalition-s.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Plan S"),a("OutboundLink")],1),e._v(" or not, open conversations around the ethics of access and transparency are important steps toward safe, equitable, and inclusive AI.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/111.2eda7610.js b/assets/js/111.21dd805b.js similarity index 98% rename from assets/js/111.2eda7610.js rename to assets/js/111.21dd805b.js index a921804e4..f146be975 100644 --- a/assets/js/111.2eda7610.js +++ b/assets/js/111.21dd805b.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[111],{645:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"did-you-miss-our-october-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#did-you-miss-our-october-community-call"}},[e._v("#")]),e._v(" Did you miss our October community call?")]),e._v(" "),r("p",[e._v("We had a great presentation by Keith Hughitt, who told us about his work on using Frictionless to create infrastructure for sharing biology and genomics data packages. You can watch his presentation here:"),r("br"),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/GpIgfJE9UGw",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})]),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-of-note-included"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-of-note-included"}},[e._v("#")]),e._v(" Other agenda items of note included:")]),e._v(" "),r("ul",[r("li",[e._v("We are hiring a community manager! The full details are here: "),r("a",{attrs:{href:"https://okfn.org/about/jobs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://okfn.org/about/jobs/"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("Help us prioritize adding new features to frictionless-py! You can vote on which features you want to see here: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues/486",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues/486"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("You can read more about frictionless-py here: "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/10/08/frictionless-framework/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/blog/2020/10/08/frictionless-framework/"),r("OutboundLink")],1)])]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on 19th November. You can sign up here: "),r("a",{attrs:{href:"https://forms.gle/5HeMrt2MDCYSYWxT8",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://forms.gle/5HeMrt2MDCYSYWxT8"),r("OutboundLink")],1),e._v(". We’ll discuss new features of frictionless-py, and there will also be time for your updates too. Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("Here is the recording of the full call:"),r("br"),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kZ4vy5zP0M0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})]),e._v(" "),r("p",[e._v("As always, join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[111],{642:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"did-you-miss-our-october-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#did-you-miss-our-october-community-call"}},[e._v("#")]),e._v(" Did you miss our October community call?")]),e._v(" "),r("p",[e._v("We had a great presentation by Keith Hughitt, who told us about his work on using Frictionless to create infrastructure for sharing biology and genomics data packages. You can watch his presentation here:"),r("br"),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/GpIgfJE9UGw",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})]),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-of-note-included"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-of-note-included"}},[e._v("#")]),e._v(" Other agenda items of note included:")]),e._v(" "),r("ul",[r("li",[e._v("We are hiring a community manager! The full details are here: "),r("a",{attrs:{href:"https://okfn.org/about/jobs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://okfn.org/about/jobs/"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("Help us prioritize adding new features to frictionless-py! You can vote on which features you want to see here: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues/486",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues/486"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("You can read more about frictionless-py here: "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/10/08/frictionless-framework/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/blog/2020/10/08/frictionless-framework/"),r("OutboundLink")],1)])]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on 19th November. You can sign up here: "),r("a",{attrs:{href:"https://forms.gle/5HeMrt2MDCYSYWxT8",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://forms.gle/5HeMrt2MDCYSYWxT8"),r("OutboundLink")],1),e._v(". We’ll discuss new features of frictionless-py, and there will also be time for your updates too. Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("Here is the recording of the full call:"),r("br"),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kZ4vy5zP0M0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})]),e._v(" "),r("p",[e._v("As always, join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/112.072d253f.js b/assets/js/112.d1d4b6fc.js similarity index 96% rename from assets/js/112.072d253f.js rename to assets/js/112.d1d4b6fc.js index a1d343a45..b70888d42 100644 --- a/assets/js/112.072d253f.js +++ b/assets/js/112.d1d4b6fc.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[112],{644:function(t,e,o){"use strict";o.r(e);var a=o(29),n=Object(a.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[o("em",[t._v("By Tracy Teal; originally posted in the Dryad blog: "),o("a",{attrs:{href:"https://blog.datadryad.org/2020/11/18/frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://blog.datadryad.org/2020/11/18/frictionless-data/"),o("OutboundLink")],1)])]),t._v(" "),o("p",[t._v("Guided by our commitment to make research data publishing more seamless and also re-usable, we are thrilled to partner with Open Knowledge Foundation and the Frictionless Data team to enhance our submission processes. Integrating the Frictionless Data toolkit, Dryad will be able to directly provide feedback to authors on the structure of tabular files uploaded. This will also allow for automated file level metadata to be created at upload and available for download for published datasets.")]),t._v(" "),o("p",[t._v("We are excited to get moving on this project and with support from the Sloan Foundation, Open Knowledge Foundation has just announced a job opening to contribute to this work. Please check out the posting and circulate it to any developers who may be interested in building out this functionality with us: "),o("a",{attrs:{href:"https://okfn.org/about/jobs/",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://okfn.org/about/jobs/"),o("OutboundLink")],1)]),t._v(" "),o("p",[o("em",[t._v("Stay tuned for a project update in July 2021!")])])])}),[],!1,null,null,null);e.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[112],{645:function(t,e,o){"use strict";o.r(e);var a=o(29),n=Object(a.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[o("em",[t._v("By Tracy Teal; originally posted in the Dryad blog: "),o("a",{attrs:{href:"https://blog.datadryad.org/2020/11/18/frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://blog.datadryad.org/2020/11/18/frictionless-data/"),o("OutboundLink")],1)])]),t._v(" "),o("p",[t._v("Guided by our commitment to make research data publishing more seamless and also re-usable, we are thrilled to partner with Open Knowledge Foundation and the Frictionless Data team to enhance our submission processes. Integrating the Frictionless Data toolkit, Dryad will be able to directly provide feedback to authors on the structure of tabular files uploaded. This will also allow for automated file level metadata to be created at upload and available for download for published datasets.")]),t._v(" "),o("p",[t._v("We are excited to get moving on this project and with support from the Sloan Foundation, Open Knowledge Foundation has just announced a job opening to contribute to this work. Please check out the posting and circulate it to any developers who may be interested in building out this functionality with us: "),o("a",{attrs:{href:"https://okfn.org/about/jobs/",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://okfn.org/about/jobs/"),o("OutboundLink")],1)]),t._v(" "),o("p",[o("em",[t._v("Stay tuned for a project update in July 2021!")])])])}),[],!1,null,null,null);e.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/114.7a312779.js b/assets/js/114.78ae2f39.js similarity index 99% rename from assets/js/114.7a312779.js rename to assets/js/114.78ae2f39.js index 378254045..160e19faf 100644 --- a/assets/js/114.7a312779.js +++ b/assets/js/114.78ae2f39.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[114],{647:function(a,e,t){"use strict";t.r(e);var o=t(29),r=Object(o.a)({},(function(){var a=this,e=a.$createElement,t=a._self._c||e;return t("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[t("p",[a._v("Have you ever been looking at a dataset and had no idea what the data values mean? What units are being used? What does that acronym in the first column mean? What is the license for this data?")]),a._v(" "),t("p",[a._v("These are all very common issues that make data hard to understand and use. At Frictionless Data, we work to solve these issues by packaging data with its metadata - aka the description of the data. To help you package your data, we have "),t("a",{attrs:{href:"https://frictionlessdata.io/software/",target:"_blank",rel:"noopener noreferrer"}},[a._v("code in several languages"),t("OutboundLink")],1),a._v(" and a browser tool, called "),t("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),t("OutboundLink")],1),a._v(".")]),a._v(" "),t("p",[a._v("Our Reproducible Research Fellows recently learned all about packaging their data by using the Data Package Creator. To help others learn how they too can package their data, the Fellows wrote about packaging their data in blogs that you can read below!")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-is-valid-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-is-valid-by-ouso-daniel-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package is Valid! By Ouso Daniel"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“To quality-check the integrity of your data package creation, you must validate it before downloading it for sharing, among many things. The best you can get from that process is “Data package is valid!”. What about before then?”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"combating-other-people-s-data-by-monica-granados-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#combating-other-people-s-data-by-monica-granados-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/monica-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Combating other people’s data by Monica Granados"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“Follow the #otherpeoplesdata on Twitter and in it you will find a trove of data users trying to make sense of data they did not collect. While the data may be open, having no metadata or information about what variables mean, doesn’t make it very accessible….Without definitions and an explanation of the data, taking the data out of the context of my experiment and adding it to something like a meta-analysis is difficult. Enter Data packages. “")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-blog-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-blog-by-lily-zhao-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Blog by Lily Zhao"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v('"When I started graduate school, I was shocked to learn that seafood is actually the most internationally traded food commodity in the world….However, for many developing countries being connected to the global seafood market can be a double-edged sword….Over the course of my master’s degree, I developed a passion for studying these issues, which is why I am excited to share with you my experience turning some of the data my collaborators into a packaged dataset using the Open Knowledge Foundation’s Datapackage tool.”')]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"¿como-empaquetamos-datos-y-por-que-es-importante-organizar-la-bolsa-del-supermercado-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#¿como-empaquetamos-datos-y-por-que-es-importante-organizar-la-bolsa-del-supermercado-by-sele-yang-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("¿Cómo empaquetamos datos y por qué es importante organizar la bolsa del supermercado? By Sele Yang"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“Empaquetando datos sobre aborto desde OpenStreetMap Esta es una publicación para compartirles sobre el proceso y pasos para crear datapackages. ¿Qué es esto? Un datapackage es básicamente un empaquetado que agiliza la forma en que compartimos y replicamos los datos. Es como un contenedor de datos listo para ser transportado por la autopista del conocimiento (geeky, right).”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"so-you-want-to-get-your-data-package-validated-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#so-you-want-to-get-your-data-package-validated-by-katerina-drakoulaki-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("So you want to get your data package validated? By Katerina Drakoulaki"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“Have you ever found any kind of dataset, (or been given one by your PI/collaborator) and had no idea what the data were about? During my PhD I’ve had my fair share of not knowing how code works, or how stimuli were supposed to be presented, or how data were supposed to be analysed….The datapackage tool tries to solve one of these issues, more specifically creating packages in which data make sense, and have all the explanations (metadata) necessary to understand and manipulate them.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"constructing-a-basic-data-package-in-python-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#constructing-a-basic-data-package-in-python-by-jacqueline-maasch-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Constructing a basic data package in Python by Jacqueline Maasch"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“As a machine learning researcher, I am constantly scraping, merging, reshaping, exploring, modeling, and generating data. Because I do most of my data management and analysis in Python, I find it convenient to package my data in Python as well. The screenshots below are a walk-through of basic data package construction in Python.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"sharing-data-from-your-own-scientific-publication-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#sharing-data-from-your-own-scientific-publication-by-dani-alcala-lopez-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Sharing data from your own scientific publication by Dani Alcalá-López"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“What better way to start working with open data than by sharing a Data Package from one of my own publications? In this tutorial, I will explain how to use the Frictionless Data tools to share tabular data from a scientific publication openly. This will make easier for anyone to reuse this data.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-blog-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-blog-by-sam-wilairat-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Blog by Sam Wilairat"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“As a library science student with an interest in pursuing data librarianship, learning how to create, manage, and share frictionless data is important. These past few months I’ve been learning about Frictionless Data and how to use Frictionless Data Tools to support reproducible research….To learn how to use the Frictionless Data Tools, I decided to pursue an independent project and am working on creating a comprehensive dataset of OER (open educational resources) health science materials that can be filtered by material type, media format, topic, and more.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"let-s-talk-data-packaging-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#let-s-talk-data-packaging-by-evelyn-night-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Let’s Talk Data Packaging by Evelyn Night"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“A few weeks ago I met data packages for the first time and I was intrigued since I had spent too much time in the past wrangling missing and inconsistent values. Packaging data therefore taught me that arranging and preserving data does not have to be tedious anymore. Here, I show how I packaged a bit of my data (unpublished) into a neat json document using the Data Package creator . I am excited to show you just how much I have come from knowing nothing to being able to package and extract the json output.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-packaging-human-rights-with-the-universal-periodic-review-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-packaging-human-rights-with-the-universal-periodic-review-by-anne-lee-steele-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("[Data]packaging human rights with the Universal Periodic Review by Anne Lee Steele"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“All of the records for the Universal Periodic Review have been uploaded online, and are available for the public. However, it’s not likely that the everyday user would be able to make heads or tails of what it actually means….The way I think about it, the Data Package is a way of explaining the categories used within the data itself, in case someone besides an expert is using them. While sections like “Recommendation” and “Recommending State” may be somewhat self-explanatory, I can imagine that this will get way more complicated with purely numerical data.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"creating-a-datapackage-for-microbial-community-data-and-a-phyloseq-object-by-kate-bowie-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-datapackage-for-microbial-community-data-and-a-phyloseq-object-by-kate-bowie-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kate-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Creating a datapackage for microbial community data (and a phyloseq object) by Kate Bowie"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“I study bacteria, and lucky for me, bacteria are everywhere….My lab often tries many different ways to handle the mock [bacteria] community, so it’s important that the analysis be documented and reproducible. To address this, I decided to generate a data package using a tool created by the Open Knowledge Foundation. Here is my experience creating a data package of our data, the metadata, and associated software.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"using-weather-and-rainfall-data-to-validate-by-ritwik-agarwal-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-weather-and-rainfall-data-to-validate-by-ritwik-agarwal-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ritwik-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Using Weather and Rainfall Data to Validate by Ritwik Agarwal"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“I am using a data resource from Telangana Open Data…it is an open source data repository commissioned by the state government here in India and basically it archives and stores Weather, Topological, Agriculture and Infrastructure data which then can be used by research students and stakeholders keen to study and make reports in it….CSV files are very versatile, but cannot handle the metadata with all the necessary context. We need to make sure that people can find our data and the information they need to understand our data. That’s where the Data Package comes in! ”")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[114],{648:function(a,e,t){"use strict";t.r(e);var o=t(29),r=Object(o.a)({},(function(){var a=this,e=a.$createElement,t=a._self._c||e;return t("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[t("p",[a._v("Have you ever been looking at a dataset and had no idea what the data values mean? What units are being used? What does that acronym in the first column mean? What is the license for this data?")]),a._v(" "),t("p",[a._v("These are all very common issues that make data hard to understand and use. At Frictionless Data, we work to solve these issues by packaging data with its metadata - aka the description of the data. To help you package your data, we have "),t("a",{attrs:{href:"https://frictionlessdata.io/software/",target:"_blank",rel:"noopener noreferrer"}},[a._v("code in several languages"),t("OutboundLink")],1),a._v(" and a browser tool, called "),t("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),t("OutboundLink")],1),a._v(".")]),a._v(" "),t("p",[a._v("Our Reproducible Research Fellows recently learned all about packaging their data by using the Data Package Creator. To help others learn how they too can package their data, the Fellows wrote about packaging their data in blogs that you can read below!")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-is-valid-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-is-valid-by-ouso-daniel-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package is Valid! By Ouso Daniel"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“To quality-check the integrity of your data package creation, you must validate it before downloading it for sharing, among many things. The best you can get from that process is “Data package is valid!”. What about before then?”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"combating-other-people-s-data-by-monica-granados-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#combating-other-people-s-data-by-monica-granados-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/monica-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Combating other people’s data by Monica Granados"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“Follow the #otherpeoplesdata on Twitter and in it you will find a trove of data users trying to make sense of data they did not collect. While the data may be open, having no metadata or information about what variables mean, doesn’t make it very accessible….Without definitions and an explanation of the data, taking the data out of the context of my experiment and adding it to something like a meta-analysis is difficult. Enter Data packages. “")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-blog-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-blog-by-lily-zhao-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Blog by Lily Zhao"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v('"When I started graduate school, I was shocked to learn that seafood is actually the most internationally traded food commodity in the world….However, for many developing countries being connected to the global seafood market can be a double-edged sword….Over the course of my master’s degree, I developed a passion for studying these issues, which is why I am excited to share with you my experience turning some of the data my collaborators into a packaged dataset using the Open Knowledge Foundation’s Datapackage tool.”')]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"¿como-empaquetamos-datos-y-por-que-es-importante-organizar-la-bolsa-del-supermercado-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#¿como-empaquetamos-datos-y-por-que-es-importante-organizar-la-bolsa-del-supermercado-by-sele-yang-cohort-1"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("¿Cómo empaquetamos datos y por qué es importante organizar la bolsa del supermercado? By Sele Yang"),t("OutboundLink")],1),a._v(" (Cohort 1)")]),a._v(" "),t("p",[a._v("“Empaquetando datos sobre aborto desde OpenStreetMap Esta es una publicación para compartirles sobre el proceso y pasos para crear datapackages. ¿Qué es esto? Un datapackage es básicamente un empaquetado que agiliza la forma en que compartimos y replicamos los datos. Es como un contenedor de datos listo para ser transportado por la autopista del conocimiento (geeky, right).”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"so-you-want-to-get-your-data-package-validated-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#so-you-want-to-get-your-data-package-validated-by-katerina-drakoulaki-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("So you want to get your data package validated? By Katerina Drakoulaki"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“Have you ever found any kind of dataset, (or been given one by your PI/collaborator) and had no idea what the data were about? During my PhD I’ve had my fair share of not knowing how code works, or how stimuli were supposed to be presented, or how data were supposed to be analysed….The datapackage tool tries to solve one of these issues, more specifically creating packages in which data make sense, and have all the explanations (metadata) necessary to understand and manipulate them.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"constructing-a-basic-data-package-in-python-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#constructing-a-basic-data-package-in-python-by-jacqueline-maasch-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Constructing a basic data package in Python by Jacqueline Maasch"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“As a machine learning researcher, I am constantly scraping, merging, reshaping, exploring, modeling, and generating data. Because I do most of my data management and analysis in Python, I find it convenient to package my data in Python as well. The screenshots below are a walk-through of basic data package construction in Python.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"sharing-data-from-your-own-scientific-publication-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#sharing-data-from-your-own-scientific-publication-by-dani-alcala-lopez-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Sharing data from your own scientific publication by Dani Alcalá-López"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“What better way to start working with open data than by sharing a Data Package from one of my own publications? In this tutorial, I will explain how to use the Frictionless Data tools to share tabular data from a scientific publication openly. This will make easier for anyone to reuse this data.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-package-blog-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-package-blog-by-sam-wilairat-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Blog by Sam Wilairat"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“As a library science student with an interest in pursuing data librarianship, learning how to create, manage, and share frictionless data is important. These past few months I’ve been learning about Frictionless Data and how to use Frictionless Data Tools to support reproducible research….To learn how to use the Frictionless Data Tools, I decided to pursue an independent project and am working on creating a comprehensive dataset of OER (open educational resources) health science materials that can be filtered by material type, media format, topic, and more.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"let-s-talk-data-packaging-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#let-s-talk-data-packaging-by-evelyn-night-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Let’s Talk Data Packaging by Evelyn Night"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“A few weeks ago I met data packages for the first time and I was intrigued since I had spent too much time in the past wrangling missing and inconsistent values. Packaging data therefore taught me that arranging and preserving data does not have to be tedious anymore. Here, I show how I packaged a bit of my data (unpublished) into a neat json document using the Data Package creator . I am excited to show you just how much I have come from knowing nothing to being able to package and extract the json output.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"data-packaging-human-rights-with-the-universal-periodic-review-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-packaging-human-rights-with-the-universal-periodic-review-by-anne-lee-steele-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("[Data]packaging human rights with the Universal Periodic Review by Anne Lee Steele"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“All of the records for the Universal Periodic Review have been uploaded online, and are available for the public. However, it’s not likely that the everyday user would be able to make heads or tails of what it actually means….The way I think about it, the Data Package is a way of explaining the categories used within the data itself, in case someone besides an expert is using them. While sections like “Recommendation” and “Recommending State” may be somewhat self-explanatory, I can imagine that this will get way more complicated with purely numerical data.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"creating-a-datapackage-for-microbial-community-data-and-a-phyloseq-object-by-kate-bowie-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-datapackage-for-microbial-community-data-and-a-phyloseq-object-by-kate-bowie-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kate-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Creating a datapackage for microbial community data (and a phyloseq object) by Kate Bowie"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“I study bacteria, and lucky for me, bacteria are everywhere….My lab often tries many different ways to handle the mock [bacteria] community, so it’s important that the analysis be documented and reproducible. To address this, I decided to generate a data package using a tool created by the Open Knowledge Foundation. Here is my experience creating a data package of our data, the metadata, and associated software.”")]),a._v(" "),t("hr"),a._v(" "),t("h3",{attrs:{id:"using-weather-and-rainfall-data-to-validate-by-ritwik-agarwal-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-weather-and-rainfall-data-to-validate-by-ritwik-agarwal-cohort-2"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ritwik-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Using Weather and Rainfall Data to Validate by Ritwik Agarwal"),t("OutboundLink")],1),a._v(" (Cohort 2)")]),a._v(" "),t("p",[a._v("“I am using a data resource from Telangana Open Data…it is an open source data repository commissioned by the state government here in India and basically it archives and stores Weather, Topological, Agriculture and Infrastructure data which then can be used by research students and stakeholders keen to study and make reports in it….CSV files are very versatile, but cannot handle the metadata with all the necessary context. We need to make sure that people can find our data and the information they need to understand our data. That’s where the Data Package comes in! ”")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/115.6bee2741.js b/assets/js/115.24c7eaf7.js similarity index 98% rename from assets/js/115.6bee2741.js rename to assets/js/115.24c7eaf7.js index f41eb489d..fef48f9a8 100644 --- a/assets/js/115.6bee2741.js +++ b/assets/js/115.24c7eaf7.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[115],{648:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("h2",{attrs:{id:"a-recap-from-our-december-community-call"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-december-community-call"}},[e._v("#")]),e._v(" A recap from our December community call")]),e._v(" "),a("p",[e._v("We had a presentation about “using frictionless data package for web archive data package (WACZ format)”. More details in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/forum/issues/69",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub issue"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("If you would like to dive deeper and watch Ilya’s presentation, you can find it here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/2aeRcEMmmSs",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("ul",[a("li",[e._v("People are interested in tools dealing with tools dealing with “small” data.")]),e._v(" "),a("li",[e._v("What are we up to for 2021? What’s the roadmap for Frictionless?\n"),a("ul",[a("li",[e._v("Specs are stable.")]),e._v(" "),a("li",[e._v("Always bet on JavaScript! We will keep focusing on working with tools for JavaScript. Its versatility is required for desktop apps, using dynamic frameworks like React, etc.\n"),a("ul",[a("li",[e._v("We will keep working on "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-js"),a("OutboundLink")],1)])])])])]),e._v(" "),a("li",[a("a",{attrs:{href:"https://github.com/datopian/data-cli",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/datopian/data-cli"),a("OutboundLink")],1),e._v(" Command line tool for working with data, Data Packages and the DataHub")]),e._v(" "),a("li",[a("a",{attrs:{href:"https://github.com/datopian/datapub",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/datopian/datapub"),a("OutboundLink")],1),e._v(" React-based framework for building data publishing workflows (esp for CKAN)")])]),e._v(" "),a("h2",{attrs:{id:"join-us-next-time"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-time"}},[e._v("#")]),e._v(" Join us next time!")]),e._v(" "),a("p",[e._v("Our next meeting will be announced in January 2021! You can "),a("a",{attrs:{href:"https://forms.gle/5HeMrt2MDCYSYWxT8",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up here"),a("OutboundLink")],1),e._v(" to be notified when the hangout will be scheduled. We’ll give some space to talk about geospatial data standards, coping with Covid and showing a member’s platform dedicated to open data hackathons!")]),e._v(" "),a("p",[e._v("As always, there will be time for your updates too. Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h2",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/lquzoKn9Flo",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[115],{647:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("h2",{attrs:{id:"a-recap-from-our-december-community-call"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-december-community-call"}},[e._v("#")]),e._v(" A recap from our December community call")]),e._v(" "),a("p",[e._v("We had a presentation about “using frictionless data package for web archive data package (WACZ format)”. More details in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/forum/issues/69",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub issue"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("If you would like to dive deeper and watch Ilya’s presentation, you can find it here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/2aeRcEMmmSs",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("ul",[a("li",[e._v("People are interested in tools dealing with tools dealing with “small” data.")]),e._v(" "),a("li",[e._v("What are we up to for 2021? What’s the roadmap for Frictionless?\n"),a("ul",[a("li",[e._v("Specs are stable.")]),e._v(" "),a("li",[e._v("Always bet on JavaScript! We will keep focusing on working with tools for JavaScript. Its versatility is required for desktop apps, using dynamic frameworks like React, etc.\n"),a("ul",[a("li",[e._v("We will keep working on "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-js"),a("OutboundLink")],1)])])])])]),e._v(" "),a("li",[a("a",{attrs:{href:"https://github.com/datopian/data-cli",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/datopian/data-cli"),a("OutboundLink")],1),e._v(" Command line tool for working with data, Data Packages and the DataHub")]),e._v(" "),a("li",[a("a",{attrs:{href:"https://github.com/datopian/datapub",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/datopian/datapub"),a("OutboundLink")],1),e._v(" React-based framework for building data publishing workflows (esp for CKAN)")])]),e._v(" "),a("h2",{attrs:{id:"join-us-next-time"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-time"}},[e._v("#")]),e._v(" Join us next time!")]),e._v(" "),a("p",[e._v("Our next meeting will be announced in January 2021! You can "),a("a",{attrs:{href:"https://forms.gle/5HeMrt2MDCYSYWxT8",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up here"),a("OutboundLink")],1),e._v(" to be notified when the hangout will be scheduled. We’ll give some space to talk about geospatial data standards, coping with Covid and showing a member’s platform dedicated to open data hackathons!")]),e._v(" "),a("p",[e._v("As always, there will be time for your updates too. Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h2",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/lquzoKn9Flo",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/116.0d0fad1f.js b/assets/js/116.85bdea59.js similarity index 98% rename from assets/js/116.0d0fad1f.js rename to assets/js/116.85bdea59.js index 97f1f54ed..734d8222b 100644 --- a/assets/js/116.0d0fad1f.js +++ b/assets/js/116.85bdea59.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[116],{650:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Originally published: "),r("a",{attrs:{href:"https://blog.okfn.org/2021/01/12/partnering-with-odi-to-improve-frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2021/01/12/partnering-with-odi-to-improve-frictionless-data/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("em",[e._v("In the framework of the Open Data Institute’s "),r("a",{attrs:{href:"https://theodi.org/article/call-for-proposals-funding-to-develop-open-source-tools-for-data-institutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("fund to develop open source tools for data institutions"),r("OutboundLink")],1),e._v(", the "),r("a",{attrs:{href:"okfn.org"}},[e._v("Open Knowledge Foundation (OKF)")]),e._v(" has been awarded funds to improve the quality and interoperability of Frictionless Data.")])]),e._v(" "),r("p",[e._v("In light of our effort to make data open and accessible, we are thrilled to announce we will be partnering with the "),r("a",{attrs:{href:"https://theodi.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Institute (ODI)"),r("OutboundLink")],1),e._v(" to improve our existing documentation and add new features on "),r("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data"),r("OutboundLink")],1),e._v(" to create a better user experience for all."),r("br"),e._v("\nTo achieve this, we will be working with a cohort of users from our active and engaged community to create better documentation that fits their needs. Our main goal is to make it easier for current and future users to understand and make use of the Frictionless Data tools and data libraries to their fullest potential."),r("br"),e._v("\nWe know how frustrating it can be to try and use existing code (or learn new code) that has incomplete documentation and we don’t want that to be a barrier for our users anymore. This is why we are very grateful to the ODI for granting us the opportunity to improve upon our existing documentation.")]),e._v(" "),r("h2",{attrs:{id:"so-what-will-be-changing"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#so-what-will-be-changing"}},[e._v("#")]),e._v(" So, what will be changing?")]),e._v(" "),r("ul",[r("li",[e._v("We will have a new project overview section, to help our users understand how to use Frictionless Data for their specific needs.")]),e._v(" "),r("li",[e._v("We will improve the existing documentation, to make sure even brand new users can quickly understand everything.")]),e._v(" "),r("li",[e._v("We will have Tutorials, to showcase real users experience and have user-friendly examples.")]),e._v(" "),r("li",[e._v("We will add a FAQ session.")])]),e._v(" "),r("h2",{attrs:{id:"and-when-will-all-of-that-be-ready"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#and-when-will-all-of-that-be-ready"}},[e._v("#")]),e._v(" And when will all of that be ready?")]),e._v(" "),r("p",[e._v("Very soon! By the beginning of April everything will be online, so stay tuned (and frictionless)!")]),e._v(" "),r("h2",{attrs:{id:"call-for-user-feedback"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-for-user-feedback"}},[e._v("#")]),e._v(" Call for user feedback")]),e._v(" "),r("p",[e._v("Feedback from our community is crucial to us, and part of this grant will be used to fund an evaluation of the existing documentation by our users in the format of user feedback sessions."),r("br"),e._v("\nAre you using our Frictionless Data tools or our Python data library? Then we want to hear from you!"),r("br"),e._v("\nWe are currently looking for novice and intermediate users to help us review our documentation, in order to make it more useful for you and all our future users."),r("br"),e._v("\nFor every user session you take part into, you will be given £50 for your time and feedback."),r("br"),e._v("\nAre you interested? Then fill in "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSezZVuKjqnFL9CHtuWVjDwDu8Cv1gQCAIs85TtDYQUv1t9hVw/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"more-about-frictionless-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#more-about-frictionless-data"}},[e._v("#")]),e._v(" More about Frictionless Data")]),e._v(" "),r("p",[e._v("Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[116],{649:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Originally published: "),r("a",{attrs:{href:"https://blog.okfn.org/2021/01/12/partnering-with-odi-to-improve-frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2021/01/12/partnering-with-odi-to-improve-frictionless-data/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("em",[e._v("In the framework of the Open Data Institute’s "),r("a",{attrs:{href:"https://theodi.org/article/call-for-proposals-funding-to-develop-open-source-tools-for-data-institutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("fund to develop open source tools for data institutions"),r("OutboundLink")],1),e._v(", the "),r("a",{attrs:{href:"okfn.org"}},[e._v("Open Knowledge Foundation (OKF)")]),e._v(" has been awarded funds to improve the quality and interoperability of Frictionless Data.")])]),e._v(" "),r("p",[e._v("In light of our effort to make data open and accessible, we are thrilled to announce we will be partnering with the "),r("a",{attrs:{href:"https://theodi.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Institute (ODI)"),r("OutboundLink")],1),e._v(" to improve our existing documentation and add new features on "),r("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data"),r("OutboundLink")],1),e._v(" to create a better user experience for all."),r("br"),e._v("\nTo achieve this, we will be working with a cohort of users from our active and engaged community to create better documentation that fits their needs. Our main goal is to make it easier for current and future users to understand and make use of the Frictionless Data tools and data libraries to their fullest potential."),r("br"),e._v("\nWe know how frustrating it can be to try and use existing code (or learn new code) that has incomplete documentation and we don’t want that to be a barrier for our users anymore. This is why we are very grateful to the ODI for granting us the opportunity to improve upon our existing documentation.")]),e._v(" "),r("h2",{attrs:{id:"so-what-will-be-changing"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#so-what-will-be-changing"}},[e._v("#")]),e._v(" So, what will be changing?")]),e._v(" "),r("ul",[r("li",[e._v("We will have a new project overview section, to help our users understand how to use Frictionless Data for their specific needs.")]),e._v(" "),r("li",[e._v("We will improve the existing documentation, to make sure even brand new users can quickly understand everything.")]),e._v(" "),r("li",[e._v("We will have Tutorials, to showcase real users experience and have user-friendly examples.")]),e._v(" "),r("li",[e._v("We will add a FAQ session.")])]),e._v(" "),r("h2",{attrs:{id:"and-when-will-all-of-that-be-ready"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#and-when-will-all-of-that-be-ready"}},[e._v("#")]),e._v(" And when will all of that be ready?")]),e._v(" "),r("p",[e._v("Very soon! By the beginning of April everything will be online, so stay tuned (and frictionless)!")]),e._v(" "),r("h2",{attrs:{id:"call-for-user-feedback"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-for-user-feedback"}},[e._v("#")]),e._v(" Call for user feedback")]),e._v(" "),r("p",[e._v("Feedback from our community is crucial to us, and part of this grant will be used to fund an evaluation of the existing documentation by our users in the format of user feedback sessions."),r("br"),e._v("\nAre you using our Frictionless Data tools or our Python data library? Then we want to hear from you!"),r("br"),e._v("\nWe are currently looking for novice and intermediate users to help us review our documentation, in order to make it more useful for you and all our future users."),r("br"),e._v("\nFor every user session you take part into, you will be given £50 for your time and feedback."),r("br"),e._v("\nAre you interested? Then fill in "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSezZVuKjqnFL9CHtuWVjDwDu8Cv1gQCAIs85TtDYQUv1t9hVw/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"more-about-frictionless-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#more-about-frictionless-data"}},[e._v("#")]),e._v(" More about Frictionless Data")]),e._v(" "),r("p",[e._v("Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/117.72acabd5.js b/assets/js/117.85164eb1.js similarity index 99% rename from assets/js/117.72acabd5.js rename to assets/js/117.85164eb1.js index 450384b5d..d642facb9 100644 --- a/assets/js/117.72acabd5.js +++ b/assets/js/117.85164eb1.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[117],{649:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[t("em",[e._v("This blog is part of a series showcasing projects developed during the 2020 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),t("h2",{attrs:{id:"what-problem-does-schema-collaboration-solve"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#what-problem-does-schema-collaboration-solve"}},[e._v("#")]),e._v(" What problem does Schema-Collaboration solve?")]),e._v(" "),t("p",[e._v("As a software engineer, I’ve spent more than a decade developing software used by researchers or data managers using different technologies. I have been involved in free software communities and projects for more than 20 years.")]),e._v(" "),t("p",[e._v("Whilst working for a polar research institute, we saw the opportunity to take advantage of Frictionless data packages to describe datasets in a machine readable way ready for publication. But it was difficult for data managers and researchers to collaborate effectively on this, particularly when one or both groups were not familiar with Frictionless schemas. We needed a way for researchers submitting datasets to get feedback from the data managers to ensure that the dataset’s schema was correct.")]),e._v(" "),t("h2",{attrs:{id:"how-does-schema-collaboration-make-collaborating-easier"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-does-schema-collaboration-make-collaborating-easier"}},[e._v("#")]),e._v(" How does Schema-Collaboration make collaborating easier?")]),e._v(" "),t("p",[e._v("The Frictionless "),t("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),t("OutboundLink")],1),e._v(" is a very good Web-based tool to create the schemas but it didn’t help out of the box on the collaboration part. The solution in this tool fund was to build a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). To encourage collaboration within a project multiple researchers can work on the same schema. Being able to view the description in human-readable formats makes it easier to spot mistakes and to integrate with third-party data repositories.")]),e._v(" "),t("p",[e._v("From a data manager’s perspective the tool allows them to keep tabs on the datasets being managed and their progress. It prevents details getting lost in emails and hopefully provides a nicer interface to encourage better collaboration.")]),e._v(" "),t("p",[e._v("In other words: think of a very simplified “Google Docs” specialised for data packages.")]),e._v(" "),t("h2",{attrs:{id:"who-can-use-schema-collaboration"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#who-can-use-schema-collaboration"}},[e._v("#")]),e._v(" Who can use Schema-Collaboration?")]),e._v(" "),t("p",[e._v("The tool is designed to help data managers(*) and researchers document data packages. The documentation (which is based on Frictionless schemas) needs to be started by the data manager who then sends the URL to the researchers allowing them to edit the schema.")]),e._v(" "),t("p",[e._v("*: or anybody who wants to collaborate on creating a data package.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/104922881-8e788c80-599b-11eb-9260-21b9a5747a8f.png",alt:"Data-packages"}}),t("br"),e._v(" "),t("em",[e._v("Data managers can view a list of datapackages within the Schema-Collaboration tool.")])]),e._v(" "),t("h2",{attrs:{id:"how-can-i-use-this-tool"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-can-i-use-this-tool"}},[e._v("#")]),e._v(" How can I use this tool?")]),e._v(" "),t("p",[e._v("To evaluate the tool it is possible to use the "),t("a",{attrs:{href:"https://carles.eu.pythonanywhere.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("public demo server"),t("OutboundLink")],1),e._v(" or to install it locally on a computer.")]),e._v(" "),t("p",[e._v("It was packaged in a Docker container to make it easier to install on servers. There is full "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration/blob/master/docker/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation available"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Once the tool is installed it is used via a Web browser both by data managers and researchers.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/104923256-19598700-599c-11eb-9cc4-19bb7637fdaa.png",alt:"datapackage-detail"}}),t("br"),e._v(" "),t("em",[e._v("You can view details about the datapackage, including comments from the data manager or other users, and also edit the datapackage.")])]),e._v(" "),t("h2",{attrs:{id:"future-plans-for-schema-collaboration"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#future-plans-for-schema-collaboration"}},[e._v("#")]),e._v(" Future plans for Schema-Collaboration")]),e._v(" "),t("p",[e._v("We plan to install the schema-collaboration at the Swiss Polar Institute to be used to describe polar data sets.")]),e._v(" "),t("p",[e._v("In the upcoming January Frictionless Data community call (sign up "),t("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(" to join), I will do a demo and I would really appreciate feedback. Please feel free to use it and add issues (bugs or ideas) in the "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repository"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("h2",{attrs:{id:"tech-stack"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#tech-stack"}},[e._v("#")]),e._v(" Tech stack")]),e._v(" "),t("p",[e._v("For the curious: schema-collaboration is developed using Python and Django and uses the django-crispy-forms package to create the forms. It supports sqlite3 and MariaDB databases.")]),e._v(" "),t("h2",{attrs:{id:"thanks-to"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#thanks-to"}},[e._v("#")]),e._v(" Thanks to…")]),e._v(" "),t("p",[e._v("In order to integrate Data Package Creator with schema-collaboration some changes where needed in the Data Package Creator. Evgeny (@roll on GitHub/Discord) from Frictionlessdata project made the changes to Data Package Creator needed to achieve this and helped with the integration. Thank you very much!")]),e._v(" "),t("p",[t("strong",[e._v("Further reading:")])]),e._v(" "),t("p",[e._v("GitHub repository: "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/schema-collaboration"),t("OutboundLink")],1)]),e._v(" "),t("p",[e._v("Meet Carles Pina Estany: "),t("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/#meet-carles-pina-estany",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/#meet-carles-pina-estany"),t("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[117],{650:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[t("em",[e._v("This blog is part of a series showcasing projects developed during the 2020 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),t("h2",{attrs:{id:"what-problem-does-schema-collaboration-solve"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#what-problem-does-schema-collaboration-solve"}},[e._v("#")]),e._v(" What problem does Schema-Collaboration solve?")]),e._v(" "),t("p",[e._v("As a software engineer, I’ve spent more than a decade developing software used by researchers or data managers using different technologies. I have been involved in free software communities and projects for more than 20 years.")]),e._v(" "),t("p",[e._v("Whilst working for a polar research institute, we saw the opportunity to take advantage of Frictionless data packages to describe datasets in a machine readable way ready for publication. But it was difficult for data managers and researchers to collaborate effectively on this, particularly when one or both groups were not familiar with Frictionless schemas. We needed a way for researchers submitting datasets to get feedback from the data managers to ensure that the dataset’s schema was correct.")]),e._v(" "),t("h2",{attrs:{id:"how-does-schema-collaboration-make-collaborating-easier"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-does-schema-collaboration-make-collaborating-easier"}},[e._v("#")]),e._v(" How does Schema-Collaboration make collaborating easier?")]),e._v(" "),t("p",[e._v("The Frictionless "),t("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),t("OutboundLink")],1),e._v(" is a very good Web-based tool to create the schemas but it didn’t help out of the box on the collaboration part. The solution in this tool fund was to build a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). To encourage collaboration within a project multiple researchers can work on the same schema. Being able to view the description in human-readable formats makes it easier to spot mistakes and to integrate with third-party data repositories.")]),e._v(" "),t("p",[e._v("From a data manager’s perspective the tool allows them to keep tabs on the datasets being managed and their progress. It prevents details getting lost in emails and hopefully provides a nicer interface to encourage better collaboration.")]),e._v(" "),t("p",[e._v("In other words: think of a very simplified “Google Docs” specialised for data packages.")]),e._v(" "),t("h2",{attrs:{id:"who-can-use-schema-collaboration"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#who-can-use-schema-collaboration"}},[e._v("#")]),e._v(" Who can use Schema-Collaboration?")]),e._v(" "),t("p",[e._v("The tool is designed to help data managers(*) and researchers document data packages. The documentation (which is based on Frictionless schemas) needs to be started by the data manager who then sends the URL to the researchers allowing them to edit the schema.")]),e._v(" "),t("p",[e._v("*: or anybody who wants to collaborate on creating a data package.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/104922881-8e788c80-599b-11eb-9260-21b9a5747a8f.png",alt:"Data-packages"}}),t("br"),e._v(" "),t("em",[e._v("Data managers can view a list of datapackages within the Schema-Collaboration tool.")])]),e._v(" "),t("h2",{attrs:{id:"how-can-i-use-this-tool"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-can-i-use-this-tool"}},[e._v("#")]),e._v(" How can I use this tool?")]),e._v(" "),t("p",[e._v("To evaluate the tool it is possible to use the "),t("a",{attrs:{href:"https://carles.eu.pythonanywhere.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("public demo server"),t("OutboundLink")],1),e._v(" or to install it locally on a computer.")]),e._v(" "),t("p",[e._v("It was packaged in a Docker container to make it easier to install on servers. There is full "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration/blob/master/docker/README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation available"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Once the tool is installed it is used via a Web browser both by data managers and researchers.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/104923256-19598700-599c-11eb-9cc4-19bb7637fdaa.png",alt:"datapackage-detail"}}),t("br"),e._v(" "),t("em",[e._v("You can view details about the datapackage, including comments from the data manager or other users, and also edit the datapackage.")])]),e._v(" "),t("h2",{attrs:{id:"future-plans-for-schema-collaboration"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#future-plans-for-schema-collaboration"}},[e._v("#")]),e._v(" Future plans for Schema-Collaboration")]),e._v(" "),t("p",[e._v("We plan to install the schema-collaboration at the Swiss Polar Institute to be used to describe polar data sets.")]),e._v(" "),t("p",[e._v("In the upcoming January Frictionless Data community call (sign up "),t("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(" to join), I will do a demo and I would really appreciate feedback. Please feel free to use it and add issues (bugs or ideas) in the "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repository"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("h2",{attrs:{id:"tech-stack"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#tech-stack"}},[e._v("#")]),e._v(" Tech stack")]),e._v(" "),t("p",[e._v("For the curious: schema-collaboration is developed using Python and Django and uses the django-crispy-forms package to create the forms. It supports sqlite3 and MariaDB databases.")]),e._v(" "),t("h2",{attrs:{id:"thanks-to"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#thanks-to"}},[e._v("#")]),e._v(" Thanks to…")]),e._v(" "),t("p",[e._v("In order to integrate Data Package Creator with schema-collaboration some changes where needed in the Data Package Creator. Evgeny (@roll on GitHub/Discord) from Frictionlessdata project made the changes to Data Package Creator needed to achieve this and helped with the integration. Thank you very much!")]),e._v(" "),t("p",[t("strong",[e._v("Further reading:")])]),e._v(" "),t("p",[e._v("GitHub repository: "),t("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/schema-collaboration"),t("OutboundLink")],1)]),e._v(" "),t("p",[e._v("Meet Carles Pina Estany: "),t("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/#meet-carles-pina-estany",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/#meet-carles-pina-estany"),t("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/119.4464f1f2.js b/assets/js/119.2c4ebfcb.js similarity index 99% rename from assets/js/119.4464f1f2.js rename to assets/js/119.2c4ebfcb.js index d4e50592c..8d5983280 100644 --- a/assets/js/119.4464f1f2.js +++ b/assets/js/119.2c4ebfcb.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[119],{652:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Have you ever heard a data horror story about Excel automatically changing all numbers into dates without so much as a warning? Have you ever accidentally entered a wrong data value into a spreadsheet, or accidentally deleted a cell? What if there was an easy way to detect errors in data types and content? Well there is! That is the main goal of Goodtables, the Frictionless data validation service, and also the "),t("code",[e._v("Frictionless-py")]),e._v(" "),t("code",[e._v("validate")]),e._v(" function. Interested in learning more about how you can validate your data? Read on to see how the Frictionless Fellows validated their research data and learn their tips and tricks!")]),e._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[e._v("TIP")]),e._v(" "),t("p",[e._v("Click on the links below to read the whole blog.")])]),e._v(" "),t("h3",{attrs:{id:"don-t-you-wish-your-table-was-as-clean-as-mine-by-monica-granados-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#don-t-you-wish-your-table-was-as-clean-as-mine-by-monica-granados-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/monica-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Don’t you wish your table was as clean as mine? By Monica Granados"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“How many times have you gotten a data frame from a colleague or downloaded data that had missing values? Or it’s missing a column name? Do you wish you were never that person? Well introducing Goodtables – your solution to counteracting bad data frames! As part of the inaugural Frictionless Data Fellows, I took Goodtables out for a spin.”")]),e._v(" "),t("h3",{attrs:{id:"validando-datos-un-paquete-a-la-vez-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validando-datos-un-paquete-a-la-vez-by-sele-yang-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validando datos un paquete a la vez by Sele Yang"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Yo trabajé con la base de datos que vengo utilizando para el programa que se encuentra en mi repositorio de Github. Es una base de datos geográficos sobre clínicas de aborto descargada desde OpenStreetMap a través de OverpassTurbo….Goodtables es una herramienta muy poderosa, que nos permite contar contar con la posibilidad de validación constante y de forma simple para mantener nuestras bases de datos en condiciones óptimas, no sólo para nuestro trabajo, sino también para la reproducción y uso de los mismos por otras personas.”")]),e._v(" "),t("h3",{attrs:{id:"tabular-data-before-you-use-the-data-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#tabular-data-before-you-use-the-data-by-ouso-daniel-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular data: Before you use the data by Ouso Daniel"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“I want to talk about goodtables, a Frictionless data (FD) tool for validating tabular data sets. As hinted by the name, you only want to work on/with tabular data in good condition; the tool highlights errors in your tabular dataset, with the precision of the exact location of your error. Again, the beautiful thing about FD tools is that they don’t discriminate on your preferences, it encompasses the Linux-based CLI, Python, GUI folks, among other languages.”")]),e._v(" "),t("h3",{attrs:{id:"data-validation-of-my-interview-dataset-using-goodtables-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-validation-of-my-interview-dataset-using-goodtables-by-lily-zhao-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Validation Of My Interview Dataset Using Goodtables by Lily Zhao"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“I used goodtables to validate the interview data we gathered as part of the first chapter of my PhD. These data were collected in Mo’orea, French Polynesia where we interviewed both residents and scientists regarding the future of research in Mo’orea….Amplifying local involvement and unifying the perspectives of researchers and coastal communities is critical not only in reducing inequity in science, but also in securing lasting coral reef health.”")]),e._v(" "),t("h3",{attrs:{id:"walking-through-the-frictionless-framework-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#walking-through-the-frictionless-framework-by-jacqueline-maasch-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Walking through the "),t("code",[e._v("frictionless")]),e._v(" framework by Jacqueline Maasch"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“While the GoodTables web server is a convenient tool for automated data validation, the frictionless framework allows for validation right within your Python scripts. We’ll demonstrate some key frictionless functionality, both in Python and command line syntax. As an illustrative point, we will use a CSV file that contains an invalid element – a remnant of careless file creation.”")]),e._v(" "),t("h3",{attrs:{id:"validating-your-data-before-sharing-with-the-community-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-your-data-before-sharing-with-the-community-by-dani-alcala-lopez-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating your data before sharing with the community by Dani Alcalá-López"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Once we have decided to share our data with the rest of the world, it is important to make sure that other people will be able to reuse it. This means providing as much metadata as possible, but also checking that there are no errors in the data that might prevent others from benefiting from our data. Goodtables is a simple tool that you can use both on the web and in the command-line interface to carry out this verification process”")]),e._v(" "),t("h3",{attrs:{id:"goodtables-blog-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#goodtables-blog-by-sam-wilairat-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Goodtables blog by Sam Wilairat"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Now let’s try validating the same data using the Goodtables command line tool! ….Once the installation is complete, type “goodtables path/to/file.csv”. You will either receive a green message stating that the data is valid, or a red message, like the one I have shown below, showing that the data is not valid!”")]),e._v(" "),t("h3",{attrs:{id:"using-goodtables-to-validate-metadata-from-multiple-sequencing-runs-by-kate-bowie-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-goodtables-to-validate-metadata-from-multiple-sequencing-runs-by-kate-bowie-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kate-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using goodtables to validate metadata from multiple sequencing runs by Kate Bowie"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Here, I will show you how I used a schema and GoodTables to make sure my metadata files could be combined, so I can use them for downstream microbial diversity analysis….It’s extremely helpful that GoodTables pointed this ### [error] out, because if I tried to combine these metadata files in R with non-matching case as it is here, then it would create TWO separate columns for the metadata….Now I will be able to combine these metadata files together and it will make my data analysis pipeline a lot smoother.”")]),e._v(" "),t("h3",{attrs:{id:"reflecting-on-datafication-data-prep-and-utf-8-with-goodtables-io-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reflecting-on-datafication-data-prep-and-utf-8-with-goodtables-io-by-anne-lee-steele-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reflecting on ‘datafication’, data prep, and UTF-8 with goodtables.io by Anne Lee Steele"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Before I knew it, it was 2021, and revisiting my data in the new year has made me realize just how much time and efforts goes into cleaning, structuring, and formatting datasets – and how much more goes into making them understandable for others (i.e. through Frictionless’ data-package). I’d always thought of these processes as a kind of black box, where ‘data analysis’ simply happens. But in reality, it’s the fact that we’ve been spending so much time on preparatory work that points to how important these processes actually are: and how much goes into making sure that data can be used before analyzing it in the first place.”")]),e._v(" "),t("h3",{attrs:{id:"validate-it-the-goodtables-way-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validate-it-the-goodtables-way-by-evelyn-night-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validate it the GoodTables way! By Evelyn Night"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Errors may sometimes occur while describing data in a tabular format and these could be in the structure; such as missing headers and duplicated rows, or in the content for instance assigning the wrong character to a string. Some of these errors could be easily spotted by naked eyes and fixed during the data curation process while others may just go unnoticed and later impede some downstream analytical workflows. GoodTables are handy in flagging down common errors that come with tabular data handling as it recognises these discrepancies fast and efficiently to enable users debug their data easily. ”")]),e._v(" "),t("h3",{attrs:{id:"using-the-frictionless-framework-for-data-validation-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-the-frictionless-framework-for-data-validation-by-katerina-drakoulaki-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using the frictionless framework for data validation by Katerina Drakoulaki"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Thus, similar to what the data package creator and "),t("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),t("OutboundLink")],1),e._v(" does, frictionless detects your variables and their names, and infers the type of data. However, it detected some of my variables as strings, when they are in fact integers. Of course, goodtables did not detect this, as my data were generally -in terms of formatting- valid. Not inferring the right type of data can be a problem both for future me, but also for other people looking at my data.”")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[119],{653:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Have you ever heard a data horror story about Excel automatically changing all numbers into dates without so much as a warning? Have you ever accidentally entered a wrong data value into a spreadsheet, or accidentally deleted a cell? What if there was an easy way to detect errors in data types and content? Well there is! That is the main goal of Goodtables, the Frictionless data validation service, and also the "),t("code",[e._v("Frictionless-py")]),e._v(" "),t("code",[e._v("validate")]),e._v(" function. Interested in learning more about how you can validate your data? Read on to see how the Frictionless Fellows validated their research data and learn their tips and tricks!")]),e._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[e._v("TIP")]),e._v(" "),t("p",[e._v("Click on the links below to read the whole blog.")])]),e._v(" "),t("h3",{attrs:{id:"don-t-you-wish-your-table-was-as-clean-as-mine-by-monica-granados-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#don-t-you-wish-your-table-was-as-clean-as-mine-by-monica-granados-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/monica-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Don’t you wish your table was as clean as mine? By Monica Granados"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“How many times have you gotten a data frame from a colleague or downloaded data that had missing values? Or it’s missing a column name? Do you wish you were never that person? Well introducing Goodtables – your solution to counteracting bad data frames! As part of the inaugural Frictionless Data Fellows, I took Goodtables out for a spin.”")]),e._v(" "),t("h3",{attrs:{id:"validando-datos-un-paquete-a-la-vez-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validando-datos-un-paquete-a-la-vez-by-sele-yang-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validando datos un paquete a la vez by Sele Yang"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Yo trabajé con la base de datos que vengo utilizando para el programa que se encuentra en mi repositorio de Github. Es una base de datos geográficos sobre clínicas de aborto descargada desde OpenStreetMap a través de OverpassTurbo….Goodtables es una herramienta muy poderosa, que nos permite contar contar con la posibilidad de validación constante y de forma simple para mantener nuestras bases de datos en condiciones óptimas, no sólo para nuestro trabajo, sino también para la reproducción y uso de los mismos por otras personas.”")]),e._v(" "),t("h3",{attrs:{id:"tabular-data-before-you-use-the-data-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#tabular-data-before-you-use-the-data-by-ouso-daniel-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular data: Before you use the data by Ouso Daniel"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“I want to talk about goodtables, a Frictionless data (FD) tool for validating tabular data sets. As hinted by the name, you only want to work on/with tabular data in good condition; the tool highlights errors in your tabular dataset, with the precision of the exact location of your error. Again, the beautiful thing about FD tools is that they don’t discriminate on your preferences, it encompasses the Linux-based CLI, Python, GUI folks, among other languages.”")]),e._v(" "),t("h3",{attrs:{id:"data-validation-of-my-interview-dataset-using-goodtables-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-validation-of-my-interview-dataset-using-goodtables-by-lily-zhao-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Validation Of My Interview Dataset Using Goodtables by Lily Zhao"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“I used goodtables to validate the interview data we gathered as part of the first chapter of my PhD. These data were collected in Mo’orea, French Polynesia where we interviewed both residents and scientists regarding the future of research in Mo’orea….Amplifying local involvement and unifying the perspectives of researchers and coastal communities is critical not only in reducing inequity in science, but also in securing lasting coral reef health.”")]),e._v(" "),t("h3",{attrs:{id:"walking-through-the-frictionless-framework-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#walking-through-the-frictionless-framework-by-jacqueline-maasch-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Walking through the "),t("code",[e._v("frictionless")]),e._v(" framework by Jacqueline Maasch"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“While the GoodTables web server is a convenient tool for automated data validation, the frictionless framework allows for validation right within your Python scripts. We’ll demonstrate some key frictionless functionality, both in Python and command line syntax. As an illustrative point, we will use a CSV file that contains an invalid element – a remnant of careless file creation.”")]),e._v(" "),t("h3",{attrs:{id:"validating-your-data-before-sharing-with-the-community-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-your-data-before-sharing-with-the-community-by-dani-alcala-lopez-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating your data before sharing with the community by Dani Alcalá-López"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Once we have decided to share our data with the rest of the world, it is important to make sure that other people will be able to reuse it. This means providing as much metadata as possible, but also checking that there are no errors in the data that might prevent others from benefiting from our data. Goodtables is a simple tool that you can use both on the web and in the command-line interface to carry out this verification process”")]),e._v(" "),t("h3",{attrs:{id:"goodtables-blog-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#goodtables-blog-by-sam-wilairat-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Goodtables blog by Sam Wilairat"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Now let’s try validating the same data using the Goodtables command line tool! ….Once the installation is complete, type “goodtables path/to/file.csv”. You will either receive a green message stating that the data is valid, or a red message, like the one I have shown below, showing that the data is not valid!”")]),e._v(" "),t("h3",{attrs:{id:"using-goodtables-to-validate-metadata-from-multiple-sequencing-runs-by-kate-bowie-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-goodtables-to-validate-metadata-from-multiple-sequencing-runs-by-kate-bowie-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kate-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using goodtables to validate metadata from multiple sequencing runs by Kate Bowie"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Here, I will show you how I used a schema and GoodTables to make sure my metadata files could be combined, so I can use them for downstream microbial diversity analysis….It’s extremely helpful that GoodTables pointed this ### [error] out, because if I tried to combine these metadata files in R with non-matching case as it is here, then it would create TWO separate columns for the metadata….Now I will be able to combine these metadata files together and it will make my data analysis pipeline a lot smoother.”")]),e._v(" "),t("h3",{attrs:{id:"reflecting-on-datafication-data-prep-and-utf-8-with-goodtables-io-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reflecting-on-datafication-data-prep-and-utf-8-with-goodtables-io-by-anne-lee-steele-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reflecting on ‘datafication’, data prep, and UTF-8 with goodtables.io by Anne Lee Steele"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Before I knew it, it was 2021, and revisiting my data in the new year has made me realize just how much time and efforts goes into cleaning, structuring, and formatting datasets – and how much more goes into making them understandable for others (i.e. through Frictionless’ data-package). I’d always thought of these processes as a kind of black box, where ‘data analysis’ simply happens. But in reality, it’s the fact that we’ve been spending so much time on preparatory work that points to how important these processes actually are: and how much goes into making sure that data can be used before analyzing it in the first place.”")]),e._v(" "),t("h3",{attrs:{id:"validate-it-the-goodtables-way-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validate-it-the-goodtables-way-by-evelyn-night-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validate it the GoodTables way! By Evelyn Night"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Errors may sometimes occur while describing data in a tabular format and these could be in the structure; such as missing headers and duplicated rows, or in the content for instance assigning the wrong character to a string. Some of these errors could be easily spotted by naked eyes and fixed during the data curation process while others may just go unnoticed and later impede some downstream analytical workflows. GoodTables are handy in flagging down common errors that come with tabular data handling as it recognises these discrepancies fast and efficiently to enable users debug their data easily. ”")]),e._v(" "),t("h3",{attrs:{id:"using-the-frictionless-framework-for-data-validation-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-the-frictionless-framework-for-data-validation-by-katerina-drakoulaki-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-goodtables-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using the frictionless framework for data validation by Katerina Drakoulaki"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Thus, similar to what the data package creator and "),t("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),t("OutboundLink")],1),e._v(" does, frictionless detects your variables and their names, and infers the type of data. However, it detected some of my variables as strings, when they are in fact integers. Of course, goodtables did not detect this, as my data were generally -in terms of formatting- valid. Not inferring the right type of data can be a problem both for future me, but also for other people looking at my data.”")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/120.8f23ec46.js b/assets/js/120.9ab4a56a.js similarity index 98% rename from assets/js/120.8f23ec46.js rename to assets/js/120.9ab4a56a.js index ec91ba675..1188a4ea7 100644 --- a/assets/js/120.8f23ec46.js +++ b/assets/js/120.9ab4a56a.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[120],{654:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"a-recap-from-our-january-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-january-community-call"}},[e._v("#")]),e._v(" A recap from our January community call")]),e._v(" "),r("p",[e._v("On January 28"),r("sup",[e._v("th")]),e._v(" we had our first Frictionless Data Community Call for 2021. It was great to see it was so well attended!")]),e._v(" "),r("p",[e._v("We heard a presentation by Carles Pina i Estany on schema-collaboration, a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). Before this tool was developed, researchers communicated with a data manager via email for each datapackage they were publishing, which slowed down considerably the whole process, besides making it more difficult.")]),e._v(" "),r("p",[e._v("To discover more about schema-collaboration, have a look at it on "),r("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),r("OutboundLink")],1),e._v(" or read "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/18/schema-collaboration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the blog"),r("OutboundLink")],1),e._v(" Carles wrote about the project. If you would like to dive deeper and watch Carles’ presentation, you can find it here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/_0cs25Fj_yU",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("ul",[r("li",[r("p",[r("a",{attrs:{href:"https://opendataday.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Day"),r("OutboundLink")],1),e._v(" is fast approaching! If you are organising something to celebrate open data on March 6"),r("sup",[e._v("th")]),e._v(", let us know! You still have a few days to apply for mini-grants for your community events.")])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. If you want to give a talk, make sure to submit a proposal by February 28"),r("sup",[e._v("th")]),e._v(". More info "),r("a",{attrs:{href:"https://csvconf.com/submit/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")])])]),e._v(" "),r("h2",{attrs:{id:"news-from-the-community"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),r("p",[e._v("Giuseppe Peronato and "),r("a",{attrs:{href:"https://cividi.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("cividi"),r("OutboundLink")],1),e._v(" started using Frictionless Data for data pipelines using (Geo-)Spatial datasets, e.g. raster data and GeoJSONs. You can have a look "),r("a",{attrs:{href:"https://github.com/datahq/dataflows/pull/153",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". They have also been looking more closely at the Creator’s UI library in a "),r("a",{attrs:{href:"https://github.com/gperonato/archive-forger",target:"_blank",rel:"noopener noreferrer"}},[e._v("prototype"),r("OutboundLink")],1),e._v(" with researchers, and releasing a "),r("a",{attrs:{href:"https://blog.datalets.ch/073/",target:"_blank",rel:"noopener noreferrer"}},[e._v("QGIS plugin"),r("OutboundLink")],1),e._v(" for Frictionless Data.")]),e._v(" "),r("p",[e._v("Thorben started working on the official vaccination publication by the German Federal Health Authority, which was replaced daily with a Data Package Pipeline saved as a Data Package by a GitHub Action. If you are interested, have a look "),r("a",{attrs:{href:"https://github.com/n0rdlicht/rki-vaccination-scraper",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on 25"),r("sup",[e._v("th")]),e._v(" February. Don’t miss the opportunity to get a code demonstration on "),r("a",{attrs:{href:"http://frictionless.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless.py"),r("OutboundLink")],1),e._v(" by our very own Evgeny Karev (@roll). You can "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z4-EM2RPKMA",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[120],{652:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"a-recap-from-our-january-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-january-community-call"}},[e._v("#")]),e._v(" A recap from our January community call")]),e._v(" "),r("p",[e._v("On January 28"),r("sup",[e._v("th")]),e._v(" we had our first Frictionless Data Community Call for 2021. It was great to see it was so well attended!")]),e._v(" "),r("p",[e._v("We heard a presentation by Carles Pina i Estany on schema-collaboration, a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). Before this tool was developed, researchers communicated with a data manager via email for each datapackage they were publishing, which slowed down considerably the whole process, besides making it more difficult.")]),e._v(" "),r("p",[e._v("To discover more about schema-collaboration, have a look at it on "),r("a",{attrs:{href:"https://github.com/frictionlessdata/schema-collaboration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),r("OutboundLink")],1),e._v(" or read "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/18/schema-collaboration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the blog"),r("OutboundLink")],1),e._v(" Carles wrote about the project. If you would like to dive deeper and watch Carles’ presentation, you can find it here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/_0cs25Fj_yU",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("ul",[r("li",[r("p",[r("a",{attrs:{href:"https://opendataday.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Day"),r("OutboundLink")],1),e._v(" is fast approaching! If you are organising something to celebrate open data on March 6"),r("sup",[e._v("th")]),e._v(", let us know! You still have a few days to apply for mini-grants for your community events.")])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. If you want to give a talk, make sure to submit a proposal by February 28"),r("sup",[e._v("th")]),e._v(". More info "),r("a",{attrs:{href:"https://csvconf.com/submit/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")])])]),e._v(" "),r("h2",{attrs:{id:"news-from-the-community"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),r("p",[e._v("Giuseppe Peronato and "),r("a",{attrs:{href:"https://cividi.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("cividi"),r("OutboundLink")],1),e._v(" started using Frictionless Data for data pipelines using (Geo-)Spatial datasets, e.g. raster data and GeoJSONs. You can have a look "),r("a",{attrs:{href:"https://github.com/datahq/dataflows/pull/153",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". They have also been looking more closely at the Creator’s UI library in a "),r("a",{attrs:{href:"https://github.com/gperonato/archive-forger",target:"_blank",rel:"noopener noreferrer"}},[e._v("prototype"),r("OutboundLink")],1),e._v(" with researchers, and releasing a "),r("a",{attrs:{href:"https://blog.datalets.ch/073/",target:"_blank",rel:"noopener noreferrer"}},[e._v("QGIS plugin"),r("OutboundLink")],1),e._v(" for Frictionless Data.")]),e._v(" "),r("p",[e._v("Thorben started working on the official vaccination publication by the German Federal Health Authority, which was replaced daily with a Data Package Pipeline saved as a Data Package by a GitHub Action. If you are interested, have a look "),r("a",{attrs:{href:"https://github.com/n0rdlicht/rki-vaccination-scraper",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on 25"),r("sup",[e._v("th")]),e._v(" February. Don’t miss the opportunity to get a code demonstration on "),r("a",{attrs:{href:"http://frictionless.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless.py"),r("OutboundLink")],1),e._v(" by our very own Evgeny Karev (@roll). You can "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z4-EM2RPKMA",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/121.c7c0fbf6.js b/assets/js/121.1bb95626.js similarity index 97% rename from assets/js/121.c7c0fbf6.js rename to assets/js/121.1bb95626.js index ad770f068..b0c398aab 100644 --- a/assets/js/121.c7c0fbf6.js +++ b/assets/js/121.1bb95626.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[121],{653:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("HuBMAP ("),a("a",{attrs:{href:"https://portal.hubmapconsortium.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Human BioMolecular Atlas Program"),a("OutboundLink")],1),e._v(") is creating an open, global atlas of the human body at the cellular level. To do this, we’re incorporating data from dozens of different assay types, and as many institutions. Each assay type has its own metadata requirements, and Frictionless Table Schemas are an important part of our validation framework, to ensure that the metadata supplied by the labs is good.")]),e._v(" "),a("p",[e._v("That system has worked well, as far as it goes, but when there are errors, it’s a pain for the labs to read the error message, find the original TSV, scroll to the appropriate row and column, re-enter, re-save, re-upload… and hopefully not repeat! To simplify that process, we’ve made "),a("a",{attrs:{href:"https://pypi.org/project/tableschema-to-template/#description",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-to-template"),a("OutboundLink")],1),e._v(": it takes a Table Schema as input, and returns an Excel template with embedded documentation and some basic validations.")]),e._v(" "),a("p",[a("code",[e._v("pip install tableschema-to-template")])]),e._v(" "),a("p",[a("code",[e._v("ts2xl.py schema.yaml new-template.xlsx")])]),e._v(" "),a("p",[e._v("It can be used either as a command-line tool, or as a python library. Right now the generated Excel files offer pull-downs for enum constraints, and also check that floats, integers, and booleans are the correct format, and that numbers are in bounds. Adding support for regex pattern constraints is a high priority for us… What features are important to you? Issues and PRs are welcome at the "),a("a",{attrs:{href:"https://github.com/hubmapconsortium/tableschema-to-template",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[121],{654:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("HuBMAP ("),a("a",{attrs:{href:"https://portal.hubmapconsortium.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Human BioMolecular Atlas Program"),a("OutboundLink")],1),e._v(") is creating an open, global atlas of the human body at the cellular level. To do this, we’re incorporating data from dozens of different assay types, and as many institutions. Each assay type has its own metadata requirements, and Frictionless Table Schemas are an important part of our validation framework, to ensure that the metadata supplied by the labs is good.")]),e._v(" "),a("p",[e._v("That system has worked well, as far as it goes, but when there are errors, it’s a pain for the labs to read the error message, find the original TSV, scroll to the appropriate row and column, re-enter, re-save, re-upload… and hopefully not repeat! To simplify that process, we’ve made "),a("a",{attrs:{href:"https://pypi.org/project/tableschema-to-template/#description",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-to-template"),a("OutboundLink")],1),e._v(": it takes a Table Schema as input, and returns an Excel template with embedded documentation and some basic validations.")]),e._v(" "),a("p",[a("code",[e._v("pip install tableschema-to-template")])]),e._v(" "),a("p",[a("code",[e._v("ts2xl.py schema.yaml new-template.xlsx")])]),e._v(" "),a("p",[e._v("It can be used either as a command-line tool, or as a python library. Right now the generated Excel files offer pull-downs for enum constraints, and also check that floats, integers, and booleans are the correct format, and that numbers are in bounds. Adding support for regex pattern constraints is a high priority for us… What features are important to you? Issues and PRs are welcome at the "),a("a",{attrs:{href:"https://github.com/hubmapconsortium/tableschema-to-template",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/123.50916f86.js b/assets/js/123.3cc6b810.js similarity index 98% rename from assets/js/123.50916f86.js rename to assets/js/123.3cc6b810.js index b2cdd26d0..0b0afaff7 100644 --- a/assets/js/123.50916f86.js +++ b/assets/js/123.3cc6b810.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[123],{659:function(e,r,t){"use strict";t.r(r);var o=t(29),a=Object(o.a)({},(function(){var e=this,r=e.$createElement,t=e._self._c||r;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("h2",{attrs:{id:"a-recap-from-our-february-community-call"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-february-community-call"}},[e._v("#")]),e._v(" A recap from our February community call")]),e._v(" "),t("p",[e._v("On this February Community Call we had a top notch code demonstration of the new "),t("a",{attrs:{href:"http://frictionless.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless.py"),t("OutboundLink")],1),e._v(" framework by our Frictionless Data senior developer Evgeny Karev. We had been looking very much forward to presenting the new framework to you all and we were very pleased that so many of you joined us. If you would like to know more about it, you can explore the new Frictionless Python framework through the "),t("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation portal"),t("OutboundLink")],1),e._v(" or on "),t("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("If you couldn’t make it to the call, or you are just curious and would like to go over the presentation again, here it is:")]),e._v(" "),t("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/WX4NbYmvu9M",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),t("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),t("p",[t("a",{attrs:{href:"https://opendataday.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Day"),t("OutboundLink")],1),e._v(" is fast approaching with over 200 events organised online on March 6"),t("sup",[e._v("th")]),e._v(". Together with the "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Fellows"),t("OutboundLink")],1),e._v(" we will be celebrating open research data. Join us online from 3pm UTC. "),t("a",{attrs:{href:"https://us02web.zoom.us/meeting/register/tZUvdeuspjMoGtK-rR8wV4IrnfEW_5-KdLkG",target:"_blank",rel:"noopener noreferrer"}},[e._v("RSVP here"),t("OutboundLink")],1),e._v(" for the link to join this virtual event. This event is open to everyone.")]),e._v(" "),t("h2",{attrs:{id:"join-us-next-month"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),t("p",[e._v("Our next meeting will be on 25"),t("sup",[e._v("th")]),e._v(" March. We will hear about Hackathons to facilitate the creation of web tools to create field-specific FAIR archive files from Oleg Lavrovsky and Giuseppe Peronato.")]),e._v(" "),t("p",[e._v("You can sign up "),t("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),t("h2",{attrs:{id:"call-recording"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),t("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),t("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/W0EHL6SSPcE",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),t("p",[e._v("As usual, you can join us on "),t("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),t("OutboundLink")],1),e._v(" or "),t("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),t("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);r.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[123],{656:function(e,r,t){"use strict";t.r(r);var o=t(29),a=Object(o.a)({},(function(){var e=this,r=e.$createElement,t=e._self._c||r;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("h2",{attrs:{id:"a-recap-from-our-february-community-call"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-february-community-call"}},[e._v("#")]),e._v(" A recap from our February community call")]),e._v(" "),t("p",[e._v("On this February Community Call we had a top notch code demonstration of the new "),t("a",{attrs:{href:"http://frictionless.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless.py"),t("OutboundLink")],1),e._v(" framework by our Frictionless Data senior developer Evgeny Karev. We had been looking very much forward to presenting the new framework to you all and we were very pleased that so many of you joined us. If you would like to know more about it, you can explore the new Frictionless Python framework through the "),t("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation portal"),t("OutboundLink")],1),e._v(" or on "),t("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("If you couldn’t make it to the call, or you are just curious and would like to go over the presentation again, here it is:")]),e._v(" "),t("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/WX4NbYmvu9M",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),t("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),t("p",[t("a",{attrs:{href:"https://opendataday.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Day"),t("OutboundLink")],1),e._v(" is fast approaching with over 200 events organised online on March 6"),t("sup",[e._v("th")]),e._v(". Together with the "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Fellows"),t("OutboundLink")],1),e._v(" we will be celebrating open research data. Join us online from 3pm UTC. "),t("a",{attrs:{href:"https://us02web.zoom.us/meeting/register/tZUvdeuspjMoGtK-rR8wV4IrnfEW_5-KdLkG",target:"_blank",rel:"noopener noreferrer"}},[e._v("RSVP here"),t("OutboundLink")],1),e._v(" for the link to join this virtual event. This event is open to everyone.")]),e._v(" "),t("h2",{attrs:{id:"join-us-next-month"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),t("p",[e._v("Our next meeting will be on 25"),t("sup",[e._v("th")]),e._v(" March. We will hear about Hackathons to facilitate the creation of web tools to create field-specific FAIR archive files from Oleg Lavrovsky and Giuseppe Peronato.")]),e._v(" "),t("p",[e._v("You can sign up "),t("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),t("h2",{attrs:{id:"call-recording"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),t("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),t("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/W0EHL6SSPcE",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),t("p",[e._v("As usual, you can join us on "),t("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),t("OutboundLink")],1),e._v(" or "),t("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),t("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);r.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/124.6ad45b64.js b/assets/js/124.4132dc2c.js similarity index 99% rename from assets/js/124.6ad45b64.js rename to assets/js/124.4132dc2c.js index 69d5709f2..769dbe4bb 100644 --- a/assets/js/124.6ad45b64.js +++ b/assets/js/124.4132dc2c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[124],{656:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This blog is part of a series showcasing projects developed during the 2020-2021 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),a("p",[e._v("We are Simon Tyrrell and Xingdong Bian, both research software engineers, in Robert Davey’s Data Infrastructure and Algorithms group at the Earlham Institute. We built the "),a("a",{attrs:{href:"https://grassroots.tools/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Grassroots Infrastructure project"),a("OutboundLink")],1),e._v(" which aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data. This is part of the "),a("a",{attrs:{href:"https://designingfuturewheat.org.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Designing Future Wheat (DFW)"),a("OutboundLink")],1),e._v(" project. There are two separate parts of this project that we have added Frictionless Data support to and we’ll now describe each of these in turn.")]),e._v(" "),a("h2",{attrs:{id:"why-add-frictionless-to-the-designing-future-wheat-project"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-add-frictionless-to-the-designing-future-wheat-project"}},[e._v("#")]),e._v(" Why add Frictionless to the Designing Future Wheat project?")]),e._v(" "),a("p",[e._v("The first part of the Tool Fund project we added Frictionless Data to is the DFW data portal which delivers large scale wheat datasets that are also tied to semantically marked-up metadata. These datasets are heterogeneous and vary from field trial information, sequencing data, through to phenotyping images, etc. Given the different needs of users of this data, there is an increasing need to be able to manage this data and its associated metadata to allow for as easy dissemination as possible. So the issue that we had was how can we standardize the methods to access this data/metadata and label it using both well-defined ontologies and standards to deliver consistent data packages to users in an interoperable way. This is where Frictionless Data came in, allowing data scientists a consistent, well-defined standard to use when building programs or workflows to access the data stored on the portal.")]),e._v(" "),a("p",[e._v("The portal uses a combination of an "),a("a",{attrs:{href:"https://irods.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("iRODS"),a("OutboundLink")],1),e._v(" repository, to store the data and metadata, and "),a("a",{attrs:{href:"https://httpd.apache.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Apache"),a("OutboundLink")],1),e._v(" to host the files with our in-house developed Apache module, mod_eirods_dav, linking the two together. It was this module that we added the Frictionless Data support to and further details are available in the "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav#frictionless-data-support",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"how-does-the-new-frictionless-implementation-work"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-does-the-new-frictionless-implementation-work"}},[e._v("#")]),e._v(" How does the new Frictionless implementation work?")]),e._v(" "),a("p",[e._v("So what does it do? Well, it can generate a datapackage.json file automatically for any number of specified directories. These Data Packages can either be generated dynamically on each access or can optionally be written back to the iRODS repository and served like any other static file stored there. Since every iRODS repository can use different metadata keys for storing the information that the Data Packages require, the required key names are completely configurable by specifying the iRODS metadata keys to use in the mod_eirods_dav configuration file and you can do things like combining the values of multiple iRODS metadata keys with generic strings to produce the value that you want to use in the Data Package. Currently the Data Package’s name, title, description, authors, unique identifier and license details are all supported. For each entry within the Data Package’s resources array, the name, path checksum and size attributes are also stored.")]),e._v(" "),a("p",[e._v("As well as standard entries within the Data Package, we also added support for Tabular Data Packages. As with standard entries, all of the keys for the column names can be generated from setting the required directives within the module configuration file.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128100-b5154a00-7dc6-11eb-8d8a-a915a49e6742.png",alt:"imgblog"}}),a("br"),e._v("\nFigure1: A Data Package generated automatically by mod_eirods_dav")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128509-25bc6680-7dc7-11eb-8c2e-ff966169f9c5.png",alt:"imgblog2"}}),a("br"),e._v("\nFigure2: Tabular Data Package generated automatically by mod_eirods_dav")]),e._v(" "),a("h2",{attrs:{id:"adding-ckan-support"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#adding-ckan-support"}},[e._v("#")]),e._v(" Adding CKAN support")]),e._v(" "),a("p",[e._v("The second of the tools that we have implemented Frictionless Data support for is the DFW CKAN website. Primarily we use this to store publications from the project output. We currently have over 300 entries in there and since its collection is getting larger and larger, we needed a more manageable way of having better data integration, especially when using other systems through the projects by our collaborators.")]),e._v(" "),a("p",[e._v("So we built a simple Python Django webapp to do this:")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128662-58fef580-7dc7-11eb-88c9-46e8e36b4def.png",alt:"imgblog3"}})]),e._v(" "),a("p",[e._v("By querying the REST API provided by CKAN and getting the datasets’ metadata as JSON output, followed by using the "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-ckan-mapper",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless CKAN Mapper"),a("OutboundLink")],1),e._v(", the JSON is converted into datapackage.json, to conform with Frictionless Data standard. If any of the resources under a dataset is CSV, the headings will be extracted as the "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("tabular data package schema"),a("OutboundLink")],1),e._v(" and integrated into the datapackage.json file itself. As well as providing the datapackage.json file as a download through the Django web app, it is also possible to push the datapackage.json back to the CKAN as a resource file on the page. This requires the CKAN user key with the relevant permissions.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128881-94012900-7dc7-11eb-9833-e46f351477be.png",alt:"imgblog4"}})]),e._v(" "),a("h2",{attrs:{id:"how-can-you-try-this-tool"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-try-this-tool"}},[e._v("#")]),e._v(" How can you try this tool?")]),e._v(" "),a("p",[e._v("The tool can be used by accessing its REST interface:")]),e._v(" "),a("ul",[a("li",[a("code",[e._v("/convert?q={ckan-dataset-id}")]),e._v(" - convert CKAN dataset json to datapackage json e.g. /convert?q=0c03fa08-2142-426b-b1ca-fa852f909aa6")]),e._v(" "),a("li",[a("code",[e._v("/convert_resources?q={ckan-dataset-id}")]),e._v(" - convert CKAN dataset json to datapackage json with resources, also if any of the resources files are CSV files, the tabular data package will be converted. e.g. /convert_resources?q=grassroots-frictionless-data-test")]),e._v(" "),a("li",[a("code",[e._v("/convert_push?q={ckan-dataset-id}&key={ckan-user-key}")]),e._v(" - push the generated datapackage.json to the CKAN entry."),a("br"),e._v("\nAn example REST query page:")])]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110129172-efcbb200-7dc7-11eb-9230-a70cbbd6d9cf.png",alt:"imgblog5"}})]),e._v(" "),a("p",[e._v("It is possible to have your own local deployment of the tool too by downloading the web app from its Github repository, installing the requirements, and running the server with")]),e._v(" "),a("p",[a("code",[e._v("$manage.py runserver 8000")])]),e._v(" "),a("p",[e._v("Our collaborators can utilise the datapackage.json and integrate the CKAN entries to their own tools or project with ease as it conforms to the Frictionless Data standard.")]),e._v(" "),a("h2",{attrs:{id:"next-steps-for-frictionlessly-designing-future-wheat"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#next-steps-for-frictionlessly-designing-future-wheat"}},[e._v("#")]),e._v(" Next Steps for Frictionlessly Designing Future Wheat")]),e._v(" "),a("p",[e._v("It has been a hugely positive step to implement support for Frictionless Data Packages and we’ve already used these packages ourselves after two of our servers decided to fall over within three days of each other! Our future plans are to add support for further metadata keys within the datapackage.json files and expose more datasets as Frictionless Data Packages. For the CKAN-side, there are a few improvements that can be made in future: firstly, make the base CKAN url configurable in a config file, so this can be used for any CKAN website. Secondly, create a docker file to include the whole Django app, so it is more portable and easier to be deployed. You can keep track of the project at the following links:")]),e._v(" "),a("ul",[a("li",[e._v("The Designing Future Wheat Data Portal: "),a("a",{attrs:{href:"https://opendata.earlham.ac.uk/wheat/under_license/toronto/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://opendata.earlham.ac.uk/wheat/under_license/toronto/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("The Designing Future Wheat publications portal: "),a("a",{attrs:{href:"https://ckan.grassroots.tools",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://ckan.grassroots.tools"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("mod_eirods_dav: "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/billyfish/eirods-dav"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("CKAN Frictionless Data web application: "),a("a",{attrs:{href:"https://github.com/TGAC/ckan-frictionlessdata",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/TGAC/ckan-frictionlessdata"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[124],{657:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This blog is part of a series showcasing projects developed during the 2020-2021 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),a("p",[e._v("We are Simon Tyrrell and Xingdong Bian, both research software engineers, in Robert Davey’s Data Infrastructure and Algorithms group at the Earlham Institute. We built the "),a("a",{attrs:{href:"https://grassroots.tools/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Grassroots Infrastructure project"),a("OutboundLink")],1),e._v(" which aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data. This is part of the "),a("a",{attrs:{href:"https://designingfuturewheat.org.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Designing Future Wheat (DFW)"),a("OutboundLink")],1),e._v(" project. There are two separate parts of this project that we have added Frictionless Data support to and we’ll now describe each of these in turn.")]),e._v(" "),a("h2",{attrs:{id:"why-add-frictionless-to-the-designing-future-wheat-project"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-add-frictionless-to-the-designing-future-wheat-project"}},[e._v("#")]),e._v(" Why add Frictionless to the Designing Future Wheat project?")]),e._v(" "),a("p",[e._v("The first part of the Tool Fund project we added Frictionless Data to is the DFW data portal which delivers large scale wheat datasets that are also tied to semantically marked-up metadata. These datasets are heterogeneous and vary from field trial information, sequencing data, through to phenotyping images, etc. Given the different needs of users of this data, there is an increasing need to be able to manage this data and its associated metadata to allow for as easy dissemination as possible. So the issue that we had was how can we standardize the methods to access this data/metadata and label it using both well-defined ontologies and standards to deliver consistent data packages to users in an interoperable way. This is where Frictionless Data came in, allowing data scientists a consistent, well-defined standard to use when building programs or workflows to access the data stored on the portal.")]),e._v(" "),a("p",[e._v("The portal uses a combination of an "),a("a",{attrs:{href:"https://irods.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("iRODS"),a("OutboundLink")],1),e._v(" repository, to store the data and metadata, and "),a("a",{attrs:{href:"https://httpd.apache.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Apache"),a("OutboundLink")],1),e._v(" to host the files with our in-house developed Apache module, mod_eirods_dav, linking the two together. It was this module that we added the Frictionless Data support to and further details are available in the "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav#frictionless-data-support",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"how-does-the-new-frictionless-implementation-work"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-does-the-new-frictionless-implementation-work"}},[e._v("#")]),e._v(" How does the new Frictionless implementation work?")]),e._v(" "),a("p",[e._v("So what does it do? Well, it can generate a datapackage.json file automatically for any number of specified directories. These Data Packages can either be generated dynamically on each access or can optionally be written back to the iRODS repository and served like any other static file stored there. Since every iRODS repository can use different metadata keys for storing the information that the Data Packages require, the required key names are completely configurable by specifying the iRODS metadata keys to use in the mod_eirods_dav configuration file and you can do things like combining the values of multiple iRODS metadata keys with generic strings to produce the value that you want to use in the Data Package. Currently the Data Package’s name, title, description, authors, unique identifier and license details are all supported. For each entry within the Data Package’s resources array, the name, path checksum and size attributes are also stored.")]),e._v(" "),a("p",[e._v("As well as standard entries within the Data Package, we also added support for Tabular Data Packages. As with standard entries, all of the keys for the column names can be generated from setting the required directives within the module configuration file.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128100-b5154a00-7dc6-11eb-8d8a-a915a49e6742.png",alt:"imgblog"}}),a("br"),e._v("\nFigure1: A Data Package generated automatically by mod_eirods_dav")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128509-25bc6680-7dc7-11eb-8c2e-ff966169f9c5.png",alt:"imgblog2"}}),a("br"),e._v("\nFigure2: Tabular Data Package generated automatically by mod_eirods_dav")]),e._v(" "),a("h2",{attrs:{id:"adding-ckan-support"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#adding-ckan-support"}},[e._v("#")]),e._v(" Adding CKAN support")]),e._v(" "),a("p",[e._v("The second of the tools that we have implemented Frictionless Data support for is the DFW CKAN website. Primarily we use this to store publications from the project output. We currently have over 300 entries in there and since its collection is getting larger and larger, we needed a more manageable way of having better data integration, especially when using other systems through the projects by our collaborators.")]),e._v(" "),a("p",[e._v("So we built a simple Python Django webapp to do this:")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128662-58fef580-7dc7-11eb-88c9-46e8e36b4def.png",alt:"imgblog3"}})]),e._v(" "),a("p",[e._v("By querying the REST API provided by CKAN and getting the datasets’ metadata as JSON output, followed by using the "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-ckan-mapper",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless CKAN Mapper"),a("OutboundLink")],1),e._v(", the JSON is converted into datapackage.json, to conform with Frictionless Data standard. If any of the resources under a dataset is CSV, the headings will be extracted as the "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("tabular data package schema"),a("OutboundLink")],1),e._v(" and integrated into the datapackage.json file itself. As well as providing the datapackage.json file as a download through the Django web app, it is also possible to push the datapackage.json back to the CKAN as a resource file on the page. This requires the CKAN user key with the relevant permissions.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110128881-94012900-7dc7-11eb-9833-e46f351477be.png",alt:"imgblog4"}})]),e._v(" "),a("h2",{attrs:{id:"how-can-you-try-this-tool"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-try-this-tool"}},[e._v("#")]),e._v(" How can you try this tool?")]),e._v(" "),a("p",[e._v("The tool can be used by accessing its REST interface:")]),e._v(" "),a("ul",[a("li",[a("code",[e._v("/convert?q={ckan-dataset-id}")]),e._v(" - convert CKAN dataset json to datapackage json e.g. /convert?q=0c03fa08-2142-426b-b1ca-fa852f909aa6")]),e._v(" "),a("li",[a("code",[e._v("/convert_resources?q={ckan-dataset-id}")]),e._v(" - convert CKAN dataset json to datapackage json with resources, also if any of the resources files are CSV files, the tabular data package will be converted. e.g. /convert_resources?q=grassroots-frictionless-data-test")]),e._v(" "),a("li",[a("code",[e._v("/convert_push?q={ckan-dataset-id}&key={ckan-user-key}")]),e._v(" - push the generated datapackage.json to the CKAN entry."),a("br"),e._v("\nAn example REST query page:")])]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/110129172-efcbb200-7dc7-11eb-9230-a70cbbd6d9cf.png",alt:"imgblog5"}})]),e._v(" "),a("p",[e._v("It is possible to have your own local deployment of the tool too by downloading the web app from its Github repository, installing the requirements, and running the server with")]),e._v(" "),a("p",[a("code",[e._v("$manage.py runserver 8000")])]),e._v(" "),a("p",[e._v("Our collaborators can utilise the datapackage.json and integrate the CKAN entries to their own tools or project with ease as it conforms to the Frictionless Data standard.")]),e._v(" "),a("h2",{attrs:{id:"next-steps-for-frictionlessly-designing-future-wheat"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#next-steps-for-frictionlessly-designing-future-wheat"}},[e._v("#")]),e._v(" Next Steps for Frictionlessly Designing Future Wheat")]),e._v(" "),a("p",[e._v("It has been a hugely positive step to implement support for Frictionless Data Packages and we’ve already used these packages ourselves after two of our servers decided to fall over within three days of each other! Our future plans are to add support for further metadata keys within the datapackage.json files and expose more datasets as Frictionless Data Packages. For the CKAN-side, there are a few improvements that can be made in future: firstly, make the base CKAN url configurable in a config file, so this can be used for any CKAN website. Secondly, create a docker file to include the whole Django app, so it is more portable and easier to be deployed. You can keep track of the project at the following links:")]),e._v(" "),a("ul",[a("li",[e._v("The Designing Future Wheat Data Portal: "),a("a",{attrs:{href:"https://opendata.earlham.ac.uk/wheat/under_license/toronto/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://opendata.earlham.ac.uk/wheat/under_license/toronto/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("The Designing Future Wheat publications portal: "),a("a",{attrs:{href:"https://ckan.grassroots.tools",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://ckan.grassroots.tools"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("mod_eirods_dav: "),a("a",{attrs:{href:"https://github.com/billyfish/eirods-dav",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/billyfish/eirods-dav"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("CKAN Frictionless Data web application: "),a("a",{attrs:{href:"https://github.com/TGAC/ckan-frictionlessdata",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/TGAC/ckan-frictionlessdata"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/125.f54a71c6.js b/assets/js/125.ac9d6293.js similarity index 99% rename from assets/js/125.f54a71c6.js rename to assets/js/125.ac9d6293.js index 6e5b24571..d40ae0278 100644 --- a/assets/js/125.f54a71c6.js +++ b/assets/js/125.ac9d6293.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[125],{657:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("The “reproducibility crisis” is a hot topic in scientific research these days. Can you reproduce published data from another laboratory? Can you follow the published scientific methods and get the same result? Unfortunately, the answer to these questions is often no.")]),e._v(" "),t("p",[e._v("One of the goals of Frictionless Data is to help researchers make their work more reproducible. To achieve this, we focus on making data more understandable (make sure to document your metadata!), of higher quality (via validation checks), and easier to reuse (by standardization and packaging).")]),e._v(" "),t("p",[e._v("As a test of these reproducibility measures, we tasked the Frictionless Fellows with reproducing each others’ data packages! This was a great learning experience for the Fellows and revealed some important lessons about how to make their data more (re)usable. Click on the blog links below to read more about their experiences!")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproduciendo-un-viaje-a-mo-rea-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproduciendo-un-viaje-a-mo-rea-by-sele-yang-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproduciendo un viaje a Mo’rea by Sele Yang"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Mi viaje a través de los datos de Lily, me llevó a Mo’rea, Polinesia Francesa, desde donde ella, a través de diferentes herramientas, recopiló un total de 175 entrevistas entre residentes y también investigadores/as de la región…Para reproducir los datos de Lily, utilicé inicialmente el DataPackage Creator tool para cargar su información en bruto y así empezar a revisar las especificaciones de su data type creados de manera automática por la herramienta.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"packaging-ouso-s-data-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#packaging-ouso-s-data-by-lily-zhao-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Packaging Ouso’s Data by Lily Zhao"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“This week I had the opportunity to work with my colleague’s data. He created a Datapackage which I replicated. In doing so, I learned a lot about the Datapackage web interface….Using these data Ouso and his co-authors evaluate the ability of high-resolution melting analysis to identify illegally targeted wildlife species.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"data-barter-real-life-data-interactions-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-barter-real-life-data-interactions-by-ouso-daniel-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Barter: Real-life data interactions by Ouso Daniel"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica’s work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"on-readme-files-sharing-data-and-interoperability-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#on-readme-files-sharing-data-and-interoperability-by-anne-lee-steele-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("On README files, sharing data and interoperability by Anne Lee Steele"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“One of the goals of the Frictionless Data Fellowship has been to help us make our research more interoperable, which is another way of saying: something that other researchers can use, even if they have entirely different systems or tools with which they approach the same topic….What if researchers of all types wrote prototypical “data packages” about their research, that gave greater context to their work, or explained its wider relevance? In my fields, many researchers tend to find this in ‘the art of the footnote’, but this type of informal knowledge or context is not operationalized in any real way.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"using-frictionless-tools-to-help-you-understand-open-data-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-frictionless-tools-to-help-you-understand-open-data-by-dani-alcala-lopez-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using Frictionless tools to help you understand open data by Dani Alcalá-López"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“A few weeks ago, the fellows did an interesting exercise: We would try to replicate each others DataPackages in pairs. We had spent some time before creating and validating DataPacakges with our own data. Now it was the time to see how would it be to work with someone else’s. This experience was intended to be a way for us to check how it was to be at the other side.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"validating-someone-else-s-data-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-someone-else-s-data-by-katerina-drakoulaki-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating someone else’s data! By Katerina Drakoulaki"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“The first thing I did was to go through the README file on my fellow’s repository. Since the repository was in a completely different field, I really had to read through everything very carefully, and think about the terms they used….Validating the data (to the extent that it was possible after all) was easy using the goodtables tools.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproducing-jacqueline-s-datapackage-and-revalidating-her-data-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproducing-jacqueline-s-datapackage-and-revalidating-her-data-by-sam-wilairat-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-reproduce-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducing Jacqueline’s Datapackage and Revalidating her Data! By Sam Wilairat"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Using Jacqueline’s GitHub repository, Frictionless Data Package Creator, and Goodtables, I feel that I can confidently reuse her dataset for my own research purposes. While there was one piece of metadata missing from her dataset, her publicly published datapackage .JSON file on her repository helped me to quickly figure out how to interpret the unlabeled column. I also feel confident that the data is valid because after doing a visual scan of the dataset, I used the Goodtables tool to double check that the data was valid!”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproducing-a-data-package-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproducing-a-data-package-by-jacqueline-maasch-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-pkg-reprod-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducing a data package by Jacqueline Maasch"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Is it easy to reproduce someone else’s data package? Sometimes, but not always. Tools that automate data management can standardize the process, making reproducibility simpler to achieve. However, accurately anticipating a tool’s expected behavior is essential, especially when mixing technologies.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"validating-data-from-daniel-alcala-lopez-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-data-from-daniel-alcala-lopez-by-evelyn-night-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating data from Daniel Alcalá-López by Evelyn Night"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“In a fast paced research world where there’s an approximate increase of 8-9% in scientific publications every year, an overload of information is usually fed to the outside world. Unfortunately for us, most of this information is often wasted due to the reproducibility crisis marred by data or code that’s often locked away. We explored the question, ‘how reproducible is your data?’ by exchanging personal data and validating them according to the instructions that are outlined in the fellows’ recent goodtables blogs.”")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[125],{658:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("The “reproducibility crisis” is a hot topic in scientific research these days. Can you reproduce published data from another laboratory? Can you follow the published scientific methods and get the same result? Unfortunately, the answer to these questions is often no.")]),e._v(" "),t("p",[e._v("One of the goals of Frictionless Data is to help researchers make their work more reproducible. To achieve this, we focus on making data more understandable (make sure to document your metadata!), of higher quality (via validation checks), and easier to reuse (by standardization and packaging).")]),e._v(" "),t("p",[e._v("As a test of these reproducibility measures, we tasked the Frictionless Fellows with reproducing each others’ data packages! This was a great learning experience for the Fellows and revealed some important lessons about how to make their data more (re)usable. Click on the blog links below to read more about their experiences!")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproduciendo-un-viaje-a-mo-rea-by-sele-yang-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproduciendo-un-viaje-a-mo-rea-by-sele-yang-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sele-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproduciendo un viaje a Mo’rea by Sele Yang"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Mi viaje a través de los datos de Lily, me llevó a Mo’rea, Polinesia Francesa, desde donde ella, a través de diferentes herramientas, recopiló un total de 175 entrevistas entre residentes y también investigadores/as de la región…Para reproducir los datos de Lily, utilicé inicialmente el DataPackage Creator tool para cargar su información en bruto y así empezar a revisar las especificaciones de su data type creados de manera automática por la herramienta.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"packaging-ouso-s-data-by-lily-zhao-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#packaging-ouso-s-data-by-lily-zhao-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lily-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Packaging Ouso’s Data by Lily Zhao"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“This week I had the opportunity to work with my colleague’s data. He created a Datapackage which I replicated. In doing so, I learned a lot about the Datapackage web interface….Using these data Ouso and his co-authors evaluate the ability of high-resolution melting analysis to identify illegally targeted wildlife species.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"data-barter-real-life-data-interactions-by-ouso-daniel-cohort-1"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#data-barter-real-life-data-interactions-by-ouso-daniel-cohort-1"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/ouso-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Barter: Real-life data interactions by Ouso Daniel"),t("OutboundLink")],1),e._v(" (Cohort 1)")]),e._v(" "),t("p",[e._v("“Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica’s work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"on-readme-files-sharing-data-and-interoperability-by-anne-lee-steele-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#on-readme-files-sharing-data-and-interoperability-by-anne-lee-steele-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("On README files, sharing data and interoperability by Anne Lee Steele"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“One of the goals of the Frictionless Data Fellowship has been to help us make our research more interoperable, which is another way of saying: something that other researchers can use, even if they have entirely different systems or tools with which they approach the same topic….What if researchers of all types wrote prototypical “data packages” about their research, that gave greater context to their work, or explained its wider relevance? In my fields, many researchers tend to find this in ‘the art of the footnote’, but this type of informal knowledge or context is not operationalized in any real way.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"using-frictionless-tools-to-help-you-understand-open-data-by-dani-alcala-lopez-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#using-frictionless-tools-to-help-you-understand-open-data-by-dani-alcala-lopez-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/dani-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Using Frictionless tools to help you understand open data by Dani Alcalá-López"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“A few weeks ago, the fellows did an interesting exercise: We would try to replicate each others DataPackages in pairs. We had spent some time before creating and validating DataPacakges with our own data. Now it was the time to see how would it be to work with someone else’s. This experience was intended to be a way for us to check how it was to be at the other side.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"validating-someone-else-s-data-by-katerina-drakoulaki-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-someone-else-s-data-by-katerina-drakoulaki-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating someone else’s data! By Katerina Drakoulaki"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“The first thing I did was to go through the README file on my fellow’s repository. Since the repository was in a completely different field, I really had to read through everything very carefully, and think about the terms they used….Validating the data (to the extent that it was possible after all) was easy using the goodtables tools.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproducing-jacqueline-s-datapackage-and-revalidating-her-data-by-sam-wilairat-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproducing-jacqueline-s-datapackage-and-revalidating-her-data-by-sam-wilairat-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/sam-reproduce-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducing Jacqueline’s Datapackage and Revalidating her Data! By Sam Wilairat"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Using Jacqueline’s GitHub repository, Frictionless Data Package Creator, and Goodtables, I feel that I can confidently reuse her dataset for my own research purposes. While there was one piece of metadata missing from her dataset, her publicly published datapackage .JSON file on her repository helped me to quickly figure out how to interpret the unlabeled column. I also feel confident that the data is valid because after doing a visual scan of the dataset, I used the Goodtables tool to double check that the data was valid!”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"reproducing-a-data-package-by-jacqueline-maasch-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reproducing-a-data-package-by-jacqueline-maasch-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-pkg-reprod-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Reproducing a data package by Jacqueline Maasch"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“Is it easy to reproduce someone else’s data package? Sometimes, but not always. Tools that automate data management can standardize the process, making reproducibility simpler to achieve. However, accurately anticipating a tool’s expected behavior is essential, especially when mixing technologies.”")]),e._v(" "),t("hr"),e._v(" "),t("h3",{attrs:{id:"validating-data-from-daniel-alcala-lopez-by-evelyn-night-cohort-2"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#validating-data-from-daniel-alcala-lopez-by-evelyn-night-cohort-2"}},[e._v("#")]),e._v(" "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-partner-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Validating data from Daniel Alcalá-López by Evelyn Night"),t("OutboundLink")],1),e._v(" (Cohort 2)")]),e._v(" "),t("p",[e._v("“In a fast paced research world where there’s an approximate increase of 8-9% in scientific publications every year, an overload of information is usually fed to the outside world. Unfortunately for us, most of this information is often wasted due to the reproducibility crisis marred by data or code that’s often locked away. We explored the question, ‘how reproducible is your data?’ by exchanging personal data and validating them according to the instructions that are outlined in the fellows’ recent goodtables blogs.”")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/126.2eac7835.js b/assets/js/126.88c9378d.js similarity index 98% rename from assets/js/126.2eac7835.js rename to assets/js/126.88c9378d.js index 1f70e05f4..cedf6bd64 100644 --- a/assets/js/126.2eac7835.js +++ b/assets/js/126.88c9378d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[126],{658:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"a-recap-from-our-march-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-march-community-call"}},[e._v("#")]),e._v(" A recap from our March community call")]),e._v(" "),r("p",[e._v("On our last Frictionless Data community call on March 25"),r("sup",[e._v("th")]),e._v(", we dealt with a very current topic thanks to Thorben Westerhuys, who presented his project on Frictionless Vaccination data.")]),e._v(" "),r("p",[e._v("To compensate for the lack of time perspective in the government data, Thorben has developed a spatiotemporal tracker for state level covid vaccination data, which takes the data provided by the government, reformats it and makes it available to everyone in a structured, more machine readable form.")]),e._v(" "),r("p",[e._v("To discover more about this great project, have a look at it on "),r("a",{attrs:{href:"https://github.com/n0rdlicht/rki-vaccination-scraper",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all the project’s applications, you can watch Thorben’s presentation here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/TR0kNEC3bBM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. Registrations are open. Don’t forget to book your place!")]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on April 29"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from the Frictionless Fellows. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/5cghp8KieLE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v(" ")]),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[126],{659:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"a-recap-from-our-march-community-call"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#a-recap-from-our-march-community-call"}},[e._v("#")]),e._v(" A recap from our March community call")]),e._v(" "),r("p",[e._v("On our last Frictionless Data community call on March 25"),r("sup",[e._v("th")]),e._v(", we dealt with a very current topic thanks to Thorben Westerhuys, who presented his project on Frictionless Vaccination data.")]),e._v(" "),r("p",[e._v("To compensate for the lack of time perspective in the government data, Thorben has developed a spatiotemporal tracker for state level covid vaccination data, which takes the data provided by the government, reformats it and makes it available to everyone in a structured, more machine readable form.")]),e._v(" "),r("p",[e._v("To discover more about this great project, have a look at it on "),r("a",{attrs:{href:"https://github.com/n0rdlicht/rki-vaccination-scraper",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all the project’s applications, you can watch Thorben’s presentation here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/TR0kNEC3bBM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. Registrations are open. Don’t forget to book your place!")]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on April 29"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from the Frictionless Fellows. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/5cghp8KieLE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v(" ")]),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/128.5f662b25.js b/assets/js/128.8b277324.js similarity index 99% rename from assets/js/128.5f662b25.js rename to assets/js/128.8b277324.js index ada62f042..661ba1edc 100644 --- a/assets/js/128.5f662b25.js +++ b/assets/js/128.8b277324.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[128],{661:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Originally published: "),o("a",{attrs:{href:"https://blog.okfn.org/2021/04/14/unveiling-the-new-frictionless-data-documentation-portal/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2021/04/14/unveiling-the-new-frictionless-data-documentation-portal/"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Have you used Frictionless Data documentation in the past and been confused or wanted more examples? Are you a brand new Frictionless Data user looking to get started learning?")]),e._v(" "),o("p",[e._v("We invite you all to read our new and improved "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation portal"),o("OutboundLink")],1),e._v("! Thanks to a "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/13/partnering-with-odi/",target:"_blank",rel:"noopener noreferrer"}},[e._v("fund that the Open Knowledge Foundation was awarded"),o("OutboundLink")],1),e._v(" from the "),o("a",{attrs:{href:"https://theodi.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Institute"),o("OutboundLink")],1),e._v(", we have completely reworked the guides of our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Framework website"),o("OutboundLink")],1),e._v(" according to the suggestions from a cohort of users gathered in several feedback sessions throughout the months of February and March.")]),e._v(" "),o("p",[e._v("We cannot stress enough how precious those feedback sessions have been to us. They were an excellent opportunity to connect with our users and reflect together with them on how to make all our guides more useful for current and future users. The enthusiasm and engagement that the community showed for the process was great to see and reminded us that the link with the community should be at the core of open source projects.")]),e._v(" "),o("p",[e._v("We were amazed by the amount of extremely useful inputs that we got. While we are still digesting some of the suggestions and working out how to best implement them, we have made many changes to make the documentation a smoother, Frictionless experience.")]),e._v(" "),o("h2",{attrs:{id:"so-what-s-new"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#so-what-s-new"}},[e._v("#")]),e._v(" So what’s new?")]),e._v(" "),o("p",[e._v("A common theme from the feedback sessions was that it was sometimes difficult for novice users to understand the whole potential of the Frictionless specifications. To help make this clearer, we added a more detailed explanation, user examples and user stories to our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/introduction",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction"),o("OutboundLink")],1),e._v(". We also added some extra installation tips and a troubleshooting section to our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/quick-start",target:"_blank",rel:"noopener noreferrer"}},[e._v("Quick Start guide"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The users also suggested several code changes, like more realistic code examples, better explanations of functions, and the ability to run code examples in both the Command Line and Python. This last suggestion was prompted because most of the guides use a mix of Command Line and Python syntax, which was confusing to our users. We have clarified that by adding a switch in the code snippets that allows user to work with a pure Python Syntax or pure Command Line (when possible), as you can see "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/basic-examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". We also put together an "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/faq/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAQ section"),o("OutboundLink")],1),e._v(" based on questions that were often asked on our "),o("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord chat"),o("OutboundLink")],1),e._v(". If you have suggestions for other common questions to add, let us know!")]),e._v(" "),o("p",[e._v("The documentation revamping process also included the publication of new tutorials. We worked on two new Frictionless tutorials, which are published under the Notebooks link in the navigation menu. While working on those, we got inspired by the feedback sessions and realised that it made sense to give our community the possibility to contribute to the project with some real life examples of Frictionless Data use. The user selection process has started and we hope to get the new tutorials online by the end of the month, so stay tuned!")]),e._v(" "),o("h2",{attrs:{id:"what-s-next"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next?")]),e._v(" "),o("p",[e._v("Our commitment to continually improving our documentation is not over with this project coming to an end! Do you have suggestions for changes you would like to see in our documentation? Please reach out to us or open a "),o("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/pulls",target:"_blank",rel:"noopener noreferrer"}},[e._v("pull request"),o("OutboundLink")],1),e._v(" to contribute. Everyone is welcome to contribute! Learn how to do it "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/development/contributing",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h2",{attrs:{id:"thanks-thanks-thanks"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#thanks-thanks-thanks"}},[e._v("#")]),e._v(" Thanks, thanks, thanks!")]),e._v(" "),o("p",[e._v("Once again, we are very grateful to the Open Data Institute for giving us the chance to focus on this documentation in order to improve it. We cannot thank enough all our users who took part in the feedback sessions, your contributions were precious.")]),e._v(" "),o("h2",{attrs:{id:"more-about-frictionless-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#more-about-frictionless-data"}},[e._v("#")]),e._v(" More about Frictionless Data")]),e._v(" "),o("p",[e._v("Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[128],{662:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Originally published: "),o("a",{attrs:{href:"https://blog.okfn.org/2021/04/14/unveiling-the-new-frictionless-data-documentation-portal/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2021/04/14/unveiling-the-new-frictionless-data-documentation-portal/"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Have you used Frictionless Data documentation in the past and been confused or wanted more examples? Are you a brand new Frictionless Data user looking to get started learning?")]),e._v(" "),o("p",[e._v("We invite you all to read our new and improved "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation portal"),o("OutboundLink")],1),e._v("! Thanks to a "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/13/partnering-with-odi/",target:"_blank",rel:"noopener noreferrer"}},[e._v("fund that the Open Knowledge Foundation was awarded"),o("OutboundLink")],1),e._v(" from the "),o("a",{attrs:{href:"https://theodi.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Institute"),o("OutboundLink")],1),e._v(", we have completely reworked the guides of our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Framework website"),o("OutboundLink")],1),e._v(" according to the suggestions from a cohort of users gathered in several feedback sessions throughout the months of February and March.")]),e._v(" "),o("p",[e._v("We cannot stress enough how precious those feedback sessions have been to us. They were an excellent opportunity to connect with our users and reflect together with them on how to make all our guides more useful for current and future users. The enthusiasm and engagement that the community showed for the process was great to see and reminded us that the link with the community should be at the core of open source projects.")]),e._v(" "),o("p",[e._v("We were amazed by the amount of extremely useful inputs that we got. While we are still digesting some of the suggestions and working out how to best implement them, we have made many changes to make the documentation a smoother, Frictionless experience.")]),e._v(" "),o("h2",{attrs:{id:"so-what-s-new"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#so-what-s-new"}},[e._v("#")]),e._v(" So what’s new?")]),e._v(" "),o("p",[e._v("A common theme from the feedback sessions was that it was sometimes difficult for novice users to understand the whole potential of the Frictionless specifications. To help make this clearer, we added a more detailed explanation, user examples and user stories to our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/introduction",target:"_blank",rel:"noopener noreferrer"}},[e._v("Introduction"),o("OutboundLink")],1),e._v(". We also added some extra installation tips and a troubleshooting section to our "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/quick-start",target:"_blank",rel:"noopener noreferrer"}},[e._v("Quick Start guide"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The users also suggested several code changes, like more realistic code examples, better explanations of functions, and the ability to run code examples in both the Command Line and Python. This last suggestion was prompted because most of the guides use a mix of Command Line and Python syntax, which was confusing to our users. We have clarified that by adding a switch in the code snippets that allows user to work with a pure Python Syntax or pure Command Line (when possible), as you can see "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/basic-examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". We also put together an "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/faq/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAQ section"),o("OutboundLink")],1),e._v(" based on questions that were often asked on our "),o("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord chat"),o("OutboundLink")],1),e._v(". If you have suggestions for other common questions to add, let us know!")]),e._v(" "),o("p",[e._v("The documentation revamping process also included the publication of new tutorials. We worked on two new Frictionless tutorials, which are published under the Notebooks link in the navigation menu. While working on those, we got inspired by the feedback sessions and realised that it made sense to give our community the possibility to contribute to the project with some real life examples of Frictionless Data use. The user selection process has started and we hope to get the new tutorials online by the end of the month, so stay tuned!")]),e._v(" "),o("h2",{attrs:{id:"what-s-next"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next?")]),e._v(" "),o("p",[e._v("Our commitment to continually improving our documentation is not over with this project coming to an end! Do you have suggestions for changes you would like to see in our documentation? Please reach out to us or open a "),o("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/pulls",target:"_blank",rel:"noopener noreferrer"}},[e._v("pull request"),o("OutboundLink")],1),e._v(" to contribute. Everyone is welcome to contribute! Learn how to do it "),o("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/development/contributing",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h2",{attrs:{id:"thanks-thanks-thanks"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#thanks-thanks-thanks"}},[e._v("#")]),e._v(" Thanks, thanks, thanks!")]),e._v(" "),o("p",[e._v("Once again, we are very grateful to the Open Data Institute for giving us the chance to focus on this documentation in order to improve it. We cannot thank enough all our users who took part in the feedback sessions, your contributions were precious.")]),e._v(" "),o("h2",{attrs:{id:"more-about-frictionless-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#more-about-frictionless-data"}},[e._v("#")]),e._v(" More about Frictionless Data")]),e._v(" "),o("p",[e._v("Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/129.015e158d.js b/assets/js/129.130b6b7a.js similarity index 99% rename from assets/js/129.015e158d.js rename to assets/js/129.130b6b7a.js index b6396045b..499c74f1f 100644 --- a/assets/js/129.015e158d.js +++ b/assets/js/129.130b6b7a.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[129],{721:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on April 29"),r("sup",[e._v("th")]),e._v(" we had an interactive session with our great Frictionless Data Fellows: Daniel Alcalá López, Kate Bowie, Katerina Drakoulaki, Anne Lee, Jacqueline Maasch, Evelyn Night and Samantha Wilairat.")]),e._v(" "),r("p",[e._v("The Fellows are early career researchers recruited to become champions of the Frictionless Data tools and approaches in their field. During the nine months of their fellowship, which started in August 2020, the Fellows learned how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. It was a real pleasure to work with this amazing cohort. Sadly the fellowship is coming to an end, but we are sure we will hear a lot from them in the future.")]),e._v(" "),r("p",[e._v("You can learn more about them "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(", and read all the great blogs they wrote "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("If you would like to hear directly from the Fellows about their experience with Frictionless Data and what the fellowship meant for them, you can have a look at the presentation they made during the community call here below:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/uw7wqdiCP_g",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. It is free and virtual - register "),r("a",{attrs:{href:"https://www.eventbrite.com/e/csvconfv6-tickets-144250211265",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". There are two Frictionless sessions:")]),e._v(" "),r("ul",[r("li",[e._v("May 4"),r("sup",[e._v("th")]),e._v(": Frictionless Data workshop led by the Reproducible Research fellows, don’t miss the opportunity to meet the Fellows again!")]),e._v(" "),r("li",[e._v("May 5"),r("sup",[e._v("th")]),e._v(": Frictionless Data for Wheat by Simon Tyrrell")])]),e._v(" "),r("p",[e._v("Full programme here: "),r("a",{attrs:{href:"https://csvconf.com/speakers",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/speakers"),r("OutboundLink")],1)]),e._v(" "),r("h2",{attrs:{id:"news-from-the-community"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the Community")]),e._v(" "),r("p",[e._v("Oleg Lavrovsky presented instant APIs for small Frictionless Data-powered apps. "),r("a",{attrs:{href:"https://scene.rip/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here"),r("OutboundLink")],1),e._v(" is an example app developed during the latest Swiss OpenGLAM hackathon. To know more about it, you can also check:")]),e._v(" "),r("ul",[r("li",[e._v("The "),r("a",{attrs:{href:"https://github.com/we-art-o-nauts/the-scene-lives",target:"_blank",rel:"noopener noreferrer"}},[e._v("source code"),r("OutboundLink")],1),e._v(" which uses "),r("a",{attrs:{href:"https://github.com/datahq/dataflows",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataFlows"),r("OutboundLink")],1),e._v(" for the aggregation, and the "),r("a",{attrs:{href:"https://github.com/rgieseke/pandas-datapackage-reader",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pandas Data Package reader"),r("OutboundLink")],1),e._v(" as the basis for filtering.")]),e._v(" "),r("li",[e._v("The "),r("a",{attrs:{href:"https://hack.glam.opendata.ch/project/114",target:"_blank",rel:"noopener noreferrer"}},[e._v("project page"),r("OutboundLink")],1),e._v(" and slides which outline the motivation to collect and homogenize electronic art archives.")]),e._v(" "),r("li",[e._v("An "),r("a",{attrs:{href:"https://github.com/loleg/baumkataster-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("earlier attempt"),r("OutboundLink")],1),e._v(" which involves a city tree catalogue. The team is also building on this approach in several projects at "),r("a",{attrs:{href:"http://github.com/cividi",target:"_blank",rel:"noopener noreferrer"}},[e._v("cividi"),r("OutboundLink")],1),e._v(".")])]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on May 27"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from Simon Tyrrell on his Tool Fund project - Frictionless Data for Wheat. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/sRJZnm7bUQc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[129],{663:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on April 29"),r("sup",[e._v("th")]),e._v(" we had an interactive session with our great Frictionless Data Fellows: Daniel Alcalá López, Kate Bowie, Katerina Drakoulaki, Anne Lee, Jacqueline Maasch, Evelyn Night and Samantha Wilairat.")]),e._v(" "),r("p",[e._v("The Fellows are early career researchers recruited to become champions of the Frictionless Data tools and approaches in their field. During the nine months of their fellowship, which started in August 2020, the Fellows learned how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. It was a real pleasure to work with this amazing cohort. Sadly the fellowship is coming to an end, but we are sure we will hear a lot from them in the future.")]),e._v(" "),r("p",[e._v("You can learn more about them "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(", and read all the great blogs they wrote "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("If you would like to hear directly from the Fellows about their experience with Frictionless Data and what the fellowship meant for them, you can have a look at the presentation they made during the community call here below:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/uw7wqdiCP_g",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[r("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v6"),r("OutboundLink")],1),e._v(" is happening on May 4-5. It is free and virtual - register "),r("a",{attrs:{href:"https://www.eventbrite.com/e/csvconfv6-tickets-144250211265",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". There are two Frictionless sessions:")]),e._v(" "),r("ul",[r("li",[e._v("May 4"),r("sup",[e._v("th")]),e._v(": Frictionless Data workshop led by the Reproducible Research fellows, don’t miss the opportunity to meet the Fellows again!")]),e._v(" "),r("li",[e._v("May 5"),r("sup",[e._v("th")]),e._v(": Frictionless Data for Wheat by Simon Tyrrell")])]),e._v(" "),r("p",[e._v("Full programme here: "),r("a",{attrs:{href:"https://csvconf.com/speakers",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/speakers"),r("OutboundLink")],1)]),e._v(" "),r("h2",{attrs:{id:"news-from-the-community"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the Community")]),e._v(" "),r("p",[e._v("Oleg Lavrovsky presented instant APIs for small Frictionless Data-powered apps. "),r("a",{attrs:{href:"https://scene.rip/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here"),r("OutboundLink")],1),e._v(" is an example app developed during the latest Swiss OpenGLAM hackathon. To know more about it, you can also check:")]),e._v(" "),r("ul",[r("li",[e._v("The "),r("a",{attrs:{href:"https://github.com/we-art-o-nauts/the-scene-lives",target:"_blank",rel:"noopener noreferrer"}},[e._v("source code"),r("OutboundLink")],1),e._v(" which uses "),r("a",{attrs:{href:"https://github.com/datahq/dataflows",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataFlows"),r("OutboundLink")],1),e._v(" for the aggregation, and the "),r("a",{attrs:{href:"https://github.com/rgieseke/pandas-datapackage-reader",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pandas Data Package reader"),r("OutboundLink")],1),e._v(" as the basis for filtering.")]),e._v(" "),r("li",[e._v("The "),r("a",{attrs:{href:"https://hack.glam.opendata.ch/project/114",target:"_blank",rel:"noopener noreferrer"}},[e._v("project page"),r("OutboundLink")],1),e._v(" and slides which outline the motivation to collect and homogenize electronic art archives.")]),e._v(" "),r("li",[e._v("An "),r("a",{attrs:{href:"https://github.com/loleg/baumkataster-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("earlier attempt"),r("OutboundLink")],1),e._v(" which involves a city tree catalogue. The team is also building on this approach in several projects at "),r("a",{attrs:{href:"http://github.com/cividi",target:"_blank",rel:"noopener noreferrer"}},[e._v("cividi"),r("OutboundLink")],1),e._v(".")])]),e._v(" "),r("h2",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on May 27"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from Simon Tyrrell on his Tool Fund project - Frictionless Data for Wheat. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h2",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/sRJZnm7bUQc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/13.d6483d82.js b/assets/js/13.ab9d3f3d.js similarity index 99% rename from assets/js/13.d6483d82.js rename to assets/js/13.ab9d3f3d.js index cc001bded..905ddfa2c 100644 --- a/assets/js/13.d6483d82.js +++ b/assets/js/13.ab9d3f3d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[13],{468:function(t,e,a){t.exports=a.p+"assets/img/figure-1.ab449321.png"},469:function(t,e,a){t.exports=a.p+"assets/img/figure-2.b334817a.png"},470:function(t,e,a){t.exports=a.p+"assets/img/figure-3.7144591c.png"},471:function(t,e,a){t.exports=a.p+"assets/img/figure-4.478e4974.gif"},472:function(t,e,a){t.exports=a.p+"assets/img/figure-5.e20a138b.gif"},605:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("Errors in data are not uncommon. They also often get in the way of quick and timely data analysis for many data users. What if there was a way to quickly identify errors in your data to accelerate the process by which you fix them before sharing your data or using it for analysis?")]),t._v(" "),o("p",[t._v("In this section, we will learn how to carry out one-time data validation using")]),t._v(" "),o("ul",[o("li",[t._v("a free web tool called "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(",")]),t._v(" "),o("li",[t._v("the goodtables command line tool which you use in your local machine.")])]),t._v(" "),o("p",[t._v("Our working assumption is that you already know what a data schema and a data package are, and how to create them. If not, "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[t._v("start here")]),t._v(".")],1),t._v(" "),o("h2",{attrs:{id:"one-time-data-validation-with-try-goodtables-io"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#one-time-data-validation-with-try-goodtables-io"}},[t._v("#")]),t._v(" One-time data validation with "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("Now that you have your data package you may want to check it for errors. We refer to this process as data validation. Raw data is often ‘messy’ or ‘dirty’, which means it contains errors and irrelevant bits that make it inaccurate and difficult to quickly analyse and draw insight from existing datasets. "),o("strong",[t._v("Goodtables")]),t._v(" exists to identify structural and content errors in your tabular data so they can be fixed quickly. As with other tools mentioned in this field guide, goodtables aims to help data publishers improve the quality of their data before the data is shared elsewhere and used for analysis, or archived.")]),t._v(" "),o("p",[o("strong",[t._v("Types of errors identified in the validation process")])]),t._v(" "),o("p",[t._v("Here are some of the errors that "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" highlights. A more exhaustive list is available "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py#validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("table",[o("thead",[o("tr",[o("th",[o("strong",[t._v("Structural Errors")])]),t._v(" "),o("th")])]),t._v(" "),o("tbody",[o("tr",[o("td",[t._v("blank-header")]),t._v(" "),o("td",[t._v("There is a blank header name. All cells in the header row must have a value.")])]),t._v(" "),o("tr",[o("td",[t._v("duplicate-header")]),t._v(" "),o("td",[t._v("There are multiple columns with the same name. All column names must be unique.")])]),t._v(" "),o("tr",[o("td",[t._v("blank-row")]),t._v(" "),o("td",[t._v("Rows must have at least one non-blank cell.")])]),t._v(" "),o("tr",[o("td",[t._v("duplicate-row")]),t._v(" "),o("td",[t._v("Rows can’t be duplicated.")])]),t._v(" "),o("tr",[o("td",[t._v("extra-value")]),t._v(" "),o("td",[t._v("A row has more columns than the header.")])]),t._v(" "),o("tr",[o("td",[t._v("missing-value")]),t._v(" "),o("td",[t._v("A row has less columns than the header.")])]),t._v(" "),o("tr",[o("td",[o("strong",[t._v("Content Errors")])]),t._v(" "),o("td")]),t._v(" "),o("tr",[o("td",[t._v("schema-error")]),t._v(" "),o("td",[t._v("Schema is not valid.")])]),t._v(" "),o("tr",[o("td",[t._v("non-matching-header")]),t._v(" "),o("td",[t._v("The header’s name in the schema is different from what’s in the data.")])]),t._v(" "),o("tr",[o("td",[t._v("extra-header")]),t._v(" "),o("td",[t._v("The data contains a header not defined in the schema.")])]),t._v(" "),o("tr",[o("td",[t._v("missing-header")]),t._v(" "),o("td",[t._v("The data doesn’t contain a header defined in the schema.")])]),t._v(" "),o("tr",[o("td",[t._v("type-or-format-error")]),t._v(" "),o("td",[t._v("The value can’t be cast based on the schema type and format for this field.")])])])]),t._v(" "),o("p",[o("strong",[t._v("Load tabular data for one-time validation")])]),t._v(" "),o("p",[t._v("You can add a dataset for one-time validation on "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" in two ways:")]),t._v(" "),o("ul",[o("li",[t._v("If your tabular data is publicly available online, obtain a link to the tabular data you would like to validate and paste it in the "),o("strong",[t._v("{Source}")]),t._v(" section.")]),t._v(" "),o("li",[t._v("Alternatively, Click on the Upload file prompt in the "),o("strong",[t._v("{Source}")]),t._v(" section to load a tabular dataset from your local machine")])]),t._v(" "),o("p",[o("strong",[t._v("Validating data without a schema")])]),t._v(" "),o("p",[t._v("In this section we will illustrate how to check tabular data for structural errors on "),o("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" where a data schema is not available. For this tutorial we will use a "),o("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/goodtables-py/bc6470a970aacf65f20a3ddb7f71eb05a2a31c70/data/invalid-on-structure.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("sample CSV file with errors"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[t._v("Copy and paste the file’s URL to the "),o("strong",[t._v("{Source}")]),t._v(" input. When you click on the "),o("strong",[t._v("{Validate}")]),t._v(" button, "),o("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" presents an exhaustive list of structural errors in your dataset.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(468),alt:"Add dataset link in the Source field, or select the Upload file option"}}),o("br"),t._v(" "),o("em",[t._v("Figure 1: Add dataset link in the Source field, or select the Upload file option.")])]),t._v(" "),o("p",[t._v("If needed, you can disable two types of validation checks:")]),t._v(" "),o("ul",[o("li",[o("p",[t._v("Ignore blank rows"),o("br"),t._v("\nUse this checkbox to indicate whether blank rows should be considered as errors, or simply ignored. Check this option if missing data is a known issue that cannot be fixed immediately i.e. if you are not the owner/publisher of the data.")])]),t._v(" "),o("li",[o("p",[t._v("Ignore duplicate rows"),o("br"),t._v("\nUse this checkbox to indicate whether duplicate rows should be considered as errors, or simply ignored.")])])]),t._v(" "),o("p",[t._v("We will leave all boxes unchecked for our example. On validate, we receive a list of 12 errors as we can see in figure 7 below.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(469),alt:"dataset errors outlined on try.goodtables.io"}}),o("br"),t._v(" "),o("em",[t._v("Figure 2: dataset errors outlined on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" points us to specific cells containing errors so they can be fixed easily. We can use this list as a guide to fix all errors in our data manually, and run a second validation test to confirm that all issues are resolved. If there no validations could be found, the ensuing message will be as in figure 8 below:")]),t._v(" "),o("p",[o("img",{attrs:{src:a(470),alt:"valid data message on goodtables.io"}}),o("br"),t._v(" "),o("em",[t._v("Figure 3: valid data message on "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[t._v("Improving data quality is an iterative process that should involve data publishers and maintainers. Tools such as "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" allow you to focus on complex errors like if the presented data is correct, instead of wasting time with simple (but very common) errors like incorrect date formats.")]),t._v(" "),o("p",[o("strong",[t._v("Validating tabular data with a schema")])]),t._v(" "),o("p",[t._v("A data schema contains information on the structure of your tabular data. Providing a data schema as part of the validation process on "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" makes it possible to check your dataset for content errors. For example, a schema contains information on fields and their assigned data types, making it possible to highlight misplaced data i.e. text in an amounts column where numeric data is expected. If you haven’t yet, learn how to create a data schema for your data collection before continuing with this section.")]),t._v(" "),o("p",[t._v("To test how this works, you can use:")]),t._v(" "),o("ul",[o("li",[t._v("any of the data packages from "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages",target:"_blank",rel:"noopener noreferrer"}},[t._v("this Data Package collection on GitHub"),o("OutboundLink")],1),t._v(", which comprises of example data packages curated by the Frictionless Data team or")]),t._v(" "),o("li",[o("a",{attrs:{href:"http://datahub.io/core/",target:"_blank",rel:"noopener noreferrer"}},[t._v("any of the Core Data Packages on DataHub"),o("OutboundLink")],1),t._v(". The Core Data project provides essential data for data wranglers and data science community. Read more about it "),o("a",{attrs:{href:"https://datahub.io/docs/core-data",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[t._v("In any given Data Package, the "),o("em",[t._v("datapackage.json")]),t._v(" file contains the schema and the data folder contains tabular data to be validated against the schema.")]),t._v(" "),o("p",[t._v("Often, you will find that you may be working in workflows that involve many datasets, which are updated regularly. In cases such as this, one-time validation on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" is probably not the answer. But fear not! Goodtables has the ability to automate the validation process so that errors are checked for continually. Find out more in our continuous and "),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[t._v("automated data validation section")]),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"one-time-data-validation-with-goodtables-command-line-tool"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#one-time-data-validation-with-goodtables-command-line-tool"}},[t._v("#")]),t._v(" One-time data validation with goodtables command line tool")]),t._v(" "),o("p",[t._v("The same validations that we’ve done on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(", can also be done in your local machine using goodtables. This is especially useful for big datasets, or if your data is not publicly accessible online. However, this is a slightly technical task, which requires basic knowledge of the command line (CLI). If you don’t know how to use the CLI, or are a bit rusty, we recommend you to read the "),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/intro_to_command_line/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Introduction to the command-line tutorial"),o("OutboundLink")],1),t._v(" before proceeding.")]),t._v(" "),o("p",[t._v("For this section, you will need:")]),t._v(" "),o("ul",[o("li",[t._v("Python, a programming language which the goodtables command-line tool is written in - ["),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/python_installation/",target:"_blank",rel:"noopener noreferrer"}},[t._v("installation instructions"),o("OutboundLink")],1),t._v("]")]),t._v(" "),o("li",[t._v("PIP, a tool that allows you to install packages written in Python. Installing Python automatically installs PIP, but in case not - [installation instructions]")]),t._v(" "),o("li",[t._v("Basic knowledge on how to use the command-line (see the "),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/intro_to_command_line/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Introduction to the command-line"),o("OutboundLink")],1),t._v(" if you want to brush up your skills)")])]),t._v(" "),o("p",[t._v("Once Python is set up, open your "),o("strong",[t._v("Terminal")]),t._v(" and install goodtables using the package manager, PIP. The command "),o("code",[t._v("pip install goodtables")]),t._v(".")]),t._v(" "),o("p",[o("img",{attrs:{src:a(471),alt:"installing goodtables command-line tool with pip in Terminal"}}),o("br"),t._v(" "),o("em",[t._v("Figure 4: installing goodtables command-line tool with pip in Terminal.")])]),t._v(" "),o("p",[t._v("To validate a data file, type goodtables followed by the path to your file i.e. "),o("code",[t._v("goodtables path/to/file.csv")]),t._v(". You can pass multiple file paths one after the other, or even the path to a "),o("em",[t._v("datapackage.json")]),t._v(" file.")]),t._v(" "),o("p",[t._v("For our first example, we will download and check "),o("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py/blob/master/data/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("this simple location CSV data file"),o("OutboundLink")],1),t._v(" for errors. In the second instance, we will validate this "),o("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/goodtables-py/bc6470a970aacf65f20a3ddb7f71eb05a2a31c70/data/invalid-on-structure.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("Department of Data Expenses dataset, that contains errors"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[o("img",{attrs:{src:a(472),alt:"Validating data files using goodtables in Terminal"}}),o("br"),t._v(" "),o("em",[t._v("Figure 5: Validating data files using goodtables in Terminal.")])]),t._v(" "),o("p",[t._v("You can see the list of options by running "),o("code",[t._v("goodtables --help")]),t._v(". The full documentation, including the list of validation checks that can be run, is available "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("on the goodtables-py repository on GitHub"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[t._v("Congratulations, you now know how to validate your tabular data using the command-line!")]),t._v(" "),o("p",[t._v("If you regularly update your data or maintain many different datasets, running the validations manually can be time-consuming. The solution is to automate this process, so the data is validated every time it changes, ensuring the errors are caught as soon as possible. Find out how to do it in the “"),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[t._v("Automating the validation checks")]),t._v("” section.")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[13],{468:function(t,e,a){t.exports=a.p+"assets/img/figure-1.ab449321.png"},469:function(t,e,a){t.exports=a.p+"assets/img/figure-2.b334817a.png"},470:function(t,e,a){t.exports=a.p+"assets/img/figure-3.7144591c.png"},471:function(t,e,a){t.exports=a.p+"assets/img/figure-4.478e4974.gif"},472:function(t,e,a){t.exports=a.p+"assets/img/figure-5.e20a138b.gif"},604:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("Errors in data are not uncommon. They also often get in the way of quick and timely data analysis for many data users. What if there was a way to quickly identify errors in your data to accelerate the process by which you fix them before sharing your data or using it for analysis?")]),t._v(" "),o("p",[t._v("In this section, we will learn how to carry out one-time data validation using")]),t._v(" "),o("ul",[o("li",[t._v("a free web tool called "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(",")]),t._v(" "),o("li",[t._v("the goodtables command line tool which you use in your local machine.")])]),t._v(" "),o("p",[t._v("Our working assumption is that you already know what a data schema and a data package are, and how to create them. If not, "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[t._v("start here")]),t._v(".")],1),t._v(" "),o("h2",{attrs:{id:"one-time-data-validation-with-try-goodtables-io"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#one-time-data-validation-with-try-goodtables-io"}},[t._v("#")]),t._v(" One-time data validation with "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("Now that you have your data package you may want to check it for errors. We refer to this process as data validation. Raw data is often ‘messy’ or ‘dirty’, which means it contains errors and irrelevant bits that make it inaccurate and difficult to quickly analyse and draw insight from existing datasets. "),o("strong",[t._v("Goodtables")]),t._v(" exists to identify structural and content errors in your tabular data so they can be fixed quickly. As with other tools mentioned in this field guide, goodtables aims to help data publishers improve the quality of their data before the data is shared elsewhere and used for analysis, or archived.")]),t._v(" "),o("p",[o("strong",[t._v("Types of errors identified in the validation process")])]),t._v(" "),o("p",[t._v("Here are some of the errors that "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" highlights. A more exhaustive list is available "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py#validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("table",[o("thead",[o("tr",[o("th",[o("strong",[t._v("Structural Errors")])]),t._v(" "),o("th")])]),t._v(" "),o("tbody",[o("tr",[o("td",[t._v("blank-header")]),t._v(" "),o("td",[t._v("There is a blank header name. All cells in the header row must have a value.")])]),t._v(" "),o("tr",[o("td",[t._v("duplicate-header")]),t._v(" "),o("td",[t._v("There are multiple columns with the same name. All column names must be unique.")])]),t._v(" "),o("tr",[o("td",[t._v("blank-row")]),t._v(" "),o("td",[t._v("Rows must have at least one non-blank cell.")])]),t._v(" "),o("tr",[o("td",[t._v("duplicate-row")]),t._v(" "),o("td",[t._v("Rows can’t be duplicated.")])]),t._v(" "),o("tr",[o("td",[t._v("extra-value")]),t._v(" "),o("td",[t._v("A row has more columns than the header.")])]),t._v(" "),o("tr",[o("td",[t._v("missing-value")]),t._v(" "),o("td",[t._v("A row has less columns than the header.")])]),t._v(" "),o("tr",[o("td",[o("strong",[t._v("Content Errors")])]),t._v(" "),o("td")]),t._v(" "),o("tr",[o("td",[t._v("schema-error")]),t._v(" "),o("td",[t._v("Schema is not valid.")])]),t._v(" "),o("tr",[o("td",[t._v("non-matching-header")]),t._v(" "),o("td",[t._v("The header’s name in the schema is different from what’s in the data.")])]),t._v(" "),o("tr",[o("td",[t._v("extra-header")]),t._v(" "),o("td",[t._v("The data contains a header not defined in the schema.")])]),t._v(" "),o("tr",[o("td",[t._v("missing-header")]),t._v(" "),o("td",[t._v("The data doesn’t contain a header defined in the schema.")])]),t._v(" "),o("tr",[o("td",[t._v("type-or-format-error")]),t._v(" "),o("td",[t._v("The value can’t be cast based on the schema type and format for this field.")])])])]),t._v(" "),o("p",[o("strong",[t._v("Load tabular data for one-time validation")])]),t._v(" "),o("p",[t._v("You can add a dataset for one-time validation on "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" in two ways:")]),t._v(" "),o("ul",[o("li",[t._v("If your tabular data is publicly available online, obtain a link to the tabular data you would like to validate and paste it in the "),o("strong",[t._v("{Source}")]),t._v(" section.")]),t._v(" "),o("li",[t._v("Alternatively, Click on the Upload file prompt in the "),o("strong",[t._v("{Source}")]),t._v(" section to load a tabular dataset from your local machine")])]),t._v(" "),o("p",[o("strong",[t._v("Validating data without a schema")])]),t._v(" "),o("p",[t._v("In this section we will illustrate how to check tabular data for structural errors on "),o("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" where a data schema is not available. For this tutorial we will use a "),o("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/goodtables-py/bc6470a970aacf65f20a3ddb7f71eb05a2a31c70/data/invalid-on-structure.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("sample CSV file with errors"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[t._v("Copy and paste the file’s URL to the "),o("strong",[t._v("{Source}")]),t._v(" input. When you click on the "),o("strong",[t._v("{Validate}")]),t._v(" button, "),o("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" presents an exhaustive list of structural errors in your dataset.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(468),alt:"Add dataset link in the Source field, or select the Upload file option"}}),o("br"),t._v(" "),o("em",[t._v("Figure 1: Add dataset link in the Source field, or select the Upload file option.")])]),t._v(" "),o("p",[t._v("If needed, you can disable two types of validation checks:")]),t._v(" "),o("ul",[o("li",[o("p",[t._v("Ignore blank rows"),o("br"),t._v("\nUse this checkbox to indicate whether blank rows should be considered as errors, or simply ignored. Check this option if missing data is a known issue that cannot be fixed immediately i.e. if you are not the owner/publisher of the data.")])]),t._v(" "),o("li",[o("p",[t._v("Ignore duplicate rows"),o("br"),t._v("\nUse this checkbox to indicate whether duplicate rows should be considered as errors, or simply ignored.")])])]),t._v(" "),o("p",[t._v("We will leave all boxes unchecked for our example. On validate, we receive a list of 12 errors as we can see in figure 7 below.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(469),alt:"dataset errors outlined on try.goodtables.io"}}),o("br"),t._v(" "),o("em",[t._v("Figure 2: dataset errors outlined on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" points us to specific cells containing errors so they can be fixed easily. We can use this list as a guide to fix all errors in our data manually, and run a second validation test to confirm that all issues are resolved. If there no validations could be found, the ensuing message will be as in figure 8 below:")]),t._v(" "),o("p",[o("img",{attrs:{src:a(470),alt:"valid data message on goodtables.io"}}),o("br"),t._v(" "),o("em",[t._v("Figure 3: valid data message on "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[t._v("Improving data quality is an iterative process that should involve data publishers and maintainers. Tools such as "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" allow you to focus on complex errors like if the presented data is correct, instead of wasting time with simple (but very common) errors like incorrect date formats.")]),t._v(" "),o("p",[o("strong",[t._v("Validating tabular data with a schema")])]),t._v(" "),o("p",[t._v("A data schema contains information on the structure of your tabular data. Providing a data schema as part of the validation process on "),o("a",{attrs:{href:"https://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" makes it possible to check your dataset for content errors. For example, a schema contains information on fields and their assigned data types, making it possible to highlight misplaced data i.e. text in an amounts column where numeric data is expected. If you haven’t yet, learn how to create a data schema for your data collection before continuing with this section.")]),t._v(" "),o("p",[t._v("To test how this works, you can use:")]),t._v(" "),o("ul",[o("li",[t._v("any of the data packages from "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages",target:"_blank",rel:"noopener noreferrer"}},[t._v("this Data Package collection on GitHub"),o("OutboundLink")],1),t._v(", which comprises of example data packages curated by the Frictionless Data team or")]),t._v(" "),o("li",[o("a",{attrs:{href:"http://datahub.io/core/",target:"_blank",rel:"noopener noreferrer"}},[t._v("any of the Core Data Packages on DataHub"),o("OutboundLink")],1),t._v(". The Core Data project provides essential data for data wranglers and data science community. Read more about it "),o("a",{attrs:{href:"https://datahub.io/docs/core-data",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")])]),t._v(" "),o("p",[t._v("In any given Data Package, the "),o("em",[t._v("datapackage.json")]),t._v(" file contains the schema and the data folder contains tabular data to be validated against the schema.")]),t._v(" "),o("p",[t._v("Often, you will find that you may be working in workflows that involve many datasets, which are updated regularly. In cases such as this, one-time validation on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(" is probably not the answer. But fear not! Goodtables has the ability to automate the validation process so that errors are checked for continually. Find out more in our continuous and "),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[t._v("automated data validation section")]),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"one-time-data-validation-with-goodtables-command-line-tool"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#one-time-data-validation-with-goodtables-command-line-tool"}},[t._v("#")]),t._v(" One-time data validation with goodtables command line tool")]),t._v(" "),o("p",[t._v("The same validations that we’ve done on "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),o("OutboundLink")],1),t._v(", can also be done in your local machine using goodtables. This is especially useful for big datasets, or if your data is not publicly accessible online. However, this is a slightly technical task, which requires basic knowledge of the command line (CLI). If you don’t know how to use the CLI, or are a bit rusty, we recommend you to read the "),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/intro_to_command_line/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Introduction to the command-line tutorial"),o("OutboundLink")],1),t._v(" before proceeding.")]),t._v(" "),o("p",[t._v("For this section, you will need:")]),t._v(" "),o("ul",[o("li",[t._v("Python, a programming language which the goodtables command-line tool is written in - ["),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/python_installation/",target:"_blank",rel:"noopener noreferrer"}},[t._v("installation instructions"),o("OutboundLink")],1),t._v("]")]),t._v(" "),o("li",[t._v("PIP, a tool that allows you to install packages written in Python. Installing Python automatically installs PIP, but in case not - [installation instructions]")]),t._v(" "),o("li",[t._v("Basic knowledge on how to use the command-line (see the "),o("a",{attrs:{href:"https://tutorial.djangogirls.org/en/intro_to_command_line/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Introduction to the command-line"),o("OutboundLink")],1),t._v(" if you want to brush up your skills)")])]),t._v(" "),o("p",[t._v("Once Python is set up, open your "),o("strong",[t._v("Terminal")]),t._v(" and install goodtables using the package manager, PIP. The command "),o("code",[t._v("pip install goodtables")]),t._v(".")]),t._v(" "),o("p",[o("img",{attrs:{src:a(471),alt:"installing goodtables command-line tool with pip in Terminal"}}),o("br"),t._v(" "),o("em",[t._v("Figure 4: installing goodtables command-line tool with pip in Terminal.")])]),t._v(" "),o("p",[t._v("To validate a data file, type goodtables followed by the path to your file i.e. "),o("code",[t._v("goodtables path/to/file.csv")]),t._v(". You can pass multiple file paths one after the other, or even the path to a "),o("em",[t._v("datapackage.json")]),t._v(" file.")]),t._v(" "),o("p",[t._v("For our first example, we will download and check "),o("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py/blob/master/data/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("this simple location CSV data file"),o("OutboundLink")],1),t._v(" for errors. In the second instance, we will validate this "),o("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/goodtables-py/bc6470a970aacf65f20a3ddb7f71eb05a2a31c70/data/invalid-on-structure.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("Department of Data Expenses dataset, that contains errors"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[o("img",{attrs:{src:a(472),alt:"Validating data files using goodtables in Terminal"}}),o("br"),t._v(" "),o("em",[t._v("Figure 5: Validating data files using goodtables in Terminal.")])]),t._v(" "),o("p",[t._v("You can see the list of options by running "),o("code",[t._v("goodtables --help")]),t._v(". The full documentation, including the list of validation checks that can be run, is available "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("on the goodtables-py repository on GitHub"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("p",[t._v("Congratulations, you now know how to validate your tabular data using the command-line!")]),t._v(" "),o("p",[t._v("If you regularly update your data or maintain many different datasets, running the validations manually can be time-consuming. The solution is to automate this process, so the data is validated every time it changes, ensuring the errors are caught as soon as possible. Find out how to do it in the “"),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[t._v("Automating the validation checks")]),t._v("” section.")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/130.233f404d.js b/assets/js/130.7f175013.js similarity index 98% rename from assets/js/130.233f404d.js rename to assets/js/130.7f175013.js index 9994ad99e..451ed798c 100644 --- a/assets/js/130.233f404d.js +++ b/assets/js/130.7f175013.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[130],{664:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on May 29"),r("sup",[e._v("th")]),e._v(" we had Simon Tyrrell and Xingdong Bian from the Earlham Institute giving a presentation on Frictionless Data for Wheat. The project was developed during the Frictionless Toolfund 2020-2021.")]),e._v(" "),r("p",[e._v("Simon and Xingdong are part of the Designing Future Wheat, a research group studying how to increment the amount of wheat that is produced in a field in order to meet the global demand by 2050. To run the project, they collect a great amount of data and large scale datasets, which are shared with a great number of different users. Frictionless Data is used to make that data available, usable and interoperable for everyone.")]),e._v(" "),r("p",[e._v("You can learn more about the Designing Future Wheat project "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/05/frictionless-data-for-wheat/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Simon’s and Xingdong’s presentation here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/E4Mw8cYlM88",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h1",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[e._v("We are super happy to share with you "),r("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Repository - a Github Action for the continuous data validation of your repo"),r("OutboundLink")],1),e._v("."),r("br"),e._v("\nWe are actively looking for feedback, so please let us know what you think.")]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on June 24"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from"),r("br"),e._v("\nNikhil Vats on Frictionless Data Package for InterMine. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here."),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/uC-whhwGiqk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v(" ")]),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[130],{730:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on May 29"),r("sup",[e._v("th")]),e._v(" we had Simon Tyrrell and Xingdong Bian from the Earlham Institute giving a presentation on Frictionless Data for Wheat. The project was developed during the Frictionless Toolfund 2020-2021.")]),e._v(" "),r("p",[e._v("Simon and Xingdong are part of the Designing Future Wheat, a research group studying how to increment the amount of wheat that is produced in a field in order to meet the global demand by 2050. To run the project, they collect a great amount of data and large scale datasets, which are shared with a great number of different users. Frictionless Data is used to make that data available, usable and interoperable for everyone.")]),e._v(" "),r("p",[e._v("You can learn more about the Designing Future Wheat project "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/05/frictionless-data-for-wheat/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Simon’s and Xingdong’s presentation here:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/E4Mw8cYlM88",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h1",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("p",[e._v("We are super happy to share with you "),r("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Repository - a Github Action for the continuous data validation of your repo"),r("OutboundLink")],1),e._v("."),r("br"),e._v("\nWe are actively looking for feedback, so please let us know what you think.")]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Our next meeting will be on June 24"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from"),r("br"),e._v("\nNikhil Vats on Frictionless Data Package for InterMine. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here."),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/uC-whhwGiqk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v(" ")]),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/131.1373b305.js b/assets/js/131.46e9f4e2.js similarity index 98% rename from assets/js/131.1373b305.js rename to assets/js/131.46e9f4e2.js index 08027c693..64cbd2e9d 100644 --- a/assets/js/131.1373b305.js +++ b/assets/js/131.46e9f4e2.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[131],{663:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Have you noticed some changes to our website? Building upon last year’s "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/05/01/announcing-new-website/",target:"_blank",rel:"noopener noreferrer"}},[e._v("website redesign"),a("OutboundLink")],1),e._v(", we have finished making some new changes that we are very excited to tell you about! When we started reviewing our documentation for the "),a("a",{attrs:{href:"http://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Python Framework"),a("OutboundLink")],1),e._v(" with the "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/13/partnering-with-odi/#so-what-will-be-changing",target:"_blank",rel:"noopener noreferrer"}},[e._v("support of the ODI"),a("OutboundLink")],1),e._v(" back in January, we quickly realised that our main website could benefit from some revamping as well, in order to make it more user-friendly and easier to navigate.")]),e._v(" "),a("p",[e._v("We needed to clarify the relationship between our main project website and the website of all our Frictionless standards, software, and specifications, which all had different layouts and visual styles. The harmonisation process is still ongoing, but we are already very happy with the fact that the new website offers a comprehensive view of all our tools.")]),e._v(" "),a("p",[e._v("It was important for us that people visiting our website for the very first time could quickly understand what Frictionless Data is and how it can be useful to them. We did that through a reorganisation of the homepage and the navigation, which was a bit confusing for some users. We also updated most of the text to better reflect the current status of the project, but also to clearly state what Frictionless Data is. Users should now be able to understand in a glance that Frictionless is composed of two main parts, "),a("a",{attrs:{href:"https://frictionlessdata.io/software/",target:"_blank",rel:"noopener noreferrer"}},[e._v("software"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://frictionlessdata.io/standards/",target:"_blank",rel:"noopener noreferrer"}},[e._v("standards"),a("OutboundLink")],1),e._v(", which make it more accessible for a broad range of people working with data.")]),e._v(" "),a("img",{attrs:{width:"1337",alt:"Schermata 2021-06-16 alle 15 03 47",src:"https://user-images.githubusercontent.com/74717970/122254960-f2ad6700-cecd-11eb-88dd-a5cd119eec45.png"}}),e._v(" "),a("p",[e._v("Users will also easily find examples of "),a("a",{attrs:{href:"https://staging.frictionlessdata.io/adoption/",target:"_blank",rel:"noopener noreferrer"}},[e._v("projects and collaborations that adopted Frictionless"),a("OutboundLink")],1),e._v(", which can be very useful to better understand the full potential of the Frictionless toolkit.")]),e._v(" "),a("p",[e._v("Our goal with this new website is to give visitors an easier way to learn about Frictionless Data, encourage them to try it out and join our great community. The new architecture should reflect that, and should make it easier for people to understand that Frictionless Data is a progressive open-source framework for building data infrastructure, aiming at making it easier to work with data. Being an open-source project, we welcome and cherish everybody’s contribution. Talking about that, we would love to hear your feedback! Let us know what you think about the new website, if you have any comments or if you see any further improvement we could make. We have created a "),a("a",{attrs:{href:"https://github.com/frictionlessdata/website/issues/198",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub issue"),a("OutboundLink")],1),e._v(" you can use to give us your thoughts.")]),e._v(" "),a("p",[e._v("Thank you!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[131],{661:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Have you noticed some changes to our website? Building upon last year’s "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/05/01/announcing-new-website/",target:"_blank",rel:"noopener noreferrer"}},[e._v("website redesign"),a("OutboundLink")],1),e._v(", we have finished making some new changes that we are very excited to tell you about! When we started reviewing our documentation for the "),a("a",{attrs:{href:"http://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Python Framework"),a("OutboundLink")],1),e._v(" with the "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/01/13/partnering-with-odi/#so-what-will-be-changing",target:"_blank",rel:"noopener noreferrer"}},[e._v("support of the ODI"),a("OutboundLink")],1),e._v(" back in January, we quickly realised that our main website could benefit from some revamping as well, in order to make it more user-friendly and easier to navigate.")]),e._v(" "),a("p",[e._v("We needed to clarify the relationship between our main project website and the website of all our Frictionless standards, software, and specifications, which all had different layouts and visual styles. The harmonisation process is still ongoing, but we are already very happy with the fact that the new website offers a comprehensive view of all our tools.")]),e._v(" "),a("p",[e._v("It was important for us that people visiting our website for the very first time could quickly understand what Frictionless Data is and how it can be useful to them. We did that through a reorganisation of the homepage and the navigation, which was a bit confusing for some users. We also updated most of the text to better reflect the current status of the project, but also to clearly state what Frictionless Data is. Users should now be able to understand in a glance that Frictionless is composed of two main parts, "),a("a",{attrs:{href:"https://frictionlessdata.io/software/",target:"_blank",rel:"noopener noreferrer"}},[e._v("software"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://frictionlessdata.io/standards/",target:"_blank",rel:"noopener noreferrer"}},[e._v("standards"),a("OutboundLink")],1),e._v(", which make it more accessible for a broad range of people working with data.")]),e._v(" "),a("img",{attrs:{width:"1337",alt:"Schermata 2021-06-16 alle 15 03 47",src:"https://user-images.githubusercontent.com/74717970/122254960-f2ad6700-cecd-11eb-88dd-a5cd119eec45.png"}}),e._v(" "),a("p",[e._v("Users will also easily find examples of "),a("a",{attrs:{href:"https://staging.frictionlessdata.io/adoption/",target:"_blank",rel:"noopener noreferrer"}},[e._v("projects and collaborations that adopted Frictionless"),a("OutboundLink")],1),e._v(", which can be very useful to better understand the full potential of the Frictionless toolkit.")]),e._v(" "),a("p",[e._v("Our goal with this new website is to give visitors an easier way to learn about Frictionless Data, encourage them to try it out and join our great community. The new architecture should reflect that, and should make it easier for people to understand that Frictionless Data is a progressive open-source framework for building data infrastructure, aiming at making it easier to work with data. Being an open-source project, we welcome and cherish everybody’s contribution. Talking about that, we would love to hear your feedback! Let us know what you think about the new website, if you have any comments or if you see any further improvement we could make. We have created a "),a("a",{attrs:{href:"https://github.com/frictionlessdata/website/issues/198",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub issue"),a("OutboundLink")],1),e._v(" you can use to give us your thoughts.")]),e._v(" "),a("p",[e._v("Thank you!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/132.09f13c2b.js b/assets/js/132.be369adc.js similarity index 97% rename from assets/js/132.09f13c2b.js rename to assets/js/132.be369adc.js index 0d53af1f9..8ba3254da 100644 --- a/assets/js/132.09f13c2b.js +++ b/assets/js/132.be369adc.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[132],{662:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Livemark. What is that? Livemark is a great tool that allows you to publish data articles very easily, giving you the possibility to see your data live on a working website in a blink of an eye.")]),e._v(" "),a("h2",{attrs:{id:"how-does-it-work"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-does-it-work"}},[e._v("#")]),e._v(" How does it work?")]),e._v(" "),a("p",[e._v("Livemark is a Python library generating a static page that extends Markdown with interactive charts, tables, scripts, and much much more. You can use the Frictionless framework as a "),a("code",[e._v("frictionless")]),e._v(" variable to work with your tabular data in Livemark.")]),e._v(" "),a("p",[e._v("Livemark offers a series of useful features, like automatically generating a table of contents and providing a scroll-to-top button when you scroll down your document. You can also customise the layout of your newly created webpage.")]),e._v(" "),a("h2",{attrs:{id:"how-can-you-get-started"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-get-started"}},[e._v("#")]),e._v(" How can you get started?")]),e._v(" "),a("p",[e._v("Livemark is very easy to use. We invite you watch this great demo by developer Evgeny Karev:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/NMg-eCbO6L0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v(" ")]),e._v(" "),a("p",[e._v("You can also have a look at the "),a("a",{attrs:{href:"https://frictionlessdata.github.io/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation on GitHub"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"what-do-you-think"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-do-you-think"}},[e._v("#")]),e._v(" What do you think?")]),e._v(" "),a("p",[e._v("If you create a site using Livemark, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),a("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or by opening an issue in the "),a("a",{attrs:{href:"https://github.com/frictionlessdata/livemark",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[132],{729:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Livemark. What is that? Livemark is a great tool that allows you to publish data articles very easily, giving you the possibility to see your data live on a working website in a blink of an eye.")]),e._v(" "),a("h2",{attrs:{id:"how-does-it-work"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-does-it-work"}},[e._v("#")]),e._v(" How does it work?")]),e._v(" "),a("p",[e._v("Livemark is a Python library generating a static page that extends Markdown with interactive charts, tables, scripts, and much much more. You can use the Frictionless framework as a "),a("code",[e._v("frictionless")]),e._v(" variable to work with your tabular data in Livemark.")]),e._v(" "),a("p",[e._v("Livemark offers a series of useful features, like automatically generating a table of contents and providing a scroll-to-top button when you scroll down your document. You can also customise the layout of your newly created webpage.")]),e._v(" "),a("h2",{attrs:{id:"how-can-you-get-started"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-you-get-started"}},[e._v("#")]),e._v(" How can you get started?")]),e._v(" "),a("p",[e._v("Livemark is very easy to use. We invite you watch this great demo by developer Evgeny Karev:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/NMg-eCbO6L0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v(" ")]),e._v(" "),a("p",[e._v("You can also have a look at the "),a("a",{attrs:{href:"https://frictionlessdata.github.io/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation on GitHub"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"what-do-you-think"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-do-you-think"}},[e._v("#")]),e._v(" What do you think?")]),e._v(" "),a("p",[e._v("If you create a site using Livemark, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),a("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or by opening an issue in the "),a("a",{attrs:{href:"https://github.com/frictionlessdata/livemark",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/133.e77f594f.js b/assets/js/133.c345b419.js similarity index 99% rename from assets/js/133.e77f594f.js rename to assets/js/133.c345b419.js index a0cab78d9..61942291e 100644 --- a/assets/js/133.e77f594f.js +++ b/assets/js/133.c345b419.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[133],{666:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Do you remember "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/user/73932",target:"_blank",rel:"noopener noreferrer"}},[e._v("Costas Simatos"),o("OutboundLink")],1),e._v("? He introduced the Frictionless Data community to the "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("Interoperability Test Bed"),o("OutboundLink")],1),e._v(" (ITB), an online platform that can be used to test systems against technical specifications — curious minds will find a recording of his presentation on the subject "),o("a",{attrs:{href:"https://www.youtube.com/watch?v=pJFsJW96fuA",target:"_blank",rel:"noopener noreferrer"}},[e._v("available on YouTube"),o("OutboundLink")],1),e._v(". Amongst the tools it offers, there is a "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/csvvalidator",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV validator"),o("OutboundLink")],1),e._v(" which relies on the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema specifications"),o("OutboundLink")],1),e._v(". Those specifications filled a gap that the "),o("a",{attrs:{href:"https://datatracker.ietf.org/doc/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC 4180"),o("OutboundLink")],1),e._v(" didn’t address by having a structured way of defining the content of individual fields in terms of data types, formats and constraints, which is a clear benefit of the Frictionless specifications as reported back in 2020 "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/interoperability-test-bed/news/table-schema-validator",target:"_blank",rel:"noopener noreferrer"}},[e._v("when a beta version of the CSV validator was launched"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("p",[e._v("Frictionless specifications are flexible while allowing users to define unambiguously the expected content of a given field, therefore they were officially adopted to "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/interoperability-test-bed/news/test-bed-support-kohesio-pilot",target:"_blank",rel:"noopener noreferrer"}},[e._v("realise the validator for the Kohesio pilot phase of 2014-2020"),o("OutboundLink")],1),e._v(", "),o("a",{attrs:{href:"https://kohesio.eu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Kohesio"),o("OutboundLink")],1),e._v(" being the "),o("em",[e._v("“Project Information Portal for Cohesion Policy”")]),e._v(". The Table Schema specifications made it easy and convenient for the Interoperability Test Bed to establish constraints and describe the data to be validated in a concise way based on an initial set of "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/kohesio-validator/specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV syntax rules"),o("OutboundLink")],1),e._v(", converting written and mostly non-technical definitions to their Frictionless equivalent. Using simple JSON objects, Frictionless specifications allowed the ITB to enforce data validation in multiple ways as can be observed from the "),o("a",{attrs:{href:"https://github.com/ISAITB/validator-resources-kohesio/blob/master/resources/schemas/schema.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema used for the CSV validator"),o("OutboundLink")],1),e._v(". The following list of items calls attention to the core aspects of the Table Schema standard that were taken advantage of:")]),e._v(" "),o("ul",[o("li",[e._v("Dates can be defined with string formatting (e.g. "),o("code",[e._v("%d/%m/%Y")]),e._v(" stands for "),o("code",[e._v("day/month/year")]),e._v(");")]),e._v(" "),o("li",[e._v("Constraints can indicate whether a column can contain empty values or not;")]),e._v(" "),o("li",[e._v("Constraints can also specify a valid range of values (e.g. "),o("code",[e._v('"minimum": 0.0')]),e._v(" and "),o("code",[e._v('"maximum": 100.0')]),e._v(");")]),e._v(" "),o("li",[e._v("Constraints can specify an enumeration of valid values to choose from (e.g. "),o("code",[e._v('"enum" : ["2014-2020", "2021-2027"]')]),e._v(").")]),e._v(" "),o("li",[e._v("Constraints can be specified in custom ways, such as with "),o("a",{attrs:{href:"https://en.wikipedia.org/wiki/Regular_expression",target:"_blank",rel:"noopener noreferrer"}},[e._v("regular expressions"),o("OutboundLink")],1),e._v(" for powerful string matching capabilities;")]),e._v(" "),o("li",[e._v("Data types can be enforced for any column;")]),e._v(" "),o("li",[e._v("Columns can be forced to adapt a specific name and a description can be provided for each one of them.")])]),e._v(" "),o("p",[e._v("Because these specifications can be expressed as portable text files, they became part of a multitude of tools to provide greater convenience to users and the validation process has been "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/docs/guides/latest/validatingCSV/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("documented extensively"),o("OutboundLink")],1),e._v(". JSON code snippets from the documentation highlight the fact that this format conveys all the necessary information in a readable manner and lets users extend the original specifications as needed. In this particular instance, the CSV validator can be used as a "),o("a",{attrs:{href:"https://hub.docker.com/repository/docker/isaitb/validator-kohesio",target:"_blank",rel:"noopener noreferrer"}},[e._v("Docker image"),o("OutboundLink")],1),e._v(", as part of a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv-offline/kohesio/validator.zip",target:"_blank",rel:"noopener noreferrer"}},[e._v("command-line application"),o("OutboundLink")],1),e._v(", inside a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv/kohesio/upload",target:"_blank",rel:"noopener noreferrer"}},[e._v("web application"),o("OutboundLink")],1),e._v(" and even as a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv/soap/kohesio/validation?wsdl",target:"_blank",rel:"noopener noreferrer"}},[e._v("SOAP API"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Frictionless specifications were the missing piece of the puzzle that enabled the ITB to rely on a well-documented set of standards for their data validation needs. But there is more on the table (no pun intended): whether you need to manage files, tables or entire datasets, there are "),o("RouterLink",{attrs:{to:"/standards/"}},[e._v("Frictionless standards")]),e._v(" to cover you. As the growing "),o("RouterLink",{attrs:{to:"/adoption/"}},[e._v("list of adopters and collaborations")]),e._v(" demonstrates, there are many use cases to make a data project shine with Frictionless.")],1),e._v(" "),o("p",[e._v("Are you working on a great project that should become the next glowing star in the world of Frictionless Data? Feel free to "),o("RouterLink",{attrs:{to:"/work-with-us/get-help/"}},[e._v("reach out")]),e._v(" to spread the good news!")],1)])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[133],{665:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Do you remember "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/user/73932",target:"_blank",rel:"noopener noreferrer"}},[e._v("Costas Simatos"),o("OutboundLink")],1),e._v("? He introduced the Frictionless Data community to the "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("Interoperability Test Bed"),o("OutboundLink")],1),e._v(" (ITB), an online platform that can be used to test systems against technical specifications — curious minds will find a recording of his presentation on the subject "),o("a",{attrs:{href:"https://www.youtube.com/watch?v=pJFsJW96fuA",target:"_blank",rel:"noopener noreferrer"}},[e._v("available on YouTube"),o("OutboundLink")],1),e._v(". Amongst the tools it offers, there is a "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/csvvalidator",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV validator"),o("OutboundLink")],1),e._v(" which relies on the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema specifications"),o("OutboundLink")],1),e._v(". Those specifications filled a gap that the "),o("a",{attrs:{href:"https://datatracker.ietf.org/doc/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC 4180"),o("OutboundLink")],1),e._v(" didn’t address by having a structured way of defining the content of individual fields in terms of data types, formats and constraints, which is a clear benefit of the Frictionless specifications as reported back in 2020 "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/interoperability-test-bed/news/table-schema-validator",target:"_blank",rel:"noopener noreferrer"}},[e._v("when a beta version of the CSV validator was launched"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("p",[e._v("Frictionless specifications are flexible while allowing users to define unambiguously the expected content of a given field, therefore they were officially adopted to "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/interoperability-test-bed/news/test-bed-support-kohesio-pilot",target:"_blank",rel:"noopener noreferrer"}},[e._v("realise the validator for the Kohesio pilot phase of 2014-2020"),o("OutboundLink")],1),e._v(", "),o("a",{attrs:{href:"https://kohesio.eu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Kohesio"),o("OutboundLink")],1),e._v(" being the "),o("em",[e._v("“Project Information Portal for Cohesion Policy”")]),e._v(". The Table Schema specifications made it easy and convenient for the Interoperability Test Bed to establish constraints and describe the data to be validated in a concise way based on an initial set of "),o("a",{attrs:{href:"https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/kohesio-validator/specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV syntax rules"),o("OutboundLink")],1),e._v(", converting written and mostly non-technical definitions to their Frictionless equivalent. Using simple JSON objects, Frictionless specifications allowed the ITB to enforce data validation in multiple ways as can be observed from the "),o("a",{attrs:{href:"https://github.com/ISAITB/validator-resources-kohesio/blob/master/resources/schemas/schema.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema used for the CSV validator"),o("OutboundLink")],1),e._v(". The following list of items calls attention to the core aspects of the Table Schema standard that were taken advantage of:")]),e._v(" "),o("ul",[o("li",[e._v("Dates can be defined with string formatting (e.g. "),o("code",[e._v("%d/%m/%Y")]),e._v(" stands for "),o("code",[e._v("day/month/year")]),e._v(");")]),e._v(" "),o("li",[e._v("Constraints can indicate whether a column can contain empty values or not;")]),e._v(" "),o("li",[e._v("Constraints can also specify a valid range of values (e.g. "),o("code",[e._v('"minimum": 0.0')]),e._v(" and "),o("code",[e._v('"maximum": 100.0')]),e._v(");")]),e._v(" "),o("li",[e._v("Constraints can specify an enumeration of valid values to choose from (e.g. "),o("code",[e._v('"enum" : ["2014-2020", "2021-2027"]')]),e._v(").")]),e._v(" "),o("li",[e._v("Constraints can be specified in custom ways, such as with "),o("a",{attrs:{href:"https://en.wikipedia.org/wiki/Regular_expression",target:"_blank",rel:"noopener noreferrer"}},[e._v("regular expressions"),o("OutboundLink")],1),e._v(" for powerful string matching capabilities;")]),e._v(" "),o("li",[e._v("Data types can be enforced for any column;")]),e._v(" "),o("li",[e._v("Columns can be forced to adapt a specific name and a description can be provided for each one of them.")])]),e._v(" "),o("p",[e._v("Because these specifications can be expressed as portable text files, they became part of a multitude of tools to provide greater convenience to users and the validation process has been "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/docs/guides/latest/validatingCSV/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("documented extensively"),o("OutboundLink")],1),e._v(". JSON code snippets from the documentation highlight the fact that this format conveys all the necessary information in a readable manner and lets users extend the original specifications as needed. In this particular instance, the CSV validator can be used as a "),o("a",{attrs:{href:"https://hub.docker.com/repository/docker/isaitb/validator-kohesio",target:"_blank",rel:"noopener noreferrer"}},[e._v("Docker image"),o("OutboundLink")],1),e._v(", as part of a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv-offline/kohesio/validator.zip",target:"_blank",rel:"noopener noreferrer"}},[e._v("command-line application"),o("OutboundLink")],1),e._v(", inside a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv/kohesio/upload",target:"_blank",rel:"noopener noreferrer"}},[e._v("web application"),o("OutboundLink")],1),e._v(" and even as a "),o("a",{attrs:{href:"https://www.itb.ec.europa.eu/csv/soap/kohesio/validation?wsdl",target:"_blank",rel:"noopener noreferrer"}},[e._v("SOAP API"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Frictionless specifications were the missing piece of the puzzle that enabled the ITB to rely on a well-documented set of standards for their data validation needs. But there is more on the table (no pun intended): whether you need to manage files, tables or entire datasets, there are "),o("RouterLink",{attrs:{to:"/standards/"}},[e._v("Frictionless standards")]),e._v(" to cover you. As the growing "),o("RouterLink",{attrs:{to:"/adoption/"}},[e._v("list of adopters and collaborations")]),e._v(" demonstrates, there are many use cases to make a data project shine with Frictionless.")],1),e._v(" "),o("p",[e._v("Are you working on a great project that should become the next glowing star in the world of Frictionless Data? Feel free to "),o("RouterLink",{attrs:{to:"/work-with-us/get-help/"}},[e._v("reach out")]),e._v(" to spread the good news!")],1)])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/134.0de7dba0.js b/assets/js/134.20dc089c.js similarity index 98% rename from assets/js/134.0de7dba0.js rename to assets/js/134.20dc089c.js index a5f01bcf1..4403d81cd 100644 --- a/assets/js/134.0de7dba0.js +++ b/assets/js/134.20dc089c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[134],{665:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last Frictionless Data community call on June 24"),a("sup",[e._v("th")]),e._v(" we had Nikhil Vats giving a presentation on Frictionless Package for InterMine. The project was developed during the Frictionless Toolfund 2020-2021.")]),e._v(" "),a("p",[e._v("InterMine is an open source biological data warehouse that creates databases of biological data accessed by sophisticated web query tools. Nikhil worked on the Frictionless Data Package integration, which is extremely helpful for users, as it describes all the fields of their query, specifically: name of field, type of field, class path, field and class ontology link.")]),e._v(" "),a("p",[e._v("You can learn more about the Data Package for InterMine project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/04/13/data-package-for-intermine/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Nikhil Vats’ presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/6Izm_W-hNKI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("h3",{attrs:{id:"linked-data-support"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#linked-data-support"}},[e._v("#")]),e._v(" Linked data support")]),e._v(" "),a("p",[e._v("Nikhil’s presentation naturally led to a discussion on adding support for linked data and ontologies to Frictionless Data. On several occasions the community has shown interest in extending Frictionless specifications by incorporating standard attributes like ontology terms for improved interoperability. There have also been several discussion about supporting JSON-LD or RDF in the main specifications for improved data linking and querying. Would this help your work? Let us know what you think and if you are potentially interested in participating in this project.")]),e._v(" "),a("h3",{attrs:{id:"new-tool-livemark"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#new-tool-livemark"}},[e._v("#")]),e._v(" New tool: Livemark")]),e._v(" "),a("p",[e._v("We are super happy to share with you the newest entry in the Frictionless Data toolkit: Livemark - a static page generator with built-in tables and charts support (with support for data processing and validation with Frictionless): "),a("a",{attrs:{href:"https://frictionlessdata.github.io/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.github.io/livemark/"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("To know more about it, check out "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/06/22/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("our latest blog"),a("OutboundLink")],1),e._v(" (featuring a great demo by developer Evgeny Karev).")]),e._v(" "),a("p",[e._v("As usual, we would love to hear what you think, so please share your thoughts, comments and feedback with us.")]),e._v(" "),a("h1",{attrs:{id:"news-from-the-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),a("p",[e._v("Michael Amadi from Nimble Learn presented the "),a("a",{attrs:{href:"https://www.opendatablend.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Blend project"),a("OutboundLink")],1),e._v(" - a set of open data services that aim to make large and complex UK open data easier to analyse. Open Data Blend’s bulk data API is built on the Frictionless Data specs. Keep an eye out for an upcoming blog with more details!")]),e._v(" "),a("p",[e._v("Frictionless contributor Peter Desmet proposed to start a Frictionless Data community on Zenodo. We are currently discussing the best way to do that on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" in the "),a("em",[e._v("datasets")]),e._v(" channel. Join us there if you are interested or have ideas!")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Our next meeting will be on July 29"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from"),a("br"),e._v("\nDave Rowe on Public Libraries Open Data Schema. You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/4Kl_VBdbc5M",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[134],{664:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last Frictionless Data community call on June 24"),a("sup",[e._v("th")]),e._v(" we had Nikhil Vats giving a presentation on Frictionless Package for InterMine. The project was developed during the Frictionless Toolfund 2020-2021.")]),e._v(" "),a("p",[e._v("InterMine is an open source biological data warehouse that creates databases of biological data accessed by sophisticated web query tools. Nikhil worked on the Frictionless Data Package integration, which is extremely helpful for users, as it describes all the fields of their query, specifically: name of field, type of field, class path, field and class ontology link.")]),e._v(" "),a("p",[e._v("You can learn more about the Data Package for InterMine project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/04/13/data-package-for-intermine/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Nikhil Vats’ presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/6Izm_W-hNKI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("h3",{attrs:{id:"linked-data-support"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#linked-data-support"}},[e._v("#")]),e._v(" Linked data support")]),e._v(" "),a("p",[e._v("Nikhil’s presentation naturally led to a discussion on adding support for linked data and ontologies to Frictionless Data. On several occasions the community has shown interest in extending Frictionless specifications by incorporating standard attributes like ontology terms for improved interoperability. There have also been several discussion about supporting JSON-LD or RDF in the main specifications for improved data linking and querying. Would this help your work? Let us know what you think and if you are potentially interested in participating in this project.")]),e._v(" "),a("h3",{attrs:{id:"new-tool-livemark"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#new-tool-livemark"}},[e._v("#")]),e._v(" New tool: Livemark")]),e._v(" "),a("p",[e._v("We are super happy to share with you the newest entry in the Frictionless Data toolkit: Livemark - a static page generator with built-in tables and charts support (with support for data processing and validation with Frictionless): "),a("a",{attrs:{href:"https://frictionlessdata.github.io/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.github.io/livemark/"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("To know more about it, check out "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/06/22/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("our latest blog"),a("OutboundLink")],1),e._v(" (featuring a great demo by developer Evgeny Karev).")]),e._v(" "),a("p",[e._v("As usual, we would love to hear what you think, so please share your thoughts, comments and feedback with us.")]),e._v(" "),a("h1",{attrs:{id:"news-from-the-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),a("p",[e._v("Michael Amadi from Nimble Learn presented the "),a("a",{attrs:{href:"https://www.opendatablend.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Blend project"),a("OutboundLink")],1),e._v(" - a set of open data services that aim to make large and complex UK open data easier to analyse. Open Data Blend’s bulk data API is built on the Frictionless Data specs. Keep an eye out for an upcoming blog with more details!")]),e._v(" "),a("p",[e._v("Frictionless contributor Peter Desmet proposed to start a Frictionless Data community on Zenodo. We are currently discussing the best way to do that on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" in the "),a("em",[e._v("datasets")]),e._v(" channel. Join us there if you are interested or have ideas!")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Our next meeting will be on July 29"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from"),a("br"),e._v("\nDave Rowe on Public Libraries Open Data Schema. You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/4Kl_VBdbc5M",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/135.ba0f9f60.js b/assets/js/135.7f2aa5bc.js similarity index 98% rename from assets/js/135.ba0f9f60.js rename to assets/js/135.7f2aa5bc.js index 986d25f84..be20c219f 100644 --- a/assets/js/135.ba0f9f60.js +++ b/assets/js/135.7f2aa5bc.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[135],{668:function(e,t,o){"use strict";o.r(t);var n=o(29),a=Object(n.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("To say that I am proud of the "),o("RouterLink",{attrs:{to:"/blog/2020/09/01/hello-fellows-cohort2/"}},[e._v("second cohort of Frictionless Fellows")]),e._v(" is an understatement. Their insight, discussions, and breakthroughs have been a true joy to witness, and I feel so lucky to have had the chance to work and learn with each of them. Over the last 9 months, they not only learned about Frictionless Data tooling, how to make their research more reproducible, and how to advocate for open science, they also gave many presentations (some for the first time in public!), published papers, wrote dissertations, and gained confidence in their coding skills. I know each of them will be a leader in the open space, so keep an eye on them!")],1),e._v(" "),o("p",[e._v("As a final assignment, the Fellows have written blogs reflecting upon their experiences and what they learned during the programme. I’ve copied blurbs from each below, but be sure to click on the links to read more from each Fellow!")]),e._v(" "),o("ul",[o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Endings, Beginnings, and Reflections - by Anne Lee Steele"),o("OutboundLink")],1),o("br"),e._v("\n“What came out of this fellowship, as my colleagues have said time and time again, is much more than I ever could have imagined. Over the course of the past year, I’ve had fascinating debates with my cohort, and learned about how different disciplines unpack complex debates surrounding transparency, openness, and accessibility (as well as many other things). I’ve learned how to engage with the universe of open knowledge, and have even started working on my own related projects! With the support of OKF, I’ve learned how to give presentations in public, and think about data in ways I never had before.”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("A done deal - by Evelyn Night"),o("OutboundLink")],1),o("br"),e._v("\n“The fellowship was both exhilarating and educative. I got to engage in Open Science conversations, learned about and used frictionless tools like the Data Package Creator and Goodtables. I also navigated the open data landscape using CLI, Python, and git. I also got to engage in the Frictionless Community calls where software geniuses presented their work and also held Open science-centered conversations. These discussions enhanced my understanding of the Open Science movement and I felt a great honor to be involved in such meetings. I learned so much that the 9 months flew by.”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("A fellowship concludes - by Jacqueline Maasch"),o("OutboundLink")],1),o("br"),e._v("\n“It is hard to believe that my time as a Reproducible Research Fellow is over. I am most grateful for this program giving me a dedicated space in which to learn, a community with which to engage, and language with which to arm myself. I have been exposed to issues in open science that I had never encountered before, and have had the privilege of discussing these issues with people from across the world. I will miss the journal clubs the most!”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("My experience in the fellows program - a reflection - by Katerina Drakoulaki"),o("OutboundLink")],1),o("br"),e._v("\n“I got into the fellowship just with the hope of getting the opportunity to learn things I didn’t have the opportunity to learn on my own. That is, I did not have specific expectations, I was (and still am) grateful to be in. I feel that all the implicit expectations I might have had are all fulfilled. I got an amazing boost in my digital skills altogether and I know exactly why (no I did not gain a few IQ points). I was in a helpful community and I matured in a way that enabled me to have more of a growth mindset. I also saw other people ‘fail’, as in having their code not working and having to google the solution! I have to say all the readings, the discussions, the tutorials, the Frictionless tools have been amazing, but this shift in my mindset has been the greatest gift the fellowship has given me.”")])])]),e._v(" "),o("p",[e._v("Thank you Fellows! As a bonus, here are the reflections from the first cohort of Fellows: "),o("a",{attrs:{href:"https://blog.okfn.org/2020/06/09/reflecting-on-the-first-cohort-of-frictionless-data-reproducible-research-fellows/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2020/06/09/reflecting-on-the-first-cohort-of-frictionless-data-reproducible-research-fellows/"),o("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[135],{666:function(e,t,o){"use strict";o.r(t);var n=o(29),a=Object(n.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("To say that I am proud of the "),o("RouterLink",{attrs:{to:"/blog/2020/09/01/hello-fellows-cohort2/"}},[e._v("second cohort of Frictionless Fellows")]),e._v(" is an understatement. Their insight, discussions, and breakthroughs have been a true joy to witness, and I feel so lucky to have had the chance to work and learn with each of them. Over the last 9 months, they not only learned about Frictionless Data tooling, how to make their research more reproducible, and how to advocate for open science, they also gave many presentations (some for the first time in public!), published papers, wrote dissertations, and gained confidence in their coding skills. I know each of them will be a leader in the open space, so keep an eye on them!")],1),e._v(" "),o("p",[e._v("As a final assignment, the Fellows have written blogs reflecting upon their experiences and what they learned during the programme. I’ve copied blurbs from each below, but be sure to click on the links to read more from each Fellow!")]),e._v(" "),o("ul",[o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/anne-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Endings, Beginnings, and Reflections - by Anne Lee Steele"),o("OutboundLink")],1),o("br"),e._v("\n“What came out of this fellowship, as my colleagues have said time and time again, is much more than I ever could have imagined. Over the course of the past year, I’ve had fascinating debates with my cohort, and learned about how different disciplines unpack complex debates surrounding transparency, openness, and accessibility (as well as many other things). I’ve learned how to engage with the universe of open knowledge, and have even started working on my own related projects! With the support of OKF, I’ve learned how to give presentations in public, and think about data in ways I never had before.”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/evelyn-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("A done deal - by Evelyn Night"),o("OutboundLink")],1),o("br"),e._v("\n“The fellowship was both exhilarating and educative. I got to engage in Open Science conversations, learned about and used frictionless tools like the Data Package Creator and Goodtables. I also navigated the open data landscape using CLI, Python, and git. I also got to engage in the Frictionless Community calls where software geniuses presented their work and also held Open science-centered conversations. These discussions enhanced my understanding of the Open Science movement and I felt a great honor to be involved in such meetings. I learned so much that the 9 months flew by.”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/jacqueline-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("A fellowship concludes - by Jacqueline Maasch"),o("OutboundLink")],1),o("br"),e._v("\n“It is hard to believe that my time as a Reproducible Research Fellow is over. I am most grateful for this program giving me a dedicated space in which to learn, a community with which to engage, and language with which to arm myself. I have been exposed to issues in open science that I had never encountered before, and have had the privilege of discussing these issues with people from across the world. I will miss the journal clubs the most!”")])]),e._v(" "),o("li",[o("p",[o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/katerina-final-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("My experience in the fellows program - a reflection - by Katerina Drakoulaki"),o("OutboundLink")],1),o("br"),e._v("\n“I got into the fellowship just with the hope of getting the opportunity to learn things I didn’t have the opportunity to learn on my own. That is, I did not have specific expectations, I was (and still am) grateful to be in. I feel that all the implicit expectations I might have had are all fulfilled. I got an amazing boost in my digital skills altogether and I know exactly why (no I did not gain a few IQ points). I was in a helpful community and I matured in a way that enabled me to have more of a growth mindset. I also saw other people ‘fail’, as in having their code not working and having to google the solution! I have to say all the readings, the discussions, the tutorials, the Frictionless tools have been amazing, but this shift in my mindset has been the greatest gift the fellowship has given me.”")])])]),e._v(" "),o("p",[e._v("Thank you Fellows! As a bonus, here are the reflections from the first cohort of Fellows: "),o("a",{attrs:{href:"https://blog.okfn.org/2020/06/09/reflecting-on-the-first-cohort-of-frictionless-data-reproducible-research-fellows/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2020/06/09/reflecting-on-the-first-cohort-of-frictionless-data-reproducible-research-fellows/"),o("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/137.ee1d8ee5.js b/assets/js/137.9900e0e5.js similarity index 98% rename from assets/js/137.ee1d8ee5.js rename to assets/js/137.9900e0e5.js index a75cda13e..6eb85c93d 100644 --- a/assets/js/137.ee1d8ee5.js +++ b/assets/js/137.9900e0e5.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[137],{671:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Are you looking for a way to automate the validation workflows of your datasets? Look no further, Frictionless Repository is here!")]),e._v(" "),o("p",[e._v("We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Frictionless Repository. This is a Github Action allowing the continuous data validation of your repository and it will ensure the quality of your data by reporting any problems you might have with your datasets in no time.")]),e._v(" "),o("h2",{attrs:{id:"how-does-it-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-does-it-work"}},[e._v("#")]),e._v(" How does it work?")]),e._v(" "),o("p",[e._v("Every time you add or update any tabular data file in your repository, Frictionless Repository runs a validation. Missing header? Data type mismatch? You will get a neat, visual, human-readable validation report straight away, which will show any problems your data may have. The report lets you spot immediately where the error occurred, making it extremely easy to correct it. You can even get a Markdown Badge to display in your repository to show that your data is valid.")]),e._v(" "),o("p",[e._v("Frictionless Repository only requires a simple installation. It is completely serverless, and it doesn’t rely on any third-party hardware except for the Github infrastructure.")]),e._v(" "),o("h2",{attrs:{id:"let-s-go"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#let-s-go"}},[e._v("#")]),e._v(" Let’s go!")]),e._v(" "),o("p",[e._v("Before you get started, have a look at developer Evgeny Karev’s demo:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kXA4hmuF57c",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v(" ")]),e._v(" "),o("p",[e._v("We also encourage you to check out the dedicated "),o("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation website"),o("OutboundLink")],1),e._v(", to get more detailed information.")]),e._v(" "),o("h2",{attrs:{id:"what-do-you-think"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-do-you-think"}},[e._v("#")]),e._v(" What do you think?")]),e._v(" "),o("p",[e._v("If you use Frictionless Repository, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),o("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),o("OutboundLink")],1),e._v(" or by opening an issue in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[137],{668:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Are you looking for a way to automate the validation workflows of your datasets? Look no further, Frictionless Repository is here!")]),e._v(" "),o("p",[e._v("We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Frictionless Repository. This is a Github Action allowing the continuous data validation of your repository and it will ensure the quality of your data by reporting any problems you might have with your datasets in no time.")]),e._v(" "),o("h2",{attrs:{id:"how-does-it-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-does-it-work"}},[e._v("#")]),e._v(" How does it work?")]),e._v(" "),o("p",[e._v("Every time you add or update any tabular data file in your repository, Frictionless Repository runs a validation. Missing header? Data type mismatch? You will get a neat, visual, human-readable validation report straight away, which will show any problems your data may have. The report lets you spot immediately where the error occurred, making it extremely easy to correct it. You can even get a Markdown Badge to display in your repository to show that your data is valid.")]),e._v(" "),o("p",[e._v("Frictionless Repository only requires a simple installation. It is completely serverless, and it doesn’t rely on any third-party hardware except for the Github infrastructure.")]),e._v(" "),o("h2",{attrs:{id:"let-s-go"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#let-s-go"}},[e._v("#")]),e._v(" Let’s go!")]),e._v(" "),o("p",[e._v("Before you get started, have a look at developer Evgeny Karev’s demo:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kXA4hmuF57c",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v(" ")]),e._v(" "),o("p",[e._v("We also encourage you to check out the dedicated "),o("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation website"),o("OutboundLink")],1),e._v(", to get more detailed information.")]),e._v(" "),o("h2",{attrs:{id:"what-do-you-think"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-do-you-think"}},[e._v("#")]),e._v(" What do you think?")]),e._v(" "),o("p",[e._v("If you use Frictionless Repository, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on "),o("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),o("OutboundLink")],1),e._v(" or by opening an issue in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/138.ad482374.js b/assets/js/138.850d2795.js similarity index 99% rename from assets/js/138.ad482374.js rename to assets/js/138.850d2795.js index 6b3be36b9..d77ba14fb 100644 --- a/assets/js/138.ad482374.js +++ b/assets/js/138.850d2795.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[138],{670:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("em",[e._v("The Frictionless Data Reproducible Research "),o("a",{attrs:{href:"http://fellows.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fellows Program"),o("OutboundLink")],1),e._v(", supported by the Sloan Foundation, aims to train graduate students, postdoctoral scholars, and early career researchers how to become champions for open, reproducible research using Frictionless Data tools and approaches in their field.")])]),e._v(" "),o("h3",{attrs:{id:"apply-today-to-join-the-third-cohort-of-frictionless-data-fellows"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#apply-today-to-join-the-third-cohort-of-frictionless-data-fellows"}},[e._v("#")]),e._v(" Apply today to join the Third Cohort of Frictionless Data Fellows!")]),e._v(" "),o("p",[e._v("Fellows will learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. In addition to mentorship, we are providing Fellows with stipends of $5,000 to support their work and time during the nine-month long Fellowship. We welcome applications using this "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),o("OutboundLink")],1),e._v(" from 4th August until 31st August 2021, with the Fellowship starting in October. We value diversity and encourage applicants from communities that are under-represented in science and technology, people of colour, women, people with disabilities, and LGBTI+ individuals. Questions? Please read the "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/apply",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAQ"),o("OutboundLink")],1),e._v(", and feel free to email us ("),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(") if your question is not answered in the FAQ.")]),e._v(" "),o("h3",{attrs:{id:"frictionless-data-for-reproducible-research"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data-for-reproducible-research"}},[e._v("#")]),e._v(" Frictionless Data for Reproducible Research")]),e._v(" "),o("p",[e._v("The Fellowship is part of the "),o("a",{attrs:{href:"http://frictionlessdata.io/adoption/#frictionless-data-for-reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research"),o("OutboundLink")],1),e._v(" project at "),o("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),o("OutboundLink")],1),e._v(", and is the third iteration. Frictionless Data aims to reduce the friction often found when working with data, such as when data is poorly structured, incomplete, hard to find, or is archived in difficult to use formats. This project, funded by the Sloan Foundation and the Open Knowledge Foundation, applies our work to data-driven research disciplines, in order to help researchers and the research community resolve data workflow issues. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata. The Frictionless Data approach aims to address identified needs for improving data-driven research such as generalized, standard metadata formats, interoperable data, and open-source tooling for data validation.")]),e._v(" "),o("h3",{attrs:{id:"fellowship-program"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#fellowship-program"}},[e._v("#")]),e._v(" Fellowship program")]),e._v(" "),o("p",[e._v("During the Fellowship, our team will be on hand to work closely with you as you complete the work. We will help you learn Frictionless Data tooling and software, and provide you with resources to help you create workshops and presentations. Also, we will announce Fellows on the project website and will be publishing your blogs and workshops slides within our network channels. We will provide mentorship on how to work on an Open project, and will work with you to achieve your Fellowship goals. You can read more about the first two cohorts of the Programme in the Fellows blog: "),o("a",{attrs:{href:"http://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://fellows.frictionlessdata.io/blog/"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"how-to-apply"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-to-apply"}},[e._v("#")]),e._v(" How to apply")]),e._v(" "),o("p",[e._v("The Fund is open to early career research individuals, such as graduate students and postdoctoral scholars, anywhere in the world, and in any scientific discipline. Successful applicants will be enthusiastic about reproducible research and open science, have some experience with communications, writing, or giving presentations, and have some technical skills (basic experience with Python, R, or Matlab for example), but do not need to be technically proficient. If you are interested, but do not have all of the qualifications, we still encourage you to "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("apply"),o("OutboundLink")],1),e._v(". We welcome applications using this "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),o("OutboundLink")],1),e._v(" from 4th August until 31st August 2021.")]),e._v(" "),o("p",[e._v("If you have any questions, please email the team at "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(" and check out the "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/apply",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fellows FAQ section"),o("OutboundLink")],1),e._v(". "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("Apply"),o("OutboundLink")],1),e._v(" soon, and share with your networks!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[138],{672:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("em",[e._v("The Frictionless Data Reproducible Research "),o("a",{attrs:{href:"http://fellows.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fellows Program"),o("OutboundLink")],1),e._v(", supported by the Sloan Foundation, aims to train graduate students, postdoctoral scholars, and early career researchers how to become champions for open, reproducible research using Frictionless Data tools and approaches in their field.")])]),e._v(" "),o("h3",{attrs:{id:"apply-today-to-join-the-third-cohort-of-frictionless-data-fellows"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#apply-today-to-join-the-third-cohort-of-frictionless-data-fellows"}},[e._v("#")]),e._v(" Apply today to join the Third Cohort of Frictionless Data Fellows!")]),e._v(" "),o("p",[e._v("Fellows will learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. In addition to mentorship, we are providing Fellows with stipends of $5,000 to support their work and time during the nine-month long Fellowship. We welcome applications using this "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),o("OutboundLink")],1),e._v(" from 4th August until 31st August 2021, with the Fellowship starting in October. We value diversity and encourage applicants from communities that are under-represented in science and technology, people of colour, women, people with disabilities, and LGBTI+ individuals. Questions? Please read the "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/apply",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAQ"),o("OutboundLink")],1),e._v(", and feel free to email us ("),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(") if your question is not answered in the FAQ.")]),e._v(" "),o("h3",{attrs:{id:"frictionless-data-for-reproducible-research"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data-for-reproducible-research"}},[e._v("#")]),e._v(" Frictionless Data for Reproducible Research")]),e._v(" "),o("p",[e._v("The Fellowship is part of the "),o("a",{attrs:{href:"http://frictionlessdata.io/adoption/#frictionless-data-for-reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research"),o("OutboundLink")],1),e._v(" project at "),o("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),o("OutboundLink")],1),e._v(", and is the third iteration. Frictionless Data aims to reduce the friction often found when working with data, such as when data is poorly structured, incomplete, hard to find, or is archived in difficult to use formats. This project, funded by the Sloan Foundation and the Open Knowledge Foundation, applies our work to data-driven research disciplines, in order to help researchers and the research community resolve data workflow issues. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata. The Frictionless Data approach aims to address identified needs for improving data-driven research such as generalized, standard metadata formats, interoperable data, and open-source tooling for data validation.")]),e._v(" "),o("h3",{attrs:{id:"fellowship-program"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#fellowship-program"}},[e._v("#")]),e._v(" Fellowship program")]),e._v(" "),o("p",[e._v("During the Fellowship, our team will be on hand to work closely with you as you complete the work. We will help you learn Frictionless Data tooling and software, and provide you with resources to help you create workshops and presentations. Also, we will announce Fellows on the project website and will be publishing your blogs and workshops slides within our network channels. We will provide mentorship on how to work on an Open project, and will work with you to achieve your Fellowship goals. You can read more about the first two cohorts of the Programme in the Fellows blog: "),o("a",{attrs:{href:"http://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://fellows.frictionlessdata.io/blog/"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"how-to-apply"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-to-apply"}},[e._v("#")]),e._v(" How to apply")]),e._v(" "),o("p",[e._v("The Fund is open to early career research individuals, such as graduate students and postdoctoral scholars, anywhere in the world, and in any scientific discipline. Successful applicants will be enthusiastic about reproducible research and open science, have some experience with communications, writing, or giving presentations, and have some technical skills (basic experience with Python, R, or Matlab for example), but do not need to be technically proficient. If you are interested, but do not have all of the qualifications, we still encourage you to "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("apply"),o("OutboundLink")],1),e._v(". We welcome applications using this "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),o("OutboundLink")],1),e._v(" from 4th August until 31st August 2021.")]),e._v(" "),o("p",[e._v("If you have any questions, please email the team at "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(" and check out the "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/apply",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fellows FAQ section"),o("OutboundLink")],1),e._v(". "),o("a",{attrs:{href:"https://forms.gle/3t9EoHKWYUnBdzHF8",target:"_blank",rel:"noopener noreferrer"}},[e._v("Apply"),o("OutboundLink")],1),e._v(" soon, and share with your networks!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/14.b66cad77.js b/assets/js/14.0962b179.js similarity index 99% rename from assets/js/14.b66cad77.js rename to assets/js/14.0962b179.js index 74ae15d8d..0a192ceb3 100644 --- a/assets/js/14.b66cad77.js +++ b/assets/js/14.0962b179.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[14],{431:function(e,t,a){e.exports=a.p+"assets/img/openml-dashboard-intro.4d6e718b.png"},432:function(e,t,a){e.exports=a.p+"assets/img/openml-upload-data.79f2aa75.png"},433:function(e,t,a){e.exports=a.p+"assets/img/openml-dataset-list.9aa35113.png"},434:function(e,t,a){e.exports=a.p+"assets/img/openml-dataset-overview.14685b05.png"},582:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"http://openml.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenML"),o("OutboundLink")],1),e._v(" is an online platform and service for machine learning, whose goal is to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. People can upload and share data sets and questions (prediction tasks) on OpenML that they then collaboratively solve using machine learning algorithms.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://www.youtube.com/embed/1N3qATxXrpE",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(431),alt:""}}),o("OutboundLink")],1),o("br"),e._v(" "),o("em",[e._v("A brief introduction to openML")])]),e._v(" "),o("p",[e._v("We offer "),o("a",{attrs:{href:"https://www.openml.org/guide/api",target:"_blank",rel:"noopener noreferrer"}},[e._v("open source tools"),o("OutboundLink")],1),e._v(" to download data into your "),o("a",{attrs:{href:"https://www.openml.org/guide/integrations",target:"_blank",rel:"noopener noreferrer"}},[e._v("favorite machine learning environments"),o("OutboundLink")],1),e._v(" and work with it. You can then upload your results back onto the platform so that others can learn from you. If you have data, you can use OpenML to get insights on what machine learning method works well to answer your question. Machine Learners can use OpenML to find interesting data sets and questions that are relevant for others and also for machine learning research (e.g. learning how algorithms behave on different types of data sets).")]),e._v(" "),o("p",[e._v("Users typically store their data in all kinds of formats, which makes it hard to simplify the data upload process on OpenML. Currently we only allow data in ARFF format. We are looking to make it as easy as possible for users to upload data, download and work with data from OpenML while keeping the datasets in machine readable formats and availing metadata in easy to read formats for our users. We also like to avail datasets from other services on OpenML. Most of these external sources currently contain data in varied formats, but some i.e. "),o("a",{attrs:{href:"https://data.world/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.world"),o("OutboundLink")],1),e._v(" have started adopting and using "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),o("OutboundLink")],1),e._v(". You can read more about data.world’s adoption and use of data packages "),o("RouterLink",{attrs:{to:"/blog/2017/04/11/dataworld/"}},[e._v("here")]),e._v(" and "),o("a",{attrs:{href:"https://meta.data.world/try-this-frictionless-data-world-ad36b6422ceb",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")],1),e._v(" "),o("p",[o("a",{attrs:{href:"https://biteable.com/watch/upload-data-to-openml-1575659/4500a42627a119f548c7cb0ec3ec4a25ee8a576f",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(432),alt:""}}),o("OutboundLink")],1),o("br"),e._v(" "),o("em",[e._v("Learn how to upload data on OpenML in 1 minute")])]),e._v(" "),o("p",[e._v("We first heard about the Frictionless Data project through "),o("a",{attrs:{href:"https://schoolofdata.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),o("OutboundLink")],1),e._v(". One of the OpenML core members is also involved in School of Data and used data packages in one of the open data workshops from School of Data Switzerland. In the coming months, we are looking to adopt "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data specifications"),o("OutboundLink")],1),e._v(" to improve user friendliness on OpenML. We hope to make it possible for users to upload and connect datasets in "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages format"),o("OutboundLink")],1),e._v(". This will be a great shift because it would enable people to easily build and share machine learning models trained on any dataset in the frictionless data ecosystem.")]),e._v(" "),o("p",[e._v("OpenML currently works with tabular data in Attribute Relation File Format ("),o("a",{attrs:{href:"https://weka.wikispaces.com/ARFF+%28stable+version%29",target:"_blank",rel:"noopener noreferrer"}},[e._v("ARFF"),o("OutboundLink")],1),e._v(") accompanied by metadata in an XML or JSON file. It is actually very similar to Frictionless Data’s "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("tabular data package"),o("OutboundLink")],1),e._v(" specification, but with ARFF instead of csv.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(433),alt:""}}),o("br"),e._v(" "),o("em",[e._v("Image of dataset list on OpenML")])]),e._v(" "),o("p",[e._v("ARFF (Attribute-Relation File Format) is a CSV file with a header that lists the names of the attributes (columns) and their data types. Especially the latter is very important to do data analysis. For instance, say that you have a column with values 1,2,3. It is very important to know whether that is just a number (1,2,3 ice creams), a rank (1st, 2nd, 3rd place), or a category (item 1, item 2, item 3). This is missing from CSV data. ARFF also allows to connect multiple tables together, although we don’t really use this right now.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(434),alt:""}}),o("br"),e._v(" "),o("em",[e._v("Image of a dataset overview on openML")])]),e._v(" "),o("p",[e._v("The metadata is free-form information about the dataset. It is mostly key-value data, although some values are more structured. It is stored in our database and exported to simple JSON or XML. "),o("a",{attrs:{href:"https://www.openml.org/d/2/json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s an example"),o("OutboundLink")],1),e._v(". It covers basic information (textual description of the dataset, owner, format, license, et al) as well as statistics (number of instances, number of features, number of missing values, details about the data distribution, and results of simple machine learning algorithms run on the data), and summary statistics (mainly used for the quick overview plots).")]),e._v(" "),o("p",[e._v("We firmly believe that if data packages become the go-to specification for sharing data in scientific communities, accessibility to data that’s currently ‘hidden’ in data platforms and university libraries will improve vastly, and are keen to adopt and use the specification on OpenML in the coming months.")]),e._v(" "),o("p",[e._v("Interested in contributing to our quest to adopt the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package specification"),o("OutboundLink")],1),e._v(" as an import and export option for data on the OpenML platform? "),o("a",{attrs:{href:"https://github.com/openml/OpenML/issues/482",target:"_blank",rel:"noopener noreferrer"}},[e._v("Start here"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[14],{431:function(e,t,a){e.exports=a.p+"assets/img/openml-dashboard-intro.4d6e718b.png"},432:function(e,t,a){e.exports=a.p+"assets/img/openml-upload-data.79f2aa75.png"},433:function(e,t,a){e.exports=a.p+"assets/img/openml-dataset-list.9aa35113.png"},434:function(e,t,a){e.exports=a.p+"assets/img/openml-dataset-overview.14685b05.png"},581:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"http://openml.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenML"),o("OutboundLink")],1),e._v(" is an online platform and service for machine learning, whose goal is to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. People can upload and share data sets and questions (prediction tasks) on OpenML that they then collaboratively solve using machine learning algorithms.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://www.youtube.com/embed/1N3qATxXrpE",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(431),alt:""}}),o("OutboundLink")],1),o("br"),e._v(" "),o("em",[e._v("A brief introduction to openML")])]),e._v(" "),o("p",[e._v("We offer "),o("a",{attrs:{href:"https://www.openml.org/guide/api",target:"_blank",rel:"noopener noreferrer"}},[e._v("open source tools"),o("OutboundLink")],1),e._v(" to download data into your "),o("a",{attrs:{href:"https://www.openml.org/guide/integrations",target:"_blank",rel:"noopener noreferrer"}},[e._v("favorite machine learning environments"),o("OutboundLink")],1),e._v(" and work with it. You can then upload your results back onto the platform so that others can learn from you. If you have data, you can use OpenML to get insights on what machine learning method works well to answer your question. Machine Learners can use OpenML to find interesting data sets and questions that are relevant for others and also for machine learning research (e.g. learning how algorithms behave on different types of data sets).")]),e._v(" "),o("p",[e._v("Users typically store their data in all kinds of formats, which makes it hard to simplify the data upload process on OpenML. Currently we only allow data in ARFF format. We are looking to make it as easy as possible for users to upload data, download and work with data from OpenML while keeping the datasets in machine readable formats and availing metadata in easy to read formats for our users. We also like to avail datasets from other services on OpenML. Most of these external sources currently contain data in varied formats, but some i.e. "),o("a",{attrs:{href:"https://data.world/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.world"),o("OutboundLink")],1),e._v(" have started adopting and using "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),o("OutboundLink")],1),e._v(". You can read more about data.world’s adoption and use of data packages "),o("RouterLink",{attrs:{to:"/blog/2017/04/11/dataworld/"}},[e._v("here")]),e._v(" and "),o("a",{attrs:{href:"https://meta.data.world/try-this-frictionless-data-world-ad36b6422ceb",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")],1),e._v(" "),o("p",[o("a",{attrs:{href:"https://biteable.com/watch/upload-data-to-openml-1575659/4500a42627a119f548c7cb0ec3ec4a25ee8a576f",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(432),alt:""}}),o("OutboundLink")],1),o("br"),e._v(" "),o("em",[e._v("Learn how to upload data on OpenML in 1 minute")])]),e._v(" "),o("p",[e._v("We first heard about the Frictionless Data project through "),o("a",{attrs:{href:"https://schoolofdata.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),o("OutboundLink")],1),e._v(". One of the OpenML core members is also involved in School of Data and used data packages in one of the open data workshops from School of Data Switzerland. In the coming months, we are looking to adopt "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data specifications"),o("OutboundLink")],1),e._v(" to improve user friendliness on OpenML. We hope to make it possible for users to upload and connect datasets in "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages format"),o("OutboundLink")],1),e._v(". This will be a great shift because it would enable people to easily build and share machine learning models trained on any dataset in the frictionless data ecosystem.")]),e._v(" "),o("p",[e._v("OpenML currently works with tabular data in Attribute Relation File Format ("),o("a",{attrs:{href:"https://weka.wikispaces.com/ARFF+%28stable+version%29",target:"_blank",rel:"noopener noreferrer"}},[e._v("ARFF"),o("OutboundLink")],1),e._v(") accompanied by metadata in an XML or JSON file. It is actually very similar to Frictionless Data’s "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("tabular data package"),o("OutboundLink")],1),e._v(" specification, but with ARFF instead of csv.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(433),alt:""}}),o("br"),e._v(" "),o("em",[e._v("Image of dataset list on OpenML")])]),e._v(" "),o("p",[e._v("ARFF (Attribute-Relation File Format) is a CSV file with a header that lists the names of the attributes (columns) and their data types. Especially the latter is very important to do data analysis. For instance, say that you have a column with values 1,2,3. It is very important to know whether that is just a number (1,2,3 ice creams), a rank (1st, 2nd, 3rd place), or a category (item 1, item 2, item 3). This is missing from CSV data. ARFF also allows to connect multiple tables together, although we don’t really use this right now.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(434),alt:""}}),o("br"),e._v(" "),o("em",[e._v("Image of a dataset overview on openML")])]),e._v(" "),o("p",[e._v("The metadata is free-form information about the dataset. It is mostly key-value data, although some values are more structured. It is stored in our database and exported to simple JSON or XML. "),o("a",{attrs:{href:"https://www.openml.org/d/2/json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s an example"),o("OutboundLink")],1),e._v(". It covers basic information (textual description of the dataset, owner, format, license, et al) as well as statistics (number of instances, number of features, number of missing values, details about the data distribution, and results of simple machine learning algorithms run on the data), and summary statistics (mainly used for the quick overview plots).")]),e._v(" "),o("p",[e._v("We firmly believe that if data packages become the go-to specification for sharing data in scientific communities, accessibility to data that’s currently ‘hidden’ in data platforms and university libraries will improve vastly, and are keen to adopt and use the specification on OpenML in the coming months.")]),e._v(" "),o("p",[e._v("Interested in contributing to our quest to adopt the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package specification"),o("OutboundLink")],1),e._v(" as an import and export option for data on the OpenML platform? "),o("a",{attrs:{href:"https://github.com/openml/OpenML/issues/482",target:"_blank",rel:"noopener noreferrer"}},[e._v("Start here"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/140.d5db7070.js b/assets/js/140.08503e80.js similarity index 98% rename from assets/js/140.d5db7070.js rename to assets/js/140.08503e80.js index 7f286d984..27bf2ed05 100644 --- a/assets/js/140.d5db7070.js +++ b/assets/js/140.08503e80.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[140],{672:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("What happens to scientific data after it is generated? The answer is complicated - sometimes that data is shared with other researchers, sometimes it is hidden away on a private hard drive. Sharing research data is a key part of open science, the movement to make research more accessible and usable by everyone to drive faster advances in science. A great way to share research data is to upload it to a repository, but simply uploading data is not the final step here. Ideally, the uploaded data will be of high quality - that is, it won’t have errors or missing data, and it will have enough descriptive information that other researchers can also use it! Over the last 6 months, we collaborated with the data repository Dryad to make it easier for researchers to upload their high quality data for sharing.")]),e._v(" "),t("p",[t("a",{attrs:{href:"https://datadryad.org/stash",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad"),t("OutboundLink")],1),e._v(" is a community-led data repository that allows researchers to submit data from any field, which not only promotes open science, but also helps researchers comply with open data policies from funders and journals. Because Dryad accepts all kinds of data, they need to curate that data for quality and ensure that the data does not present risk, and have comprehensive metadata to reuse the data. We quickly realized our shared goals, and formed a Pilot collaboration to add Frictionless validation functionality to the Dryad data upload page. Both teams agreed how important it is to give researchers immediate feedback about their data as they are submitting it so they can make edits in that moment, and learn about data best practices.")]),e._v(" "),t("p",[e._v("The outcome of this collaboration is a revamped upload page for the Dryad application. Researchers uploading tabular data (CSV, XLS, XLSX) under 25MB will have the files automatically validated using the Frictionless tool. These checks are based on the built-in validation of Frictionless Framework (read the validation guide "),t("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/validation-guide",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v("), and include checking for data errors such as blank cells, missing headers, or incorrectly formatted data. The Frictionless report will help guide researchers on which issues should be resolved, allowing researchers to edit and re-upload files before submitting their dataset for curation and publication.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/128690898-2095f1c7-060d-4398-ac92-33f65c068c4c.png",alt:"Screen Shot 2021-08-06 at 8 10 41 AM"}}),t("br"),e._v(" "),t("em",[e._v("When a data file is uploaded, researchers can see if the data passed the Tabular Data Checks or if there are any issues. Clicking to “View 1 Issues” shows more details describing the error.")])]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/128690994-16be9845-59ec-4f3b-9b76-28a163dfa1e3.png",alt:"Screen Shot 2021-08-06 at 8 12 01 AM"}}),t("br"),e._v(" "),t("em",[e._v("This uploaded data file has a blank header. With this information, the researcher can fix the error and re-upload the data.")])]),e._v(" "),t("p",[e._v("This work was funded by the Sloan Foundation as part of the Frictionless Data for Reproducible Research project. This project was truly collaboratory - most of the technical work was completed by contractor Cassiano Reinert Novais dos Santos with supervision and support from the Dryad team: Daniella Lowenberg, Scott Fisher, Ryan Scherle, and the CDL UX team (Rachael Hu and John Kratz); as well as support from the Frictionless team, Evgeny Karev, Lilly Winfree, and Sara Petti. If you have any feedback on the Dryad upload page, please let us know!")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[140],{671:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("What happens to scientific data after it is generated? The answer is complicated - sometimes that data is shared with other researchers, sometimes it is hidden away on a private hard drive. Sharing research data is a key part of open science, the movement to make research more accessible and usable by everyone to drive faster advances in science. A great way to share research data is to upload it to a repository, but simply uploading data is not the final step here. Ideally, the uploaded data will be of high quality - that is, it won’t have errors or missing data, and it will have enough descriptive information that other researchers can also use it! Over the last 6 months, we collaborated with the data repository Dryad to make it easier for researchers to upload their high quality data for sharing.")]),e._v(" "),t("p",[t("a",{attrs:{href:"https://datadryad.org/stash",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad"),t("OutboundLink")],1),e._v(" is a community-led data repository that allows researchers to submit data from any field, which not only promotes open science, but also helps researchers comply with open data policies from funders and journals. Because Dryad accepts all kinds of data, they need to curate that data for quality and ensure that the data does not present risk, and have comprehensive metadata to reuse the data. We quickly realized our shared goals, and formed a Pilot collaboration to add Frictionless validation functionality to the Dryad data upload page. Both teams agreed how important it is to give researchers immediate feedback about their data as they are submitting it so they can make edits in that moment, and learn about data best practices.")]),e._v(" "),t("p",[e._v("The outcome of this collaboration is a revamped upload page for the Dryad application. Researchers uploading tabular data (CSV, XLS, XLSX) under 25MB will have the files automatically validated using the Frictionless tool. These checks are based on the built-in validation of Frictionless Framework (read the validation guide "),t("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/guides/validation-guide",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v("), and include checking for data errors such as blank cells, missing headers, or incorrectly formatted data. The Frictionless report will help guide researchers on which issues should be resolved, allowing researchers to edit and re-upload files before submitting their dataset for curation and publication.")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/128690898-2095f1c7-060d-4398-ac92-33f65c068c4c.png",alt:"Screen Shot 2021-08-06 at 8 10 41 AM"}}),t("br"),e._v(" "),t("em",[e._v("When a data file is uploaded, researchers can see if the data passed the Tabular Data Checks or if there are any issues. Clicking to “View 1 Issues” shows more details describing the error.")])]),e._v(" "),t("p",[t("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/128690994-16be9845-59ec-4f3b-9b76-28a163dfa1e3.png",alt:"Screen Shot 2021-08-06 at 8 12 01 AM"}}),t("br"),e._v(" "),t("em",[e._v("This uploaded data file has a blank header. With this information, the researcher can fix the error and re-upload the data.")])]),e._v(" "),t("p",[e._v("This work was funded by the Sloan Foundation as part of the Frictionless Data for Reproducible Research project. This project was truly collaboratory - most of the technical work was completed by contractor Cassiano Reinert Novais dos Santos with supervision and support from the Dryad team: Daniella Lowenberg, Scott Fisher, Ryan Scherle, and the CDL UX team (Rachael Hu and John Kratz); as well as support from the Frictionless team, Evgeny Karev, Lilly Winfree, and Sara Petti. If you have any feedback on the Dryad upload page, please let us know!")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/141.94553000.js b/assets/js/141.b4624c5c.js similarity index 98% rename from assets/js/141.94553000.js rename to assets/js/141.b4624c5c.js index cb22d9498..e3da9e494 100644 --- a/assets/js/141.94553000.js +++ b/assets/js/141.b4624c5c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[141],{675:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on August 12"),r("sup",[e._v("th")]),e._v(" we had Dave Rowe (aka Libraries Hacked) giving a presentation on Frictionless Data standards and tooling for public libraries’ data.")]),e._v(" "),r("p",[e._v("Libraries Hacked is a project promoting open data in libraries and creating digital prototypes from that data. Public libraries hold a lot of data, but this data is often not shared and it is lacking common standards for data sharing. With the introduction of data schemas, Dave developed a series of tools to show libraries what they could do with their data. For example, Dave demonstrated membership mapping, libraries maps and a "),r("a",{attrs:{href:"https://www.mobilelibraries.org/map",target:"_blank",rel:"noopener noreferrer"}},[e._v("mobile libraries dashboard"),r("OutboundLink")],1),e._v(" that displays mobile libraries vans, estimates their location and automatically generates paper timelines.")]),e._v(" "),r("p",[e._v("You can learn more about the Libraries Hacked project "),r("a",{attrs:{href:"https://www.librarieshacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about what you can do with Frictionless library data, , you can watch Dave Rowe’ presentation here::")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/R0U9Iwd8J00",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("h3",{attrs:{id:"frictionless-hackathon-in-october"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-in-october"}},[e._v("#")]),e._v(" Frictionless Hackathon in October!")]),e._v(" "),r("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),r("br"),e._v("\nWe need to decide on a date to hold this event, and are currently considering Thursday and Fridays in October. You can vote on Discord."),r("br"),e._v("\nKeep an eye on the website for more info: "),r("a",{attrs:{href:"https://frictionlessdata.io/hackathon/#what-s-a-hackathon",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/hackathon/#what-s-a-hackathon"),r("OutboundLink")],1)]),e._v(" "),r("h3",{attrs:{id:"recruiting-the-3rd-cohort-of-frictionless-fellows"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#recruiting-the-3rd-cohort-of-frictionless-fellows"}},[e._v("#")]),e._v(" Recruiting the 3rd cohort of Frictionless Fellows")]),e._v(" "),r("p",[e._v("Are you an early career researcher interested in Open Science? We are recruiting the 3rd cohort of Frictionless Fellows! During their 9-month Fellowship, Fellows will lead training workshops, host events at universities and in labs, and write blogs and other communications content. You will be mentored by Frictionless Data product manager Lilly Winfree, PhD and we will help you learn Frictionless Data tooling and software. Applications are open until August 31"),r("sup",[e._v("st")]),e._v("."),r("br"),e._v("\nMore info "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/02/apply-fellows/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),r("br"),e._v("\nYou can apply via this "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSdR1Qz5GL5A1BrqgFxDBOXScvNoS5AeyCWixNwtcApXUttT8Q/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h1",{attrs:{id:"join-us-in-2-weeks"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-2-weeks"}},[e._v("#")]),e._v(" Join us in 2 weeks!")]),e._v(" "),r("p",[e._v("Yes, that’s right, August is our lucky month, we don’t have one, but two community calls! Our next meeting will be in just 2 weeks, on August 26"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from"),r("br"),e._v("\nAmber York and Adam Shepherd from BCO-DMO on Frictionless Data Pipelines. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/IGhcP2dDNIg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[141],{673:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our last Frictionless Data community call on August 12"),r("sup",[e._v("th")]),e._v(" we had Dave Rowe (aka Libraries Hacked) giving a presentation on Frictionless Data standards and tooling for public libraries’ data.")]),e._v(" "),r("p",[e._v("Libraries Hacked is a project promoting open data in libraries and creating digital prototypes from that data. Public libraries hold a lot of data, but this data is often not shared and it is lacking common standards for data sharing. With the introduction of data schemas, Dave developed a series of tools to show libraries what they could do with their data. For example, Dave demonstrated membership mapping, libraries maps and a "),r("a",{attrs:{href:"https://www.mobilelibraries.org/map",target:"_blank",rel:"noopener noreferrer"}},[e._v("mobile libraries dashboard"),r("OutboundLink")],1),e._v(" that displays mobile libraries vans, estimates their location and automatically generates paper timelines.")]),e._v(" "),r("p",[e._v("You can learn more about the Libraries Hacked project "),r("a",{attrs:{href:"https://www.librarieshacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about what you can do with Frictionless library data, , you can watch Dave Rowe’ presentation here::")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/R0U9Iwd8J00",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),r("h3",{attrs:{id:"frictionless-hackathon-in-october"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-in-october"}},[e._v("#")]),e._v(" Frictionless Hackathon in October!")]),e._v(" "),r("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),r("br"),e._v("\nWe need to decide on a date to hold this event, and are currently considering Thursday and Fridays in October. You can vote on Discord."),r("br"),e._v("\nKeep an eye on the website for more info: "),r("a",{attrs:{href:"https://frictionlessdata.io/hackathon/#what-s-a-hackathon",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/hackathon/#what-s-a-hackathon"),r("OutboundLink")],1)]),e._v(" "),r("h3",{attrs:{id:"recruiting-the-3rd-cohort-of-frictionless-fellows"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#recruiting-the-3rd-cohort-of-frictionless-fellows"}},[e._v("#")]),e._v(" Recruiting the 3rd cohort of Frictionless Fellows")]),e._v(" "),r("p",[e._v("Are you an early career researcher interested in Open Science? We are recruiting the 3rd cohort of Frictionless Fellows! During their 9-month Fellowship, Fellows will lead training workshops, host events at universities and in labs, and write blogs and other communications content. You will be mentored by Frictionless Data product manager Lilly Winfree, PhD and we will help you learn Frictionless Data tooling and software. Applications are open until August 31"),r("sup",[e._v("st")]),e._v("."),r("br"),e._v("\nMore info "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/02/apply-fellows/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),r("br"),e._v("\nYou can apply via this "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSdR1Qz5GL5A1BrqgFxDBOXScvNoS5AeyCWixNwtcApXUttT8Q/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h1",{attrs:{id:"join-us-in-2-weeks"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-2-weeks"}},[e._v("#")]),e._v(" Join us in 2 weeks!")]),e._v(" "),r("p",[e._v("Yes, that’s right, August is our lucky month, we don’t have one, but two community calls! Our next meeting will be in just 2 weeks, on August 26"),r("sup",[e._v("th")]),e._v(". We will hear a presentation from"),r("br"),e._v("\nAmber York and Adam Shepherd from BCO-DMO on Frictionless Data Pipelines. You can sign up "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/IGhcP2dDNIg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/142.3e6ea283.js b/assets/js/142.b178ea6a.js similarity index 98% rename from assets/js/142.3e6ea283.js rename to assets/js/142.b178ea6a.js index 535e709f6..55e6d03d4 100644 --- a/assets/js/142.3e6ea283.js +++ b/assets/js/142.b178ea6a.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[142],{674:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last Frictionless Data community call on August 26"),o("sup",[e._v("th")]),e._v(" we had Amber York and Adam Shepherd from BCO-DMO giving a presentation on Frictionless Data Pipelines for Ocean Science.")]),e._v(" "),o("p",[e._v("BCO-DMO is a biological and chemical oceanography data management office, working with scientists to make sure that their data is publicly available and archived for everyone else to use.")]),e._v(" "),o("p",[e._v("BCO-DMO processes around 500 datasets a year, with all sorts of variability. In the beginning the staff was writing ad hoc scripts and software to process that data, but that quickly became a challenge, as the catalogue continued to grow in both size and the variety of data types it curates.")]),e._v(" "),o("p",[e._v("Having worked for several years with Frictionless Data, BCO-DMO identified the Data Package Pipelines (DPP) project in the Frictionless toolkit as key to overcoming those challenges and achieving its data curation goals."),o("br"),e._v("\nTogether with the Frictionless Data team at Open Knowledge Foundation, BCO-DMO developed Laminar, a web application to create Frictionless Data Package Pipelines. Laminar helps data managers process data efficiently while recording the provenance of their activities to support reproducibility of results")]),e._v(" "),o("p",[e._v("You can learn more on the project "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about Frictionless Data Pipelines, you can watch Amber York’s and Adam Shepherd’s presentation:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/R0U9Iwd8J00",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),o("h3",{attrs:{id:"frictionless-hackathon-on-7-8-october"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-on-7-8-october"}},[e._v("#")]),e._v(" Frictionless Hackathon on 7-8 October!")]),e._v(" "),o("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),o("br"),e._v("\nWe are currently accepting project submissions, so if you have a cool project in mind, using based on existing Frictionless open source code, this could be an excellent opportunity to prototype it, together with other Frictionless users from all around the world. You can pitch anything - your idea doesn’t need to be complete/fully planned. We can also help you formulate a project if you have an idea but aren’t sure about it. You can also submit ideas for existing projects you need help with!")]),e._v(" "),o("p",[e._v("Use "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSdd41pbfWaCYQHkQNTaf49kht1cUg7_Tg-NzqdP11pHWrD7yA/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(" to submit your project."),o("br"),e._v("\nKeep an eye "),o("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("on the website"),o("OutboundLink")],1),e._v(" for more info.")]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Our next meeting will be on September 30"),o("sup",[e._v("th")]),e._v(", exceptionally one hour later than usual. We will hear a presentation from Daniella Lowenberg and Cassiano Reinert Novais dos Santos on the Frictionless Data validation implemented for the Dryad application.")]),e._v(" "),o("p",[e._v("You can sign up "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/-y6njoJPMbE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("As usual, you can join us on "),o("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),o("OutboundLink")],1),e._v(" or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[142],{670:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last Frictionless Data community call on August 26"),o("sup",[e._v("th")]),e._v(" we had Amber York and Adam Shepherd from BCO-DMO giving a presentation on Frictionless Data Pipelines for Ocean Science.")]),e._v(" "),o("p",[e._v("BCO-DMO is a biological and chemical oceanography data management office, working with scientists to make sure that their data is publicly available and archived for everyone else to use.")]),e._v(" "),o("p",[e._v("BCO-DMO processes around 500 datasets a year, with all sorts of variability. In the beginning the staff was writing ad hoc scripts and software to process that data, but that quickly became a challenge, as the catalogue continued to grow in both size and the variety of data types it curates.")]),e._v(" "),o("p",[e._v("Having worked for several years with Frictionless Data, BCO-DMO identified the Data Package Pipelines (DPP) project in the Frictionless toolkit as key to overcoming those challenges and achieving its data curation goals."),o("br"),e._v("\nTogether with the Frictionless Data team at Open Knowledge Foundation, BCO-DMO developed Laminar, a web application to create Frictionless Data Package Pipelines. Laminar helps data managers process data efficiently while recording the provenance of their activities to support reproducibility of results")]),e._v(" "),o("p",[e._v("You can learn more on the project "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about Frictionless Data Pipelines, you can watch Amber York’s and Adam Shepherd’s presentation:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/R0U9Iwd8J00",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),o("h3",{attrs:{id:"frictionless-hackathon-on-7-8-october"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-on-7-8-october"}},[e._v("#")]),e._v(" Frictionless Hackathon on 7-8 October!")]),e._v(" "),o("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),o("br"),e._v("\nWe are currently accepting project submissions, so if you have a cool project in mind, using based on existing Frictionless open source code, this could be an excellent opportunity to prototype it, together with other Frictionless users from all around the world. You can pitch anything - your idea doesn’t need to be complete/fully planned. We can also help you formulate a project if you have an idea but aren’t sure about it. You can also submit ideas for existing projects you need help with!")]),e._v(" "),o("p",[e._v("Use "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSdd41pbfWaCYQHkQNTaf49kht1cUg7_Tg-NzqdP11pHWrD7yA/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(" to submit your project."),o("br"),e._v("\nKeep an eye "),o("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("on the website"),o("OutboundLink")],1),e._v(" for more info.")]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Our next meeting will be on September 30"),o("sup",[e._v("th")]),e._v(", exceptionally one hour later than usual. We will hear a presentation from Daniella Lowenberg and Cassiano Reinert Novais dos Santos on the Frictionless Data validation implemented for the Dryad application.")]),e._v(" "),o("p",[e._v("You can sign up "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/-y6njoJPMbE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("As usual, you can join us on "),o("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),o("OutboundLink")],1),e._v(" or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/143.fec8edd4.js b/assets/js/143.8f069550.js similarity index 98% rename from assets/js/143.fec8edd4.js rename to assets/js/143.8f069550.js index c62cb74db..d74962505 100644 --- a/assets/js/143.fec8edd4.js +++ b/assets/js/143.8f069550.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[143],{673:function(e,t,o){"use strict";o.r(t);var r=o(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("The Frictionless Data Online Hackathon is fast approaching and we just can’t wait for it to start!")]),e._v(" "),o("p",[e._v("If you are not sure yet whether to participate or not, bear in mind that it will be a great opportunity to test some of the newest Frictionless tools, like Livemark, Repository, play around with Frictionless-py and other new Frictionless code. It will also be a great chance for you to meet other Frictionless users and contributors from all around the world and build a project prototype together.")]),e._v(" "),o("p",[o("strong",[e._v("Not convinced yet? Go and explore the proposed projects on the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/event/1#top",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dashboard"),o("OutboundLink")],1),e._v("! You will see, there is a project for every taste, so surely there must be one that sounds right for you!")])]),e._v(" "),o("p",[e._v("Are you a big fan of geodata? In that case you will probably want to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/9",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless-geojson team"),o("OutboundLink")],1),e._v(", who is planning to create a frictionless-py plugin to add support for reading, writing and inlining geojson. If you are a devoted CKAN user who would like to see more Frictionless functionalities in it, you may decide to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/8",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data package manager for CKAN project"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In case you read our "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/06/22/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog about Livemark"),o("OutboundLink")],1),e._v(" and have been intrigued by this new Frictionless tool ever since, your moment has come! You can finally try it out by joining the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/11",target:"_blank",rel:"noopener noreferrer"}},[e._v("Citation Context Reports"),o("OutboundLink")],1),e._v(", the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dataset List"),o("OutboundLink")],1),e._v(" project, or the "),o("a",{attrs:{href:"http://frictionless-hackathon.herokuapp.com/project/12",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Community Insights"),o("OutboundLink")],1),e._v(" project. If you are interested in datasets discoverability and linkage, you may want to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/10",target:"_blank",rel:"noopener noreferrer"}},[e._v("Things not Datasets"),o("OutboundLink")],1),e._v(" team.")]),e._v(" "),o("p",[e._v("Oh, and please let us know in advance if you are a big bugs smasher! You will be a coveted participant for all projects and we need to make sure everybody gets a fair share of your skills, including us in our effort to improve the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/4",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Python Framework"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("But enough of describing the projects, instead hear about them directly from the people who proposed them:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kO0YflqQQzI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("Hurry up to register for the hackathon if you haven’t done so yet, you can do it only until the end of this week via "),o("a",{attrs:{href:"https://forms.gle/Xr4gcnQnhShMJrWeA",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("More information on the Frictionless Data Hackathon is available on the "),o("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("dedicated webpage"),o("OutboundLink")],1),e._v(". You can also follow news on the day itself through "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(": #FrictionlessHackathon and #FrictionlessHack2021.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[143],{675:function(e,t,o){"use strict";o.r(t);var r=o(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("The Frictionless Data Online Hackathon is fast approaching and we just can’t wait for it to start!")]),e._v(" "),o("p",[e._v("If you are not sure yet whether to participate or not, bear in mind that it will be a great opportunity to test some of the newest Frictionless tools, like Livemark, Repository, play around with Frictionless-py and other new Frictionless code. It will also be a great chance for you to meet other Frictionless users and contributors from all around the world and build a project prototype together.")]),e._v(" "),o("p",[o("strong",[e._v("Not convinced yet? Go and explore the proposed projects on the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/event/1#top",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dashboard"),o("OutboundLink")],1),e._v("! You will see, there is a project for every taste, so surely there must be one that sounds right for you!")])]),e._v(" "),o("p",[e._v("Are you a big fan of geodata? In that case you will probably want to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/9",target:"_blank",rel:"noopener noreferrer"}},[e._v("frictionless-geojson team"),o("OutboundLink")],1),e._v(", who is planning to create a frictionless-py plugin to add support for reading, writing and inlining geojson. If you are a devoted CKAN user who would like to see more Frictionless functionalities in it, you may decide to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/8",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data package manager for CKAN project"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In case you read our "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/06/22/livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog about Livemark"),o("OutboundLink")],1),e._v(" and have been intrigued by this new Frictionless tool ever since, your moment has come! You can finally try it out by joining the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/11",target:"_blank",rel:"noopener noreferrer"}},[e._v("Citation Context Reports"),o("OutboundLink")],1),e._v(", the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dataset List"),o("OutboundLink")],1),e._v(" project, or the "),o("a",{attrs:{href:"http://frictionless-hackathon.herokuapp.com/project/12",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Community Insights"),o("OutboundLink")],1),e._v(" project. If you are interested in datasets discoverability and linkage, you may want to join the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/10",target:"_blank",rel:"noopener noreferrer"}},[e._v("Things not Datasets"),o("OutboundLink")],1),e._v(" team.")]),e._v(" "),o("p",[e._v("Oh, and please let us know in advance if you are a big bugs smasher! You will be a coveted participant for all projects and we need to make sure everybody gets a fair share of your skills, including us in our effort to improve the "),o("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/project/4",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Python Framework"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("But enough of describing the projects, instead hear about them directly from the people who proposed them:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kO0YflqQQzI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("Hurry up to register for the hackathon if you haven’t done so yet, you can do it only until the end of this week via "),o("a",{attrs:{href:"https://forms.gle/Xr4gcnQnhShMJrWeA",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("More information on the Frictionless Data Hackathon is available on the "),o("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("dedicated webpage"),o("OutboundLink")],1),e._v(". You can also follow news on the day itself through "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(": #FrictionlessHackathon and #FrictionlessHack2021.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/144.ec36c430.js b/assets/js/144.b22aad7a.js similarity index 98% rename from assets/js/144.ec36c430.js rename to assets/js/144.b22aad7a.js index f6f031f94..21d1546c4 100644 --- a/assets/js/144.ec36c430.js +++ b/assets/js/144.b22aad7a.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[144],{676:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on September 30"),a("sup",[e._v("th")]),e._v(" we had Daniella Lowenberg from Dryad and developer Cassiano Reinert Novais dos Santos giving a presentation on the Frictionless Data integration into Dryad.")]),e._v(" "),a("p",[e._v("Dryad is a community-led repository that makes research data discoverable, freely reusable, and citable. To ensure the quality of the submitted data, Dryad needs to curate it. It therefore made total sense to integrate the Frictionless Data validation functionality to its uploading page.")]),e._v(" "),a("p",[e._v("A pilot was started at the beginning of 2021 to add an automatic tabular data validation check to all uploaded files under 25MB, and it went live in June 2021. Since then, more than 11000 research data files have been validated, and around 1000 failed the validation test. 98,4% of the researchers whose files failed, managed to fix their errors easily and resubmit their data.")]),e._v(" "),a("p",[e._v("All the code of the Frictionless Data integration is open source and lives in the "),a("a",{attrs:{href:"https://github.com/orgs/CDL-Dryad/repositories",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad GitHub repository"),a("OutboundLink")],1),e._v(", so go and have a look if you want and please let us know if you have any feedback.")]),e._v(" "),a("p",[e._v("You can learn more on the project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/09/dryad-pilot/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless Data validation functionality integrated into Dryad, you can watch Daniella Lowenberg’s and Cassiano Reinert Novais dos Santos’ presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/IHVUjWGh2oY",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("h3",{attrs:{id:"frictionless-hackathon-on-7-8-october"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-on-7-8-october"}},[e._v("#")]),e._v(" Frictionless Hackathon on 7-8 October!")]),e._v(" "),a("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),a("br"),e._v("\nGo and explore the dashboard to know more about all the projects we plan to work on."),a("br"),e._v("\nFor general information, just go to the "),a("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("dedicated page"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nWe are accepting last minute registrations "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[e._v("via this form"),a("OutboundLink")],1),e._v(", so hurry up if you want to be on board!")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Our next meeting will be on October 28"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from Michael Amadi on Open Data Blend datasets powered by Frictionless Data.")]),e._v(" "),a("p",[e._v("Ahead of our next call, you can learn more about Open Data Blend "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/07/12/open-data-blend/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/V3SJcq_XYIA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[144],{674:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on September 30"),a("sup",[e._v("th")]),e._v(" we had Daniella Lowenberg from Dryad and developer Cassiano Reinert Novais dos Santos giving a presentation on the Frictionless Data integration into Dryad.")]),e._v(" "),a("p",[e._v("Dryad is a community-led repository that makes research data discoverable, freely reusable, and citable. To ensure the quality of the submitted data, Dryad needs to curate it. It therefore made total sense to integrate the Frictionless Data validation functionality to its uploading page.")]),e._v(" "),a("p",[e._v("A pilot was started at the beginning of 2021 to add an automatic tabular data validation check to all uploaded files under 25MB, and it went live in June 2021. Since then, more than 11000 research data files have been validated, and around 1000 failed the validation test. 98,4% of the researchers whose files failed, managed to fix their errors easily and resubmit their data.")]),e._v(" "),a("p",[e._v("All the code of the Frictionless Data integration is open source and lives in the "),a("a",{attrs:{href:"https://github.com/orgs/CDL-Dryad/repositories",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad GitHub repository"),a("OutboundLink")],1),e._v(", so go and have a look if you want and please let us know if you have any feedback.")]),e._v(" "),a("p",[e._v("You can learn more on the project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/09/dryad-pilot/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about the Frictionless Data validation functionality integrated into Dryad, you can watch Daniella Lowenberg’s and Cassiano Reinert Novais dos Santos’ presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/IHVUjWGh2oY",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("h3",{attrs:{id:"frictionless-hackathon-on-7-8-october"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-hackathon-on-7-8-october"}},[e._v("#")]),e._v(" Frictionless Hackathon on 7-8 October!")]),e._v(" "),a("p",[e._v("Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!"),a("br"),e._v("\nGo and explore the dashboard to know more about all the projects we plan to work on."),a("br"),e._v("\nFor general information, just go to the "),a("a",{attrs:{href:"https://frictionlessdata.io/hackathon/",target:"_blank",rel:"noopener noreferrer"}},[e._v("dedicated page"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nWe are accepting last minute registrations "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[e._v("via this form"),a("OutboundLink")],1),e._v(", so hurry up if you want to be on board!")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Our next meeting will be on October 28"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from Michael Amadi on Open Data Blend datasets powered by Frictionless Data.")]),e._v(" "),a("p",[e._v("Ahead of our next call, you can learn more about Open Data Blend "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/07/12/open-data-blend/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/V3SJcq_XYIA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/145.2bef085b.js b/assets/js/145.60921a8f.js similarity index 99% rename from assets/js/145.2bef085b.js rename to assets/js/145.60921a8f.js index 70922607f..614032a7e 100644 --- a/assets/js/145.2bef085b.js +++ b/assets/js/145.60921a8f.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[145],{677:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("The first (of many we hope!) Frictionless Data Hackathon is over, and it was great! Many thanks to all who helped make it such a success the past week.")]),t._v(" "),a("p",[t._v("The prize for the best project, voted by the participants, went to the DPCKAN team. Well done André, Andrés, Carolina, Daniel, Francisco and Gabriel!"),a("br"),t._v(" "),a("em",[t._v("”I feel pretty happy after this frictionless hackathon experience. We’ve grown in 2 days more than it could have been possible in one month. The knowledge and experience exchange was remarkable.”")]),t._v(", said the winning team.")]),t._v(" "),a("p",[t._v("It was also great to see participants who had never taken part in a hackathon before being enthusiastic about it. "),a("em",[t._v("”I loved the helpfulness of the community members, as well as the diversity of participants.”")])]),t._v(" "),a("p",[a("em",[t._v("“It was such a great opportunity to network with other people interested in data quality and open data!”")])]),t._v(" "),a("p",[a("em",[t._v("”It was amazing to see a weightless tool used in development. I want to learn more about it and integrate it into my projects.”")])]),t._v(" "),a("p",[t._v("Over 20 people signed up for the hackathon from Africa, Asia, Europe, South America and North America. We had a very diverse audience and saw a lot of new faces. The event ran from 7th to 8th October on our Discord server. The result of those 2 days of intense collaboration were four great projects:")]),t._v(" "),a("h2",{attrs:{id:"dpckan"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dpckan"}},[t._v("#")]),t._v(" DPCKAN")]),t._v(" "),a("p",[t._v("The DPCKAN project was proposed by a team working on the data portal of the state of Minas Gerais in Brazil. To ensure quality metadata and automate the publishing process, the team decided to develop a tool that would allow publishing and updating datasets described with Frictionless Standards in a CKAN instance.")]),t._v(" "),a("p",[t._v("The main objectives for the hackathon were to refine the package update functions and clean up the documentation.")]),t._v(" "),a("p",[t._v("You can check out the project’s "),a("a",{attrs:{href:"https://github.com/dados-mg/dpckan",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub repository"),a("OutboundLink")],1),t._v(" to see the improvements that were made during the hackathon.")]),t._v(" "),a("h2",{attrs:{id:"frictionless-tutorials"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-tutorials"}},[t._v("#")]),t._v(" Frictionless Tutorials")]),t._v(" "),a("p",[t._v("The main objective of this project was to write new tutorials using the Python Frictionless Framework. The team not only created a tutorial, but also wrote "),a("a",{attrs:{href:"https://docs.google.com/document/d/1zbWMmIeU8DUwzGaEih0JGJ-DMGug5-2UksRN1x4fvj8/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[t._v("more detailed instructions"),a("OutboundLink")],1),t._v(" on how to create new tutorials for future contributors.")]),t._v(" "),a("p",[t._v("You can have a look at the tutorial written during the hackathon "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1tTtynfnExykcTYon1j6Y8OgzQZEXpQvP?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"covid-tracker"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#covid-tracker"}},[t._v("#")]),t._v(" Covid tracker")]),t._v(" "),a("p",[t._v("The main objective of this project was to test Livemark, one of the newest Frictionless tools, with real data and provide an example of all its functionalities. Besides the charts and tables, the information is available on an interactive map, which also takes into account the accuracy of the official data.")]),t._v(" "),a("p",[t._v("You can have a look at the Covid Tracker "),a("a",{attrs:{href:"https://covid-tracker.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"frictionless-community-insight"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-community-insight"}},[t._v("#")]),t._v(" Frictionless Community Insight")]),t._v(" "),a("p",[t._v("The objective of this project, proposed by the Frictionless core team, was to build a "),a("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark"),a("OutboundLink")],1),t._v(" website telling a story about the Frictionless Data community using the data from the community survey we ran in September.")]),t._v(" "),a("p",[t._v("The main goals for the hackathon were to clean the data from the survey, visualise it and display it as a story on the Livemark website.")]),t._v(" "),a("p",[t._v("You can have a look at the "),a("a",{attrs:{href:"https://community-insights.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("draft website"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Four other great projects started the hackathon but did not finish it:")]),t._v(" "),a("p",[a("strong",[t._v("Dataset List")]),t._v(", another Livemark project to list all the datapackages on GitHub, "),a("strong",[t._v("Frictionless Geojson")]),t._v(", an extension to add GeoJSON read and write support in frictionless-py, "),a("strong",[t._v("Improve Frictionless Data Python Framework")]),t._v(", a project to get familiar with the codebase, and "),a("strong",[t._v("Citation Context Reports")]),t._v(", a project to create Frictionless data schemas for scholarly citations data.")]),t._v(" "),a("p",[t._v("Interestingly, one of the participants started off his own project during the hackathon, building a Discord matrix bridge to allow Frictionless users and contributors to join the community Discord chat using an Open standard. Even if the Matrix did not participate in the voting, it still is a notable project. If you are interested in knowing more about it you can have a look at "),a("a",{attrs:{href:"https://github.com/frictionlessdata/project/issues/698",target:"_blank",rel:"noopener noreferrer"}},[t._v("this GitHub issue"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("On the last day of the hackathon, one hour before the end of the event, the teams pitched their projects. Here’s a recording of the event if you missed it and want to have a look:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/PKRKldaUB5U",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),a("p",[t._v("Thanks again to all those who took part in the hackathon and contributed with their time and enthusiasm to make it so great. We can’t wait for the next hack already!")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[145],{676:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("The first (of many we hope!) Frictionless Data Hackathon is over, and it was great! Many thanks to all who helped make it such a success the past week.")]),t._v(" "),a("p",[t._v("The prize for the best project, voted by the participants, went to the DPCKAN team. Well done André, Andrés, Carolina, Daniel, Francisco and Gabriel!"),a("br"),t._v(" "),a("em",[t._v("”I feel pretty happy after this frictionless hackathon experience. We’ve grown in 2 days more than it could have been possible in one month. The knowledge and experience exchange was remarkable.”")]),t._v(", said the winning team.")]),t._v(" "),a("p",[t._v("It was also great to see participants who had never taken part in a hackathon before being enthusiastic about it. "),a("em",[t._v("”I loved the helpfulness of the community members, as well as the diversity of participants.”")])]),t._v(" "),a("p",[a("em",[t._v("“It was such a great opportunity to network with other people interested in data quality and open data!”")])]),t._v(" "),a("p",[a("em",[t._v("”It was amazing to see a weightless tool used in development. I want to learn more about it and integrate it into my projects.”")])]),t._v(" "),a("p",[t._v("Over 20 people signed up for the hackathon from Africa, Asia, Europe, South America and North America. We had a very diverse audience and saw a lot of new faces. The event ran from 7th to 8th October on our Discord server. The result of those 2 days of intense collaboration were four great projects:")]),t._v(" "),a("h2",{attrs:{id:"dpckan"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dpckan"}},[t._v("#")]),t._v(" DPCKAN")]),t._v(" "),a("p",[t._v("The DPCKAN project was proposed by a team working on the data portal of the state of Minas Gerais in Brazil. To ensure quality metadata and automate the publishing process, the team decided to develop a tool that would allow publishing and updating datasets described with Frictionless Standards in a CKAN instance.")]),t._v(" "),a("p",[t._v("The main objectives for the hackathon were to refine the package update functions and clean up the documentation.")]),t._v(" "),a("p",[t._v("You can check out the project’s "),a("a",{attrs:{href:"https://github.com/dados-mg/dpckan",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub repository"),a("OutboundLink")],1),t._v(" to see the improvements that were made during the hackathon.")]),t._v(" "),a("h2",{attrs:{id:"frictionless-tutorials"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-tutorials"}},[t._v("#")]),t._v(" Frictionless Tutorials")]),t._v(" "),a("p",[t._v("The main objective of this project was to write new tutorials using the Python Frictionless Framework. The team not only created a tutorial, but also wrote "),a("a",{attrs:{href:"https://docs.google.com/document/d/1zbWMmIeU8DUwzGaEih0JGJ-DMGug5-2UksRN1x4fvj8/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[t._v("more detailed instructions"),a("OutboundLink")],1),t._v(" on how to create new tutorials for future contributors.")]),t._v(" "),a("p",[t._v("You can have a look at the tutorial written during the hackathon "),a("a",{attrs:{href:"https://colab.research.google.com/drive/1tTtynfnExykcTYon1j6Y8OgzQZEXpQvP?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"covid-tracker"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#covid-tracker"}},[t._v("#")]),t._v(" Covid tracker")]),t._v(" "),a("p",[t._v("The main objective of this project was to test Livemark, one of the newest Frictionless tools, with real data and provide an example of all its functionalities. Besides the charts and tables, the information is available on an interactive map, which also takes into account the accuracy of the official data.")]),t._v(" "),a("p",[t._v("You can have a look at the Covid Tracker "),a("a",{attrs:{href:"https://covid-tracker.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"frictionless-community-insight"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-community-insight"}},[t._v("#")]),t._v(" Frictionless Community Insight")]),t._v(" "),a("p",[t._v("The objective of this project, proposed by the Frictionless core team, was to build a "),a("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark"),a("OutboundLink")],1),t._v(" website telling a story about the Frictionless Data community using the data from the community survey we ran in September.")]),t._v(" "),a("p",[t._v("The main goals for the hackathon were to clean the data from the survey, visualise it and display it as a story on the Livemark website.")]),t._v(" "),a("p",[t._v("You can have a look at the "),a("a",{attrs:{href:"https://community-insights.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("draft website"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Four other great projects started the hackathon but did not finish it:")]),t._v(" "),a("p",[a("strong",[t._v("Dataset List")]),t._v(", another Livemark project to list all the datapackages on GitHub, "),a("strong",[t._v("Frictionless Geojson")]),t._v(", an extension to add GeoJSON read and write support in frictionless-py, "),a("strong",[t._v("Improve Frictionless Data Python Framework")]),t._v(", a project to get familiar with the codebase, and "),a("strong",[t._v("Citation Context Reports")]),t._v(", a project to create Frictionless data schemas for scholarly citations data.")]),t._v(" "),a("p",[t._v("Interestingly, one of the participants started off his own project during the hackathon, building a Discord matrix bridge to allow Frictionless users and contributors to join the community Discord chat using an Open standard. Even if the Matrix did not participate in the voting, it still is a notable project. If you are interested in knowing more about it you can have a look at "),a("a",{attrs:{href:"https://github.com/frictionlessdata/project/issues/698",target:"_blank",rel:"noopener noreferrer"}},[t._v("this GitHub issue"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("On the last day of the hackathon, one hour before the end of the event, the teams pitched their projects. Here’s a recording of the event if you missed it and want to have a look:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/PKRKldaUB5U",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),a("p",[t._v("Thanks again to all those who took part in the hackathon and contributed with their time and enthusiasm to make it so great. We can’t wait for the next hack already!")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/146.e3253dea.js b/assets/js/146.35d328f6.js similarity index 98% rename from assets/js/146.e3253dea.js rename to assets/js/146.35d328f6.js index 5e1c99489..b41c6f5c8 100644 --- a/assets/js/146.e3253dea.js +++ b/assets/js/146.35d328f6.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[146],{678:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on October 28"),a("sup",[e._v("th")]),e._v(" we had Michael Amadi from Nimble Learn giving a presentation on Open Data Blend and their Frictionless Data journey.")]),e._v(" "),a("p",[e._v("Open Data Blend is a set of open data services that aim to make large and complex UK open data easier to analyse. The Open Data Blend datasets have two interfaces: a UI and an API, both powered by Frictionless Data. The datasets themselves are built on top of three Frictionless Data specifications: data package, data resource and table schema; and they incorporate some Frictionless Data patterns.")]),e._v(" "),a("p",[e._v("The project addresses some of the main open data challenges:")]),e._v(" "),a("ul",[a("li",[e._v("Large data volumes that are difficult to manage due to their size")]),e._v(" "),a("li",[e._v("Overwhelming complexity in data analysis")]),e._v(" "),a("li",[e._v("Open data shared in sub-optimal file formats for data analysis (e.g. PDFs)")]),e._v(" "),a("li",[e._v("When companies and organisation aggregate data, refine it and add value to it, they often don’t openly share the cleaned data")])]),e._v(" "),a("p",[e._v("You can learn more on the project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/07/12/open-data-blend/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about how Open Data Blend uses the Frictionless Data toolkit, you can watch Michael Amadi’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/avAXe3SUEKI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("ul",[a("li",[e._v("Senior developer Evgeny Karev presented Livemark at PyData on October 29"),a("sup",[e._v("th")]),e._v(". If you missed it and want to have a look, check out the recording "),a("a",{attrs:{href:"https://zoom.us/rec/play/yyFTEAW3_v4cPGUNbiHS95-vlgICgNYeVdK_N9VHOdHxLDoKbTE9EZvbVpZMjIV8-WAr3qmZ9vZPoVsU.QXvKRI1hOrCwv8Lg?startTime=1635487241000&_x_zm_rtaid=iuuaYWHFSEec21FRLG7Cig.1635861744121.d2b5a7e329a988e4ea49b64e3d6e66b6&_x_zm_rhtaid=460",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" (for Livemark jump at 1:03:03).")]),e._v(" "),a("li",[e._v("The third cohort of Frictionless Fellows has officially kicked off mid-October. You will get to meet them next year during one of our community calls. Meanwhile, stay tuned to know more about them!")]),e._v(" "),a("li",[e._v("We don’t have any presentation planned for the December community call yet. Would you like to present something? Drop us a line to let us know!")])]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is one week earlier than usual (to avoid conflict with American Thanksgiving), on November 18"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from Peter Desmet on Frictionless Data exchange format for camera trapping data.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/MFffZRM8qjs",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[146],{677:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on October 28"),a("sup",[e._v("th")]),e._v(" we had Michael Amadi from Nimble Learn giving a presentation on Open Data Blend and their Frictionless Data journey.")]),e._v(" "),a("p",[e._v("Open Data Blend is a set of open data services that aim to make large and complex UK open data easier to analyse. The Open Data Blend datasets have two interfaces: a UI and an API, both powered by Frictionless Data. The datasets themselves are built on top of three Frictionless Data specifications: data package, data resource and table schema; and they incorporate some Frictionless Data patterns.")]),e._v(" "),a("p",[e._v("The project addresses some of the main open data challenges:")]),e._v(" "),a("ul",[a("li",[e._v("Large data volumes that are difficult to manage due to their size")]),e._v(" "),a("li",[e._v("Overwhelming complexity in data analysis")]),e._v(" "),a("li",[e._v("Open data shared in sub-optimal file formats for data analysis (e.g. PDFs)")]),e._v(" "),a("li",[e._v("When companies and organisation aggregate data, refine it and add value to it, they often don’t openly share the cleaned data")])]),e._v(" "),a("p",[e._v("You can learn more on the project "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/07/12/open-data-blend/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". If you would like to dive deeper and discover all about how Open Data Blend uses the Frictionless Data toolkit, you can watch Michael Amadi’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/avAXe3SUEKI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("ul",[a("li",[e._v("Senior developer Evgeny Karev presented Livemark at PyData on October 29"),a("sup",[e._v("th")]),e._v(". If you missed it and want to have a look, check out the recording "),a("a",{attrs:{href:"https://zoom.us/rec/play/yyFTEAW3_v4cPGUNbiHS95-vlgICgNYeVdK_N9VHOdHxLDoKbTE9EZvbVpZMjIV8-WAr3qmZ9vZPoVsU.QXvKRI1hOrCwv8Lg?startTime=1635487241000&_x_zm_rtaid=iuuaYWHFSEec21FRLG7Cig.1635861744121.d2b5a7e329a988e4ea49b64e3d6e66b6&_x_zm_rhtaid=460",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" (for Livemark jump at 1:03:03).")]),e._v(" "),a("li",[e._v("The third cohort of Frictionless Fellows has officially kicked off mid-October. You will get to meet them next year during one of our community calls. Meanwhile, stay tuned to know more about them!")]),e._v(" "),a("li",[e._v("We don’t have any presentation planned for the December community call yet. Would you like to present something? Drop us a line to let us know!")])]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is one week earlier than usual (to avoid conflict with American Thanksgiving), on November 18"),a("sup",[e._v("th")]),e._v(". We will hear a presentation from Peter Desmet on Frictionless Data exchange format for camera trapping data.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/MFffZRM8qjs",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/147.e57196d8.js b/assets/js/147.8f111805.js similarity index 98% rename from assets/js/147.e57196d8.js rename to assets/js/147.8f111805.js index c0620d8cd..270ca2118 100644 --- a/assets/js/147.e57196d8.js +++ b/assets/js/147.8f111805.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[147],{680:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on November 18"),a("sup",[e._v("th")]),e._v(" we had Peter Desmet from the Research Institute for Nature and Forest (INBO) giving a presentation on Frictionless Data exchange format for camera trapping data.")]),e._v(" "),a("p",[e._v("Camera trapping is a non-invasive wildlife monitoring technique generating more and more data in the last few years. Darwin Core, a well established standard in the biodiversity field, does not capture the full scope of camera trapping data (e.g. it does not express your camera setup) and it is therefore not ideal. To tackle this problem, the camera trapped data package was developed, using Frictionless Data standards. The camera trapped data package is both a "),a("strong",[e._v("model")]),e._v(" and a "),a("strong",[e._v("format")]),e._v(" to exchange camera trapping data, and it is designed to capture all the essential data and metadata of camera trap studies.")]),e._v(" "),a("p",[e._v("The camera trap data package model includes:")]),e._v(" "),a("ul",[a("li",[e._v("Metadata about the project")]),e._v(" "),a("li",[e._v("Deployments info about the location, the camera and the time")]),e._v(" "),a("li",[e._v("Media including the file url, the timestamp and if it is a sequence")]),e._v(" "),a("li",[e._v("Observation about the file (Is it blank? What kind of animal can we see? etc…)")])]),e._v(" "),a("p",[e._v("The format is similar to a Frictionless Data data package. It includes: "),a("strong",[e._v("metadata")]),e._v(" about the project and the data package structure, "),a("strong",[e._v("csv files")]),e._v(" for the deployments, the media captured in the deployments, and the observations in those media.")]),e._v(" "),a("p",[e._v("If you would like to dive deeper and discover all about the Frictionless Data exchange format for camera trapping data, you can watch Peter Desmet’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Pi_kbQ_KYiM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also find Peter’s presentation deck "),a("a",{attrs:{href:"https://speakerdeck.com/peterdesmet/camtrap-dp-using-frictionless-standards-for-a-camera-trapping-data-exchange-format",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("We are part of the organisation of the FOSDEM DevRoom Open Research Tools & Technologies this year too. We would love to have someone from the Frictionless community giving a talk. If you are interested please let us know! We are very happy to help you structure your idea, if needed. Calls for participation will be issued soon. Keep an eye on "),a("a",{attrs:{href:"https://fosdem.org/2022/news/2021-11-02-devroom-cfp/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this page"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is one week earlier than usual, on December 16"),a("sup",[e._v("th")]),e._v(", because of the Winter holidays. Keith Hughitt is going to present some ideas around representing data processing flows as a DAG inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/DQ4hpARBVSE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[147],{678:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On our last Frictionless Data community call on November 18"),a("sup",[e._v("th")]),e._v(" we had Peter Desmet from the Research Institute for Nature and Forest (INBO) giving a presentation on Frictionless Data exchange format for camera trapping data.")]),e._v(" "),a("p",[e._v("Camera trapping is a non-invasive wildlife monitoring technique generating more and more data in the last few years. Darwin Core, a well established standard in the biodiversity field, does not capture the full scope of camera trapping data (e.g. it does not express your camera setup) and it is therefore not ideal. To tackle this problem, the camera trapped data package was developed, using Frictionless Data standards. The camera trapped data package is both a "),a("strong",[e._v("model")]),e._v(" and a "),a("strong",[e._v("format")]),e._v(" to exchange camera trapping data, and it is designed to capture all the essential data and metadata of camera trap studies.")]),e._v(" "),a("p",[e._v("The camera trap data package model includes:")]),e._v(" "),a("ul",[a("li",[e._v("Metadata about the project")]),e._v(" "),a("li",[e._v("Deployments info about the location, the camera and the time")]),e._v(" "),a("li",[e._v("Media including the file url, the timestamp and if it is a sequence")]),e._v(" "),a("li",[e._v("Observation about the file (Is it blank? What kind of animal can we see? etc…)")])]),e._v(" "),a("p",[e._v("The format is similar to a Frictionless Data data package. It includes: "),a("strong",[e._v("metadata")]),e._v(" about the project and the data package structure, "),a("strong",[e._v("csv files")]),e._v(" for the deployments, the media captured in the deployments, and the observations in those media.")]),e._v(" "),a("p",[e._v("If you would like to dive deeper and discover all about the Frictionless Data exchange format for camera trapping data, you can watch Peter Desmet’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Pi_kbQ_KYiM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also find Peter’s presentation deck "),a("a",{attrs:{href:"https://speakerdeck.com/peterdesmet/camtrap-dp-using-frictionless-standards-for-a-camera-trapping-data-exchange-format",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("We are part of the organisation of the FOSDEM DevRoom Open Research Tools & Technologies this year too. We would love to have someone from the Frictionless community giving a talk. If you are interested please let us know! We are very happy to help you structure your idea, if needed. Calls for participation will be issued soon. Keep an eye on "),a("a",{attrs:{href:"https://fosdem.org/2022/news/2021-11-02-devroom-cfp/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this page"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is one week earlier than usual, on December 16"),a("sup",[e._v("th")]),e._v(", because of the Winter holidays. Keith Hughitt is going to present some ideas around representing data processing flows as a DAG inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/DQ4hpARBVSE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/148.f186d227.js b/assets/js/148.e091719e.js similarity index 99% rename from assets/js/148.f186d227.js rename to assets/js/148.e091719e.js index 528746cb4..8e252b588 100644 --- a/assets/js/148.f186d227.js +++ b/assets/js/148.e091719e.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[148],{679:function(e,a,t){"use strict";t.r(a);var i=t(29),r=Object(i.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("We are very excited to introduce you to the 3rd cohort of "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Reproducible Research Fellows"),t("OutboundLink")],1),e._v("! Over the coming months, this group of six early career researchers will be learning about open science, data management, and how to use Frictionless Data tooling in their work to make their data more open and their research more reusable. Keep an eye on them, as they are on their way to becoming champions of reproducibility! For now, go and read the introductory blogs they wrote about themselves to know more about them and their goals for this fellowship.")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/GQ.jpeg",width:"200px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi, everyone! My name is Guo-Qiang Zhang")]),e._v(", and I am from China. Right after I finished my residency training in Pediatrics, I joined Prof Bright I. Nwaru’s group and started my doctoral studies at Krefting Research Centre in University of Gothenburg (Sweden). My doctoral project is to look at the effects of sex hormones on women’s health (especially asthma), utilizing epidemiological methods as well as evidence synthesis tools (e.g., systematic review, umbrella review).")]),e._v(" "),t("p",[e._v("In my first year of doctoral studies, I had the opportunity to participate in the course “Reproducibility in Medical Research” led by Prof Nwaru. It was my first time to hear about Open Science and research reproducibility. As a “fresh” full-time doctoral student full of passion for medical research, I felt overwhelmed by waves of frustration when I came to know the reproducibility crisis. After spending some time with my frustration, I came to realize that in fact I can do something. In my first project, my colleagues and I conducted an umbrella review on a highly controversial topic on the impact of menopausal hormone therapy on women’s health. We put extensive efforts into making the review process as transparent as possible: we developed beforehand protocols for data extraction and statistical analysis, documented key steps of the review process, verified data in the published literature, and made all datasets and R scripts publicly available."),t("br"),e._v("\nTo keep on reading about Guo-Qiang click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-guo-qiang/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Victoria.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi all! My name is Victoria")]),e._v(". I’m a physics graduate student and recovering engineer living in Berlin. I grew up mainly in my family’s native country of Singapore, but consider myself an American, and am still workshopping a straightforward answer to “where are you from.”")]),e._v(" "),t("p",[e._v("In my past life I worked in materials QA testing; currently, I’m at the German Aerospace Centre designing laser systems in the THz range - a type of non-visible light that hangs out on the electromagnetic spectrum between infrared and microwave.")]),e._v(" "),t("p",[e._v("My Open Science journey has just begun and I’m stoked! I started to get interested in topics around data transparency and accessibility after a series of escalating frustrations with information dynamics in medical technology, beginning in my own field of gas sensing, then discovering similar disparities in tangential fields."),t("br"),e._v("\nRead more about Victoria "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-victoria/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Zarena.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hello everybody! My name is Zarena")]),e._v(". I grew up in the Kyrgyz Republic, yet spent half of my life studying and working abroad. Currently, I am a Research Assistant for the project Creating Culturally Appropriate Research Ethics in Central Asia (CARE) at Nazarbayev University in Kazakhstan. I am also a Mad activist and an interdisciplinary human rights researcher. I like to consider my research activities going beyond academia to encompass and make an effect on broader socio-political structures.")]),e._v(" "),t("p",[e._v("Although I believe that life would not progress without frictions, when it comes to science and research, I feel, ‘frictions’ - manifested in a form of paywalls, bureaucratic and corporate management, or other structural barriers - should be deconstructed. So, I am joining the Frictionless Data Fellowship Programme with the purpose to learn more about open and FAIR research."),t("br"),e._v("\nLearn more about Zarena "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-zarena/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Melvin.jpeg",width:"120px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi everyone, my name is Melvin Ochieng")]),e._v(", and I’m a pathologist and up-coming soil scientist. I was born in Kenya in a town called Eldoret that is famous for producing Kenyan marathon champions. I was raised in Kenya in my early childhood days and in Tanzania afterwards. I like to consider myself both Kenyan and Tanzanian at heart because the two countries took part in molding the person I am today. I am currently a masters student at University of Mohammed VI polytechnic in Morocco, studying fertilizer science and technology. Over the past two years, my research focused on potato cyst nematode (PCN) which is a quarantine pest that had been reported in Kenya in 2015.")]),e._v(" "),t("p",[e._v("I’m excited to start this journey as a Frictionless Data fellow with my fellows for this cohort. I just recently found out about open science and I couldn’t be more excited to learn more about this concept and how it will influence me as a researcher. Advancement in technology has opened up the world in so many ways and made possible extensive networks for collaborations globally. Notably, the problems the world is facing today require a global/collaborative approach to solve. Therefore, reproducible research is of key importance in promoting this collaboration."),t("br"),e._v("\nTo know more about Melvin click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-melvin/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Kevin-Photo.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hello! My name is Kevin Kidambasi")]),e._v("(KK). I was born and raised in Vihiga County of western Kenya. Currently, I live in Nairobi, the capital city of Kenya. I am a master’s student in Jomo Kenyatta University of Agriculture and Technology (JKUAT) registered at the department of Biochemistry. My MSc research at the International Centre of Insect Physiology and Ecology (icipe) focuses on the role of haematophagous camel-specific biting keds (Hippobosca camelina) in disease transmission in Laisamis, Marsabit County of northern Kenya. My broad research interest focuses on studying host-pathogen interactions to understand infection mechanisms of diseases in order to discover novel control and treatment targets.")]),e._v(" "),t("p",[e._v("I am interested in improving research reproducibility because it allows other researchers to confirm the accuracy of my data and correct any bias as well as validate the relevance of the conclusions drawn from the results. This also allows data to be analyzed in different ways and thus, give new insights and lead the research in new directions. In addition, improving research reproducibility would allow the scientific community to understand how the conclusions of a study were made and pinpoint out any mistakes in data analyses. In general, research reproducibility enhances openness, research collaboration, and data accessibility which in turn increase public trust in science and hence permits their participation and support for research. This enables public understanding of how research is conducted and its importance."),t("br"),e._v("\nRead more about Kevin "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-kevin/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Lindsay.jpeg",width:"120px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Greetings! My name is Lindsay Gypin")]),e._v(", she/her. I grew up in Denver, Colorado and began my career as a K-12 educator. I taught high school English and worked as a school librarian before becoming disillusioned with the politicization of public education and determining my skills were better suited for work in public libraries. Attending library school after having worked in libraries for so many years, I found myself drawn to courses in the research data management track of librarianship, and in qualitative research methods.I recently became a Data Services Librarian at the University of North Carolina Greensboro, where I hope to assist scholars in making their research data more open and accessible.")]),e._v(" "),t("p",[e._v("For some time, I have wanted to build a reproducible workflow to uncover systemic bias in library catalogs. I’m hoping the Fellows Programme will help me build the foundation to do so."),t("br"),e._v("\nTo learn more about Lindsay click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-lindsay/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[148],{681:function(e,a,t){"use strict";t.r(a);var i=t(29),r=Object(i.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("We are very excited to introduce you to the 3rd cohort of "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Reproducible Research Fellows"),t("OutboundLink")],1),e._v("! Over the coming months, this group of six early career researchers will be learning about open science, data management, and how to use Frictionless Data tooling in their work to make their data more open and their research more reusable. Keep an eye on them, as they are on their way to becoming champions of reproducibility! For now, go and read the introductory blogs they wrote about themselves to know more about them and their goals for this fellowship.")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/GQ.jpeg",width:"200px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi, everyone! My name is Guo-Qiang Zhang")]),e._v(", and I am from China. Right after I finished my residency training in Pediatrics, I joined Prof Bright I. Nwaru’s group and started my doctoral studies at Krefting Research Centre in University of Gothenburg (Sweden). My doctoral project is to look at the effects of sex hormones on women’s health (especially asthma), utilizing epidemiological methods as well as evidence synthesis tools (e.g., systematic review, umbrella review).")]),e._v(" "),t("p",[e._v("In my first year of doctoral studies, I had the opportunity to participate in the course “Reproducibility in Medical Research” led by Prof Nwaru. It was my first time to hear about Open Science and research reproducibility. As a “fresh” full-time doctoral student full of passion for medical research, I felt overwhelmed by waves of frustration when I came to know the reproducibility crisis. After spending some time with my frustration, I came to realize that in fact I can do something. In my first project, my colleagues and I conducted an umbrella review on a highly controversial topic on the impact of menopausal hormone therapy on women’s health. We put extensive efforts into making the review process as transparent as possible: we developed beforehand protocols for data extraction and statistical analysis, documented key steps of the review process, verified data in the published literature, and made all datasets and R scripts publicly available."),t("br"),e._v("\nTo keep on reading about Guo-Qiang click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-guo-qiang/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Victoria.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi all! My name is Victoria")]),e._v(". I’m a physics graduate student and recovering engineer living in Berlin. I grew up mainly in my family’s native country of Singapore, but consider myself an American, and am still workshopping a straightforward answer to “where are you from.”")]),e._v(" "),t("p",[e._v("In my past life I worked in materials QA testing; currently, I’m at the German Aerospace Centre designing laser systems in the THz range - a type of non-visible light that hangs out on the electromagnetic spectrum between infrared and microwave.")]),e._v(" "),t("p",[e._v("My Open Science journey has just begun and I’m stoked! I started to get interested in topics around data transparency and accessibility after a series of escalating frustrations with information dynamics in medical technology, beginning in my own field of gas sensing, then discovering similar disparities in tangential fields."),t("br"),e._v("\nRead more about Victoria "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-victoria/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Zarena.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hello everybody! My name is Zarena")]),e._v(". I grew up in the Kyrgyz Republic, yet spent half of my life studying and working abroad. Currently, I am a Research Assistant for the project Creating Culturally Appropriate Research Ethics in Central Asia (CARE) at Nazarbayev University in Kazakhstan. I am also a Mad activist and an interdisciplinary human rights researcher. I like to consider my research activities going beyond academia to encompass and make an effect on broader socio-political structures.")]),e._v(" "),t("p",[e._v("Although I believe that life would not progress without frictions, when it comes to science and research, I feel, ‘frictions’ - manifested in a form of paywalls, bureaucratic and corporate management, or other structural barriers - should be deconstructed. So, I am joining the Frictionless Data Fellowship Programme with the purpose to learn more about open and FAIR research."),t("br"),e._v("\nLearn more about Zarena "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-zarena/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Melvin.jpeg",width:"120px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hi everyone, my name is Melvin Ochieng")]),e._v(", and I’m a pathologist and up-coming soil scientist. I was born in Kenya in a town called Eldoret that is famous for producing Kenyan marathon champions. I was raised in Kenya in my early childhood days and in Tanzania afterwards. I like to consider myself both Kenyan and Tanzanian at heart because the two countries took part in molding the person I am today. I am currently a masters student at University of Mohammed VI polytechnic in Morocco, studying fertilizer science and technology. Over the past two years, my research focused on potato cyst nematode (PCN) which is a quarantine pest that had been reported in Kenya in 2015.")]),e._v(" "),t("p",[e._v("I’m excited to start this journey as a Frictionless Data fellow with my fellows for this cohort. I just recently found out about open science and I couldn’t be more excited to learn more about this concept and how it will influence me as a researcher. Advancement in technology has opened up the world in so many ways and made possible extensive networks for collaborations globally. Notably, the problems the world is facing today require a global/collaborative approach to solve. Therefore, reproducible research is of key importance in promoting this collaboration."),t("br"),e._v("\nTo know more about Melvin click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-melvin/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Kevin-Photo.jpeg",width:"140px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Hello! My name is Kevin Kidambasi")]),e._v("(KK). I was born and raised in Vihiga County of western Kenya. Currently, I live in Nairobi, the capital city of Kenya. I am a master’s student in Jomo Kenyatta University of Agriculture and Technology (JKUAT) registered at the department of Biochemistry. My MSc research at the International Centre of Insect Physiology and Ecology (icipe) focuses on the role of haematophagous camel-specific biting keds (Hippobosca camelina) in disease transmission in Laisamis, Marsabit County of northern Kenya. My broad research interest focuses on studying host-pathogen interactions to understand infection mechanisms of diseases in order to discover novel control and treatment targets.")]),e._v(" "),t("p",[e._v("I am interested in improving research reproducibility because it allows other researchers to confirm the accuracy of my data and correct any bias as well as validate the relevance of the conclusions drawn from the results. This also allows data to be analyzed in different ways and thus, give new insights and lead the research in new directions. In addition, improving research reproducibility would allow the scientific community to understand how the conclusions of a study were made and pinpoint out any mistakes in data analyses. In general, research reproducibility enhances openness, research collaboration, and data accessibility which in turn increase public trust in science and hence permits their participation and support for research. This enables public understanding of how research is conducted and its importance."),t("br"),e._v("\nRead more about Kevin "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-kevin/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("img",{staticStyle:{margin:"10px",border:"5px solid black"},attrs:{src:"/img/blog/Lindsay.jpeg",width:"120px",align:"left"}}),e._v(" "),t("p",[t("strong",[e._v("Greetings! My name is Lindsay Gypin")]),e._v(", she/her. I grew up in Denver, Colorado and began my career as a K-12 educator. I taught high school English and worked as a school librarian before becoming disillusioned with the politicization of public education and determining my skills were better suited for work in public libraries. Attending library school after having worked in libraries for so many years, I found myself drawn to courses in the research data management track of librarianship, and in qualitative research methods.I recently became a Data Services Librarian at the University of North Carolina Greensboro, where I hope to assist scholars in making their research data more open and accessible.")]),e._v(" "),t("p",[e._v("For some time, I have wanted to build a reproducible workflow to uncover systemic bias in library catalogs. I’m hoping the Fellows Programme will help me build the foundation to do so."),t("br"),e._v("\nTo learn more about Lindsay click "),t("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/hello-lindsay/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),t("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/149.c02b9d00.js b/assets/js/149.457cfdc1.js similarity index 98% rename from assets/js/149.c02b9d00.js rename to assets/js/149.457cfdc1.js index 3ba09cda7..8f2eeac92 100644 --- a/assets/js/149.c02b9d00.js +++ b/assets/js/149.457cfdc1.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[149],{683:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On the last Frictionless Data community call of the year, on December 16"),a("sup",[e._v("th")]),e._v(", we had Keith Hughitt from the National Cancer Institute (NCI) sharing (and demoing) his ideas around representing data processing flows as a DAG (Directed Acyclic Graph) inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.")]),e._v(" "),a("p",[e._v("Keith started thinking about this when he realised that cleaning and processing data are not obvious processes, on the contrary, there is a lot of bias in them. The decisions made to clean the raw data are not generally included in the publications and are not made available in any transparent way. To allow collaboration and reproducibility, Keith thought of embedding and annotated data provenance DAG in a datapackage.json using the Frictionless specs.")]),e._v(" "),a("p",[e._v("The basic process Keith has in mind to solve this problem is:")]),e._v(" "),a("ul",[a("li",[e._v("The data provenance is encoded as a DAG in the metadata")]),e._v(" "),a("li",[e._v("For each step in processing the workflow, the previous DAG is copied and extended")]),e._v(" "),a("li",[e._v("Each node of the DAG represents a dataset at a particular stage of processing, and it can be associated with annotations, views")]),e._v(" "),a("li",[e._v("Datapackages would be generated and associated with each node")]),e._v(" "),a("li",[e._v("Have a web UI that reads the metadata and renders the DAG.")])]),e._v(" "),a("p",[e._v("If you would like to dive deeper and discover all about representing data processing flows as DAG inside of a Data Package, you can watch Keith Hughitt’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/pDpAuyTCvF0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("If you find this idea interesting, come and talk to Keith on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v("! He would love to hear what you think and if you have other ideas in mind.")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("We are part of the organisation of the "),a("a",{attrs:{href:"https://fosdem.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FOSDEM"),a("OutboundLink")],1),e._v(" Thematic Track "),a("em",[e._v("Open Research Tools & Technologies")]),e._v(" this year too. We would love to have someone from the Frictionless community giving a talk. The deadline has been extended and you have time until December 23"),a("sup",[e._v("rd")]),e._v(" to submit a talk proposal! More info at "),a("a",{attrs:{href:"https://fosdem.org/2022/news/2021-11-02-devroom-cfp/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this page"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is next year, on January 21"),a("sup",[e._v("st")]),e._v(". Francisco Alves, from the DPCKAN team who won the Frictionless Data hackathon back in October, is going to present their prototype and how it evolved.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/FaWixB29SUA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[149],{679:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On the last Frictionless Data community call of the year, on December 16"),a("sup",[e._v("th")]),e._v(", we had Keith Hughitt from the National Cancer Institute (NCI) sharing (and demoing) his ideas around representing data processing flows as a DAG (Directed Acyclic Graph) inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.")]),e._v(" "),a("p",[e._v("Keith started thinking about this when he realised that cleaning and processing data are not obvious processes, on the contrary, there is a lot of bias in them. The decisions made to clean the raw data are not generally included in the publications and are not made available in any transparent way. To allow collaboration and reproducibility, Keith thought of embedding and annotated data provenance DAG in a datapackage.json using the Frictionless specs.")]),e._v(" "),a("p",[e._v("The basic process Keith has in mind to solve this problem is:")]),e._v(" "),a("ul",[a("li",[e._v("The data provenance is encoded as a DAG in the metadata")]),e._v(" "),a("li",[e._v("For each step in processing the workflow, the previous DAG is copied and extended")]),e._v(" "),a("li",[e._v("Each node of the DAG represents a dataset at a particular stage of processing, and it can be associated with annotations, views")]),e._v(" "),a("li",[e._v("Datapackages would be generated and associated with each node")]),e._v(" "),a("li",[e._v("Have a web UI that reads the metadata and renders the DAG.")])]),e._v(" "),a("p",[e._v("If you would like to dive deeper and discover all about representing data processing flows as DAG inside of a Data Package, you can watch Keith Hughitt’s presentation here:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/pDpAuyTCvF0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("If you find this idea interesting, come and talk to Keith on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v("! He would love to hear what you think and if you have other ideas in mind.")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("We are part of the organisation of the "),a("a",{attrs:{href:"https://fosdem.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FOSDEM"),a("OutboundLink")],1),e._v(" Thematic Track "),a("em",[e._v("Open Research Tools & Technologies")]),e._v(" this year too. We would love to have someone from the Frictionless community giving a talk. The deadline has been extended and you have time until December 23"),a("sup",[e._v("rd")]),e._v(" to submit a talk proposal! More info at "),a("a",{attrs:{href:"https://fosdem.org/2022/news/2021-11-02-devroom-cfp/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this page"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is next year, on January 21"),a("sup",[e._v("st")]),e._v(". Francisco Alves, from the DPCKAN team who won the Frictionless Data hackathon back in October, is going to present their prototype and how it evolved.")]),e._v(" "),a("p",[e._v("You can sign up "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/FaWixB29SUA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/150.fdc13174.js b/assets/js/150.39300daf.js similarity index 98% rename from assets/js/150.fdc13174.js rename to assets/js/150.39300daf.js index c6a7bc920..fbf5ddde3 100644 --- a/assets/js/150.fdc13174.js +++ b/assets/js/150.39300daf.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[150],{682:function(e,t,a){"use strict";a.r(t);var n=a(29),o=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Originally published: "),a("a",{attrs:{href:"https://blog.okfn.org/2022/01/10/frictionless-planet-save-the-date/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2022/01/10/frictionless-planet-save-the-date/"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("We believe that an ecosystem of organisations combining tools, techniques and strategies to transform datasets relevant to the climate crisis into applied knowledge and actionable campaigns can get us closer to the Paris agreement goals. Today, scientists, academics and activists are working against the clock to save us from the greatest catastrophe of our times. But they are doing so under-resourced, siloed and disconnected. Sometimes even facing physical threats or achieving very local, isolated impact. We want to reverse that by activating a cross-sectoral sharing process of tools, techniques and technologies to open the data and unleash the power of knowledge to fight against climate change. We already started with the Frictionless Data process – collaborating with researcher groups to "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/09/16/goodtables-bcodmo/",target:"_blank",rel:"noopener noreferrer"}},[e._v("better manage ocean research data"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/03/18/frictionless-data-pilot-study/",target:"_blank",rel:"noopener noreferrer"}},[e._v("openly publish cleaned, integrated energy data"),a("OutboundLink")],1),e._v(" – and we want to expand an action-oriented alliance leading to cross regional, cross sectoral, sustainable collaboration. We need to use the best tools and the best minds of our times to fight the problems of our times.")]),e._v(" "),a("p",[e._v("We consider you-your organisation- as leading thinkers-doers-communicators leveraging technology and creativity in a unique way, with the potential to lead to meaningful change and we would love to invite you to an initial brainstorming session as we think of common efforts, a sustainability path and a road of action to work the next three years and beyond.")]),e._v(" "),a("p",[e._v("What will we do together during this brainstorming session? Our overarching goal is to make open climate data more useful. To that end, during this initial session, we will conceptualise ways of cleaning and standardising open climate data, creating more reproducible and efficient methods of consuming and analysing that data, and focus on ways to put this data into the hands of those that can truly drive change.")]),e._v(" "),a("h1",{attrs:{id:"what-to-bring"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-to-bring"}},[e._v("#")]),e._v(" WHAT TO BRING?")]),e._v(" "),a("ul",[a("li",[e._v("An effort-idea that is effective and you feel proud of at the intersection of digital and climate change.")]),e._v(" "),a("li",[e._v("A data problem you are struggling with.")]),e._v(" "),a("li",[e._v("Your best post-holidays smile.")])]),e._v(" "),a("h1",{attrs:{id:"when"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#when"}},[e._v("#")]),e._v(" When?")]),e._v(" "),a("p",[e._v("13:30 GMT – 20 January – Registration open "),a("a",{attrs:{href:"https://www.eventbrite.co.uk/e/frictionless-planet-tickets-242708286017",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". "),a("strong",[e._v("SOLD OUT")])]),e._v(" "),a("p",[e._v("20:30 GMT – 21 January – Registration open "),a("a",{attrs:{href:"https://www.eventbrite.co.uk/e/frictionless-planet-tickets-242807803677",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Limited slots, 25 attendees per session.")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[150],{683:function(e,t,a){"use strict";a.r(t);var n=a(29),o=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Originally published: "),a("a",{attrs:{href:"https://blog.okfn.org/2022/01/10/frictionless-planet-save-the-date/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2022/01/10/frictionless-planet-save-the-date/"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("We believe that an ecosystem of organisations combining tools, techniques and strategies to transform datasets relevant to the climate crisis into applied knowledge and actionable campaigns can get us closer to the Paris agreement goals. Today, scientists, academics and activists are working against the clock to save us from the greatest catastrophe of our times. But they are doing so under-resourced, siloed and disconnected. Sometimes even facing physical threats or achieving very local, isolated impact. We want to reverse that by activating a cross-sectoral sharing process of tools, techniques and technologies to open the data and unleash the power of knowledge to fight against climate change. We already started with the Frictionless Data process – collaborating with researcher groups to "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/09/16/goodtables-bcodmo/",target:"_blank",rel:"noopener noreferrer"}},[e._v("better manage ocean research data"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/03/18/frictionless-data-pilot-study/",target:"_blank",rel:"noopener noreferrer"}},[e._v("openly publish cleaned, integrated energy data"),a("OutboundLink")],1),e._v(" – and we want to expand an action-oriented alliance leading to cross regional, cross sectoral, sustainable collaboration. We need to use the best tools and the best minds of our times to fight the problems of our times.")]),e._v(" "),a("p",[e._v("We consider you-your organisation- as leading thinkers-doers-communicators leveraging technology and creativity in a unique way, with the potential to lead to meaningful change and we would love to invite you to an initial brainstorming session as we think of common efforts, a sustainability path and a road of action to work the next three years and beyond.")]),e._v(" "),a("p",[e._v("What will we do together during this brainstorming session? Our overarching goal is to make open climate data more useful. To that end, during this initial session, we will conceptualise ways of cleaning and standardising open climate data, creating more reproducible and efficient methods of consuming and analysing that data, and focus on ways to put this data into the hands of those that can truly drive change.")]),e._v(" "),a("h1",{attrs:{id:"what-to-bring"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-to-bring"}},[e._v("#")]),e._v(" WHAT TO BRING?")]),e._v(" "),a("ul",[a("li",[e._v("An effort-idea that is effective and you feel proud of at the intersection of digital and climate change.")]),e._v(" "),a("li",[e._v("A data problem you are struggling with.")]),e._v(" "),a("li",[e._v("Your best post-holidays smile.")])]),e._v(" "),a("h1",{attrs:{id:"when"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#when"}},[e._v("#")]),e._v(" When?")]),e._v(" "),a("p",[e._v("13:30 GMT – 20 January – Registration open "),a("a",{attrs:{href:"https://www.eventbrite.co.uk/e/frictionless-planet-tickets-242708286017",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". "),a("strong",[e._v("SOLD OUT")])]),e._v(" "),a("p",[e._v("20:30 GMT – 21 January – Registration open "),a("a",{attrs:{href:"https://www.eventbrite.co.uk/e/frictionless-planet-tickets-242807803677",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Limited slots, 25 attendees per session.")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/151.7788560d.js b/assets/js/151.6dc937ef.js similarity index 98% rename from assets/js/151.7788560d.js rename to assets/js/151.6dc937ef.js index b8a06e798..59e2249da 100644 --- a/assets/js/151.7788560d.js +++ b/assets/js/151.6dc937ef.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[151],{686:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On January 27"),a("sup",[e._v("th")]),e._v(", for the first Frictionless Data community call of the year, we heard a presentation on the Data Package Manager for CKAN (DPCKAN) from Francisco Alves - leader of the proactive transparency policy in the Brazilian State of Minas Gerais.")]),e._v(" "),a("p",[e._v("You may remember Francisco and DPCKAN from the "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/10/13/hackathon-wrap/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Hackathon"),a("OutboundLink")],1),e._v(" back in October 2021, where his team won the hack with this very project.")]),e._v(" "),a("h2",{attrs:{id:"so-what-is-dpckan"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#so-what-is-dpckan"}},[e._v("#")]),e._v(" So what is DPCKAN?")]),e._v(" "),a("p",[e._v("It all started with the will to publish all the raw data on the Fiscal Transparency portal of the State of Minas Gereis, which is built on a "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(" instance, as open data following the Frictionless standards.")]),e._v(" "),a("p",[e._v("Francisco and his team wanted to install a data package, and be able to work with it locally. They also wanted to have the ability to partially update a dataset already uploaded in CKAN without overwriting it (this particular feature was developed during the Hackathon). That’s how the Data Package Manager was born. It is now in active development.")]),e._v(" "),a("h2",{attrs:{id:"and-what-s-next"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#and-what-s-next"}},[e._v("#")]),e._v(" And what’s next?")]),e._v(" "),a("p",[e._v("Francisco and his team would like to:")]),e._v(" "),a("ul",[a("li",[e._v("Make it possible to read a data package directly from CKAN,")]),e._v(" "),a("li",[e._v("Make CKAN Datastore respect the Frictionless table schema types")]),e._v(" "),a("li",[e._v("Have human readable metadata visualisation")]),e._v(" "),a("li",[e._v("Contribute back upstream to Frictionless Data, CKAN, etc.")])]),e._v(" "),a("p",[e._v("Franscisco also gave a quick demo of what the DPCKAN looks like. You can watch the full presentation (including the demo):")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/1W786q76H98",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("If you are interested in DPCKAN, come and talk to Francisco on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v("! You can also check out the presentation slides in "),a("a",{attrs:{href:"https://github.com/dados-mg/frictionless-hangout-jan2022",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub repository"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("This year as well, we are helping organise the "),a("a",{attrs:{href:"https://fosdem.org/2022/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FOSDEM"),a("OutboundLink")],1),e._v(" Thematic Track "),a("em",[e._v("Open Research Tools & Technologies")]),e._v("."),a("br"),e._v("\nJoin us on February 5th! Among the many interesting talks, you will have the opportunity to catch senior developer Evgeny Karev presenting the newest Frictionless tool: "),a("a",{attrs:{href:"https://fosdem.org/2022/schedule/event/open_research_livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Livemark"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nHave a look at "),a("a",{attrs:{href:"https://fosdem.org/2022/schedule/track/open_research_tools_and_technologies/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the programme"),a("OutboundLink")],1),e._v(". The event is free of charge and there is no need to register. You can just log in the talks that you like.")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is next year, on February 24"),a("sup",[e._v("th")]),e._v(". We don’t have a presentation scheduled yet, so if you have a project that you would like to present to the community, this could be your chance! Email us if you have something in mind: "),a("a",{attrs:{href:"mailto:sara.petti@okfn.org"}},[e._v("sara.petti@okfn.org")]),e._v(".")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/4YZD0jmMOaU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[151],{682:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On January 27"),a("sup",[e._v("th")]),e._v(", for the first Frictionless Data community call of the year, we heard a presentation on the Data Package Manager for CKAN (DPCKAN) from Francisco Alves - leader of the proactive transparency policy in the Brazilian State of Minas Gerais.")]),e._v(" "),a("p",[e._v("You may remember Francisco and DPCKAN from the "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/10/13/hackathon-wrap/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Hackathon"),a("OutboundLink")],1),e._v(" back in October 2021, where his team won the hack with this very project.")]),e._v(" "),a("h2",{attrs:{id:"so-what-is-dpckan"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#so-what-is-dpckan"}},[e._v("#")]),e._v(" So what is DPCKAN?")]),e._v(" "),a("p",[e._v("It all started with the will to publish all the raw data on the Fiscal Transparency portal of the State of Minas Gereis, which is built on a "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(" instance, as open data following the Frictionless standards.")]),e._v(" "),a("p",[e._v("Francisco and his team wanted to install a data package, and be able to work with it locally. They also wanted to have the ability to partially update a dataset already uploaded in CKAN without overwriting it (this particular feature was developed during the Hackathon). That’s how the Data Package Manager was born. It is now in active development.")]),e._v(" "),a("h2",{attrs:{id:"and-what-s-next"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#and-what-s-next"}},[e._v("#")]),e._v(" And what’s next?")]),e._v(" "),a("p",[e._v("Francisco and his team would like to:")]),e._v(" "),a("ul",[a("li",[e._v("Make it possible to read a data package directly from CKAN,")]),e._v(" "),a("li",[e._v("Make CKAN Datastore respect the Frictionless table schema types")]),e._v(" "),a("li",[e._v("Have human readable metadata visualisation")]),e._v(" "),a("li",[e._v("Contribute back upstream to Frictionless Data, CKAN, etc.")])]),e._v(" "),a("p",[e._v("Franscisco also gave a quick demo of what the DPCKAN looks like. You can watch the full presentation (including the demo):")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/1W786q76H98",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("If you are interested in DPCKAN, come and talk to Francisco on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v("! You can also check out the presentation slides in "),a("a",{attrs:{href:"https://github.com/dados-mg/frictionless-hangout-jan2022",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub repository"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"other-agenda-items-from-our-hangout"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#other-agenda-items-from-our-hangout"}},[e._v("#")]),e._v(" Other agenda items from our hangout")]),e._v(" "),a("p",[e._v("This year as well, we are helping organise the "),a("a",{attrs:{href:"https://fosdem.org/2022/",target:"_blank",rel:"noopener noreferrer"}},[e._v("FOSDEM"),a("OutboundLink")],1),e._v(" Thematic Track "),a("em",[e._v("Open Research Tools & Technologies")]),e._v("."),a("br"),e._v("\nJoin us on February 5th! Among the many interesting talks, you will have the opportunity to catch senior developer Evgeny Karev presenting the newest Frictionless tool: "),a("a",{attrs:{href:"https://fosdem.org/2022/schedule/event/open_research_livemark/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Livemark"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nHave a look at "),a("a",{attrs:{href:"https://fosdem.org/2022/schedule/track/open_research_tools_and_technologies/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the programme"),a("OutboundLink")],1),e._v(". The event is free of charge and there is no need to register. You can just log in the talks that you like.")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is next year, on February 24"),a("sup",[e._v("th")]),e._v(". We don’t have a presentation scheduled yet, so if you have a project that you would like to present to the community, this could be your chance! Email us if you have something in mind: "),a("a",{attrs:{href:"mailto:sara.petti@okfn.org"}},[e._v("sara.petti@okfn.org")]),e._v(".")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/4YZD0jmMOaU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("As usual, you can join us on "),a("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/152.a7c792b3.js b/assets/js/152.ff415a3b.js similarity index 99% rename from assets/js/152.a7c792b3.js rename to assets/js/152.ff415a3b.js index 08cfc36cf..d2e9757a4 100644 --- a/assets/js/152.a7c792b3.js +++ b/assets/js/152.ff415a3b.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[152],{684:function(e,a,t){"use strict";t.r(a);var r=t(29),i=Object(r.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("I started the "),t("a",{attrs:{href:"https://www.librarieshacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Libraries Hacked"),t("OutboundLink")],1),e._v(" project in 2014. Inspired by ‘tech for good’ open data groups and hackathons, I wanted to explore how libraries could leverage data for innovation and service improvement. I had already been involved in the work of the group "),t("a",{attrs:{href:"https://www.bathhacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Bath Hacked"),t("OutboundLink")],1),e._v(", and worked at the local Council in Bath, releasing large amounts of open data that was well used by the community. That included data such as live car park occupancy, traffic surveys, and air quality monitoring.")]),e._v(" "),t("p",[e._v("Getting involved in civic data publishing led me to explore data software, tools, and standards. I’ve used the Frictionless standards of Table Schema and CSV Dialect, as well as the code libraries that can be utilised to implement these. Data standards are an essential tool for data publishers in order to make data easily usable and reproducible across different organisations.")]),e._v(" "),t("p",[e._v("Public library services in England are managed by 150 local government organisations. The central government department for Digital, Culture, Media, and Sport (DCMS) hold responsibility for superintending those services. In September 2019 they convened a meeting about public library data.")]),e._v(" "),t("p",[e._v("Library data, of many kinds, is not well utilised in England.")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Lack of public data")]),e._v(". There are relatively few library services sharing data about themselves for public use.")]),e._v(" "),t("li",[t("strong",[e._v("Low expectations")]),e._v(". There is no guidance on what data to share. Some services will publish certain datasets, but these will likely be different to the ones other publish.")]),e._v(" "),t("li",[t("strong",[e._v("Few standards")]),e._v(". The structure of any published data will be unique to each library service. For example, there are published lists of library branches from "),t("a",{attrs:{href:"https://www.opendatanottingham.org.uk/dataset.aspx?id=1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Nottinghamshire County Council"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://data.gov.uk/dataset/9342032d-ab88-462f-b31c-4fb07fd4da6f/libraries",target:"_blank",rel:"noopener noreferrer"}},[e._v("North Somerset Council"),t("OutboundLink")],1),e._v(". Both are out of date, and have different fields, field names, field types, and file formats.")])]),e._v(" "),t("p",[e._v("The meeting discussed these issues, amongst others. The problems are understood, but difficult to tackle, as no organisation has direct responsibility for library data. There are also difficult underlying causes - low skills and funding being two major ones.")]),e._v(" "),t("p",[e._v("Large scale culture change will take many years. But to begin some sector-led collaborative work, a group of the attendees agreed to define the fields for a core selection of library datasets. The project would involve data practitioners from across English library services.")]),e._v(" "),t("p",[e._v("The datasets would cover:")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Events")]),e._v(": the events that happen in libraries, their attendance, and outcomes")]),e._v(" "),t("li",[t("strong",[e._v("Library branches")]),e._v(": physical building locations, opening hours, and contact details")]),e._v(" "),t("li",[t("strong",[e._v("Loans")]),e._v(": the items lent from libraries, with counts, time periods, and categories")]),e._v(" "),t("li",[t("strong",[e._v("Stock")]),e._v(": the number of items held in libraries, with categories")]),e._v(" "),t("li",[t("strong",[e._v("Mobile library stops")]),e._v(": locations of mobile library stops, and their timetabled frequency")]),e._v(" "),t("li",[t("strong",[e._v("Physical visits")]),e._v(": how many people visit library premises")]),e._v(" "),t("li",[t("strong",[e._v("Membership")]),e._v(": counts of people who are library members, at small-area geographies.")])]),e._v(" "),t("p",[e._v("These can be split into 3 categories:")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Registers")]),e._v(". Data that should be updated when it changes. A list of library branches is a permanent register, to be updated when there are changes to those branches.")]),e._v(" "),t("li",[t("strong",[e._v("Snapshot")]),e._v(". Data that is released as a point in time representation. Library membership will be continually changing, but a snapshot of membership counts should be released at regular intervals.")]),e._v(" "),t("li",[t("strong",[e._v("Time-series")]),e._v(". Data that is new every time it is published. Loans data should be published at regular intervals, each published file being an addition to the existing set.")])]),e._v(" "),t("p",[e._v("To work on these, we held an in-person workshop at the DCMS offices. This featured an exciting interruption by a fire drill, and we had to relocate to a nearby café (difficult for a meeting with many people held in in London!). We also formed an online group using Slack to trial and discuss the data.")]),e._v(" "),t("h2",{attrs:{id:"schemas-and-frictionless-data"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#schemas-and-frictionless-data"}},[e._v("#")]),e._v(" Schemas and Frictionless Data")]),e._v(" "),t("p",[e._v("The majority of our discussions were practical rather than technical, such as what data would be most useful, whether or not it was currently used locally by services, and common problems.")]),e._v(" "),t("p",[e._v("However, to formalise how data should be structured, it became clear that it would be necessary to create technical 'data schemas’.")]),e._v(" "),t("p",[e._v("It can be easy to decide on the data you want, but fail to describe it properly. For example, we could provide people with a spreadsheet that included a column title such as ‘Closed date’. I’d expect people to enter a date in that column, but we’d end up with all kinds of formats.")]),e._v(" "),t("p",[e._v("The "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),t("OutboundLink")],1),e._v(" specification for defining data, from Frictionless Data, provided a good option for tackling this problem. Not only would it allow us to create a detailed description for the data fields, but we could use other frictionless tools such as "),t("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Good Tables"),t("OutboundLink")],1),e._v(". This would allow library services to validate their data before publishing. Things like mismatching date formats would be picked up by the validator, and it would give instructions for how to fix the issue. We would additionally also provide ‘human-readable’ guidance on the datasets.")]),e._v(" "),t("p",[e._v("Frictionless Data is an "),t("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),t("OutboundLink")],1),e._v(" project, and using tools from an internationally renowned body was also a good practice. The schemas are UK-centric but could be adapted and reused by international library services.")]),e._v(" "),t("p",[e._v("The schemas are all documented at "),t("a",{attrs:{href:"https://schema.librarydata.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Public Library Open Data"),t("OutboundLink")],1),e._v(", including guidance, links to sample data, and the technical definition files.")]),e._v(" "),t("h2",{attrs:{id:"lessons-learned"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#lessons-learned"}},[e._v("#")]),e._v(" Lessons learned")]),e._v(" "),t("p",[e._v("The initial datasets are not comprehensive. They are designed to be a starting point, allowing more to be developed from service requirements.")]),e._v(" "),t("p",[e._v("They are overly focussed towards ‘physical’ library services. It wasn’t long after these meetings that public libraries adjusted to provide all-digital services due to lockdowns. There is nothing here to cover valuable usage datasets like the video views that library services receive on YouTube and Facebook.")]),e._v(" "),t("p",[e._v("There are some that have become even more important. The physical visits schema describes how to structure library footfall data, allowing for differences in collection methods and intervals. This kind of data is now in high demand, to analyse how library service visits recover.")]),e._v(" "),t("p",[e._v("Some of the discussions we had were fascinating. It was important to involve the people who work with this data on a daily basis. They will know how easy it is to extract and manipulate, and many of the pitfalls that come with interpreting it.")]),e._v(" "),t("h3",{attrs:{id:"complexity"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#complexity"}},[e._v("#")]),e._v(" Complexity")]),e._v(" "),t("p",[e._v("There was often a battle between complexity and simplicity. Complex data is good, it often means it is more robust, such as using external identifiers. But simplicity is also good, for data publishers and consumers.")]),e._v(" "),t("p",[e._v("Public library services will primarily employ data workers who are not formally trained in using data. Where there are complex concepts (e.g. Table Schema itself), they are used because they make data publishing easier and more consistent.")]),e._v(" "),t("p",[e._v("Public data should also be made as accessible as possible for the public, while being detailed enough to be useful. In this way the data schemas tend towards simplicity.")]),e._v(" "),t("h3",{attrs:{id:"standards-not-standardisation"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#standards-not-standardisation"}},[e._v("#")]),e._v(" Standards not standardisation")]),e._v(" "),t("p",[e._v("There is a difference between a standard format for data, and standardised data. The schemas are primarily aimed at getting data from multiple services into the same format, to share analysis techniques between library services, and to have usable data when merged with other services.")]),e._v(" "),t("p",[e._v("There were some cases where we decided against standardising the actual data within data fields. For example, there is a column in the loans and the stock datasets called ‘Item type’. This is a category description of the library item, such as ‘Adult fiction’. In some other previous examples of data collection this data is standardised into a uniform set of categories, in order to make it easily comparable.")]),e._v(" "),t("p",[e._v("That kind of exercise defies reality though. Library services may have their own set of categories, many of them interesting and unique. To use a standard set would mean that library services would have to convert their underlying data. As well as extra work, it would be a loss of data. It would also mean that library services would be unlikely to use the converted data themselves. Why use such data if it doesn’t reflect what you actually hold?")]),e._v(" "),t("p",[e._v("The downside is that anyone analysing combined data would have to decide themselves how to compare data in those fields. However, that would be at least a clear task for the data analyst - and would most likely be an easier exercise to do in bulk.")]),e._v(" "),t("h3",{attrs:{id:"detail"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#detail"}},[e._v("#")]),e._v(" Detail")]),e._v(" "),t("p",[e._v("In my ideal world, data would be as detailed as possible. Instead of knowing how many items a library lent every month, I want that data for every hour. In fact I want to have every lending record! But feasibly that would make the data unwieldy and difficult to work with, and wouldn’t be in-line with the statistics libraries are used to.")]),e._v(" "),t("p",[e._v("We primarily made decisions based upon what library services already do. In a lot of cases this was data aggregated into monthly counts, with fields such as library branch and item type used to break down that data.")]),e._v(" "),t("h2",{attrs:{id:"the-future"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#the-future"}},[e._v("#")]),e._v(" The future")]),e._v(" "),t("p",[e._v("The initial meetings were held over two years ago, and it seems longer than that! A lot has happened in the meantime. We are still in a global pandemic that from library perspectives has de-prioritised anything other than core services.")]),e._v(" "),t("p",[e._v("However, there are good examples of the data in action. Barnet libraries "),t("a",{attrs:{href:"https://open.barnet.gov.uk/dataset/e14dj/library-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("publish 5 out of the 7 data schemas"),t("OutboundLink")],1),e._v(" on a regular basis.")]),e._v(" "),t("p",[e._v("I have also been creating tools that highlight how the data can be used such as "),t("a",{attrs:{href:"https://www.librarymap.co.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Library map"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://www.mobilelibraries.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mobile libraries"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("There is national work underway that can make use of these schemas. The British Library is working on a "),t("a",{attrs:{href:"https://www.artscouncil.org.uk/blog/single-digital-presence-libraries",target:"_blank",rel:"noopener noreferrer"}},[e._v("Single Digital Presence"),t("OutboundLink")],1),e._v(" project that will require data from library services in a standard form.")]),e._v(" "),t("p",[e._v("Internationally there are calls for more public library open data. The International Federation of Library Associations and Institutions (IFLA) has "),t("a",{attrs:{href:"https://www.ifla.org/news/ifla-releases-statement-on-open-library-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("released a statement on Open Library Data"),t("OutboundLink")],1),e._v(" calling for “governments to ensure, either directly or through supporting others, the collection and open publication of data about libraries and their use”. It would be great to work with organisations like IFLA to promote schemas that could be reused Internationally as well as for local services. There could also be the opportunity to use other Frictionless Data tools to aid in publishing data, such as "),t("a",{attrs:{href:"https://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Hopefully in the future there can be workshops, training events, and conferences that allow these data schemas to be discussed and further developed.")])])}),[],!1,null,null,null);a.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[152],{685:function(e,a,t){"use strict";t.r(a);var r=t(29),i=Object(r.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("I started the "),t("a",{attrs:{href:"https://www.librarieshacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Libraries Hacked"),t("OutboundLink")],1),e._v(" project in 2014. Inspired by ‘tech for good’ open data groups and hackathons, I wanted to explore how libraries could leverage data for innovation and service improvement. I had already been involved in the work of the group "),t("a",{attrs:{href:"https://www.bathhacked.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Bath Hacked"),t("OutboundLink")],1),e._v(", and worked at the local Council in Bath, releasing large amounts of open data that was well used by the community. That included data such as live car park occupancy, traffic surveys, and air quality monitoring.")]),e._v(" "),t("p",[e._v("Getting involved in civic data publishing led me to explore data software, tools, and standards. I’ve used the Frictionless standards of Table Schema and CSV Dialect, as well as the code libraries that can be utilised to implement these. Data standards are an essential tool for data publishers in order to make data easily usable and reproducible across different organisations.")]),e._v(" "),t("p",[e._v("Public library services in England are managed by 150 local government organisations. The central government department for Digital, Culture, Media, and Sport (DCMS) hold responsibility for superintending those services. In September 2019 they convened a meeting about public library data.")]),e._v(" "),t("p",[e._v("Library data, of many kinds, is not well utilised in England.")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Lack of public data")]),e._v(". There are relatively few library services sharing data about themselves for public use.")]),e._v(" "),t("li",[t("strong",[e._v("Low expectations")]),e._v(". There is no guidance on what data to share. Some services will publish certain datasets, but these will likely be different to the ones other publish.")]),e._v(" "),t("li",[t("strong",[e._v("Few standards")]),e._v(". The structure of any published data will be unique to each library service. For example, there are published lists of library branches from "),t("a",{attrs:{href:"https://www.opendatanottingham.org.uk/dataset.aspx?id=1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Nottinghamshire County Council"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://data.gov.uk/dataset/9342032d-ab88-462f-b31c-4fb07fd4da6f/libraries",target:"_blank",rel:"noopener noreferrer"}},[e._v("North Somerset Council"),t("OutboundLink")],1),e._v(". Both are out of date, and have different fields, field names, field types, and file formats.")])]),e._v(" "),t("p",[e._v("The meeting discussed these issues, amongst others. The problems are understood, but difficult to tackle, as no organisation has direct responsibility for library data. There are also difficult underlying causes - low skills and funding being two major ones.")]),e._v(" "),t("p",[e._v("Large scale culture change will take many years. But to begin some sector-led collaborative work, a group of the attendees agreed to define the fields for a core selection of library datasets. The project would involve data practitioners from across English library services.")]),e._v(" "),t("p",[e._v("The datasets would cover:")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Events")]),e._v(": the events that happen in libraries, their attendance, and outcomes")]),e._v(" "),t("li",[t("strong",[e._v("Library branches")]),e._v(": physical building locations, opening hours, and contact details")]),e._v(" "),t("li",[t("strong",[e._v("Loans")]),e._v(": the items lent from libraries, with counts, time periods, and categories")]),e._v(" "),t("li",[t("strong",[e._v("Stock")]),e._v(": the number of items held in libraries, with categories")]),e._v(" "),t("li",[t("strong",[e._v("Mobile library stops")]),e._v(": locations of mobile library stops, and their timetabled frequency")]),e._v(" "),t("li",[t("strong",[e._v("Physical visits")]),e._v(": how many people visit library premises")]),e._v(" "),t("li",[t("strong",[e._v("Membership")]),e._v(": counts of people who are library members, at small-area geographies.")])]),e._v(" "),t("p",[e._v("These can be split into 3 categories:")]),e._v(" "),t("ul",[t("li",[t("strong",[e._v("Registers")]),e._v(". Data that should be updated when it changes. A list of library branches is a permanent register, to be updated when there are changes to those branches.")]),e._v(" "),t("li",[t("strong",[e._v("Snapshot")]),e._v(". Data that is released as a point in time representation. Library membership will be continually changing, but a snapshot of membership counts should be released at regular intervals.")]),e._v(" "),t("li",[t("strong",[e._v("Time-series")]),e._v(". Data that is new every time it is published. Loans data should be published at regular intervals, each published file being an addition to the existing set.")])]),e._v(" "),t("p",[e._v("To work on these, we held an in-person workshop at the DCMS offices. This featured an exciting interruption by a fire drill, and we had to relocate to a nearby café (difficult for a meeting with many people held in in London!). We also formed an online group using Slack to trial and discuss the data.")]),e._v(" "),t("h2",{attrs:{id:"schemas-and-frictionless-data"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#schemas-and-frictionless-data"}},[e._v("#")]),e._v(" Schemas and Frictionless Data")]),e._v(" "),t("p",[e._v("The majority of our discussions were practical rather than technical, such as what data would be most useful, whether or not it was currently used locally by services, and common problems.")]),e._v(" "),t("p",[e._v("However, to formalise how data should be structured, it became clear that it would be necessary to create technical 'data schemas’.")]),e._v(" "),t("p",[e._v("It can be easy to decide on the data you want, but fail to describe it properly. For example, we could provide people with a spreadsheet that included a column title such as ‘Closed date’. I’d expect people to enter a date in that column, but we’d end up with all kinds of formats.")]),e._v(" "),t("p",[e._v("The "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),t("OutboundLink")],1),e._v(" specification for defining data, from Frictionless Data, provided a good option for tackling this problem. Not only would it allow us to create a detailed description for the data fields, but we could use other frictionless tools such as "),t("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Good Tables"),t("OutboundLink")],1),e._v(". This would allow library services to validate their data before publishing. Things like mismatching date formats would be picked up by the validator, and it would give instructions for how to fix the issue. We would additionally also provide ‘human-readable’ guidance on the datasets.")]),e._v(" "),t("p",[e._v("Frictionless Data is an "),t("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),t("OutboundLink")],1),e._v(" project, and using tools from an internationally renowned body was also a good practice. The schemas are UK-centric but could be adapted and reused by international library services.")]),e._v(" "),t("p",[e._v("The schemas are all documented at "),t("a",{attrs:{href:"https://schema.librarydata.uk/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Public Library Open Data"),t("OutboundLink")],1),e._v(", including guidance, links to sample data, and the technical definition files.")]),e._v(" "),t("h2",{attrs:{id:"lessons-learned"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#lessons-learned"}},[e._v("#")]),e._v(" Lessons learned")]),e._v(" "),t("p",[e._v("The initial datasets are not comprehensive. They are designed to be a starting point, allowing more to be developed from service requirements.")]),e._v(" "),t("p",[e._v("They are overly focussed towards ‘physical’ library services. It wasn’t long after these meetings that public libraries adjusted to provide all-digital services due to lockdowns. There is nothing here to cover valuable usage datasets like the video views that library services receive on YouTube and Facebook.")]),e._v(" "),t("p",[e._v("There are some that have become even more important. The physical visits schema describes how to structure library footfall data, allowing for differences in collection methods and intervals. This kind of data is now in high demand, to analyse how library service visits recover.")]),e._v(" "),t("p",[e._v("Some of the discussions we had were fascinating. It was important to involve the people who work with this data on a daily basis. They will know how easy it is to extract and manipulate, and many of the pitfalls that come with interpreting it.")]),e._v(" "),t("h3",{attrs:{id:"complexity"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#complexity"}},[e._v("#")]),e._v(" Complexity")]),e._v(" "),t("p",[e._v("There was often a battle between complexity and simplicity. Complex data is good, it often means it is more robust, such as using external identifiers. But simplicity is also good, for data publishers and consumers.")]),e._v(" "),t("p",[e._v("Public library services will primarily employ data workers who are not formally trained in using data. Where there are complex concepts (e.g. Table Schema itself), they are used because they make data publishing easier and more consistent.")]),e._v(" "),t("p",[e._v("Public data should also be made as accessible as possible for the public, while being detailed enough to be useful. In this way the data schemas tend towards simplicity.")]),e._v(" "),t("h3",{attrs:{id:"standards-not-standardisation"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#standards-not-standardisation"}},[e._v("#")]),e._v(" Standards not standardisation")]),e._v(" "),t("p",[e._v("There is a difference between a standard format for data, and standardised data. The schemas are primarily aimed at getting data from multiple services into the same format, to share analysis techniques between library services, and to have usable data when merged with other services.")]),e._v(" "),t("p",[e._v("There were some cases where we decided against standardising the actual data within data fields. For example, there is a column in the loans and the stock datasets called ‘Item type’. This is a category description of the library item, such as ‘Adult fiction’. In some other previous examples of data collection this data is standardised into a uniform set of categories, in order to make it easily comparable.")]),e._v(" "),t("p",[e._v("That kind of exercise defies reality though. Library services may have their own set of categories, many of them interesting and unique. To use a standard set would mean that library services would have to convert their underlying data. As well as extra work, it would be a loss of data. It would also mean that library services would be unlikely to use the converted data themselves. Why use such data if it doesn’t reflect what you actually hold?")]),e._v(" "),t("p",[e._v("The downside is that anyone analysing combined data would have to decide themselves how to compare data in those fields. However, that would be at least a clear task for the data analyst - and would most likely be an easier exercise to do in bulk.")]),e._v(" "),t("h3",{attrs:{id:"detail"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#detail"}},[e._v("#")]),e._v(" Detail")]),e._v(" "),t("p",[e._v("In my ideal world, data would be as detailed as possible. Instead of knowing how many items a library lent every month, I want that data for every hour. In fact I want to have every lending record! But feasibly that would make the data unwieldy and difficult to work with, and wouldn’t be in-line with the statistics libraries are used to.")]),e._v(" "),t("p",[e._v("We primarily made decisions based upon what library services already do. In a lot of cases this was data aggregated into monthly counts, with fields such as library branch and item type used to break down that data.")]),e._v(" "),t("h2",{attrs:{id:"the-future"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#the-future"}},[e._v("#")]),e._v(" The future")]),e._v(" "),t("p",[e._v("The initial meetings were held over two years ago, and it seems longer than that! A lot has happened in the meantime. We are still in a global pandemic that from library perspectives has de-prioritised anything other than core services.")]),e._v(" "),t("p",[e._v("However, there are good examples of the data in action. Barnet libraries "),t("a",{attrs:{href:"https://open.barnet.gov.uk/dataset/e14dj/library-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("publish 5 out of the 7 data schemas"),t("OutboundLink")],1),e._v(" on a regular basis.")]),e._v(" "),t("p",[e._v("I have also been creating tools that highlight how the data can be used such as "),t("a",{attrs:{href:"https://www.librarymap.co.uk",target:"_blank",rel:"noopener noreferrer"}},[e._v("Library map"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://www.mobilelibraries.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mobile libraries"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("There is national work underway that can make use of these schemas. The British Library is working on a "),t("a",{attrs:{href:"https://www.artscouncil.org.uk/blog/single-digital-presence-libraries",target:"_blank",rel:"noopener noreferrer"}},[e._v("Single Digital Presence"),t("OutboundLink")],1),e._v(" project that will require data from library services in a standard form.")]),e._v(" "),t("p",[e._v("Internationally there are calls for more public library open data. The International Federation of Library Associations and Institutions (IFLA) has "),t("a",{attrs:{href:"https://www.ifla.org/news/ifla-releases-statement-on-open-library-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("released a statement on Open Library Data"),t("OutboundLink")],1),e._v(" calling for “governments to ensure, either directly or through supporting others, the collection and open publication of data about libraries and their use”. It would be great to work with organisations like IFLA to promote schemas that could be reused Internationally as well as for local services. There could also be the opportunity to use other Frictionless Data tools to aid in publishing data, such as "),t("a",{attrs:{href:"https://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Hopefully in the future there can be workshops, training events, and conferences that allow these data schemas to be discussed and further developed.")])])}),[],!1,null,null,null);a.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/153.0af4c05d.js b/assets/js/153.7798ed7e.js similarity index 98% rename from assets/js/153.0af4c05d.js rename to assets/js/153.7798ed7e.js index 43ca641d9..e3511dff8 100644 --- a/assets/js/153.0af4c05d.js +++ b/assets/js/153.7798ed7e.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[153],{685:function(e,t,a){"use strict";a.r(t);var r=a(29),s=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to announce that we responded to a "),a("a",{attrs:{href:"https://sam.gov/opp/869f4051df38475591fa48fce5b0868d/view",target:"_blank",rel:"noopener noreferrer"}},[e._v("request for information"),a("OutboundLink")],1),e._v(" that was recently published by NASA for its "),a("a",{attrs:{href:"https://science.nasa.gov/earth-science/earth-system-observatory",target:"_blank",rel:"noopener noreferrer"}},[e._v("Earth System Observatory (ESO)"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("What is ESO? It is a set of (mainly satellite) missions providing information on planet Earth, which can guide efforts related to climate change, natural hazard mitigation, fighting forest fires, and improving real-time agricultural processes.")]),e._v(" "),a("p",[e._v("With this request for information, ESO wants to gather expert advice on ways to find a more integrated approach to enhance data architecture efficiency and promote the open science principles.")]),e._v(" "),a("p",[a("strong",[e._v("We believe Frictionless Data would benefit the mission science data processing in several ways.")]),e._v(" Here’s how:")]),e._v(" "),a("p",[e._v("First, Frictionless automatically infers metadata and schemas from a data file, and allows users to edit that information. Creating good metadata is vital for downstream data users – if you can’t understand the data, you can’t use it (or can’t "),a("em",[e._v("easily")]),e._v(" use it). Similarly, having a data schema is useful for interoperability, promoting the usefulness of datasets.")]),e._v(" "),a("p",[e._v("The second Frictionless function we think will be helpful is data validation. Frictionless validates both the structure and content of a dataset, using built-in and custom checks. For instance, Frictionless will check for missing values, incorrect data types, or other constraints (e.g. temperature data points that exceed a certain threshold). If any errors are detected, Frictionless will generate a report for the user detailing the error so the user can fix the data during processing.")]),e._v(" "),a("p",[e._v("Finally, users can write reproducible data transformation pipelines with Frictionless. Writing declarative transform pipelines allows humans and machines to understand the data cleaning steps and repeat those processes if needed in the future. Collectively, these functions create well documented, high quality, clean data that can then be used in further downstream analysis.")]),e._v(" "),a("p",[e._v("We provided them with two examples of relevant collaboration:")]),e._v(" "),a("h3",{attrs:{id:"use-case-1"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#use-case-1"}},[e._v("#")]),e._v(" Use Case 1")]),e._v(" "),a("p",[e._v("The "),a("a",{attrs:{href:"https://www.bco-dmo.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Biological and Chemical Oceanography Data Management Office (BCO-DMO)"),a("OutboundLink")],1),e._v(" cleans and hosts a wide variety of open oceanography data sets for use by researchers. A main problem for them was data being submitted to them was messy and not standardized, and it was time consuming and difficult for their data managers to clean in a reproducible, documented way. They implemented Frictionless code to create a new data transformation pipeline that ingests the messy data, performs defined cleaning/transforming steps, documents those steps, and produces a cleaned, standardized dataset. It also produces a (human and machine-readable) document detailing all the transformation steps so that downstream users could understand what happened to the data and undo/repeat if necessary. This process not only helps data managers clean data faster and more efficiently, it also drives open science by making the hosted data more understandable and usable while preserving provenance.")]),e._v(" "),a("p",[e._v("More info on this use case "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"use-case-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#use-case-2"}},[e._v("#")]),e._v(" Use Case 2")]),e._v(" "),a("p",[a("a",{attrs:{href:"https://datadryad.org/stash",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad"),a("OutboundLink")],1),e._v(" is a biological data repository with a large user base. In our collaboration, their main issue was that they do not have the people-power to curate all the submitted datasets, so they implemented Frictionless tooling to help data submitters curate their data as they submit it. When data is submitted on the Dryad platform, Frictionless performs validation checks, and generates a report if any errors are found. The data submitter can then fix that error (e.g. there are no headers in row 1) and resubmit. Creating easy-to-understand error reports helps submitters understand how to create more useable, standardized data, and also frees up valuable time for the Dryad data management team. Ultimately, now the Dryad data repository hosts higher quality open science data.")]),e._v(" "),a("p",[e._v("More info on this use case "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/09/dryad-pilot/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("hr"),e._v(" "),a("p",[e._v("Are there other ways you think Frictionless Data could help the ESO project? Let us know!")]),e._v(" "),a("p",[a("em",[e._v("Image used: Antarctica Eclipsed. NASA image courtesy of the DSCOVR EPIC team. NASA Earth Observatory images by Joshua Stevens, using Landsat data from the U.S. Geological Survey. Story by Sara E. Pratt.")])])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[153],{684:function(e,t,a){"use strict";a.r(t);var r=a(29),s=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("We are very excited to announce that we responded to a "),a("a",{attrs:{href:"https://sam.gov/opp/869f4051df38475591fa48fce5b0868d/view",target:"_blank",rel:"noopener noreferrer"}},[e._v("request for information"),a("OutboundLink")],1),e._v(" that was recently published by NASA for its "),a("a",{attrs:{href:"https://science.nasa.gov/earth-science/earth-system-observatory",target:"_blank",rel:"noopener noreferrer"}},[e._v("Earth System Observatory (ESO)"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("What is ESO? It is a set of (mainly satellite) missions providing information on planet Earth, which can guide efforts related to climate change, natural hazard mitigation, fighting forest fires, and improving real-time agricultural processes.")]),e._v(" "),a("p",[e._v("With this request for information, ESO wants to gather expert advice on ways to find a more integrated approach to enhance data architecture efficiency and promote the open science principles.")]),e._v(" "),a("p",[a("strong",[e._v("We believe Frictionless Data would benefit the mission science data processing in several ways.")]),e._v(" Here’s how:")]),e._v(" "),a("p",[e._v("First, Frictionless automatically infers metadata and schemas from a data file, and allows users to edit that information. Creating good metadata is vital for downstream data users – if you can’t understand the data, you can’t use it (or can’t "),a("em",[e._v("easily")]),e._v(" use it). Similarly, having a data schema is useful for interoperability, promoting the usefulness of datasets.")]),e._v(" "),a("p",[e._v("The second Frictionless function we think will be helpful is data validation. Frictionless validates both the structure and content of a dataset, using built-in and custom checks. For instance, Frictionless will check for missing values, incorrect data types, or other constraints (e.g. temperature data points that exceed a certain threshold). If any errors are detected, Frictionless will generate a report for the user detailing the error so the user can fix the data during processing.")]),e._v(" "),a("p",[e._v("Finally, users can write reproducible data transformation pipelines with Frictionless. Writing declarative transform pipelines allows humans and machines to understand the data cleaning steps and repeat those processes if needed in the future. Collectively, these functions create well documented, high quality, clean data that can then be used in further downstream analysis.")]),e._v(" "),a("p",[e._v("We provided them with two examples of relevant collaboration:")]),e._v(" "),a("h3",{attrs:{id:"use-case-1"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#use-case-1"}},[e._v("#")]),e._v(" Use Case 1")]),e._v(" "),a("p",[e._v("The "),a("a",{attrs:{href:"https://www.bco-dmo.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Biological and Chemical Oceanography Data Management Office (BCO-DMO)"),a("OutboundLink")],1),e._v(" cleans and hosts a wide variety of open oceanography data sets for use by researchers. A main problem for them was data being submitted to them was messy and not standardized, and it was time consuming and difficult for their data managers to clean in a reproducible, documented way. They implemented Frictionless code to create a new data transformation pipeline that ingests the messy data, performs defined cleaning/transforming steps, documents those steps, and produces a cleaned, standardized dataset. It also produces a (human and machine-readable) document detailing all the transformation steps so that downstream users could understand what happened to the data and undo/repeat if necessary. This process not only helps data managers clean data faster and more efficiently, it also drives open science by making the hosted data more understandable and usable while preserving provenance.")]),e._v(" "),a("p",[e._v("More info on this use case "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h3",{attrs:{id:"use-case-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#use-case-2"}},[e._v("#")]),e._v(" Use Case 2")]),e._v(" "),a("p",[a("a",{attrs:{href:"https://datadryad.org/stash",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dryad"),a("OutboundLink")],1),e._v(" is a biological data repository with a large user base. In our collaboration, their main issue was that they do not have the people-power to curate all the submitted datasets, so they implemented Frictionless tooling to help data submitters curate their data as they submit it. When data is submitted on the Dryad platform, Frictionless performs validation checks, and generates a report if any errors are found. The data submitter can then fix that error (e.g. there are no headers in row 1) and resubmit. Creating easy-to-understand error reports helps submitters understand how to create more useable, standardized data, and also frees up valuable time for the Dryad data management team. Ultimately, now the Dryad data repository hosts higher quality open science data.")]),e._v(" "),a("p",[e._v("More info on this use case "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/08/09/dryad-pilot/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("hr"),e._v(" "),a("p",[e._v("Are there other ways you think Frictionless Data could help the ESO project? Let us know!")]),e._v(" "),a("p",[a("em",[e._v("Image used: Antarctica Eclipsed. NASA image courtesy of the DSCOVR EPIC team. NASA Earth Observatory images by Joshua Stevens, using Landsat data from the U.S. Geological Survey. Story by Sara E. Pratt.")])])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/154.f147514e.js b/assets/js/154.4f5652f9.js similarity index 98% rename from assets/js/154.f147514e.js rename to assets/js/154.4f5652f9.js index 21c864504..d548a6254 100644 --- a/assets/js/154.f147514e.js +++ b/assets/js/154.4f5652f9.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[154],{687:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our second community call of the year, on February 24"),r("sup",[e._v("th")]),e._v(", we had Ilya Kreymer and Ed Summers from "),r("a",{attrs:{href:"https://webrecorder.net/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Webrecorder"),r("OutboundLink")],1),e._v(" updating us on their effort in standardising the WAZC format (which they discussed with us already when it was still at an early development stage, in the community call of December 2020 (you can read the blog "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/12/17/december-virtual-hangout/#a-recap-from-our-december-community-call",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Webrecorder is a suite of open source tools and packages to capture interactive websites and replay them at a later time as accurately as possible. They created the WACZ format to have a portable format for archived web content that can be distributed and contain additional useful metadata about the web archives, using the Frictionless Data Package standard.")]),e._v(" "),r("p",[e._v("Ed & Ilya also hoped to discuss with the community the possibility of signing these Data Packages, in order to provide an optional mechanism to make web archives bundled in WACZ more trusted, because a cryptographic proof of who the author of a Data Package is might be interesting for other projects as well. Unfortunately the call was rather empty. Maybe it was because of the change of time, but in case there are other reasons why you did not come, please let us know (dropping an email at "),r("a",{attrs:{href:"mailto:sara.petti@okfn.org"}},[e._v("sara.petti@okfn.org")]),e._v(" or with a direct message on Discord/Matrix).")]),e._v(" "),r("p",[e._v("We did record the call though, so in case anyone is interested in having that discussion, we could always try to have it asynchronously on "),r("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/TIyOTEyAu7k",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("Their current proposal to create signed WACZ packages is summarised in "),r("a",{attrs:{href:"https://github.com/webrecorder/wacz-auth-spec/blob/main/spec.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("on GitHub"),r("OutboundLink")],1),e._v(", so you can always reach out to them there as well.")]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Next community call is on March 31"),r("sup",[e._v("st")]),e._v(". We are going to hear from Johan Richer from Multi, who is going to present the latest prototype of Etalab and his theory of "),r("a",{attrs:{href:"https://jailbreak.gitlab.io/investigation-catalogue/synthese.html#/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal vs catalogue"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("You can sign up for the call already "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/ukxLQCdyndc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[154],{686:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On our second community call of the year, on February 24"),r("sup",[e._v("th")]),e._v(", we had Ilya Kreymer and Ed Summers from "),r("a",{attrs:{href:"https://webrecorder.net/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Webrecorder"),r("OutboundLink")],1),e._v(" updating us on their effort in standardising the WAZC format (which they discussed with us already when it was still at an early development stage, in the community call of December 2020 (you can read the blog "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/12/17/december-virtual-hangout/#a-recap-from-our-december-community-call",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Webrecorder is a suite of open source tools and packages to capture interactive websites and replay them at a later time as accurately as possible. They created the WACZ format to have a portable format for archived web content that can be distributed and contain additional useful metadata about the web archives, using the Frictionless Data Package standard.")]),e._v(" "),r("p",[e._v("Ed & Ilya also hoped to discuss with the community the possibility of signing these Data Packages, in order to provide an optional mechanism to make web archives bundled in WACZ more trusted, because a cryptographic proof of who the author of a Data Package is might be interesting for other projects as well. Unfortunately the call was rather empty. Maybe it was because of the change of time, but in case there are other reasons why you did not come, please let us know (dropping an email at "),r("a",{attrs:{href:"mailto:sara.petti@okfn.org"}},[e._v("sara.petti@okfn.org")]),e._v(" or with a direct message on Discord/Matrix).")]),e._v(" "),r("p",[e._v("We did record the call though, so in case anyone is interested in having that discussion, we could always try to have it asynchronously on "),r("a",{attrs:{href:"https://discord.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/TIyOTEyAu7k",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("Their current proposal to create signed WACZ packages is summarised in "),r("a",{attrs:{href:"https://github.com/webrecorder/wacz-auth-spec/blob/main/spec.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("on GitHub"),r("OutboundLink")],1),e._v(", so you can always reach out to them there as well.")]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Next community call is on March 31"),r("sup",[e._v("st")]),e._v(". We are going to hear from Johan Richer from Multi, who is going to present the latest prototype of Etalab and his theory of "),r("a",{attrs:{href:"https://jailbreak.gitlab.io/investigation-catalogue/synthese.html#/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal vs catalogue"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("You can sign up for the call already "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/ukxLQCdyndc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("As usual, you can join us on "),r("a",{attrs:{href:"https://discord.com/invite/j9DNFNw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/155.50f8aa63.js b/assets/js/155.bfba67f5.js similarity index 99% rename from assets/js/155.50f8aa63.js rename to assets/js/155.bfba67f5.js index b5477bb29..84c89cd7d 100644 --- a/assets/js/155.50f8aa63.js +++ b/assets/js/155.bfba67f5.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[155],{689:function(e,t,a){"use strict";a.r(t);var n=a(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("During these past tumultuous years, it has been striking to witness the role that information has played in furthering suffering: misinformation, lack of data transparency, and closed technology have worsened the pandemic, increased political strife, and hurt climate policy. Building on these observations, the team at Open Knowledge Foundation are refocusing our energies on how we can come together to empower people, communities, and organisations to create and use open knowledge to solve the most urgent issues of our time, including climate change, inequality, and access to knowledge . Undaunted by these substantial challenges, we entered 2022 with enthusiasm for finding ways to work together, starting with climate data.")]),e._v(" "),a("p",[e._v("To start this year fresh and inspired, we convened two gatherings of climate researchers, activists, and organisations to brainstorm ways to collaborate to make open climate data more usable, accessible, and impactful. Over 30 experts attended the two sessions, from organisations around the world, and we identified and discussed many problems in the climate data space. We confirmed our initial theory that many of us are working siloed and that combining skills, knowledge and networks can result in a powerful alliance across tech communities, data experts and climate crisis activists.")]),e._v(" "),a("p",[e._v("Now, we want to share with you some common themes from these sessions and ask: how can we work together to solve these pressing climate issues?")]),e._v(" "),a("p",[e._v("A primary concern of attendees was "),a("strong",[e._v("the disconnect between how (and why) data is produced and how data can (and should) be used")]),e._v(". This disconnect shows up as frictions for data use: we know that much existing “open” data isn’t actually usable. During the call, many participants mentioned they frequently can’t find open data, and even when they can find it, they can’t easily access it. Even when they can access the data, they often can’t easily use it.")]),e._v(" "),a("p",[e._v("So why is it so hard to find, access, and use climate data? First, climate data is not particularly well standardised or curated, and data creators need better training in data management best practices. Another issue is that many climate data users don’t have technical training or knowledge required to clean messy data, greatly slowing down their research or policy work.")]),e._v(" "),a("h3",{attrs:{id:"how-will-the-open-knowledge-foundation-fix-the-identified-problems-skills-standards-and-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-will-the-open-knowledge-foundation-fix-the-identified-problems-skills-standards-and-community"}},[e._v("#")]),e._v(" How will the Open Knowledge Foundation fix the identified problems? Skills, standards and community.")]),e._v(" "),a("p",[e._v("An aim for this work will be to bridge the gaps between data creators and users. We plan to host several workshops in the future to work with both these groups, focusing on identifying both skills gaps and data gaps, then working towards capacity building.")]),e._v(" "),a("p",[e._v("Our goal with capacity building will be to give a data platform to those most affected by climate change. How do we make it easier for less technical or newer data users to effectively use climate data? Our future workshops will focus on training data creators and users with the "),a("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Frictionless Data tooling"),a("OutboundLink")],1),e._v(" to better manage data, create higher quality data, and share data in impactful ways that will empower trained researchers and activists alike. For instance, the Frictionless toolbox can help data creators generate clean data that is easy to understand, share, and use, and the new Frictionless tool Livemark can help data consumers easily share climate data with impactful visualisations and narratives.")]),e._v(" "),a("p",[e._v("Another theme that emerged from the brainstorm sessions was the role data plays in generating knowledge versus the role knowledge plays in generating data, and how this interplay can be maximised to create change. For instance, "),a("strong",[e._v("we need to take a hard look at how “open” replicates cycles of inequalities")]),e._v(". Several people brought up the great work citizen scientists are doing for climate research, but how these efforts are rarely recognised by governments or other official research channels. So much vital data on local impacts of climate change are being lost as they aren’t being incorporated into official datasets. How do we make data more equitable, ensuring that those being most affected by climate change can use data to tell their stories?")]),e._v(" "),a("p",[e._v("We call on data organisations, climate researchers, and activists to join us in these efforts. How can we best work together to solve pressing climate change issues? Would you like to partner with us for workshops, or do you have other ideas for collaborations? Let us know! We would like to give our utmost thanks to the organisations that joined our brainstorming sessions for paving the way in this important work. To continue planning this work, we are creating a space to talk in our Frictionless Data community chat, and we invite all interested parties to join us. We are currently migrating our community from Discord to Slack. We encourage you to join the Slack channel, which will soon be populated with all Frictionless community members: "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-14x9bxnkm-2y~uQcmmrqarSP2kV39_Kg",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://join.slack.com/t/frictionlessdata/shared_invite/zt-14x9bxnkm-2y~uQcmmrqarSP2kV39_Kg"),a("OutboundLink")],1),a("br"),e._v("\n(We also have a Matrix mirror if you prefer Matrix: "),a("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://matrix.to/#/#frictionless-data:matrix.org"),a("OutboundLink")],1),e._v(")")]),e._v(" "),a("p",[e._v("Finally, we’d like to share this list of resources that attendees shared during the calls:")]),e._v(" "),a("ul",[a("li",[e._v("Patrick J McGovern Data for Climate 2022 Accelerator: "),a("a",{attrs:{href:"https://www.mcgovern.org/foundation-awards-4-5m-including-new-accelerator-grants-to-advance-data-driven-climate-solutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.mcgovern.org/foundation-awards-4-5m-including-new-accelerator-grants-to-advance-data-driven-climate-solutions/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Open Climate: "),a("a",{attrs:{href:"https://www.appropedia.org/OpenClimate",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.appropedia.org/OpenClimate"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Environmental Data and Governance Initiative: "),a("a",{attrs:{href:"https://envirodatagov.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://envirodatagov.org/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Earth Science Information Partners: "),a("a",{attrs:{href:"https://www.esipfed.org/about",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.esipfed.org/about"),a("OutboundLink")],1),a("br"),e._v("\nCourse on environmental data journalism by School of Data Brazil: "),a("a",{attrs:{href:"https://escoladedados.org/courses/jornalismo-de-dados-ambientais/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://escoladedados.org/courses/jornalismo-de-dados-ambientais/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Catalogue of environmental databases in Brazil by School of Data Brazil: "),a("a",{attrs:{href:"https://bit.ly/dados-ambientais",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://bit.ly/dados-ambientais"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("A monthly meetup for small companies to share best practices (and data): "),a("a",{attrs:{href:"https://climatiq.io/blog/climate-action-net-zero-ambition-best-practices-for-sme",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://climatiq.io/blog/climate-action-net-zero-ambition-best-practices-for-sme"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Reddit Datasets: "),a("a",{attrs:{href:"https://www.reddit.com/r/datasets/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.reddit.com/r/datasets/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Hardware information standard: "),a("a",{attrs:{href:"https://barbal.co/the-open-know-how-manifest-specification-version-1-0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://barbal.co/the-open-know-how-manifest-specification-version-1-0/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Catalyst Cooperative: "),a("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://zenodo.org/communities/catalyst-cooperative/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://zenodo.org/communities/catalyst-cooperative/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Research Data Alliance Agriculture: "),a("a",{attrs:{href:"https://www.rd-alliance.org/rda-disciplines/rda-and-agriculture",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.rd-alliance.org/rda-disciplines/rda-and-agriculture"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Open Climate Now!: "),a("a",{attrs:{href:"https://branch.climateaction.tech/issues/issue-2/open-climate-now/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://branch.climateaction.tech/issues/issue-2/open-climate-now/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Metadata Game Changers: "),a("a",{attrs:{href:"https://metadatagamechangers.com",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://metadatagamechangers.com"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Excellent lecture by J McGlade bridging attitudes etc. to the data story and behaviour change effects: "),a("a",{attrs:{href:"https://www.youtube.com/watch?v=eIRlLlrnmBM&t=1561s",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.youtube.com/watch?v=eIRlLlrnmBM&t=1561s"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("The Integrated-Assessment Modeling Community (IAMC) is developing a Python package “pyam” for scenario analysis & data visualization: "),a("a",{attrs:{href:"https://pyam-iamc.readthedocs.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://pyam-iamc.readthedocs.io"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("IIASA is hosting numerous scenario ensemble databases, see "),a("a",{attrs:{href:"https://data.ece.iiasa.ac.at",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://data.ece.iiasa.ac.at"),a("OutboundLink")],1),e._v(", most importantly the scenario ensemble supporting the quantitative assessment in the IPCC 1.5°C Special Report (2018), and a similar database will be released in two months together with IPCC AR6 WG3")]),e._v(" "),a("li",[e._v("Letter to IEA by the openmod community, "),a("a",{attrs:{href:"https://forum.openmod.org/t/open-letter-to-iea-and-member-countries-requesting-open-data/2949",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://forum.openmod.org/t/open-letter-to-iea-and-member-countries-requesting-open-data/2949"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[155],{687:function(e,t,a){"use strict";a.r(t);var n=a(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("During these past tumultuous years, it has been striking to witness the role that information has played in furthering suffering: misinformation, lack of data transparency, and closed technology have worsened the pandemic, increased political strife, and hurt climate policy. Building on these observations, the team at Open Knowledge Foundation are refocusing our energies on how we can come together to empower people, communities, and organisations to create and use open knowledge to solve the most urgent issues of our time, including climate change, inequality, and access to knowledge . Undaunted by these substantial challenges, we entered 2022 with enthusiasm for finding ways to work together, starting with climate data.")]),e._v(" "),a("p",[e._v("To start this year fresh and inspired, we convened two gatherings of climate researchers, activists, and organisations to brainstorm ways to collaborate to make open climate data more usable, accessible, and impactful. Over 30 experts attended the two sessions, from organisations around the world, and we identified and discussed many problems in the climate data space. We confirmed our initial theory that many of us are working siloed and that combining skills, knowledge and networks can result in a powerful alliance across tech communities, data experts and climate crisis activists.")]),e._v(" "),a("p",[e._v("Now, we want to share with you some common themes from these sessions and ask: how can we work together to solve these pressing climate issues?")]),e._v(" "),a("p",[e._v("A primary concern of attendees was "),a("strong",[e._v("the disconnect between how (and why) data is produced and how data can (and should) be used")]),e._v(". This disconnect shows up as frictions for data use: we know that much existing “open” data isn’t actually usable. During the call, many participants mentioned they frequently can’t find open data, and even when they can find it, they can’t easily access it. Even when they can access the data, they often can’t easily use it.")]),e._v(" "),a("p",[e._v("So why is it so hard to find, access, and use climate data? First, climate data is not particularly well standardised or curated, and data creators need better training in data management best practices. Another issue is that many climate data users don’t have technical training or knowledge required to clean messy data, greatly slowing down their research or policy work.")]),e._v(" "),a("h3",{attrs:{id:"how-will-the-open-knowledge-foundation-fix-the-identified-problems-skills-standards-and-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-will-the-open-knowledge-foundation-fix-the-identified-problems-skills-standards-and-community"}},[e._v("#")]),e._v(" How will the Open Knowledge Foundation fix the identified problems? Skills, standards and community.")]),e._v(" "),a("p",[e._v("An aim for this work will be to bridge the gaps between data creators and users. We plan to host several workshops in the future to work with both these groups, focusing on identifying both skills gaps and data gaps, then working towards capacity building.")]),e._v(" "),a("p",[e._v("Our goal with capacity building will be to give a data platform to those most affected by climate change. How do we make it easier for less technical or newer data users to effectively use climate data? Our future workshops will focus on training data creators and users with the "),a("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Frictionless Data tooling"),a("OutboundLink")],1),e._v(" to better manage data, create higher quality data, and share data in impactful ways that will empower trained researchers and activists alike. For instance, the Frictionless toolbox can help data creators generate clean data that is easy to understand, share, and use, and the new Frictionless tool Livemark can help data consumers easily share climate data with impactful visualisations and narratives.")]),e._v(" "),a("p",[e._v("Another theme that emerged from the brainstorm sessions was the role data plays in generating knowledge versus the role knowledge plays in generating data, and how this interplay can be maximised to create change. For instance, "),a("strong",[e._v("we need to take a hard look at how “open” replicates cycles of inequalities")]),e._v(". Several people brought up the great work citizen scientists are doing for climate research, but how these efforts are rarely recognised by governments or other official research channels. So much vital data on local impacts of climate change are being lost as they aren’t being incorporated into official datasets. How do we make data more equitable, ensuring that those being most affected by climate change can use data to tell their stories?")]),e._v(" "),a("p",[e._v("We call on data organisations, climate researchers, and activists to join us in these efforts. How can we best work together to solve pressing climate change issues? Would you like to partner with us for workshops, or do you have other ideas for collaborations? Let us know! We would like to give our utmost thanks to the organisations that joined our brainstorming sessions for paving the way in this important work. To continue planning this work, we are creating a space to talk in our Frictionless Data community chat, and we invite all interested parties to join us. We are currently migrating our community from Discord to Slack. We encourage you to join the Slack channel, which will soon be populated with all Frictionless community members: "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-14x9bxnkm-2y~uQcmmrqarSP2kV39_Kg",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://join.slack.com/t/frictionlessdata/shared_invite/zt-14x9bxnkm-2y~uQcmmrqarSP2kV39_Kg"),a("OutboundLink")],1),a("br"),e._v("\n(We also have a Matrix mirror if you prefer Matrix: "),a("a",{attrs:{href:"https://matrix.to/#/#frictionless-data:matrix.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://matrix.to/#/#frictionless-data:matrix.org"),a("OutboundLink")],1),e._v(")")]),e._v(" "),a("p",[e._v("Finally, we’d like to share this list of resources that attendees shared during the calls:")]),e._v(" "),a("ul",[a("li",[e._v("Patrick J McGovern Data for Climate 2022 Accelerator: "),a("a",{attrs:{href:"https://www.mcgovern.org/foundation-awards-4-5m-including-new-accelerator-grants-to-advance-data-driven-climate-solutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.mcgovern.org/foundation-awards-4-5m-including-new-accelerator-grants-to-advance-data-driven-climate-solutions/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Open Climate: "),a("a",{attrs:{href:"https://www.appropedia.org/OpenClimate",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.appropedia.org/OpenClimate"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Environmental Data and Governance Initiative: "),a("a",{attrs:{href:"https://envirodatagov.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://envirodatagov.org/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Earth Science Information Partners: "),a("a",{attrs:{href:"https://www.esipfed.org/about",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.esipfed.org/about"),a("OutboundLink")],1),a("br"),e._v("\nCourse on environmental data journalism by School of Data Brazil: "),a("a",{attrs:{href:"https://escoladedados.org/courses/jornalismo-de-dados-ambientais/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://escoladedados.org/courses/jornalismo-de-dados-ambientais/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Catalogue of environmental databases in Brazil by School of Data Brazil: "),a("a",{attrs:{href:"https://bit.ly/dados-ambientais",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://bit.ly/dados-ambientais"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("A monthly meetup for small companies to share best practices (and data): "),a("a",{attrs:{href:"https://climatiq.io/blog/climate-action-net-zero-ambition-best-practices-for-sme",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://climatiq.io/blog/climate-action-net-zero-ambition-best-practices-for-sme"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Reddit Datasets: "),a("a",{attrs:{href:"https://www.reddit.com/r/datasets/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.reddit.com/r/datasets/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Hardware information standard: "),a("a",{attrs:{href:"https://barbal.co/the-open-know-how-manifest-specification-version-1-0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://barbal.co/the-open-know-how-manifest-specification-version-1-0/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Catalyst Cooperative: "),a("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://zenodo.org/communities/catalyst-cooperative/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://zenodo.org/communities/catalyst-cooperative/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Research Data Alliance Agriculture: "),a("a",{attrs:{href:"https://www.rd-alliance.org/rda-disciplines/rda-and-agriculture",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.rd-alliance.org/rda-disciplines/rda-and-agriculture"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Open Climate Now!: "),a("a",{attrs:{href:"https://branch.climateaction.tech/issues/issue-2/open-climate-now/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://branch.climateaction.tech/issues/issue-2/open-climate-now/"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Metadata Game Changers: "),a("a",{attrs:{href:"https://metadatagamechangers.com",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://metadatagamechangers.com"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("Excellent lecture by J McGlade bridging attitudes etc. to the data story and behaviour change effects: "),a("a",{attrs:{href:"https://www.youtube.com/watch?v=eIRlLlrnmBM&t=1561s",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.youtube.com/watch?v=eIRlLlrnmBM&t=1561s"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("The Integrated-Assessment Modeling Community (IAMC) is developing a Python package “pyam” for scenario analysis & data visualization: "),a("a",{attrs:{href:"https://pyam-iamc.readthedocs.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://pyam-iamc.readthedocs.io"),a("OutboundLink")],1)]),e._v(" "),a("li",[e._v("IIASA is hosting numerous scenario ensemble databases, see "),a("a",{attrs:{href:"https://data.ece.iiasa.ac.at",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://data.ece.iiasa.ac.at"),a("OutboundLink")],1),e._v(", most importantly the scenario ensemble supporting the quantitative assessment in the IPCC 1.5°C Special Report (2018), and a similar database will be released in two months together with IPCC AR6 WG3")]),e._v(" "),a("li",[e._v("Letter to IEA by the openmod community, "),a("a",{attrs:{href:"https://forum.openmod.org/t/open-letter-to-iea-and-member-countries-requesting-open-data/2949",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://forum.openmod.org/t/open-letter-to-iea-and-member-countries-requesting-open-data/2949"),a("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/156.8748bb31.js b/assets/js/156.8334f792.js similarity index 98% rename from assets/js/156.8748bb31.js rename to assets/js/156.8334f792.js index 341c3c642..630701e79 100644 --- a/assets/js/156.8748bb31.js +++ b/assets/js/156.8334f792.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[156],{688:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on March 31"),a("sup",[e._v("st")]),e._v(", we had a discussion with Johan Richer from "),a("a",{attrs:{href:"https://www.multi.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Multi"),a("OutboundLink")],1),e._v(" around his theory of portal vs catalogue.")]),e._v(" "),a("p",[e._v("The discussion started with a presentation of the latest catalogue prototype by "),a("a",{attrs:{href:"https://www.data.gouv.fr/fr/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Etalab"),a("OutboundLink")],1),e._v(" currently in development: "),a("a",{attrs:{href:"https://github.com/etalab/catalogage-donnees",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/etalab/catalogage-donnees"),a("OutboundLink")],1),e._v(". Data cataloguing has become a major component of open data policies in France, but there are issues related to the maintainability of the catalogue and the traceability of the data.")]),e._v(" "),a("p",[e._v("In the beginning the data producers were also the data publishers, and therefore the purpose of a portal was to catalogue, publish, and store the data. Recently the process became more complicated, and the cataloguing became a prerequisite to publication. Instead of publishing by default, data producers want to make sure that the data is clean before injecting it into the portal. This started a new workflow of internal data management, that the portals were not made for. So how can we restore the broken link between catalogue and portal? Johan thinks data lineage is key.")]),e._v(" "),a("p",[e._v("If you want to know more about it, you can go and have a look at Johan’s presentation "),a("a",{attrs:{href:"https://jailbreak.gitlab.io/investigation-catalogue/synthese.html#/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" (in French, but "),a("a",{attrs:{href:"https://jailbreak-gitlab-io.translate.goog/investigation-catalogue/synthese.html?_x_tr_sl=fr&_x_tr_tl=en#/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s a shortcut to the Google translation"),a("OutboundLink")],1),e._v(" if you’d rather have it in English), or watch the recording:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/MvrMJhn4xMo",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"news-from-the-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),a("p",[e._v("Our community chat has moved from Discord to Slack! In the community survey we ran last year, many people suggested moving to Slack, and the terms of services are definitely better (ranking B vs E for Discord, according to "),a("a",{attrs:{href:"https://tosdr.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://tosdr.org/"),a("OutboundLink")],1),e._v(" ). We will also be able to organise the questions & answer better, and that will definitely be an added value for the community.")]),e._v(" "),a("p",[e._v("To join our community chat: "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/messages/general",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.slack.com/messages/general"),a("OutboundLink")],1)]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on April 28"),a("sup",[e._v("th")]),e._v(". We are going to hear about open science practices at the Turing Way from former Frictionless Fellow Anne Lee Steele."),a("br"),e._v("\nYou can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/CCa0g-hYSUg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("Join us on "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/messages/general",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[156],{689:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on March 31"),a("sup",[e._v("st")]),e._v(", we had a discussion with Johan Richer from "),a("a",{attrs:{href:"https://www.multi.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Multi"),a("OutboundLink")],1),e._v(" around his theory of portal vs catalogue.")]),e._v(" "),a("p",[e._v("The discussion started with a presentation of the latest catalogue prototype by "),a("a",{attrs:{href:"https://www.data.gouv.fr/fr/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Etalab"),a("OutboundLink")],1),e._v(" currently in development: "),a("a",{attrs:{href:"https://github.com/etalab/catalogage-donnees",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/etalab/catalogage-donnees"),a("OutboundLink")],1),e._v(". Data cataloguing has become a major component of open data policies in France, but there are issues related to the maintainability of the catalogue and the traceability of the data.")]),e._v(" "),a("p",[e._v("In the beginning the data producers were also the data publishers, and therefore the purpose of a portal was to catalogue, publish, and store the data. Recently the process became more complicated, and the cataloguing became a prerequisite to publication. Instead of publishing by default, data producers want to make sure that the data is clean before injecting it into the portal. This started a new workflow of internal data management, that the portals were not made for. So how can we restore the broken link between catalogue and portal? Johan thinks data lineage is key.")]),e._v(" "),a("p",[e._v("If you want to know more about it, you can go and have a look at Johan’s presentation "),a("a",{attrs:{href:"https://jailbreak.gitlab.io/investigation-catalogue/synthese.html#/3",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(" (in French, but "),a("a",{attrs:{href:"https://jailbreak-gitlab-io.translate.goog/investigation-catalogue/synthese.html?_x_tr_sl=fr&_x_tr_tl=en#/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s a shortcut to the Google translation"),a("OutboundLink")],1),e._v(" if you’d rather have it in English), or watch the recording:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/MvrMJhn4xMo",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"news-from-the-community"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),a("p",[e._v("Our community chat has moved from Discord to Slack! In the community survey we ran last year, many people suggested moving to Slack, and the terms of services are definitely better (ranking B vs E for Discord, according to "),a("a",{attrs:{href:"https://tosdr.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://tosdr.org/"),a("OutboundLink")],1),e._v(" ). We will also be able to organise the questions & answer better, and that will definitely be an added value for the community.")]),e._v(" "),a("p",[e._v("To join our community chat: "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/messages/general",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.slack.com/messages/general"),a("OutboundLink")],1)]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on April 28"),a("sup",[e._v("th")]),e._v(". We are going to hear about open science practices at the Turing Way from former Frictionless Fellow Anne Lee Steele."),a("br"),e._v("\nYou can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),a("OutboundLink")],1)]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call recording:")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/CCa0g-hYSUg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("Join us on "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/messages/general",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/157.a603a66c.js b/assets/js/157.ea1df658.js similarity index 98% rename from assets/js/157.a603a66c.js rename to assets/js/157.ea1df658.js index 6c93c0bc9..3c68a69fb 100644 --- a/assets/js/157.a603a66c.js +++ b/assets/js/157.ea1df658.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[157],{693:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on April 28"),o("sup",[e._v("th")]),e._v(", we had a discussion around open science best practices and the Turing Way with Anne Lee Steele, who - you might remember, was part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/09/01/hello-fellows-cohort2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("second cohort of Frictionless Fellows"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The Turing Way is an open source and community-led handbook for reproducible, ethical and collaborative research. It is composed of more than 240 pages created by ~300 researchers over the course of 3 years, written collaboratively via GitHub PRs - contrasting to the notion of single/small-authorship papers.")]),e._v(" "),o("p",[e._v("There is currently an effort to make the Turing way develop meta-practices that can be applied to other areas as well, one example is documentation.")]),e._v(" "),o("p",[e._v("A great outcome of the call was the proposal to have a closer cooperation between the Frictionless Data community and the Turing Way’s one, possibly developing a chapter for Open Infrastructures for research to contribute upstream. This chapter would set the context and provide a vision for how to evaluate tools and platforms with a Turing Way perspective on reproducibility, ethical alternatives and collaboration in practice. For more info about this proposal, check "),o("a",{attrs:{href:"https://github.com/alan-turing-institute/the-turing-way/issues/2337",target:"_blank",rel:"noopener noreferrer"}},[e._v("this issue"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If you want to know more about the Turing Way, have a look at the "),o("a",{attrs:{href:"https://the-turing-way.netlify.app/welcome.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("project website"),o("OutboundLink")],1),e._v(". You can also check out the full recording of the call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/-RyRFcMAGCE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"news-from-the-community"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("You’re all invited to join the Frictionless Fellows for a free virtual workshop on Open Science best practices on May 25 at 2pm UTC!"),o("br"),e._v("\nIn this beginner-friendly workshop, Fellows will demonstrate how to use the Frictionless tools to make research data more understandable, usable, and open. You will learn how to use the Frictionless non-coding tools to manipulate metadata and schemas (and why that is important!) and how to validate data in a hands-on format. Learn more & sign up on the Fellows website: "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://fellows.frictionlessdata.io/"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("Reminder that our community chat has moved to Slack. Join us "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("there"),o("OutboundLink")],1),e._v(". We now also have a fully operating "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(", so if you prefer you can join us from there as well.")])])]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on May 26"),o("sup",[e._v("th")]),e._v(". We are going to hear Nick Kellett from Deploy Solutions explain to us how to build citizen science and climate change solutions, using Frictionless.")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Join us on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[157],{691:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on April 28"),o("sup",[e._v("th")]),e._v(", we had a discussion around open science best practices and the Turing Way with Anne Lee Steele, who - you might remember, was part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/09/01/hello-fellows-cohort2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("second cohort of Frictionless Fellows"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The Turing Way is an open source and community-led handbook for reproducible, ethical and collaborative research. It is composed of more than 240 pages created by ~300 researchers over the course of 3 years, written collaboratively via GitHub PRs - contrasting to the notion of single/small-authorship papers.")]),e._v(" "),o("p",[e._v("There is currently an effort to make the Turing way develop meta-practices that can be applied to other areas as well, one example is documentation.")]),e._v(" "),o("p",[e._v("A great outcome of the call was the proposal to have a closer cooperation between the Frictionless Data community and the Turing Way’s one, possibly developing a chapter for Open Infrastructures for research to contribute upstream. This chapter would set the context and provide a vision for how to evaluate tools and platforms with a Turing Way perspective on reproducibility, ethical alternatives and collaboration in practice. For more info about this proposal, check "),o("a",{attrs:{href:"https://github.com/alan-turing-institute/the-turing-way/issues/2337",target:"_blank",rel:"noopener noreferrer"}},[e._v("this issue"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If you want to know more about the Turing Way, have a look at the "),o("a",{attrs:{href:"https://the-turing-way.netlify.app/welcome.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("project website"),o("OutboundLink")],1),e._v(". You can also check out the full recording of the call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/-RyRFcMAGCE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"news-from-the-community"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#news-from-the-community"}},[e._v("#")]),e._v(" News from the community")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("You’re all invited to join the Frictionless Fellows for a free virtual workshop on Open Science best practices on May 25 at 2pm UTC!"),o("br"),e._v("\nIn this beginner-friendly workshop, Fellows will demonstrate how to use the Frictionless tools to make research data more understandable, usable, and open. You will learn how to use the Frictionless non-coding tools to manipulate metadata and schemas (and why that is important!) and how to validate data in a hands-on format. Learn more & sign up on the Fellows website: "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://fellows.frictionlessdata.io/"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("Reminder that our community chat has moved to Slack. Join us "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("there"),o("OutboundLink")],1),e._v(". We now also have a fully operating "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(", so if you prefer you can join us from there as well.")])])]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on May 26"),o("sup",[e._v("th")]),e._v(". We are going to hear Nick Kellett from Deploy Solutions explain to us how to build citizen science and climate change solutions, using Frictionless.")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Join us on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(" to say hi or ask any questions. See you there!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/158.49452f10.js b/assets/js/158.c499164d.js similarity index 99% rename from assets/js/158.49452f10.js rename to assets/js/158.c499164d.js index 79fcd5122..bb3463bd3 100644 --- a/assets/js/158.49452f10.js +++ b/assets/js/158.c499164d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[158],{692:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Is reproducing someone else’s research data a Frictionless experience? As we have seen with all the previous cohorts of Frictionless Fellows (you can read the blog "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/10/fellows-reproducing/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v("), most often than not it is sadly not the case.")]),e._v(" "),a("p",[e._v("To prove that the “reproducibility crisis” is a real problem in scientific research at the moment, we challenged the Fellows to exchange their data to see if they could reproduce each other’s Data Packages. Read about their experience:")]),e._v(" "),a("h2",{attrs:{id:"melvin"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#melvin"}},[e._v("#")]),e._v(" Melvin")]),e._v(" "),a("p",[e._v("We had an interesting task for our frictionless fellow activity that involved exchanging our data sets with our fellow colleagues (pairwise) and trying to reproduce their work. My partner for this assignment was Lindsay, who is a librarian.")]),e._v(" "),a("p",[e._v("In data science, replicability and reproducibility are some of the keys to data "),a("a",{attrs:{href:"http://integrity.It",target:"_blank",rel:"noopener noreferrer"}},[e._v("integrity.It"),a("OutboundLink")],1),e._v(" creates more opportunities for new insights and reduces errors. In order to ensure reproducibility of data, one must first make sure that the raw data is available. In this regard, my partner Lindsay shared with me her data that was on her Github account to facilitate the process.")]),e._v(" "),a("p",[e._v("This process and activity were really useful and humbling. As we got to discuss our data sets with Lindsay, I realized key things such as Tidy data principles, which was the highlight for me in this whole process, besides the point that it’s not easy to understand someone else’s data without further metadata to accompany the data set. Imagine the frustration researchers go through trying to understand and reproduce other people’s data without more information on the data."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Melvin’s blog"),a("OutboundLink")],1),e._v(" to see how she managed to reproduce her fellows’ data package.")]),e._v(" "),a("h2",{attrs:{id:"victoria"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#victoria"}},[e._v("#")]),e._v(" Victoria")]),e._v(" "),a("p",[e._v("My data package partner, Zarena is an awesome social scientist in the human rights sphere. She has a background in mental health research and interests ranging from epistemic injustice to intersectionality - two terms I had to double check my understanding of. In poking around Zarena’s profile, particularly interesting was her focus on "),a("a",{attrs:{href:"https://www.universityaffairs.ca/features/feature-article/mad-studies/",target:"_blank",rel:"noopener noreferrer"}},[e._v("mad studies"),a("OutboundLink")],1),e._v(", a young interdisciplinary field dealing with identity and the marginalisation of individuals with alternative mental states. This idea - broadly accepting a spectrum of human states instead of subjecting them to a black/white absolute interpretation - was completely new to me and fascinating! But being a social theory noob, I suspected to encounter a barrier to understanding her data.")]),e._v(" "),a("p",[e._v("Zarena’s data was publicly available in her GitHub fellows repository. I clocked a couple of things off the bat: the repo contained a csv called “data-dp.csv”, as well as a "),a("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),a("OutboundLink")],1),e._v(" and several schema files. When in doubt of where to start, a good place to look is the README."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/victoria-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Victoria’s blog"),a("OutboundLink")],1),e._v(" to see how she reproduced her fellow’s data packages.")]),e._v(" "),a("h2",{attrs:{id:"kevin"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#kevin"}},[e._v("#")]),e._v(" Kevin")]),e._v(" "),a("p",[e._v("Data reproducibility is where other researchers use same data to attain the same results by using same methods. Research reproducibility allows other scientist to gain new insights from your data as well as improve quality of research by checking the correctness of your findings. The aim of this assignment was to try and reproduce my colleague’s data package and validate the tabular data using frictionless browser tools, that is, data package creator and good tables, respectively.")]),e._v(" "),a("p",[e._v("First, Guo-Qiang shared the links to his datasets and the data package to me which I freely accessed from his GitHub repository. His data was a summary of clinical evidence of various health effects of menopausal hormone therapy in menopausal women."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kk-data-trading-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Kevin’s blog"),a("OutboundLink")],1),e._v(" to see how he managed to reproduce Guo-Qiang’s datapackages.")]),e._v(" "),a("h2",{attrs:{id:"zarena"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#zarena"}},[e._v("#")]),e._v(" Zarena")]),e._v(" "),a("p",[e._v("Before joining the Frictionless Data Fellowship Programme, I did not realise the importance of research reproducibility. To tell the truth, I really did not have such a concept in my professional vocabulary despite having an MSc degree in Social Science Research Methods and working in different social research projects. But, maybe, that was the reason why I did not know this concept and never practised it in my research projects. Like many of my social science colleagues, especially the ones working with qualitative - and often sensitive - data, for me it was important to ensure that data I collect are safely stored in a password-protected platform and then - upon completion of a project - are deleted. But now working for the Frictionless Data Fellowship Programme and managing different sorts of data, including bibliometric metadata, I see that if we want social sciences and humanities to progress, it is vital to integrate such practices as reproducing, replicating, and reusing data into our research.")]),e._v(" "),a("p",[e._v("So, in "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/zarena-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this blog"),a("OutboundLink")],1),e._v(", I will try to explain my first attempt to reproduce my Frictionless fellow’s dataset, which is openly shared in "),a("a",{attrs:{href:"https://github.com/vyelnats/frictionless-v",target:"_blank",rel:"noopener noreferrer"}},[e._v("the GitHub repository"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"lindsay"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#lindsay"}},[e._v("#")]),e._v(" Lindsay")]),e._v(" "),a("p",[e._v("Our most recent Fricitonless Fellows project is to trade data and create a Data Package using another Fellow’s data. I traded data with the fabulous Melvin! Melvin is a pathologist and soil scientist.")]),e._v(" "),a("p",[e._v("While this seems like a fun project, I was frustrated at first. I had to find my partner’s data. After reading her "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Blog"),a("OutboundLink")],1),e._v(" and poking around "),a("a",{attrs:{href:"https://github.com/frictionlessdata/fellows",target:"_blank",rel:"noopener noreferrer"}},[e._v("on GitHub"),a("OutboundLink")],1),e._v(", I could not find her data. I eventually realized: we are mimicking the process of reusing reproducible research data. The first hurdle any researcher must overcome is finding the data."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lindsay-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Lindsay’s blog"),a("OutboundLink")],1),e._v(" to understand what happened while reproducing Melvin’s Data Packages.")]),e._v(" "),a("h1",{attrs:{id:"about-the-frictionless-data-fellowship"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#about-the-frictionless-data-fellowship"}},[e._v("#")]),e._v(" About the Frictionless Data Fellowship")]),e._v(" "),a("p",[e._v("With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. To know more about the programme, visit "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the dedicated website"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[158],{690:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("Is reproducing someone else’s research data a Frictionless experience? As we have seen with all the previous cohorts of Frictionless Fellows (you can read the blog "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/10/fellows-reproducing/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v("), most often than not it is sadly not the case.")]),e._v(" "),a("p",[e._v("To prove that the “reproducibility crisis” is a real problem in scientific research at the moment, we challenged the Fellows to exchange their data to see if they could reproduce each other’s Data Packages. Read about their experience:")]),e._v(" "),a("h2",{attrs:{id:"melvin"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#melvin"}},[e._v("#")]),e._v(" Melvin")]),e._v(" "),a("p",[e._v("We had an interesting task for our frictionless fellow activity that involved exchanging our data sets with our fellow colleagues (pairwise) and trying to reproduce their work. My partner for this assignment was Lindsay, who is a librarian.")]),e._v(" "),a("p",[e._v("In data science, replicability and reproducibility are some of the keys to data "),a("a",{attrs:{href:"http://integrity.It",target:"_blank",rel:"noopener noreferrer"}},[e._v("integrity.It"),a("OutboundLink")],1),e._v(" creates more opportunities for new insights and reduces errors. In order to ensure reproducibility of data, one must first make sure that the raw data is available. In this regard, my partner Lindsay shared with me her data that was on her Github account to facilitate the process.")]),e._v(" "),a("p",[e._v("This process and activity were really useful and humbling. As we got to discuss our data sets with Lindsay, I realized key things such as Tidy data principles, which was the highlight for me in this whole process, besides the point that it’s not easy to understand someone else’s data without further metadata to accompany the data set. Imagine the frustration researchers go through trying to understand and reproduce other people’s data without more information on the data."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Melvin’s blog"),a("OutboundLink")],1),e._v(" to see how she managed to reproduce her fellows’ data package.")]),e._v(" "),a("h2",{attrs:{id:"victoria"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#victoria"}},[e._v("#")]),e._v(" Victoria")]),e._v(" "),a("p",[e._v("My data package partner, Zarena is an awesome social scientist in the human rights sphere. She has a background in mental health research and interests ranging from epistemic injustice to intersectionality - two terms I had to double check my understanding of. In poking around Zarena’s profile, particularly interesting was her focus on "),a("a",{attrs:{href:"https://www.universityaffairs.ca/features/feature-article/mad-studies/",target:"_blank",rel:"noopener noreferrer"}},[e._v("mad studies"),a("OutboundLink")],1),e._v(", a young interdisciplinary field dealing with identity and the marginalisation of individuals with alternative mental states. This idea - broadly accepting a spectrum of human states instead of subjecting them to a black/white absolute interpretation - was completely new to me and fascinating! But being a social theory noob, I suspected to encounter a barrier to understanding her data.")]),e._v(" "),a("p",[e._v("Zarena’s data was publicly available in her GitHub fellows repository. I clocked a couple of things off the bat: the repo contained a csv called “data-dp.csv”, as well as a "),a("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),a("OutboundLink")],1),e._v(" and several schema files. When in doubt of where to start, a good place to look is the README."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/victoria-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Victoria’s blog"),a("OutboundLink")],1),e._v(" to see how she reproduced her fellow’s data packages.")]),e._v(" "),a("h2",{attrs:{id:"kevin"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#kevin"}},[e._v("#")]),e._v(" Kevin")]),e._v(" "),a("p",[e._v("Data reproducibility is where other researchers use same data to attain the same results by using same methods. Research reproducibility allows other scientist to gain new insights from your data as well as improve quality of research by checking the correctness of your findings. The aim of this assignment was to try and reproduce my colleague’s data package and validate the tabular data using frictionless browser tools, that is, data package creator and good tables, respectively.")]),e._v(" "),a("p",[e._v("First, Guo-Qiang shared the links to his datasets and the data package to me which I freely accessed from his GitHub repository. His data was a summary of clinical evidence of various health effects of menopausal hormone therapy in menopausal women."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kk-data-trading-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Kevin’s blog"),a("OutboundLink")],1),e._v(" to see how he managed to reproduce Guo-Qiang’s datapackages.")]),e._v(" "),a("h2",{attrs:{id:"zarena"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#zarena"}},[e._v("#")]),e._v(" Zarena")]),e._v(" "),a("p",[e._v("Before joining the Frictionless Data Fellowship Programme, I did not realise the importance of research reproducibility. To tell the truth, I really did not have such a concept in my professional vocabulary despite having an MSc degree in Social Science Research Methods and working in different social research projects. But, maybe, that was the reason why I did not know this concept and never practised it in my research projects. Like many of my social science colleagues, especially the ones working with qualitative - and often sensitive - data, for me it was important to ensure that data I collect are safely stored in a password-protected platform and then - upon completion of a project - are deleted. But now working for the Frictionless Data Fellowship Programme and managing different sorts of data, including bibliometric metadata, I see that if we want social sciences and humanities to progress, it is vital to integrate such practices as reproducing, replicating, and reusing data into our research.")]),e._v(" "),a("p",[e._v("So, in "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/zarena-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("this blog"),a("OutboundLink")],1),e._v(", I will try to explain my first attempt to reproduce my Frictionless fellow’s dataset, which is openly shared in "),a("a",{attrs:{href:"https://github.com/vyelnats/frictionless-v",target:"_blank",rel:"noopener noreferrer"}},[e._v("the GitHub repository"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h2",{attrs:{id:"lindsay"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#lindsay"}},[e._v("#")]),e._v(" Lindsay")]),e._v(" "),a("p",[e._v("Our most recent Fricitonless Fellows project is to trade data and create a Data Package using another Fellow’s data. I traded data with the fabulous Melvin! Melvin is a pathologist and soil scientist.")]),e._v(" "),a("p",[e._v("While this seems like a fun project, I was frustrated at first. I had to find my partner’s data. After reading her "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Blog"),a("OutboundLink")],1),e._v(" and poking around "),a("a",{attrs:{href:"https://github.com/frictionlessdata/fellows",target:"_blank",rel:"noopener noreferrer"}},[e._v("on GitHub"),a("OutboundLink")],1),e._v(", I could not find her data. I eventually realized: we are mimicking the process of reusing reproducible research data. The first hurdle any researcher must overcome is finding the data."),a("br"),e._v(" "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lindsay-trade-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Read Lindsay’s blog"),a("OutboundLink")],1),e._v(" to understand what happened while reproducing Melvin’s Data Packages.")]),e._v(" "),a("h1",{attrs:{id:"about-the-frictionless-data-fellowship"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#about-the-frictionless-data-fellowship"}},[e._v("#")]),e._v(" About the Frictionless Data Fellowship")]),e._v(" "),a("p",[e._v("With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. To know more about the programme, visit "),a("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the dedicated website"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/159.3f041c20.js b/assets/js/159.135211ae.js similarity index 98% rename from assets/js/159.3f041c20.js rename to assets/js/159.135211ae.js index 50b00f3bb..ee24bca3c 100644 --- a/assets/js/159.3f041c20.js +++ b/assets/js/159.135211ae.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[159],{695:function(e,t,o){"use strict";o.r(t);var a=o(29),i=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on May 28"),o("sup",[e._v("th")]),e._v(", we heard about citizen science and climate change solutions using Frictionless Data from Nick Kellett, Pan Khantidhara and Justin Mosbey from "),o("a",{attrs:{href:"https://www.deploy.solutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Deploy Solutions"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Deploy Solutions builds software that can help with climate change disruptions, and they are using Frictionless Data to help! They develop cloud-hosted solutions using big data from satellites, and, since 2019, they have adopted a citizen focus in climate change research."),o("br"),e._v("\nThey researched and identified the main problems that prevent people and communities from acting in case of climate change disasters:")]),e._v(" "),o("ul",[o("li",[e._v("Citizens feel overwhelmed by the volume of information received.")]),e._v(" "),o("li",[e._v("They feel the information they get is not personalised to their needs.")]),e._v(" "),o("li",[e._v("Authorities have difficulties directly collaborating and sharing information with citizens.")])]),e._v(" "),o("p",[e._v("The solution they propose is the creation of a complete map-centred web-application that can be built very quickly (~4 hours) with basic functionalities to provide basic and reliable information for disaster response, while allowing users to upload citizen science observations.")]),e._v(" "),o("p",[e._v("The app takes Earth observations imagery from satellites, and associates them with imagery that citizens are taking on the ground, to check that the machine learning algorithms applied are correctly predicting the disaster extent.")]),e._v(" "),o("p",[e._v("It also visualises the data coming in to look for trends, gathering historic data and comparing with what is predicted. The quantity of information needed for such an app is huge, and most often than not, it comes from different sources and does not follow any standards. It is therefore tricky to describe it and validate it. You might have guessed it by now, Frictionless Data is helping with that.")]),e._v(" "),o("p",[e._v("If you are interested in knowing more about Deploy Solutions and how they are using Frictionless Data, you can watch the full presentation (including Pan Khantidhara’s demo!):")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/CSvVbl8Egqk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("If you have questions or feedback, you can let us know in "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(", or you can reach out to Deploy Solutions directly.")]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on June 30"),o("sup",[e._v("th")]),e._v(". Join us to meet the 3"),o("sup",[e._v("rd")]),e._v(" cohort of Frictionless Fellows and hear about their reproducibility and open science journey!")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Join our community on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:"),o("br"),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/RqAA8YCy1AU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[159],{692:function(e,t,o){"use strict";o.r(t);var a=o(29),i=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on May 28"),o("sup",[e._v("th")]),e._v(", we heard about citizen science and climate change solutions using Frictionless Data from Nick Kellett, Pan Khantidhara and Justin Mosbey from "),o("a",{attrs:{href:"https://www.deploy.solutions/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Deploy Solutions"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Deploy Solutions builds software that can help with climate change disruptions, and they are using Frictionless Data to help! They develop cloud-hosted solutions using big data from satellites, and, since 2019, they have adopted a citizen focus in climate change research."),o("br"),e._v("\nThey researched and identified the main problems that prevent people and communities from acting in case of climate change disasters:")]),e._v(" "),o("ul",[o("li",[e._v("Citizens feel overwhelmed by the volume of information received.")]),e._v(" "),o("li",[e._v("They feel the information they get is not personalised to their needs.")]),e._v(" "),o("li",[e._v("Authorities have difficulties directly collaborating and sharing information with citizens.")])]),e._v(" "),o("p",[e._v("The solution they propose is the creation of a complete map-centred web-application that can be built very quickly (~4 hours) with basic functionalities to provide basic and reliable information for disaster response, while allowing users to upload citizen science observations.")]),e._v(" "),o("p",[e._v("The app takes Earth observations imagery from satellites, and associates them with imagery that citizens are taking on the ground, to check that the machine learning algorithms applied are correctly predicting the disaster extent.")]),e._v(" "),o("p",[e._v("It also visualises the data coming in to look for trends, gathering historic data and comparing with what is predicted. The quantity of information needed for such an app is huge, and most often than not, it comes from different sources and does not follow any standards. It is therefore tricky to describe it and validate it. You might have guessed it by now, Frictionless Data is helping with that.")]),e._v(" "),o("p",[e._v("If you are interested in knowing more about Deploy Solutions and how they are using Frictionless Data, you can watch the full presentation (including Pan Khantidhara’s demo!):")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/CSvVbl8Egqk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("If you have questions or feedback, you can let us know in "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(", or you can reach out to Deploy Solutions directly.")]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on June 30"),o("sup",[e._v("th")]),e._v(". Join us to meet the 3"),o("sup",[e._v("rd")]),e._v(" cohort of Frictionless Fellows and hear about their reproducibility and open science journey!")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here:"),o("OutboundLink")],1)]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Join our community on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:"),o("br"),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/RqAA8YCy1AU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/161.0c948627.js b/assets/js/161.37b24ce2.js similarity index 98% rename from assets/js/161.0c948627.js rename to assets/js/161.37b24ce2.js index 1b45bedea..4719e0fb5 100644 --- a/assets/js/161.0c948627.js +++ b/assets/js/161.37b24ce2.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[161],{697:function(e,a,t){"use strict";t.r(a);var n=t(29),s=Object(n.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Originally published on: "),t("a",{attrs:{href:"https://blog.okfn.org/2022/07/05/frictionless-planet-and-lacuna-fund-discuss-gaps-in-climate-datasets-for-machine-learning/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2022/07/05/frictionless-planet-and-lacuna-fund-discuss-gaps-in-climate-datasets-for-machine-learning/"),t("OutboundLink")],1)]),e._v(" "),t("p",[e._v("On 24 June we hosted a conversation with the Lacuna Fund about datasets for climate change where we heard all about the Lacuna Fund’s recently launched Request for Proposals around Datasets for Climate Applications. We were joined by climate data users and creators from around the globe. This conversation is a part of Open Knowledge Foundation’s recent work on building a Frictionless Planet by using open tools and design principles to tackle the world’s largest problems, including climate change.")]),e._v(" "),t("p",[e._v("A lacuna is a gap, a blank space or a missing part of an item. Today there are gaps in the datasets that are available to train and evaluate machine learning models. This is especially true when it comes to specific populations and geographies. The Lacuna Fund was created to support data scientists in closing those gaps in machine learning datasets needed to better understand and tackle urgent problems in their communities, like those linked to the climate crisis.")]),e._v(" "),t("p",[e._v("Lacuna Fund is currently accepting proposals for two climate tracks: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Climate & Energy"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Climate & Health"),t("OutboundLink")],1),e._v(". The first track is looking at the intersection between energy, climate, and green recovery, and the second focuses on health and strategies to mitigate the impact of the climate crisis. Proposals should focus on machine learning datasets, either collecting and annotating new data, annotating and releasing existing data, or expanding existing datasets and increasing usability. Lacuna Fund’s guiding principles include equity, ethics, and participatory approach, and those values are very important for this work. Accordingly, proposals should include a plan for data management and licencing, privacy, and how the data will be shared. The target audience for this call is data scientists, with a focus on under-represented communities in Africa, Asia, and Latin America.")]),e._v(" "),t("p",[e._v("During the call, we also discussed if participants have specific data gaps in their fields, like a lack of data on how extreme heat events affect human health. The response was a strong “Yes”! Participants described working in “data deserts” where there is often missing data, leading to less accurate machine learning algorithms. Another common issue is data quality and trust in data, especially from “official” sources. Tackling data transparency will be important for creating impactful climate policy. We’d like to ask you the same question: If your group could have access to one data set that would have a large impact on your work, what is that data set?")]),e._v(" "),t("ul",[t("li",[e._v("If you are interested in applying for the Lacuna Fund’s open requests for proposals (RFP), please check out these resources here:")]),e._v(" "),t("li",[e._v("Apply page: "),t("a",{attrs:{href:"https://lacunafund.org/apply/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://lacunafund.org/apply/"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Q&A (questions from potential applicants): "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/06/QA-Climate-2022.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/06/QA-Climate-2022.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("RFP for Climate & Energy: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("RFP for Climate & Health: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Applicant webinar recording: "),t("a",{attrs:{href:"https://vimeo.com/711365252",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://vimeo.com/711365252"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Proposals are due 17th July")])])])}),[],!1,null,null,null);a.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[161],{695:function(e,a,t){"use strict";t.r(a);var n=t(29),s=Object(n.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Originally published on: "),t("a",{attrs:{href:"https://blog.okfn.org/2022/07/05/frictionless-planet-and-lacuna-fund-discuss-gaps-in-climate-datasets-for-machine-learning/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2022/07/05/frictionless-planet-and-lacuna-fund-discuss-gaps-in-climate-datasets-for-machine-learning/"),t("OutboundLink")],1)]),e._v(" "),t("p",[e._v("On 24 June we hosted a conversation with the Lacuna Fund about datasets for climate change where we heard all about the Lacuna Fund’s recently launched Request for Proposals around Datasets for Climate Applications. We were joined by climate data users and creators from around the globe. This conversation is a part of Open Knowledge Foundation’s recent work on building a Frictionless Planet by using open tools and design principles to tackle the world’s largest problems, including climate change.")]),e._v(" "),t("p",[e._v("A lacuna is a gap, a blank space or a missing part of an item. Today there are gaps in the datasets that are available to train and evaluate machine learning models. This is especially true when it comes to specific populations and geographies. The Lacuna Fund was created to support data scientists in closing those gaps in machine learning datasets needed to better understand and tackle urgent problems in their communities, like those linked to the climate crisis.")]),e._v(" "),t("p",[e._v("Lacuna Fund is currently accepting proposals for two climate tracks: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Climate & Energy"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Climate & Health"),t("OutboundLink")],1),e._v(". The first track is looking at the intersection between energy, climate, and green recovery, and the second focuses on health and strategies to mitigate the impact of the climate crisis. Proposals should focus on machine learning datasets, either collecting and annotating new data, annotating and releasing existing data, or expanding existing datasets and increasing usability. Lacuna Fund’s guiding principles include equity, ethics, and participatory approach, and those values are very important for this work. Accordingly, proposals should include a plan for data management and licencing, privacy, and how the data will be shared. The target audience for this call is data scientists, with a focus on under-represented communities in Africa, Asia, and Latin America.")]),e._v(" "),t("p",[e._v("During the call, we also discussed if participants have specific data gaps in their fields, like a lack of data on how extreme heat events affect human health. The response was a strong “Yes”! Participants described working in “data deserts” where there is often missing data, leading to less accurate machine learning algorithms. Another common issue is data quality and trust in data, especially from “official” sources. Tackling data transparency will be important for creating impactful climate policy. We’d like to ask you the same question: If your group could have access to one data set that would have a large impact on your work, what is that data set?")]),e._v(" "),t("ul",[t("li",[e._v("If you are interested in applying for the Lacuna Fund’s open requests for proposals (RFP), please check out these resources here:")]),e._v(" "),t("li",[e._v("Apply page: "),t("a",{attrs:{href:"https://lacunafund.org/apply/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://lacunafund.org/apply/"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Q&A (questions from potential applicants): "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/06/QA-Climate-2022.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/06/QA-Climate-2022.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("RFP for Climate & Energy: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Energy-RFP-Final.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("RFP for Climate & Health: "),t("a",{attrs:{href:"https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://s31207.pcdn.co/wp-content/uploads/sites/11/2022/04/Climate-and-Health-RFP-Final.pdf"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Applicant webinar recording: "),t("a",{attrs:{href:"https://vimeo.com/711365252",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://vimeo.com/711365252"),t("OutboundLink")],1)]),e._v(" "),t("li",[e._v("Proposals are due 17th July")])])])}),[],!1,null,null,null);a.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/162.c239e347.js b/assets/js/162.9db603ef.js similarity index 99% rename from assets/js/162.c239e347.js rename to assets/js/162.9db603ef.js index a0c614020..19df76c9f 100644 --- a/assets/js/162.c239e347.js +++ b/assets/js/162.9db603ef.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[162],{696:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("Originally posted on: "),a("a",{attrs:{href:"https://medium.com/opendatacoop/announcing-flatterer-converting-structured-data-into-tabular-data-c4652eae27c9",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://medium.com/opendatacoop/announcing-flatterer-converting-structured-data-into-tabular-data-c4652eae27c9"),a("OutboundLink")],1)])]),e._v(" "),a("p",[a("em",[e._v("In this blog post, we introduce Flatterer - a new tool that helps convert JSON into tabular data. To hear more about Flatterer, "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up"),a("OutboundLink")],1),e._v(" to join David Raznick at the Frictionless Data community call on July 28th.")])]),e._v(" "),a("p",[e._v("Open data needs to be available in formats people want to work with. In our experience at "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Services"),a("OutboundLink")],1),e._v(", we’ve found that developers often want access to structured data (for example, JSON) while analysts are used to working with flat data (in CSV files or tables).")]),e._v(" "),a("p",[e._v("More and more data is being published as JSON, but for most analysts this isn’t particularly useful. For many, working with JSON means needing to spend time converting the structured data into tables before they can get started.")]),e._v(" "),a("p",[e._v("That’s where "),a("a",{attrs:{href:"https://github.com/kindly/flatterer",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flatterer"),a("OutboundLink")],1),e._v(" comes in. Flatterer is an opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. It helps people to convert JSON into relational, tabular data that can be easily analysed. It’s fast and memory efficient, and can be run either in the "),a("a",{attrs:{href:"https://flatterer.opendata.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("command line"),a("OutboundLink")],1),e._v(" or as a "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-FWeGccp_QKCu1WAEGQ0mEQ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Python library"),a("OutboundLink")],1),e._v(". The Python library supports creating data frames for all the flattened data, making it easy to analyse and visualise.")]),e._v(" "),a("h2",{attrs:{id:"what-does-it-do"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-does-it-do"}},[e._v("#")]),e._v(" What does it do?")]),e._v(" "),a("p",[e._v("With Flatterer you can:")]),e._v(" "),a("ul",[a("li",[e._v("easily convert JSON to flat relational data such as CSV, XLSX, Database Tables, Pandas Dataframes and Parquet;")]),e._v(" "),a("li",[e._v("convert JSON into data packages, so you can use Frictionless data to convert into any database format;")]),e._v(" "),a("li",[e._v("create a data dictionary that contains metadata about the conversion, including fields contained in the dataset, to help you understand the data you are looking at;")]),e._v(" "),a("li",[e._v("create a new table for each one-to-many relationship, alongside _link fields that help to join the data together.")])]),e._v(" "),a("h2",{attrs:{id:"why-we-built-it"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-we-built-it"}},[e._v("#")]),e._v(" Why we built it")]),e._v(" "),a("p",[e._v("When you receive a JSON file where the structure is deeply nested or not well specified, it’s hard to determine what the data contains. Even if you know the JSON structure, it can still be time consuming to work out how to flatten the JSON into a relational structure for data analysis, and to be part of a data pipeline."),a("br"),e._v("\nFlatterer aims to be the first tool to go to when faced with this problem. Although you may still need to handwrite code, Flatterer has a number of benefits over most handwritten approaches:")]),e._v(" "),a("ul",[a("li",[e._v("it’s fast – written in Rust but with Python bindings for ease of use. It can be 10x faster than hand written Python flattening;")]),e._v(" "),a("li",[e._v("it’s memory efficient – flatterer uses a custom streaming JSON parser which means that a long list of objects nested with the JSON will be streamed, so less data needs to be loaded into memory at once;")]),e._v(" "),a("li",[e._v("it gives you fast, memory efficient output to CSV/XLSX/SQLITE/PARQUET;")]),e._v(" "),a("li",[e._v("it uses best practice that has been learnt from our experience flattening JSON countless times, such as generating keys to link one-to-many tables to their parents.")])]),e._v(" "),a("h2",{attrs:{id:"using-flatterer-in-the-openownership-data-pipeline"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#using-flatterer-in-the-openownership-data-pipeline"}},[e._v("#")]),e._v(" Using Flatterer in the OpenOwnership data pipeline")]),e._v(" "),a("p",[e._v("As an example, we’ve used "),a("a",{attrs:{href:"https://github.com/kindly/flatterer",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flatterer"),a("OutboundLink")],1),e._v(" to help "),a("a",{attrs:{href:"https://www.openownership.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenOwnership"),a("OutboundLink")],1),e._v(" create a data pipeline to make information about who owns and controls companies available in a "),a("a",{attrs:{href:"https://bods-data.openownership.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("variety of data formats"),a("OutboundLink")],1),e._v(". In the example below, we’ve used Flatterer to convert beneficial ownership data from the Register of Enterprises of the Republic of Latvia and the OpenOwnership Register from JSON into CSV, SQLite, Postgresql, Big Query and Datasette formats.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058338-08ce8ea1-9b1f-4c4c-b59c-64b04cd450f6.png",alt:"img-1-flatterer"}})]),e._v(" "),a("p",[e._v("Alongside converting the data into different formats, Flatterer has created a data dictionary that shows the fields contained in the dataset, alongside the field type and field counts. In the example below, we show how this dictionary interprets person_statement fields contained in the Beneficial Ownership Data Standard.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058526-19694210-514e-4215-bf9d-f6abc7ef5400.png",alt:"img-2-flatterer"}})]),e._v(" "),a("p",[e._v("Finally, you can see Flatterer has created special _link fields, to help with joining the tables together. The example below shows how the _link field helps join "),a("a",{attrs:{href:"https://medium.com/opendatacoop/why-do-open-organisational-identifiers-matter-46af05ab30a",target:"_blank",rel:"noopener noreferrer"}},[e._v("entity identifiers"),a("OutboundLink")],1),e._v(" to statements about beneficial ownership.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058657-ae4ab534-9fdb-4d6d-ad59-56521f0218e0.png",alt:"img-3-flatterer"}})]),e._v(" "),a("h2",{attrs:{id:"what-s-next"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next?")]),e._v(" "),a("p",[e._v("Next, we’ll be working to make Flatterer more user friendly. We’ll be exploring creating a desktop interface, improving type guessing for fields, and giving more summary statistics about the input data. We welcome feedback on the tool through "),a("a",{attrs:{href:"https://github.com/kindly/flatterer/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(", and are really interested to find out what kind of improvements you’d like to see.")]),e._v(" "),a("p",[e._v("More information about using Flatterer is available on "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-FWeGccp_QKCu1WAEGQ0mEQ",target:"_blank",rel:"noopener noreferrer"}},[e._v("deepnote"),a("OutboundLink")],1),e._v(". To hear more about Flatterer, you can join David Raznick at Frictionless Data’s monthly community call on July 28th.")]),e._v(" "),a("h4",{attrs:{id:"at-open-data-services-cooperative-we-re-always-happy-to-discuss-how-developing-or-implementing-open-data-standards-could-support-your-goals-find-out-more-about-our-work-and-get-in-touch"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#at-open-data-services-cooperative-we-re-always-happy-to-discuss-how-developing-or-implementing-open-data-standards-could-support-your-goals-find-out-more-about-our-work-and-get-in-touch"}},[e._v("#")]),e._v(" At Open Data Services Cooperative we’re always happy to discuss how developing or implementing open data standards could support your goals. Find out more about "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("our work"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://opendataservices.coop/#contact",target:"_blank",rel:"noopener noreferrer"}},[e._v("get in touch"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[162],{697:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("Originally posted on: "),a("a",{attrs:{href:"https://medium.com/opendatacoop/announcing-flatterer-converting-structured-data-into-tabular-data-c4652eae27c9",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://medium.com/opendatacoop/announcing-flatterer-converting-structured-data-into-tabular-data-c4652eae27c9"),a("OutboundLink")],1)])]),e._v(" "),a("p",[a("em",[e._v("In this blog post, we introduce Flatterer - a new tool that helps convert JSON into tabular data. To hear more about Flatterer, "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform",target:"_blank",rel:"noopener noreferrer"}},[e._v("sign up"),a("OutboundLink")],1),e._v(" to join David Raznick at the Frictionless Data community call on July 28th.")])]),e._v(" "),a("p",[e._v("Open data needs to be available in formats people want to work with. In our experience at "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Services"),a("OutboundLink")],1),e._v(", we’ve found that developers often want access to structured data (for example, JSON) while analysts are used to working with flat data (in CSV files or tables).")]),e._v(" "),a("p",[e._v("More and more data is being published as JSON, but for most analysts this isn’t particularly useful. For many, working with JSON means needing to spend time converting the structured data into tables before they can get started.")]),e._v(" "),a("p",[e._v("That’s where "),a("a",{attrs:{href:"https://github.com/kindly/flatterer",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flatterer"),a("OutboundLink")],1),e._v(" comes in. Flatterer is an opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. It helps people to convert JSON into relational, tabular data that can be easily analysed. It’s fast and memory efficient, and can be run either in the "),a("a",{attrs:{href:"https://flatterer.opendata.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("command line"),a("OutboundLink")],1),e._v(" or as a "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-FWeGccp_QKCu1WAEGQ0mEQ",target:"_blank",rel:"noopener noreferrer"}},[e._v("Python library"),a("OutboundLink")],1),e._v(". The Python library supports creating data frames for all the flattened data, making it easy to analyse and visualise.")]),e._v(" "),a("h2",{attrs:{id:"what-does-it-do"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-does-it-do"}},[e._v("#")]),e._v(" What does it do?")]),e._v(" "),a("p",[e._v("With Flatterer you can:")]),e._v(" "),a("ul",[a("li",[e._v("easily convert JSON to flat relational data such as CSV, XLSX, Database Tables, Pandas Dataframes and Parquet;")]),e._v(" "),a("li",[e._v("convert JSON into data packages, so you can use Frictionless data to convert into any database format;")]),e._v(" "),a("li",[e._v("create a data dictionary that contains metadata about the conversion, including fields contained in the dataset, to help you understand the data you are looking at;")]),e._v(" "),a("li",[e._v("create a new table for each one-to-many relationship, alongside _link fields that help to join the data together.")])]),e._v(" "),a("h2",{attrs:{id:"why-we-built-it"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-we-built-it"}},[e._v("#")]),e._v(" Why we built it")]),e._v(" "),a("p",[e._v("When you receive a JSON file where the structure is deeply nested or not well specified, it’s hard to determine what the data contains. Even if you know the JSON structure, it can still be time consuming to work out how to flatten the JSON into a relational structure for data analysis, and to be part of a data pipeline."),a("br"),e._v("\nFlatterer aims to be the first tool to go to when faced with this problem. Although you may still need to handwrite code, Flatterer has a number of benefits over most handwritten approaches:")]),e._v(" "),a("ul",[a("li",[e._v("it’s fast – written in Rust but with Python bindings for ease of use. It can be 10x faster than hand written Python flattening;")]),e._v(" "),a("li",[e._v("it’s memory efficient – flatterer uses a custom streaming JSON parser which means that a long list of objects nested with the JSON will be streamed, so less data needs to be loaded into memory at once;")]),e._v(" "),a("li",[e._v("it gives you fast, memory efficient output to CSV/XLSX/SQLITE/PARQUET;")]),e._v(" "),a("li",[e._v("it uses best practice that has been learnt from our experience flattening JSON countless times, such as generating keys to link one-to-many tables to their parents.")])]),e._v(" "),a("h2",{attrs:{id:"using-flatterer-in-the-openownership-data-pipeline"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#using-flatterer-in-the-openownership-data-pipeline"}},[e._v("#")]),e._v(" Using Flatterer in the OpenOwnership data pipeline")]),e._v(" "),a("p",[e._v("As an example, we’ve used "),a("a",{attrs:{href:"https://github.com/kindly/flatterer",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flatterer"),a("OutboundLink")],1),e._v(" to help "),a("a",{attrs:{href:"https://www.openownership.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenOwnership"),a("OutboundLink")],1),e._v(" create a data pipeline to make information about who owns and controls companies available in a "),a("a",{attrs:{href:"https://bods-data.openownership.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("variety of data formats"),a("OutboundLink")],1),e._v(". In the example below, we’ve used Flatterer to convert beneficial ownership data from the Register of Enterprises of the Republic of Latvia and the OpenOwnership Register from JSON into CSV, SQLite, Postgresql, Big Query and Datasette formats.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058338-08ce8ea1-9b1f-4c4c-b59c-64b04cd450f6.png",alt:"img-1-flatterer"}})]),e._v(" "),a("p",[e._v("Alongside converting the data into different formats, Flatterer has created a data dictionary that shows the fields contained in the dataset, alongside the field type and field counts. In the example below, we show how this dictionary interprets person_statement fields contained in the Beneficial Ownership Data Standard.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058526-19694210-514e-4215-bf9d-f6abc7ef5400.png",alt:"img-2-flatterer"}})]),e._v(" "),a("p",[e._v("Finally, you can see Flatterer has created special _link fields, to help with joining the tables together. The example below shows how the _link field helps join "),a("a",{attrs:{href:"https://medium.com/opendatacoop/why-do-open-organisational-identifiers-matter-46af05ab30a",target:"_blank",rel:"noopener noreferrer"}},[e._v("entity identifiers"),a("OutboundLink")],1),e._v(" to statements about beneficial ownership.")]),e._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/179058657-ae4ab534-9fdb-4d6d-ad59-56521f0218e0.png",alt:"img-3-flatterer"}})]),e._v(" "),a("h2",{attrs:{id:"what-s-next"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next?")]),e._v(" "),a("p",[e._v("Next, we’ll be working to make Flatterer more user friendly. We’ll be exploring creating a desktop interface, improving type guessing for fields, and giving more summary statistics about the input data. We welcome feedback on the tool through "),a("a",{attrs:{href:"https://github.com/kindly/flatterer/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(", and are really interested to find out what kind of improvements you’d like to see.")]),e._v(" "),a("p",[e._v("More information about using Flatterer is available on "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-FWeGccp_QKCu1WAEGQ0mEQ",target:"_blank",rel:"noopener noreferrer"}},[e._v("deepnote"),a("OutboundLink")],1),e._v(". To hear more about Flatterer, you can join David Raznick at Frictionless Data’s monthly community call on July 28th.")]),e._v(" "),a("h4",{attrs:{id:"at-open-data-services-cooperative-we-re-always-happy-to-discuss-how-developing-or-implementing-open-data-standards-could-support-your-goals-find-out-more-about-our-work-and-get-in-touch"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#at-open-data-services-cooperative-we-re-always-happy-to-discuss-how-developing-or-implementing-open-data-standards-could-support-your-goals-find-out-more-about-our-work-and-get-in-touch"}},[e._v("#")]),e._v(" At Open Data Services Cooperative we’re always happy to discuss how developing or implementing open data standards could support your goals. Find out more about "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("our work"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://opendataservices.coop/#contact",target:"_blank",rel:"noopener noreferrer"}},[e._v("get in touch"),a("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/163.9cdaa5d1.js b/assets/js/163.3a81f092.js similarity index 98% rename from assets/js/163.9cdaa5d1.js rename to assets/js/163.3a81f092.js index 90d81efb7..0c3e89fc1 100644 --- a/assets/js/163.9cdaa5d1.js +++ b/assets/js/163.3a81f092.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[163],{698:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Dear Frictionless community,")]),e._v(" "),o("p",[e._v("I’m writing to let you all know that this is my final week working on Frictionless Data with Open Knowledge Foundation. It has been a true pleasure to get to interact with you all over the last four years! Rest assured that Frictionless Data is in good hands with the team at Open Knowledge (Evgeny, Sara, Shashi, Edgar, and the rest of the OKF tech team).")]),e._v(" "),o("p",[e._v("What’s next for me? I’m still staying in the data space, moving to product at data.world (did you know they export data as datapackages?)! Maybe you’ll see me presenting a demo at an upcoming Frictionless community call 😉")]),e._v(" "),o("p",[e._v("If you’ll allow me to reminisce for a few minutes, here are some of my favourite Frictionless memories from my time working on this project:")]),e._v(" "),o("p",[o("strong",[e._v("The Frictionless Hackathon:")]),e._v(" In October 2021, we hosted the first-ever Frictionless Hackathon (virtually of course), and it was so cool to see all the projects and contributors from around the world! You can read all about it in "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/10/13/hackathon-wrap/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the summary blog here"),o("OutboundLink")],1),e._v(". Should we do another Hackathon? Let Sara know what you think! (Special shout-out to Oleg who set up the Hackathon software and inspired the entire event!)")]),e._v(" "),o("p",[o("strong",[e._v("Pilot collaborations")]),e._v(": We started our first Reproducible Research pilot collaboration with the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) team in 2019, and learned so much from this implementation! This resulted in a new data processing pipeline for BCO-DMO data managers that used Frictionless to reproducibly clean and document data. This work ultimately led to the creation of the Frictionless Framework. You can check out all the other "),o("a",{attrs:{href:"https://frictionlessdata.io/adoption/#pilot-collaborations",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pilots on the Adoption page"),o("OutboundLink")],1),e._v(" too.")]),e._v(" "),o("p",[o("strong",[e._v("Fellows")]),e._v(": Getting to mentor and teach 17 Fellows was truly a spectacular experience. These current (and future) leaders in open science and open scholarship are people to keep an eye on – they are brilliant! You can read all about their experience as Fellows on "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("their blog"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("The Frictionless Team at OKF")]),e._v(": I’ve been very lucky to get to work with the best team while being at OKF! Many of you already know how helpful and smart my colleagues are, but in case you don’t know, I will tell you! Evgeny has been carefully leading the technical development of Frictionless with a clear vision, making my job easy and fun. Sara has transformed how the community feels and works, which is no small feat! Shashi and Edgar have only been working on the project for less than a year, but their contributions to the code base and to help answer questions have already made a big impact! I will miss working with these excellent humans, and all of you in the community that have made Frictionless a special place!")]),e._v(" "),o("p",[e._v("Thank you all for being a part of the Frictionless community and for working with me in the past! I wish you all the best, and maybe I will see some of you in Buenos Aires in April for "),o("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v7"),o("OutboundLink")],1),e._v("?")]),e._v(" "),o("p",[e._v("Cheers!")]),e._v(" "),o("p",[e._v("– "),o("a",{attrs:{href:"https://lwinfree.github.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lilly"),o("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[163],{696:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Dear Frictionless community,")]),e._v(" "),o("p",[e._v("I’m writing to let you all know that this is my final week working on Frictionless Data with Open Knowledge Foundation. It has been a true pleasure to get to interact with you all over the last four years! Rest assured that Frictionless Data is in good hands with the team at Open Knowledge (Evgeny, Sara, Shashi, Edgar, and the rest of the OKF tech team).")]),e._v(" "),o("p",[e._v("What’s next for me? I’m still staying in the data space, moving to product at data.world (did you know they export data as datapackages?)! Maybe you’ll see me presenting a demo at an upcoming Frictionless community call 😉")]),e._v(" "),o("p",[e._v("If you’ll allow me to reminisce for a few minutes, here are some of my favourite Frictionless memories from my time working on this project:")]),e._v(" "),o("p",[o("strong",[e._v("The Frictionless Hackathon:")]),e._v(" In October 2021, we hosted the first-ever Frictionless Hackathon (virtually of course), and it was so cool to see all the projects and contributors from around the world! You can read all about it in "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/10/13/hackathon-wrap/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the summary blog here"),o("OutboundLink")],1),e._v(". Should we do another Hackathon? Let Sara know what you think! (Special shout-out to Oleg who set up the Hackathon software and inspired the entire event!)")]),e._v(" "),o("p",[o("strong",[e._v("Pilot collaborations")]),e._v(": We started our first Reproducible Research pilot collaboration with the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) team in 2019, and learned so much from this implementation! This resulted in a new data processing pipeline for BCO-DMO data managers that used Frictionless to reproducibly clean and document data. This work ultimately led to the creation of the Frictionless Framework. You can check out all the other "),o("a",{attrs:{href:"https://frictionlessdata.io/adoption/#pilot-collaborations",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pilots on the Adoption page"),o("OutboundLink")],1),e._v(" too.")]),e._v(" "),o("p",[o("strong",[e._v("Fellows")]),e._v(": Getting to mentor and teach 17 Fellows was truly a spectacular experience. These current (and future) leaders in open science and open scholarship are people to keep an eye on – they are brilliant! You can read all about their experience as Fellows on "),o("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("their blog"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("The Frictionless Team at OKF")]),e._v(": I’ve been very lucky to get to work with the best team while being at OKF! Many of you already know how helpful and smart my colleagues are, but in case you don’t know, I will tell you! Evgeny has been carefully leading the technical development of Frictionless with a clear vision, making my job easy and fun. Sara has transformed how the community feels and works, which is no small feat! Shashi and Edgar have only been working on the project for less than a year, but their contributions to the code base and to help answer questions have already made a big impact! I will miss working with these excellent humans, and all of you in the community that have made Frictionless a special place!")]),e._v(" "),o("p",[e._v("Thank you all for being a part of the Frictionless community and for working with me in the past! I wish you all the best, and maybe I will see some of you in Buenos Aires in April for "),o("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf,v7"),o("OutboundLink")],1),e._v("?")]),e._v(" "),o("p",[e._v("Cheers!")]),e._v(" "),o("p",[e._v("– "),o("a",{attrs:{href:"https://lwinfree.github.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lilly"),o("OutboundLink")],1)])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/164.a5e79699.js b/assets/js/164.23b20340.js similarity index 98% rename from assets/js/164.a5e79699.js rename to assets/js/164.23b20340.js index 6861c8f48..1c7319470 100644 --- a/assets/js/164.a5e79699.js +++ b/assets/js/164.23b20340.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[164],{700:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On the last community call on July 28"),a("sup",[e._v("th")]),e._v(", we heard David Raznick (an ex OKFer, now working at "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Services"),a("OutboundLink")],1),e._v(") presenting Flatterer, a tool he developed to convert structured JSON data into tabular data, using Frictionless Data specifications.")]),e._v(" "),a("p",[e._v("David has been working with many different open data standards functioning with deeply nested JSON. To make the data in standard formats more human readable, users often flatten JSON files with flattening tools, but the result they get are very large spreadsheets, which can be difficult to work with.")]),e._v(" "),a("p",[e._v("Flattening tools are also often used to unflatten tabular data in JSON. That way, the data, initially written in a more human readable format, can then be used according to the standards. Unfortunately the result is not optimal, the output of flattening tools is often not user-friendly and the user would probably still need to tweak it by hand, for example modifying headers’ names and/or the way tables are joined together.")]),e._v(" "),a("p",[e._v("Flatterer aims at making these processes easier and faster. It can convert in the blink of an eye your JSON file in the tabular format of your choice: csv, xlsx, parquet, postgres and sqlite. Flatterer will convert your JSON file into a main table, with keys to link one-to-many tables to their parents. That way the data is tidy and easier to work with.")]),e._v(" "),a("p",[e._v("If you are interested in knowing more about Flatterer, have a look at David’s presentation and demo:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Hi9tDGfteoA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also read more about the project here: "),a("a",{attrs:{href:"https://flatterer.opendata.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://flatterer.opendata.coop/"),a("OutboundLink")],1),e._v(", or have a look at "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-15678671-ca7f-40a0-aed5-6004190d2611",target:"_blank",rel:"noopener noreferrer"}},[e._v("the project documentation"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on August 25"),a("sup",[e._v("th")]),e._v(". Frictionless Data developer Shashi Gharti will discuss with the community a tool she would like to add to the Frictionless Framework. Stay tuned to know more!")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Join our community on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),a("OutboundLink")],1),e._v(") or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/hfGT6vAjjwU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[164],{698:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On the last community call on July 28"),a("sup",[e._v("th")]),e._v(", we heard David Raznick (an ex OKFer, now working at "),a("a",{attrs:{href:"https://opendataservices.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Services"),a("OutboundLink")],1),e._v(") presenting Flatterer, a tool he developed to convert structured JSON data into tabular data, using Frictionless Data specifications.")]),e._v(" "),a("p",[e._v("David has been working with many different open data standards functioning with deeply nested JSON. To make the data in standard formats more human readable, users often flatten JSON files with flattening tools, but the result they get are very large spreadsheets, which can be difficult to work with.")]),e._v(" "),a("p",[e._v("Flattening tools are also often used to unflatten tabular data in JSON. That way, the data, initially written in a more human readable format, can then be used according to the standards. Unfortunately the result is not optimal, the output of flattening tools is often not user-friendly and the user would probably still need to tweak it by hand, for example modifying headers’ names and/or the way tables are joined together.")]),e._v(" "),a("p",[e._v("Flatterer aims at making these processes easier and faster. It can convert in the blink of an eye your JSON file in the tabular format of your choice: csv, xlsx, parquet, postgres and sqlite. Flatterer will convert your JSON file into a main table, with keys to link one-to-many tables to their parents. That way the data is tidy and easier to work with.")]),e._v(" "),a("p",[e._v("If you are interested in knowing more about Flatterer, have a look at David’s presentation and demo:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Hi9tDGfteoA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also read more about the project here: "),a("a",{attrs:{href:"https://flatterer.opendata.coop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://flatterer.opendata.coop/"),a("OutboundLink")],1),e._v(", or have a look at "),a("a",{attrs:{href:"https://deepnote.com/@david-raznick/Flatterer-Demo-15678671-ca7f-40a0-aed5-6004190d2611",target:"_blank",rel:"noopener noreferrer"}},[e._v("the project documentation"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on August 25"),a("sup",[e._v("th")]),e._v(". Frictionless Data developer Shashi Gharti will discuss with the community a tool she would like to add to the Frictionless Framework. Stay tuned to know more!")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Join our community on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),a("OutboundLink")],1),e._v(") or "),a("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),a("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/hfGT6vAjjwU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/165.94d8c026.js b/assets/js/165.8327b1ad.js similarity index 99% rename from assets/js/165.94d8c026.js rename to assets/js/165.8327b1ad.js index 33c54504d..4988420df 100644 --- a/assets/js/165.94d8c026.js +++ b/assets/js/165.8327b1ad.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[165],{704:function(t,e,a){"use strict";a.r(e);var r=a(29),s=Object(r.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("We’re releasing a first beta of Firctionless Framework (v5)!"),a("br"),t._v("\nSince the initial Frictionless Framework release we’d been collecting feedback and analyzing both high-level users’ needs and bug reports to identify shortcomings and areas that can be improved in the next version for the framework. Once that process had been done we started working on a new v5 with a goal to make the framework more bullet-proof, easy to maintain and simplify user interface. Today, this version is almost stable and ready to be published. Let’s go through the main improvements we have made:")]),t._v(" "),a("h1",{attrs:{id:"improved-metadata"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#improved-metadata"}},[t._v("#")]),t._v(" Improved Metadata")]),t._v(" "),a("p",[t._v("This year we started working on the Frictionless Application, at the same time, we were thinking about next steps for the "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Standards"),a("OutboundLink")],1),t._v(". For both we need well-defined and an easy-to-understand metadata model. Partially it’s already published as standards like Table Schema and partially it’s going to be published as standards like File Dialect and possibly validation/transform metadata.")]),t._v(" "),a("h2",{attrs:{id:"dialect"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dialect"}},[t._v("#")]),t._v(" Dialect")]),t._v(" "),a("p",[t._v("In v4 of the framework we had Control/Dialect/Layout concepts to describe resource details related to different formats and schemes, as well as tabular details like header rows. In v5 it’s merged into the only one concept called Dialect which is going to be standardised as a File Dialect spec. Here is an example:")]),t._v(" "),a("h4",{attrs:{id:"yaml"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("header"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\nheaderRows"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\ncommentChar"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'#'")]),t._v("\ncsv"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n delimiter"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("';'")]),t._v("\n")])])]),a("p",[t._v("A dialect descriptor can be saved and reused within a resource. Technically, it’s possible to provide different schemes and formats settings within one Dialect (e.g. for CSV and Excel) so it’s possible to create e.g. one re-usable dialect for a data package. A legacy CSV Dialect spec is supported and will be supported forever so it’s possible to provide CSV properties on the root level:")]),t._v(" "),a("h4",{attrs:{id:"yaml-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-2"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("header"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\ndelimiter"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("';'")]),t._v("\n")])])]),a("p",[t._v("For performance and codebase maintainability reasons some marginal Layout features have been removed completely such as "),a("code",[t._v("skip/pick/limit/offsetFields/etc")]),t._v(". It’s possible to achieve the same results using the Pipeline concept as a part of the transformation workflow.")]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/dialect.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Dialect Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"checklist"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#checklist"}},[t._v("#")]),t._v(" Checklist")]),t._v(" "),a("p",[t._v("Checklist is a new concept introduced in v5. It’s basically a collection of validation steps and a few other settings to make “validation rules” sharable. For example:")]),t._v(" "),a("h4",{attrs:{id:"yaml-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-3"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("checks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" ascii"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("value\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" row_constraint\n formula"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\nskipErrors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" duplicate"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("label\n")])])]),a("p",[t._v("Having and sharing this checklist it’s possible to tune data quality requirements for some data file or set of data files. This concept will provide an ability for creating data quality “libraries” within projects or domains. We can use a checklist for validation:")]),t._v(" "),a("h4",{attrs:{id:"cli"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate table1.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("checklist checklist.yaml\nfrictionless validate table2.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("checklist checklist.yaml\n")])])]),a("p",[t._v("Here is a list of another changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Check(descriptor)")]),t._v(" "),a("td",[t._v("Check.from_descriptor(descriptor)")])]),t._v(" "),a("tr",[a("td",[t._v("check.code")]),t._v(" "),a("td",[t._v("check.type")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/checklist.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Checklist Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"pipeline"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#pipeline"}},[t._v("#")]),t._v(" Pipeline")]),t._v(" "),a("p",[t._v("In v4 Pipeline was a complex concept similar to validation Inquiry. We reworked it for v5 to be a lightweight set of validation steps that can be applied to a data resource or a data package. For example:")]),t._v(" "),a("h4",{attrs:{id:"yaml-4"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-4"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("steps"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("normalize\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" cell"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("set\n fieldName"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" version\n value"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" v5\n")])])]),a("p",[t._v("Similar to the Checklist concept, Pipeline is a reusable (data-abstract) object that can be saved to a descriptor and used in some complex data workflow:")]),t._v(" "),a("h4",{attrs:{id:"cli-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-2"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless transform table1.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("pipeline pipeline.yaml\nfrictionless transform table2.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("pipeline pipeline.yaml\n")])])]),a("p",[t._v("Here is another list of changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Step(descriptor)")]),t._v(" "),a("td",[t._v("Step.from_descriptor(descriptor)")])]),t._v(" "),a("tr",[a("td",[t._v("step.code")]),t._v(" "),a("td",[t._v("step.type")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/pipeline.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Pipeline Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"resource"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#resource"}},[t._v("#")]),t._v(" Resource")]),t._v(" "),a("p",[t._v("There are no changes in the Resource related to the standards although currently by default instead of "),a("code",[t._v("profile")]),t._v(" the "),a("code",[t._v("type")]),t._v(" property will be used to mark a resource as a table. It can be changed using the "),a("code",[t._v("--standards v1")]),t._v(" flag.")]),t._v(" "),a("p",[t._v("It’s now possible to set Checklist and Pipeline as a Resource property similar to Dialect and Schema:")]),t._v(" "),a("h4",{attrs:{id:"yaml-5"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-5"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("path"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# ...")]),t._v("\nchecklist"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n checks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" ascii"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("value\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" row_constraint\n formula"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\npipeline"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" pipeline.yaml\n steps"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("normalize\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" cell"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("set\n fieldName"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" version\n value"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" v5\n")])])]),a("p",[t._v("Or using dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-6"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-6"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("path"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# ...")]),t._v("\nchecklist"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" checklist.yaml\npipeline"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" pipeline.yaml\n")])])]),a("p",[t._v("In this case the validation/transformation will use it by default providing an ability to ship validation rules and transformation pipelines within resources and packages. This is an important development for data publishers who want to define what they consider to be valid for their datasets as well as sharing raw data with a cleaning pipeline steps:")]),t._v(" "),a("h4",{attrs:{id:"cli-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-3"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate resource.yaml "),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# will use the checklist above")]),t._v("\nfrictionless transform resource.yaml "),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# will use the pipeline above")]),t._v("\n")])])]),a("p",[t._v("There are minor changes in the "),a("code",[t._v("stats")]),t._v(" property. Now it uses named keys to simplify hash distinction (md5/sha256 are calculated by default and it’s not possible to change for performance reasons as it was in v4):")]),t._v(" "),a("h4",{attrs:{id:"python"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import describe\n\nresource "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" describe"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table.csv'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" stats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("True"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nprint"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("resource.stats"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'md5'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'6c2c61dd9b0e9c6876139a449ed87933'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'sha256'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'a1fd6c5ff3494f697874deeb07f69f8667e903dd94a7bc062dd57550cea26da8'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'bytes'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("30")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'fields'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'rows'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),a("p",[t._v("Here is a list of another changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("for row in resource:")]),t._v(" "),a("td",[t._v("for row in resource.row_stream")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/resource.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"package"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#package"}},[t._v("#")]),t._v(" Package")]),t._v(" "),a("p",[t._v("There are no changes in the Package related to the standards although it’s now possible to use resource dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-7"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-7"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" package\nresources"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" resource1.yaml\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" resource2.yaml\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/package.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"catalog"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#catalog"}},[t._v("#")]),t._v(" Catalog")]),t._v(" "),a("p",[t._v("Catalog is a new concept that is a collection of data packages that can be written inline or using dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-8"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-8"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" catalog\npackages"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" package1.yaml\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" package2.yaml\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/catalog.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Catalog Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"detector"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#detector"}},[t._v("#")]),t._v(" Detector")]),t._v(" "),a("p",[t._v("Detector is now a metadata class (it wasn’t in v4) so it can be saved and shared as other metadata classes:")]),t._v(" "),a("h4",{attrs:{id:"python-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python-2"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import Detector\n\ndetector "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Detector"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("sample_size"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1000")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nprint"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("detector"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'sampleSize'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1000")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/detector.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Detector Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"inquiry"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#inquiry"}},[t._v("#")]),t._v(" Inquiry")]),t._v(" "),a("p",[t._v("There are few changes in the Inquiry concept which is known for using in the "),a("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Repository"),a("OutboundLink")],1),t._v(" project:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.path")])]),t._v(" "),a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.resource")])]),t._v(" "),a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.package")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/inquiry.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Inquiry Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"report"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#report"}},[t._v("#")]),t._v(" Report")]),t._v(" "),a("p",[t._v("The Report concept has been significantly simplified by removing the "),a("code",[t._v("resource")]),t._v(" property from "),a("code",[t._v("reportTask")]),t._v(". It’s been replaced by "),a("code",[t._v("name/type/place/labels")]),t._v(" properties. Also "),a("code",[t._v("report.time")]),t._v(" is now "),a("code",[t._v("report.stats.seconds")]),t._v(". The "),a("code",[t._v("report/reportTask.warnings: List[str]")]),t._v(" have been added to provide non-error information like reached limits:")]),t._v(" "),a("h4",{attrs:{id:"cli-4"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-4"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate table.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("yaml\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("valid"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\nstats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n tasks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n seconds"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.091")]),t._v("\nwarnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nerrors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\ntasks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" valid"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\n name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table\n place"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n labels"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" id\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name\n stats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n md5"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("6")]),t._v("c2c61dd9b0e9c6876139a449ed87933\n sha256"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" a1fd6c5ff3494f697874deeb07f69f8667e903dd94a7bc062dd57550cea26da8\n bytes"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("30")]),t._v("\n fields"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n rows"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n seconds"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.091")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n")])])]),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("report.time")]),t._v(" "),a("td",[t._v("report.stats.seconds")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.time")]),t._v(" "),a("td",[t._v("reportTask.stats.seconds")])]),t._v(" "),a("tr",[a("td",[a("a",{attrs:{href:"http://reportTask.resource.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("reportTask.resource.name"),a("OutboundLink")],1)]),t._v(" "),a("td",[a("a",{attrs:{href:"http://reportTask.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("reportTask.name"),a("OutboundLink")],1)])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.profile")]),t._v(" "),a("td",[t._v("reportTask.type")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.path")]),t._v(" "),a("td",[t._v("reportTask.place")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.schema")]),t._v(" "),a("td",[t._v("reportTask.labels")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/report.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Report Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"schema"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#schema"}},[t._v("#")]),t._v(" Schema")]),t._v(" "),a("p",[t._v("Changes in the Schema class:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Schema(descriptor)")]),t._v(" "),a("td",[t._v("Schema.from_descriptor(descriptor)")])])])]),t._v(" "),a("h2",{attrs:{id:"error"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#error"}},[t._v("#")]),t._v(" Error")]),t._v(" "),a("p",[t._v("There are a few changes in the Error data structure:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("error.code")]),t._v(" "),a("td",[t._v("error.type")])]),t._v(" "),a("tr",[a("td",[a("a",{attrs:{href:"http://error.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("error.name"),a("OutboundLink")],1)]),t._v(" "),a("td",[t._v("error.title")])]),t._v(" "),a("tr",[a("td",[t._v("error.rowPosition")]),t._v(" "),a("td",[t._v("error.rowNumber")])]),t._v(" "),a("tr",[a("td",[t._v("error.fieldPosition")]),t._v(" "),a("td",[t._v("error.fieldNumber")])])])]),t._v(" "),a("h2",{attrs:{id:"types"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#types"}},[t._v("#")]),t._v(" Types")]),t._v(" "),a("p",[t._v("Note that all the metadata entities that have multiple implementations in v5 are based on a unified "),a("code",[t._v("type")]),t._v(" model. It means that they use the type property to provide type information:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("resource.profile")]),t._v(" "),a("td",[t._v("resource.type")])]),t._v(" "),a("tr",[a("td",[t._v("check.code")]),t._v(" "),a("td",[t._v("check.type")])]),t._v(" "),a("tr",[a("td",[t._v("control.code")]),t._v(" "),a("td",[t._v("control.type")])]),t._v(" "),a("tr",[a("td",[t._v("error.code")]),t._v(" "),a("td",[t._v("error.type")])]),t._v(" "),a("tr",[a("td",[t._v("field.type")]),t._v(" "),a("td",[t._v("field.type")])]),t._v(" "),a("tr",[a("td",[t._v("step.type")]),t._v(" "),a("td",[t._v("step.type")])])])]),t._v(" "),a("p",[t._v("The new v5 version still supports old notation in descriptors for backward-compatibility.")]),t._v(" "),a("h1",{attrs:{id:"improved-model"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#improved-model"}},[t._v("#")]),t._v(" Improved Model")]),t._v(" "),a("p",[t._v("It’s been many years that Frictionless were mixing declarative metadata and object model for historical reasons. Since the first implementation of "),a("code",[t._v("datapackage")]),t._v(" library we used different approaches to sync internal state to provide both interfaces descriptor and object model. In Frictionless Framework v4 this technique had been taken to a really sophisticated level with special observables dictionary classes. It was quite smart and nice-to-use for quick prototyping in REPL but it was really hard to maintain and error-prone.")]),t._v(" "),a("p",[t._v("In Framework v5 we finally decided to follow the “right way” for handling this problem and split descriptors and object model completely.")]),t._v(" "),a("h2",{attrs:{id:"descriptors"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#descriptors"}},[t._v("#")]),t._v(" Descriptors")]),t._v(" "),a("p",[t._v("In the Frictionless World we deal with a lot of declarative metadata descriptors such as packages, schemas, pipelines, etc. Nothing changes in v5 regarding this. So for example here is a Table Schema:")]),t._v(" "),a("h4",{attrs:{id:"yaml-9"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-9"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("fields"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" integer\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" name\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" string\n")])])]),a("h2",{attrs:{id:"object-model"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#object-model"}},[t._v("#")]),t._v(" Object Model")]),t._v(" "),a("p",[t._v("The difference comes here we we create a metadata instance based on this descriptor. In v4 all the metadata classes were a subclasses of the dict class providing a mix between a descriptor and object model for state management. In v5 there is a clear boundary between descriptor and object model. All the state are managed as it should be in a normal Python class using class attributes:")]),t._v(" "),a("h4",{attrs:{id:"python-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python-3"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import Schema\n\nschema "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Schema.from_descriptor"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'schema.yaml'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Here we deal with a proper object model")]),t._v("\ndescriptor "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema.to_descriptor"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Here we export it back to be a descriptor")]),t._v("\n")])])]),a("p",[t._v("There are a few important traits of the new model:")]),t._v(" "),a("p",[t._v("it’s not possible to create a metadata instance from an invalid descriptor"),a("br"),t._v("\nit’s almost always guaranteed that a metadata instance is valid"),a("br"),t._v("\nit’s not possible to mix dicts and classes in methods like "),a("code",[t._v("package.add_resource")]),a("br"),t._v("\nit’s not possible to export an invalid descriptor"),a("br"),t._v("\nThis separation might make one to add a few additional lines of code, but it gives us much less fragile programs in the end. It’s especially important for software integrators who want to be sure that they write working code. At the same time, for quick prototyping and discovery Frictionless still provides high-level actions like "),a("code",[t._v("validate")]),t._v(" function that are more forgiving regarding user input.")]),t._v(" "),a("h2",{attrs:{id:"static-typing"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#static-typing"}},[t._v("#")]),t._v(" Static Typing")]),t._v(" "),a("p",[t._v("One of the most important consequences of “fixing” state management in Frictionless is our new ability to provide static typing for the framework codebase. This work is in progress but we have already added a lot of types and it successfully pass "),a("code",[t._v("pyright")]),t._v(" validation. We highly recommend enabling "),a("code",[t._v("pyright")]),t._v(" in your IDE to see all the type problems in-advance:")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296542-9ee89ed3-999e-44b3-b3e4-32f1df125f4e.png",alt:"type-error"}})]),t._v(" "),a("h1",{attrs:{id:"livemark-docs"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#livemark-docs"}},[t._v("#")]),t._v(" Livemark Docs")]),t._v(" "),a("p",[t._v("We’re happy to announce that we’re finally ready to drop a JavaScript dependency for the docs generation as we migrated it to Livemark. Moreover, Livemark’s ability to execute scripts inside the documentation and other nifty features like simple Tabs or a reference generator will save us hours and hours for writing better docs.")]),t._v(" "),a("h2",{attrs:{id:"script-execution"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#script-execution"}},[t._v("#")]),t._v(" Script Execution")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296761-09eb95c9-7245-4d75-8753-8b1bee635f62.png",alt:"livemark-1"}})]),t._v(" "),a("h2",{attrs:{id:"reference-generation"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#reference-generation"}},[t._v("#")]),t._v(" Reference Generation")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296860-cb2cc587-c518-47c1-9534-0c1d3f57e552.png",alt:"livemark-2"}})]),t._v(" "),a("h2",{attrs:{id:"happy-contributors"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#happy-contributors"}},[t._v("#")]),t._v(" Happy Contributors")]),t._v(" "),a("p",[t._v("We hope that Livemark docs writing experience will make our contributors happier and allow to grow our community of Frictionless Authors and Users. Let’s chat in our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" if you have questions or just want to say hi.")]),t._v(" "),a("p",[t._v("Read "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/blog/2022/08-22-frictionless-framework-v5.html#:~:text=Read-,Livemark%20Docs,-for%20more%20information",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark Docs"),a("OutboundLink")],1),t._v(" for more information.")])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[165],{699:function(t,e,a){"use strict";a.r(e);var r=a(29),s=Object(r.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("We’re releasing a first beta of Firctionless Framework (v5)!"),a("br"),t._v("\nSince the initial Frictionless Framework release we’d been collecting feedback and analyzing both high-level users’ needs and bug reports to identify shortcomings and areas that can be improved in the next version for the framework. Once that process had been done we started working on a new v5 with a goal to make the framework more bullet-proof, easy to maintain and simplify user interface. Today, this version is almost stable and ready to be published. Let’s go through the main improvements we have made:")]),t._v(" "),a("h1",{attrs:{id:"improved-metadata"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#improved-metadata"}},[t._v("#")]),t._v(" Improved Metadata")]),t._v(" "),a("p",[t._v("This year we started working on the Frictionless Application, at the same time, we were thinking about next steps for the "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Standards"),a("OutboundLink")],1),t._v(". For both we need well-defined and an easy-to-understand metadata model. Partially it’s already published as standards like Table Schema and partially it’s going to be published as standards like File Dialect and possibly validation/transform metadata.")]),t._v(" "),a("h2",{attrs:{id:"dialect"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dialect"}},[t._v("#")]),t._v(" Dialect")]),t._v(" "),a("p",[t._v("In v4 of the framework we had Control/Dialect/Layout concepts to describe resource details related to different formats and schemes, as well as tabular details like header rows. In v5 it’s merged into the only one concept called Dialect which is going to be standardised as a File Dialect spec. Here is an example:")]),t._v(" "),a("h4",{attrs:{id:"yaml"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("header"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\nheaderRows"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\ncommentChar"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'#'")]),t._v("\ncsv"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n delimiter"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("';'")]),t._v("\n")])])]),a("p",[t._v("A dialect descriptor can be saved and reused within a resource. Technically, it’s possible to provide different schemes and formats settings within one Dialect (e.g. for CSV and Excel) so it’s possible to create e.g. one re-usable dialect for a data package. A legacy CSV Dialect spec is supported and will be supported forever so it’s possible to provide CSV properties on the root level:")]),t._v(" "),a("h4",{attrs:{id:"yaml-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-2"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("header"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\ndelimiter"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("';'")]),t._v("\n")])])]),a("p",[t._v("For performance and codebase maintainability reasons some marginal Layout features have been removed completely such as "),a("code",[t._v("skip/pick/limit/offsetFields/etc")]),t._v(". It’s possible to achieve the same results using the Pipeline concept as a part of the transformation workflow.")]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/dialect.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Dialect Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"checklist"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#checklist"}},[t._v("#")]),t._v(" Checklist")]),t._v(" "),a("p",[t._v("Checklist is a new concept introduced in v5. It’s basically a collection of validation steps and a few other settings to make “validation rules” sharable. For example:")]),t._v(" "),a("h4",{attrs:{id:"yaml-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-3"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("checks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" ascii"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("value\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" row_constraint\n formula"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\nskipErrors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" duplicate"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("label\n")])])]),a("p",[t._v("Having and sharing this checklist it’s possible to tune data quality requirements for some data file or set of data files. This concept will provide an ability for creating data quality “libraries” within projects or domains. We can use a checklist for validation:")]),t._v(" "),a("h4",{attrs:{id:"cli"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate table1.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("checklist checklist.yaml\nfrictionless validate table2.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("checklist checklist.yaml\n")])])]),a("p",[t._v("Here is a list of another changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Check(descriptor)")]),t._v(" "),a("td",[t._v("Check.from_descriptor(descriptor)")])]),t._v(" "),a("tr",[a("td",[t._v("check.code")]),t._v(" "),a("td",[t._v("check.type")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/checklist.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Checklist Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"pipeline"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#pipeline"}},[t._v("#")]),t._v(" Pipeline")]),t._v(" "),a("p",[t._v("In v4 Pipeline was a complex concept similar to validation Inquiry. We reworked it for v5 to be a lightweight set of validation steps that can be applied to a data resource or a data package. For example:")]),t._v(" "),a("h4",{attrs:{id:"yaml-4"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-4"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("steps"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("normalize\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" cell"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("set\n fieldName"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" version\n value"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" v5\n")])])]),a("p",[t._v("Similar to the Checklist concept, Pipeline is a reusable (data-abstract) object that can be saved to a descriptor and used in some complex data workflow:")]),t._v(" "),a("h4",{attrs:{id:"cli-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-2"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless transform table1.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("pipeline pipeline.yaml\nfrictionless transform table2.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("pipeline pipeline.yaml\n")])])]),a("p",[t._v("Here is another list of changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Step(descriptor)")]),t._v(" "),a("td",[t._v("Step.from_descriptor(descriptor)")])]),t._v(" "),a("tr",[a("td",[t._v("step.code")]),t._v(" "),a("td",[t._v("step.type")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/pipeline.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Pipeline Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"resource"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#resource"}},[t._v("#")]),t._v(" Resource")]),t._v(" "),a("p",[t._v("There are no changes in the Resource related to the standards although currently by default instead of "),a("code",[t._v("profile")]),t._v(" the "),a("code",[t._v("type")]),t._v(" property will be used to mark a resource as a table. It can be changed using the "),a("code",[t._v("--standards v1")]),t._v(" flag.")]),t._v(" "),a("p",[t._v("It’s now possible to set Checklist and Pipeline as a Resource property similar to Dialect and Schema:")]),t._v(" "),a("h4",{attrs:{id:"yaml-5"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-5"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("path"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# ...")]),t._v("\nchecklist"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n checks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" ascii"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("value\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" row_constraint\n formula"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\npipeline"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" pipeline.yaml\n steps"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("normalize\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" cell"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("set\n fieldName"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" version\n value"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" v5\n")])])]),a("p",[t._v("Or using dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-6"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-6"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("path"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# ...")]),t._v("\nchecklist"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" checklist.yaml\npipeline"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" pipeline.yaml\n")])])]),a("p",[t._v("In this case the validation/transformation will use it by default providing an ability to ship validation rules and transformation pipelines within resources and packages. This is an important development for data publishers who want to define what they consider to be valid for their datasets as well as sharing raw data with a cleaning pipeline steps:")]),t._v(" "),a("h4",{attrs:{id:"cli-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-3"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate resource.yaml "),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# will use the checklist above")]),t._v("\nfrictionless transform resource.yaml "),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# will use the pipeline above")]),t._v("\n")])])]),a("p",[t._v("There are minor changes in the "),a("code",[t._v("stats")]),t._v(" property. Now it uses named keys to simplify hash distinction (md5/sha256 are calculated by default and it’s not possible to change for performance reasons as it was in v4):")]),t._v(" "),a("h4",{attrs:{id:"python"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import describe\n\nresource "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" describe"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table.csv'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" stats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("True"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nprint"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("resource.stats"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'md5'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'6c2c61dd9b0e9c6876139a449ed87933'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'sha256'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'a1fd6c5ff3494f697874deeb07f69f8667e903dd94a7bc062dd57550cea26da8'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'bytes'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("30")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'fields'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'rows'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),a("p",[t._v("Here is a list of another changes:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("for row in resource:")]),t._v(" "),a("td",[t._v("for row in resource.row_stream")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/resource.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"package"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#package"}},[t._v("#")]),t._v(" Package")]),t._v(" "),a("p",[t._v("There are no changes in the Package related to the standards although it’s now possible to use resource dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-7"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-7"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" package\nresources"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" resource1.yaml\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" resource2.yaml\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/package.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"catalog"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#catalog"}},[t._v("#")]),t._v(" Catalog")]),t._v(" "),a("p",[t._v("Catalog is a new concept that is a collection of data packages that can be written inline or using dereference:")]),t._v(" "),a("h4",{attrs:{id:"yaml-8"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-8"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" catalog\npackages"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" package1.yaml\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" package2.yaml\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/catalog.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Catalog Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"detector"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#detector"}},[t._v("#")]),t._v(" Detector")]),t._v(" "),a("p",[t._v("Detector is now a metadata class (it wasn’t in v4) so it can be saved and shared as other metadata classes:")]),t._v(" "),a("h4",{attrs:{id:"python-2"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python-2"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import Detector\n\ndetector "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Detector"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("sample_size"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1000")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nprint"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("detector"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'sampleSize'")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1000")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/detector.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Detector Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"inquiry"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#inquiry"}},[t._v("#")]),t._v(" Inquiry")]),t._v(" "),a("p",[t._v("There are few changes in the Inquiry concept which is known for using in the "),a("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Repository"),a("OutboundLink")],1),t._v(" project:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.path")])]),t._v(" "),a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.resource")])]),t._v(" "),a("tr",[a("td",[t._v("inquiryTask.source")]),t._v(" "),a("td",[t._v("inquiryTask.package")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/inquiry.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Inquiry Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"report"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#report"}},[t._v("#")]),t._v(" Report")]),t._v(" "),a("p",[t._v("The Report concept has been significantly simplified by removing the "),a("code",[t._v("resource")]),t._v(" property from "),a("code",[t._v("reportTask")]),t._v(". It’s been replaced by "),a("code",[t._v("name/type/place/labels")]),t._v(" properties. Also "),a("code",[t._v("report.time")]),t._v(" is now "),a("code",[t._v("report.stats.seconds")]),t._v(". The "),a("code",[t._v("report/reportTask.warnings: List[str]")]),t._v(" have been added to provide non-error information like reached limits:")]),t._v(" "),a("h4",{attrs:{id:"cli-4"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#cli-4"}},[t._v("#")]),t._v(" CLI")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("frictionless validate table.csv "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v("yaml\n")])])]),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("valid"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\nstats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n tasks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n seconds"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.091")]),t._v("\nwarnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nerrors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\ntasks"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" valid"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" true\n name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table\n place"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" table.csv\n labels"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" id\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name\n stats"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n md5"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("6")]),t._v("c2c61dd9b0e9c6876139a449ed87933\n sha256"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" a1fd6c5ff3494f697874deeb07f69f8667e903dd94a7bc062dd57550cea26da8\n bytes"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("30")]),t._v("\n fields"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n rows"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n seconds"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.091")]),t._v("\n warnings"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n errors"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n")])])]),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("report.time")]),t._v(" "),a("td",[t._v("report.stats.seconds")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.time")]),t._v(" "),a("td",[t._v("reportTask.stats.seconds")])]),t._v(" "),a("tr",[a("td",[a("a",{attrs:{href:"http://reportTask.resource.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("reportTask.resource.name"),a("OutboundLink")],1)]),t._v(" "),a("td",[a("a",{attrs:{href:"http://reportTask.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("reportTask.name"),a("OutboundLink")],1)])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.profile")]),t._v(" "),a("td",[t._v("reportTask.type")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.path")]),t._v(" "),a("td",[t._v("reportTask.place")])]),t._v(" "),a("tr",[a("td",[t._v("reportTask.resource.schema")]),t._v(" "),a("td",[t._v("reportTask.labels")])])])]),t._v(" "),a("p",[t._v("Read an article about "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/report.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("Report Class"),a("OutboundLink")],1),t._v(" for more information.")]),t._v(" "),a("h2",{attrs:{id:"schema"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#schema"}},[t._v("#")]),t._v(" Schema")]),t._v(" "),a("p",[t._v("Changes in the Schema class:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("Schema(descriptor)")]),t._v(" "),a("td",[t._v("Schema.from_descriptor(descriptor)")])])])]),t._v(" "),a("h2",{attrs:{id:"error"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#error"}},[t._v("#")]),t._v(" Error")]),t._v(" "),a("p",[t._v("There are a few changes in the Error data structure:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("error.code")]),t._v(" "),a("td",[t._v("error.type")])]),t._v(" "),a("tr",[a("td",[a("a",{attrs:{href:"http://error.name",target:"_blank",rel:"noopener noreferrer"}},[t._v("error.name"),a("OutboundLink")],1)]),t._v(" "),a("td",[t._v("error.title")])]),t._v(" "),a("tr",[a("td",[t._v("error.rowPosition")]),t._v(" "),a("td",[t._v("error.rowNumber")])]),t._v(" "),a("tr",[a("td",[t._v("error.fieldPosition")]),t._v(" "),a("td",[t._v("error.fieldNumber")])])])]),t._v(" "),a("h2",{attrs:{id:"types"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#types"}},[t._v("#")]),t._v(" Types")]),t._v(" "),a("p",[t._v("Note that all the metadata entities that have multiple implementations in v5 are based on a unified "),a("code",[t._v("type")]),t._v(" model. It means that they use the type property to provide type information:")]),t._v(" "),a("table",[a("thead",[a("tr",[a("th",[t._v("From (v4)")]),t._v(" "),a("th",[t._v("To (v5)")])])]),t._v(" "),a("tbody",[a("tr",[a("td",[t._v("resource.profile")]),t._v(" "),a("td",[t._v("resource.type")])]),t._v(" "),a("tr",[a("td",[t._v("check.code")]),t._v(" "),a("td",[t._v("check.type")])]),t._v(" "),a("tr",[a("td",[t._v("control.code")]),t._v(" "),a("td",[t._v("control.type")])]),t._v(" "),a("tr",[a("td",[t._v("error.code")]),t._v(" "),a("td",[t._v("error.type")])]),t._v(" "),a("tr",[a("td",[t._v("field.type")]),t._v(" "),a("td",[t._v("field.type")])]),t._v(" "),a("tr",[a("td",[t._v("step.type")]),t._v(" "),a("td",[t._v("step.type")])])])]),t._v(" "),a("p",[t._v("The new v5 version still supports old notation in descriptors for backward-compatibility.")]),t._v(" "),a("h1",{attrs:{id:"improved-model"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#improved-model"}},[t._v("#")]),t._v(" Improved Model")]),t._v(" "),a("p",[t._v("It’s been many years that Frictionless were mixing declarative metadata and object model for historical reasons. Since the first implementation of "),a("code",[t._v("datapackage")]),t._v(" library we used different approaches to sync internal state to provide both interfaces descriptor and object model. In Frictionless Framework v4 this technique had been taken to a really sophisticated level with special observables dictionary classes. It was quite smart and nice-to-use for quick prototyping in REPL but it was really hard to maintain and error-prone.")]),t._v(" "),a("p",[t._v("In Framework v5 we finally decided to follow the “right way” for handling this problem and split descriptors and object model completely.")]),t._v(" "),a("h2",{attrs:{id:"descriptors"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#descriptors"}},[t._v("#")]),t._v(" Descriptors")]),t._v(" "),a("p",[t._v("In the Frictionless World we deal with a lot of declarative metadata descriptors such as packages, schemas, pipelines, etc. Nothing changes in v5 regarding this. So for example here is a Table Schema:")]),t._v(" "),a("h4",{attrs:{id:"yaml-9"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#yaml-9"}},[t._v("#")]),t._v(" YAML")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("fields"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v("\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" id\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" integer\n "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("-")]),t._v(" name"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" name\n type"),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" string\n")])])]),a("h2",{attrs:{id:"object-model"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#object-model"}},[t._v("#")]),t._v(" Object Model")]),t._v(" "),a("p",[t._v("The difference comes here we we create a metadata instance based on this descriptor. In v4 all the metadata classes were a subclasses of the dict class providing a mix between a descriptor and object model for state management. In v5 there is a clear boundary between descriptor and object model. All the state are managed as it should be in a normal Python class using class attributes:")]),t._v(" "),a("h4",{attrs:{id:"python-3"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#python-3"}},[t._v("#")]),t._v(" Python")]),t._v(" "),a("div",{staticClass:"language-r extra-class"},[a("pre",{pre:!0,attrs:{class:"language-r"}},[a("code",[t._v("from frictionless import Schema\n\nschema "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Schema.from_descriptor"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token string"}},[t._v("'schema.yaml'")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Here we deal with a proper object model")]),t._v("\ndescriptor "),a("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema.to_descriptor"),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),a("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),a("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Here we export it back to be a descriptor")]),t._v("\n")])])]),a("p",[t._v("There are a few important traits of the new model:")]),t._v(" "),a("p",[t._v("it’s not possible to create a metadata instance from an invalid descriptor"),a("br"),t._v("\nit’s almost always guaranteed that a metadata instance is valid"),a("br"),t._v("\nit’s not possible to mix dicts and classes in methods like "),a("code",[t._v("package.add_resource")]),a("br"),t._v("\nit’s not possible to export an invalid descriptor"),a("br"),t._v("\nThis separation might make one to add a few additional lines of code, but it gives us much less fragile programs in the end. It’s especially important for software integrators who want to be sure that they write working code. At the same time, for quick prototyping and discovery Frictionless still provides high-level actions like "),a("code",[t._v("validate")]),t._v(" function that are more forgiving regarding user input.")]),t._v(" "),a("h2",{attrs:{id:"static-typing"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#static-typing"}},[t._v("#")]),t._v(" Static Typing")]),t._v(" "),a("p",[t._v("One of the most important consequences of “fixing” state management in Frictionless is our new ability to provide static typing for the framework codebase. This work is in progress but we have already added a lot of types and it successfully pass "),a("code",[t._v("pyright")]),t._v(" validation. We highly recommend enabling "),a("code",[t._v("pyright")]),t._v(" in your IDE to see all the type problems in-advance:")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296542-9ee89ed3-999e-44b3-b3e4-32f1df125f4e.png",alt:"type-error"}})]),t._v(" "),a("h1",{attrs:{id:"livemark-docs"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#livemark-docs"}},[t._v("#")]),t._v(" Livemark Docs")]),t._v(" "),a("p",[t._v("We’re happy to announce that we’re finally ready to drop a JavaScript dependency for the docs generation as we migrated it to Livemark. Moreover, Livemark’s ability to execute scripts inside the documentation and other nifty features like simple Tabs or a reference generator will save us hours and hours for writing better docs.")]),t._v(" "),a("h2",{attrs:{id:"script-execution"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#script-execution"}},[t._v("#")]),t._v(" Script Execution")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296761-09eb95c9-7245-4d75-8753-8b1bee635f62.png",alt:"livemark-1"}})]),t._v(" "),a("h2",{attrs:{id:"reference-generation"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#reference-generation"}},[t._v("#")]),t._v(" Reference Generation")]),t._v(" "),a("p",[a("img",{attrs:{src:"https://user-images.githubusercontent.com/74717970/187296860-cb2cc587-c518-47c1-9534-0c1d3f57e552.png",alt:"livemark-2"}})]),t._v(" "),a("h2",{attrs:{id:"happy-contributors"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#happy-contributors"}},[t._v("#")]),t._v(" Happy Contributors")]),t._v(" "),a("p",[t._v("We hope that Livemark docs writing experience will make our contributors happier and allow to grow our community of Frictionless Authors and Users. Let’s chat in our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" if you have questions or just want to say hi.")]),t._v(" "),a("p",[t._v("Read "),a("a",{attrs:{href:"https://framework.frictionlessdata.io/blog/2022/08-22-frictionless-framework-v5.html#:~:text=Read-,Livemark%20Docs,-for%20more%20information",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark Docs"),a("OutboundLink")],1),t._v(" for more information.")])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/166.7552b535.js b/assets/js/166.97b54b92.js similarity index 98% rename from assets/js/166.7552b535.js rename to assets/js/166.97b54b92.js index 254f671d1..8f7572c04 100644 --- a/assets/js/166.7552b535.js +++ b/assets/js/166.97b54b92.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[166],{699:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On the last community call on August 25"),r("sup",[e._v("th")]),e._v(", we had our very own Frictionless Data developer Shashi Gharti presenting to the community the new Frictionless GitHub integration, to read and write data packages from/to GitHub repositories.")]),e._v(" "),r("p",[e._v("Besides reading and writing packages, the integration also allows the creation of containers for data packages: the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/catalog.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("catalog"),r("OutboundLink")],1),e._v(", a list of packages from multiple repositories in GitHub. To select which repository you want to be in the catalog, you can use any GitHub qualifier.")]),e._v(" "),r("p",[e._v("The Frictionless GitHub integration is part of the beta release of "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2022/08/29/frictionless-framework-release/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Framework version 5"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("If you are interested in knowing more about the Frictionless GitHub integration, have a look at Shashi’s presentation and demo:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/gURZK9WDpp0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("You can also check out "),r("a",{attrs:{href:"https://docs.google.com/presentation/d/1hhHEgEqzIkIpzCZ_FW-DjJtImxPI8jdi7Ck5OXiiDsM/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[e._v("Shashi’s slides"),r("OutboundLink")],1),e._v(" or have a look at "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/portals/github.html#reference-portals.githubcontrol",target:"_blank",rel:"noopener noreferrer"}},[e._v("the project documentation"),r("OutboundLink")],1),e._v(". If you use the Frictionless Framework v5 and its GitHub integration, please let us know! And if you have any feedback, feel free to open an issue in the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository"),r("OutboundLink")],1)]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Next community call is on September 29"),r("sup",[e._v("th")]),e._v(". Frictionless Data lead developer Evgeny Karev will be presenting the Frictionless Framework version 5, so make sure not to miss it!")]),e._v(" "),r("p",[e._v("You can sign up for the call already "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),r("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),r("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Join our community on "),r("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),r("OutboundLink")],1),e._v(" (also via "),r("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(") or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/9_VwniN4JKE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[166],{700:function(e,t,r){"use strict";r.r(t);var o=r(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("On the last community call on August 25"),r("sup",[e._v("th")]),e._v(", we had our very own Frictionless Data developer Shashi Gharti presenting to the community the new Frictionless GitHub integration, to read and write data packages from/to GitHub repositories.")]),e._v(" "),r("p",[e._v("Besides reading and writing packages, the integration also allows the creation of containers for data packages: the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/framework/catalog.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("catalog"),r("OutboundLink")],1),e._v(", a list of packages from multiple repositories in GitHub. To select which repository you want to be in the catalog, you can use any GitHub qualifier.")]),e._v(" "),r("p",[e._v("The Frictionless GitHub integration is part of the beta release of "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2022/08/29/frictionless-framework-release/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Framework version 5"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("If you are interested in knowing more about the Frictionless GitHub integration, have a look at Shashi’s presentation and demo:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/gURZK9WDpp0",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),r("p",[e._v("You can also check out "),r("a",{attrs:{href:"https://docs.google.com/presentation/d/1hhHEgEqzIkIpzCZ_FW-DjJtImxPI8jdi7Ck5OXiiDsM/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[e._v("Shashi’s slides"),r("OutboundLink")],1),e._v(" or have a look at "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/docs/portals/github.html#reference-portals.githubcontrol",target:"_blank",rel:"noopener noreferrer"}},[e._v("the project documentation"),r("OutboundLink")],1),e._v(". If you use the Frictionless Framework v5 and its GitHub integration, please let us know! And if you have any feedback, feel free to open an issue in the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository"),r("OutboundLink")],1)]),e._v(" "),r("h1",{attrs:{id:"join-us-next-month"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),r("p",[e._v("Next community call is on September 29"),r("sup",[e._v("th")]),e._v(". Frictionless Data lead developer Evgeny Karev will be presenting the Frictionless Framework version 5, so make sure not to miss it!")]),e._v(" "),r("p",[e._v("You can sign up for the call already "),r("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),r("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),r("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Join our community on "),r("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),r("OutboundLink")],1),e._v(" (also via "),r("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),r("OutboundLink")],1),e._v(") or "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),r("h1",{attrs:{id:"call-recording"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),r("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),r("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/9_VwniN4JKE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/167.88542d46.js b/assets/js/167.bbe687fa.js similarity index 99% rename from assets/js/167.88542d46.js rename to assets/js/167.bbe687fa.js index 839905a55..87537ccce 100644 --- a/assets/js/167.88542d46.js +++ b/assets/js/167.bbe687fa.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[167],{703:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last community call on October 27"),o("sup",[e._v("th")]),e._v(", we had our very own Frictionless Data developer Shashi Gharti presenting to the community the Frictionless Zenodo integration, to read and write data packages from and to Zenodo.")]),e._v(" "),o("p",[e._v("The integration is currently in development, but we decided to present this feature already in order to gather feedback from the community. It was a great idea because we got a lot of very useful inputs from all of you. Also how wonderful to see the community! We had really missed you all in the last two months, since we had to cancel the September call.")]),e._v(" "),o("p",[e._v("Back to Shashi’s presentation: what is Zenodo? For those of you who don’t know it, Zenodo is an open repository, allowing researchers to deposit papers, datasets, software, reports, etc.")]),e._v(" "),o("p",[e._v("Many members of our community are active users of Zenodo, and have asked for a plugin which would make it easier to use Frictionless Data and Zenodo together. Since our aim with Frictionless Data is to make data more easily shareable, transportable and interoperable, this feature made a lot of sense.")]),e._v(" "),o("p",[e._v("Similarly to "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2022/08/30/community-call-github-integration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the GitHub integration Shashi presented in August"),o("OutboundLink")],1),e._v(", the Zenodo integration will work with Frictionless-py v5, and has 3 different features to write data, read data and create a catalog from multiple Zenodo entries, searchable")]),e._v(" "),o("p",[e._v("If you are interested in knowing more about the feature, have a look at Shashi’s presentation and demo:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/KdblvfqIX7o",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("You can also check out "),o("a",{attrs:{href:"https://docs.google.com/presentation/d/1dMvHCR9yE4BewzpQBaW4osKg--aQR7JX6fex0T17YYA/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[e._v("Shashi’s slides"),o("OutboundLink")],1),e._v(". If you use the Frictionless Framework v5 and its Zenodo integration, please let us know! We would love to hear what you think. And if you have any feedback, feel free to open an issue in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h2",{attrs:{id:"other-news-from-the-community"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#other-news-from-the-community"}},[e._v("#")]),e._v(" Other news from the community")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("We are co-organizing "),o("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf v7"),o("OutboundLink")],1),e._v(" in Buenos Aires in April next year. csv,conf is a community event for datamakers from all around the world. Calls for proposals are open until November 25th. More info: "),o("a",{attrs:{href:"https://csvconf.com/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/about/"),o("OutboundLink")],1)])]),e._v(" "),o("li",[o("p",[e._v("Peter Desmet is co-organising a conference on biodiversity data on November 9th, with a focus on camera trap data, for which he uses Frictionless Data standards as you may remember from his presentation at "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/11/23/november-community-call/",target:"_blank",rel:"noopener noreferrer"}},[e._v("last year’s community call"),o("OutboundLink")],1),e._v(". More info: "),o("a",{attrs:{href:"https://www.gbif.org/event/f68927-b5c1-4ac8-a4ac-7d47645/exploring-camera-trap-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.gbif.org/event/f68927-b5c1-4ac8-a4ac-7d47645/exploring-camera-trap-data"),o("OutboundLink")],1)])]),e._v(" "),o("li",[o("p",[e._v("We published an article on the Frictionless Data standards, and we talked about some community projects too, like Libraries Hacked, "),o("a",{attrs:{href:"http://data.gov.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gov.fr"),o("OutboundLink")],1),e._v(", "),o("a",{attrs:{href:"http://dados.gov.br",target:"_blank",rel:"noopener noreferrer"}},[e._v("dados.gov.br"),o("OutboundLink")],1),e._v(", and BCO-DMO. The article is part of Common Place, a space to discuss the digital infrastructures, cultures, and actions needed to distribute, constellate, and amplify knowledge for the public good. ​ Check it out: "),o("a",{attrs:{href:"https://commonplace.knowledgefutures.org/pub/8x7oeawa/release/1?readingCollection=10ba8b01",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://commonplace.knowledgefutures.org/pub/8x7oeawa/release/1?readingCollection=10ba8b01"),o("OutboundLink")],1)])])]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on December 1st (we are pushing it back one week because of the US Thanksgiving).")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Join our community on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/woEiTllLp7A",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[167],{702:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last community call on October 27"),o("sup",[e._v("th")]),e._v(", we had our very own Frictionless Data developer Shashi Gharti presenting to the community the Frictionless Zenodo integration, to read and write data packages from and to Zenodo.")]),e._v(" "),o("p",[e._v("The integration is currently in development, but we decided to present this feature already in order to gather feedback from the community. It was a great idea because we got a lot of very useful inputs from all of you. Also how wonderful to see the community! We had really missed you all in the last two months, since we had to cancel the September call.")]),e._v(" "),o("p",[e._v("Back to Shashi’s presentation: what is Zenodo? For those of you who don’t know it, Zenodo is an open repository, allowing researchers to deposit papers, datasets, software, reports, etc.")]),e._v(" "),o("p",[e._v("Many members of our community are active users of Zenodo, and have asked for a plugin which would make it easier to use Frictionless Data and Zenodo together. Since our aim with Frictionless Data is to make data more easily shareable, transportable and interoperable, this feature made a lot of sense.")]),e._v(" "),o("p",[e._v("Similarly to "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2022/08/30/community-call-github-integration/",target:"_blank",rel:"noopener noreferrer"}},[e._v("the GitHub integration Shashi presented in August"),o("OutboundLink")],1),e._v(", the Zenodo integration will work with Frictionless-py v5, and has 3 different features to write data, read data and create a catalog from multiple Zenodo entries, searchable")]),e._v(" "),o("p",[e._v("If you are interested in knowing more about the feature, have a look at Shashi’s presentation and demo:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/KdblvfqIX7o",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),e._v(" "),o("p",[e._v("You can also check out "),o("a",{attrs:{href:"https://docs.google.com/presentation/d/1dMvHCR9yE4BewzpQBaW4osKg--aQR7JX6fex0T17YYA/edit?usp=sharing",target:"_blank",rel:"noopener noreferrer"}},[e._v("Shashi’s slides"),o("OutboundLink")],1),e._v(". If you use the Frictionless Framework v5 and its Zenodo integration, please let us know! We would love to hear what you think. And if you have any feedback, feel free to open an issue in the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h2",{attrs:{id:"other-news-from-the-community"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#other-news-from-the-community"}},[e._v("#")]),e._v(" Other news from the community")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("We are co-organizing "),o("a",{attrs:{href:"https://csvconf.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf v7"),o("OutboundLink")],1),e._v(" in Buenos Aires in April next year. csv,conf is a community event for datamakers from all around the world. Calls for proposals are open until November 25th. More info: "),o("a",{attrs:{href:"https://csvconf.com/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://csvconf.com/about/"),o("OutboundLink")],1)])]),e._v(" "),o("li",[o("p",[e._v("Peter Desmet is co-organising a conference on biodiversity data on November 9th, with a focus on camera trap data, for which he uses Frictionless Data standards as you may remember from his presentation at "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/11/23/november-community-call/",target:"_blank",rel:"noopener noreferrer"}},[e._v("last year’s community call"),o("OutboundLink")],1),e._v(". More info: "),o("a",{attrs:{href:"https://www.gbif.org/event/f68927-b5c1-4ac8-a4ac-7d47645/exploring-camera-trap-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.gbif.org/event/f68927-b5c1-4ac8-a4ac-7d47645/exploring-camera-trap-data"),o("OutboundLink")],1)])]),e._v(" "),o("li",[o("p",[e._v("We published an article on the Frictionless Data standards, and we talked about some community projects too, like Libraries Hacked, "),o("a",{attrs:{href:"http://data.gov.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gov.fr"),o("OutboundLink")],1),e._v(", "),o("a",{attrs:{href:"http://dados.gov.br",target:"_blank",rel:"noopener noreferrer"}},[e._v("dados.gov.br"),o("OutboundLink")],1),e._v(", and BCO-DMO. The article is part of Common Place, a space to discuss the digital infrastructures, cultures, and actions needed to distribute, constellate, and amplify knowledge for the public good. ​ Check it out: "),o("a",{attrs:{href:"https://commonplace.knowledgefutures.org/pub/8x7oeawa/release/1?readingCollection=10ba8b01",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://commonplace.knowledgefutures.org/pub/8x7oeawa/release/1?readingCollection=10ba8b01"),o("OutboundLink")],1)])])]),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on December 1st (we are pushing it back one week because of the US Thanksgiving).")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("p",[e._v("Would you like to present at one of the next community calls? Please fill out "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Join our community on "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),o("OutboundLink")],1),e._v(" (also via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),o("OutboundLink")],1),e._v(") or "),o("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),o("OutboundLink")],1),e._v(". See you there!")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/woEiTllLp7A",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/168.25ec96e6.js b/assets/js/168.38350f41.js similarity index 98% rename from assets/js/168.25ec96e6.js rename to assets/js/168.38350f41.js index aeb54d7c2..31b492cbc 100644 --- a/assets/js/168.25ec96e6.js +++ b/assets/js/168.38350f41.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[168],{705:function(t,e,a){"use strict";a.r(e);var r=a(29),o=Object(r.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("On our last community call on December 1"),a("sup",[t._v("st")]),t._v(", we heard about the new Frictionless Data - CKAN integration from senior developer Edgar Zanella.")]),t._v(" "),a("p",[t._v("Being a much awaited and longterm requested integration from the community, there are several projects aiming at integrating Frictionless Data with CKAN:")]),t._v(" "),a("ol",[a("li",[a("strong",[t._v("Datapackager CKAN Extension")]),t._v(" - allowing the import of Data Packages directly to CKAN, and the export of any dataset in your portal as a Data Package")]),t._v(" "),a("li",[a("strong",[t._v("CKAN Validation Extension")]),t._v(" - providing all the Frictionless Framework validation functionalities to your CKAN portal")]),t._v(" "),a("li",[a("strong",[t._v("CKAN Data Portal")]),t._v(" supported by Frictionless Framework - providing an easy way to load Data Packages to and from your CKAN portal, using CKAN control")]),t._v(" "),a("li",[a("strong",[t._v("Frictionless CKAN Mapper")]),t._v(" - a small Python library working behind the scenes to convert datasets formats from CKAN to Frictionless Packages, and vice versa.")])]),t._v(" "),a("p",[t._v("Check out Edgar’s presentation to know more about these projects and to see them demoed:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/ZvPTFYsIT9w",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),a("p",[t._v("If you use any Frictionless Data - CKAN integration, please let us know! We would love to hear what you think.")]),t._v(" "),a("p",[t._v("Here are all the repos:")]),t._v(" "),a("ul",[a("li",[t._v("CKAN Datapackager Extension: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/ckanext-datapackager"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("CKAN Validation Extension: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/ckanext-validation"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("CKAN Data Portal (part of Frictionless Framework): "),a("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/framework"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("Frictionless/CKAN Mapper: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-ckan-mapper",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/frictionless-ckan-mapper"),a("OutboundLink")],1)])]),t._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[t._v("#")]),t._v(" Join us next month!")]),t._v(" "),a("p",[t._v("Next community call is on December 22nd, we don’t have any presentation scheduled yet, so if you have a cool project that you would like to show to the community, just let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(", or come and tell us on our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),a("OutboundLink")],1),t._v("). See you there!")]),t._v(" "),a("p",[t._v("Also, you can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Do you want to share something with the community? Let us know when you sign up.")]),t._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[t._v("#")]),t._v(" Call Recording")]),t._v(" "),a("p",[t._v("On a final note, here is the recording of the full call:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/aBSTRfoQhIU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[168],{704:function(t,e,a){"use strict";a.r(e);var r=a(29),o=Object(r.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("On our last community call on December 1"),a("sup",[t._v("st")]),t._v(", we heard about the new Frictionless Data - CKAN integration from senior developer Edgar Zanella.")]),t._v(" "),a("p",[t._v("Being a much awaited and longterm requested integration from the community, there are several projects aiming at integrating Frictionless Data with CKAN:")]),t._v(" "),a("ol",[a("li",[a("strong",[t._v("Datapackager CKAN Extension")]),t._v(" - allowing the import of Data Packages directly to CKAN, and the export of any dataset in your portal as a Data Package")]),t._v(" "),a("li",[a("strong",[t._v("CKAN Validation Extension")]),t._v(" - providing all the Frictionless Framework validation functionalities to your CKAN portal")]),t._v(" "),a("li",[a("strong",[t._v("CKAN Data Portal")]),t._v(" supported by Frictionless Framework - providing an easy way to load Data Packages to and from your CKAN portal, using CKAN control")]),t._v(" "),a("li",[a("strong",[t._v("Frictionless CKAN Mapper")]),t._v(" - a small Python library working behind the scenes to convert datasets formats from CKAN to Frictionless Packages, and vice versa.")])]),t._v(" "),a("p",[t._v("Check out Edgar’s presentation to know more about these projects and to see them demoed:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/ZvPTFYsIT9w",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),a("p",[t._v("If you use any Frictionless Data - CKAN integration, please let us know! We would love to hear what you think.")]),t._v(" "),a("p",[t._v("Here are all the repos:")]),t._v(" "),a("ul",[a("li",[t._v("CKAN Datapackager Extension: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/ckanext-datapackager"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("CKAN Validation Extension: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/ckanext-validation"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("CKAN Data Portal (part of Frictionless Framework): "),a("a",{attrs:{href:"https://github.com/frictionlessdata/framework",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/framework"),a("OutboundLink")],1)]),t._v(" "),a("li",[t._v("Frictionless/CKAN Mapper: "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-ckan-mapper",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/frictionless-ckan-mapper"),a("OutboundLink")],1)])]),t._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[t._v("#")]),t._v(" Join us next month!")]),t._v(" "),a("p",[t._v("Next community call is on December 22nd, we don’t have any presentation scheduled yet, so if you have a cool project that you would like to show to the community, just let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(", or come and tell us on our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),a("OutboundLink")],1),t._v("). See you there!")]),t._v(" "),a("p",[t._v("Also, you can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Do you want to share something with the community? Let us know when you sign up.")]),t._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[t._v("#")]),t._v(" Call Recording")]),t._v(" "),a("p",[t._v("On a final note, here is the recording of the full call:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/aBSTRfoQhIU",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/169.4cd6dd92.js b/assets/js/169.e06ca116.js similarity index 99% rename from assets/js/169.4cd6dd92.js rename to assets/js/169.e06ca116.js index 361852afa..f4e7bf502 100644 --- a/assets/js/169.4cd6dd92.js +++ b/assets/js/169.e06ca116.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[169],{707:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On December 22"),a("sup",[e._v("nd")]),e._v(", for our last community call of the year, we had a nice discussion with Oleg Lavrovsky, an old friend of Open Knowledge Foundation, board member of the Swiss chapter, and valued member of the Frictionless Data community, about Data Package as a Service.")]),e._v(" "),a("p",[e._v("Oleg together with Thorben Westerhuys (remember his "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/29/february-virtual-hangout/#a-recap-from-our-march-community-call",target:"_blank",rel:"noopener noreferrer"}},[e._v("spatiotemporal covid 19 vaccination tracker he presented in March 2021"),a("OutboundLink")],1),e._v("?) already made a first attempt at this in 2019, as you can see in this "),a("a",{attrs:{href:"https://github.com/datalets/daats",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo here"),a("OutboundLink")],1),e._v(". The repository works as a template to create a quick API around your Frictionless Data Package. This solution is based on the "),a("a",{attrs:{href:"http://falconframework.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Falcon micro framework"),a("OutboundLink")],1),e._v(" and the "),a("a",{attrs:{href:"https://github.com/rgieseke/pandas-datapackage-reader",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pandas Data Package Reader"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("More recently Edgar Zanella from the Frictionless Data core team also worked on an "),a("a",{attrs:{href:"https://github.com/aivuk/datapackage-api",target:"_blank",rel:"noopener noreferrer"}},[e._v("experimental solution"),a("OutboundLink")],1),e._v(", converting a Data Package to SQLite database and using "),a("a",{attrs:{href:"https://datasette.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette"),a("OutboundLink")],1),e._v(" to have a "),a("a",{attrs:{href:"https://github.com/aivuk/datapackage-api/",target:"_blank",rel:"noopener noreferrer"}},[e._v("JSON API"),a("OutboundLink")],1),e._v(" over the data. The advantage of this solution is that the way of querying the data is going to be familiar for those that knows "),a("a",{attrs:{href:"https://docs.datasette.io/en/stable/sql_queries.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("SQL"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Then in November 2022, during the GLAMhack 2022 in Mendrisio, an API for Frictionless Data Packages was needed again to be able to sort data and view it on a map. The end result was a "),a("a",{attrs:{href:"https://hack.glam.opendata.ch/project/177",target:"_blank",rel:"noopener noreferrer"}},[e._v("Living Herbarium app"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("So Oleg decided to pitch the idea of Frictionless Data Packages as services, as a challenge at the "),a("a",{attrs:{href:"https://hacknight.dinacon.ch/project/60",target:"_blank",rel:"noopener noreferrer"}},[e._v("DINAcon hacknights"),a("OutboundLink")],1),e._v(" in Bern. The challenge was not picked by anyone at the hackathon itself, but it sparked a conversation "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/archives/C0369JLDJ1Z/p1668597797541189",target:"_blank",rel:"noopener noreferrer"}},[e._v("in our community chat"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("If you are also interested in joining the conversation, just get on the thread in the community chat. If you need a bit of context, you can of course rewatch Oleg’s presentation:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/n_neCrY02jg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("p",[e._v("It was also noted during the call that 2 other excellent ways to get a quick API for Frictionless Data Packages are:")]),e._v(" "),a("ul",[a("li",[a("p",[e._v("The "),a("a",{attrs:{href:"https://githubnext.com/projects/flat-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flat Data project"),a("OutboundLink")],1),e._v(", developed on top of an idea by Simon Willison, allows (among other things) to have a quick API for your Data Package.")])]),e._v(" "),a("li",[a("p",[e._v("CKAN, since CKAN provides APIs. For example via "),a("a",{attrs:{href:"https://github.com/datalets/ckan-embed",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN-embed"),a("OutboundLink")],1),e._v(", a widget for embedding live data searches from CKAN data portals into external websites.")])])]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on January 26"),a("sup",[e._v("th")]),e._v(" and we are going to hear about Frictionless Data and DCAT from Matteo Fortini.")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("p",[e._v("And if you have a cool project that you would like to show to the community, please let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),a("OutboundLink")],1),e._v("). See you there!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/hmr18OhY578",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[169],{705:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("On December 22"),a("sup",[e._v("nd")]),e._v(", for our last community call of the year, we had a nice discussion with Oleg Lavrovsky, an old friend of Open Knowledge Foundation, board member of the Swiss chapter, and valued member of the Frictionless Data community, about Data Package as a Service.")]),e._v(" "),a("p",[e._v("Oleg together with Thorben Westerhuys (remember his "),a("a",{attrs:{href:"https://frictionlessdata.io/blog/2021/03/29/february-virtual-hangout/#a-recap-from-our-march-community-call",target:"_blank",rel:"noopener noreferrer"}},[e._v("spatiotemporal covid 19 vaccination tracker he presented in March 2021"),a("OutboundLink")],1),e._v("?) already made a first attempt at this in 2019, as you can see in this "),a("a",{attrs:{href:"https://github.com/datalets/daats",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repo here"),a("OutboundLink")],1),e._v(". The repository works as a template to create a quick API around your Frictionless Data Package. This solution is based on the "),a("a",{attrs:{href:"http://falconframework.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Falcon micro framework"),a("OutboundLink")],1),e._v(" and the "),a("a",{attrs:{href:"https://github.com/rgieseke/pandas-datapackage-reader",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pandas Data Package Reader"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("More recently Edgar Zanella from the Frictionless Data core team also worked on an "),a("a",{attrs:{href:"https://github.com/aivuk/datapackage-api",target:"_blank",rel:"noopener noreferrer"}},[e._v("experimental solution"),a("OutboundLink")],1),e._v(", converting a Data Package to SQLite database and using "),a("a",{attrs:{href:"https://datasette.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datasette"),a("OutboundLink")],1),e._v(" to have a "),a("a",{attrs:{href:"https://github.com/aivuk/datapackage-api/",target:"_blank",rel:"noopener noreferrer"}},[e._v("JSON API"),a("OutboundLink")],1),e._v(" over the data. The advantage of this solution is that the way of querying the data is going to be familiar for those that knows "),a("a",{attrs:{href:"https://docs.datasette.io/en/stable/sql_queries.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("SQL"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Then in November 2022, during the GLAMhack 2022 in Mendrisio, an API for Frictionless Data Packages was needed again to be able to sort data and view it on a map. The end result was a "),a("a",{attrs:{href:"https://hack.glam.opendata.ch/project/177",target:"_blank",rel:"noopener noreferrer"}},[e._v("Living Herbarium app"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("So Oleg decided to pitch the idea of Frictionless Data Packages as services, as a challenge at the "),a("a",{attrs:{href:"https://hacknight.dinacon.ch/project/60",target:"_blank",rel:"noopener noreferrer"}},[e._v("DINAcon hacknights"),a("OutboundLink")],1),e._v(" in Bern. The challenge was not picked by anyone at the hackathon itself, but it sparked a conversation "),a("a",{attrs:{href:"https://frictionlessdata.slack.com/archives/C0369JLDJ1Z/p1668597797541189",target:"_blank",rel:"noopener noreferrer"}},[e._v("in our community chat"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("If you are also interested in joining the conversation, just get on the thread in the community chat. If you need a bit of context, you can of course rewatch Oleg’s presentation:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/n_neCrY02jg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("p",[e._v("It was also noted during the call that 2 other excellent ways to get a quick API for Frictionless Data Packages are:")]),e._v(" "),a("ul",[a("li",[a("p",[e._v("The "),a("a",{attrs:{href:"https://githubnext.com/projects/flat-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("Flat Data project"),a("OutboundLink")],1),e._v(", developed on top of an idea by Simon Willison, allows (among other things) to have a quick API for your Data Package.")])]),e._v(" "),a("li",[a("p",[e._v("CKAN, since CKAN provides APIs. For example via "),a("a",{attrs:{href:"https://github.com/datalets/ckan-embed",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN-embed"),a("OutboundLink")],1),e._v(", a widget for embedding live data searches from CKAN data portals into external websites.")])])]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on January 26"),a("sup",[e._v("th")]),e._v(" and we are going to hear about Frictionless Data and DCAT from Matteo Fortini.")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("p",[e._v("And if you have a cool project that you would like to show to the community, please let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("Slack"),a("OutboundLink")],1),e._v(" (also via "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix"),a("OutboundLink")],1),e._v("). See you there!")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/hmr18OhY578",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/171.462451b4.js b/assets/js/171.7ebd8a93.js similarity index 99% rename from assets/js/171.462451b4.js rename to assets/js/171.7ebd8a93.js index 46698379c..c65afe73d 100644 --- a/assets/js/171.462451b4.js +++ b/assets/js/171.7ebd8a93.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[171],{709:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("At our last community call on January 26"),a("sup",[t._v("th")]),t._v(", we had Matteo Fortini from the Italian National Department of Digital Transformation, who led a discussion about DCAT and Frictionless Data.")]),t._v(" "),a("p",[t._v("Open data is key to ensure transparency and accountability, understand the world, and have an economy of data. The open data publishing chain in Europe starts with distribution of datasets that go into a national catalogue, which is then harvested by an EU catalogue - all this enabled by metadata.")]),t._v(" "),a("p",[t._v("In practice, Matteo and his colleagues would publish the data (e.g. on the Next Generation EU funds, or on the National Population Registry) as Frictionless Data with DCAT metadata, a format that is mandatory to get into the EU catalogue.")]),t._v(" "),a("p",[t._v("The data is gathered on GitHub (a CKAN instance is sadly not available yet) through scripts that are run everyday. The data is published in both CSV and JSON format, with foreign keys to other tabular data (e.g. geographical data for municipalities) and Frictionless metadata to have a standard way to document all the different attributes of the data, to enforce constraints, and ensure data quality in general. On top of that there is the Italian DCAT_AP, and the mandatory attributes for metadata.")]),t._v(" "),a("p",[t._v("While DCAT is very useful to understand the content, the themes, and the licences, Frictionless Data goes down to attribute descriptions, data types and constraints. So what Matteo would like to have in the future is one type of metadata that would cover both the data description and attributes, and the catalogue information.")]),t._v(" "),a("p",[t._v("Some efforts were already made in the past by community members Augusto Herrman and Ayrton Bourne to map data packages to DCAT (as documented in this "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionlessdata.io/issues/551",target:"_blank",rel:"noopener noreferrer"}},[t._v("issue"),a("OutboundLink")],1),t._v("). Now Matteo and his colleagues are actively looking for other people who would be interested in creating a working group about this, to try to get to some kind of shared standard.")]),t._v(" "),a("p",[t._v("Other community members present at the call shared their own experience with Frictionless and DCAT:.")]),t._v(" "),a("p",[t._v("The German State of Schleswig - Holstein shared "),a("a",{attrs:{href:"https://opendata.schleswig-holstein.de/dataset/marktplatz-autos-2023",target:"_blank",rel:"noopener noreferrer"}},[t._v("a very interesting example"),a("OutboundLink")],1),t._v(" from their portal. As they did not find a good way to attach the Frictionless Specification to the DCAT Distribution, they created a separate distribution for the Frictionless Tabular Data Resource. Switzerland took the same approach, linking the Frictionless Specification as a separate distribution, as you can see "),a("a",{attrs:{href:"https://opendata.swiss/de/dataset/vollzugsresultate-der-co2-emissionsvorschriften-fur-lieferwagen-und-leichte-sattelschlepper",target:"_blank",rel:"noopener noreferrer"}},[t._v("in this example"),a("OutboundLink")],1),t._v(". They are unsure about this approach though, as it seems to be a misuse of the DCAT Class.")]),t._v(" "),a("p",[t._v("To make Frictionless Data more interoperable with other semantic web standards, Dan Feder pointed out the idea to create RDF or JSON-LD Specification, something that had already been discussed in the past, as documented in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/218",target:"_blank",rel:"noopener noreferrer"}},[t._v("this issue"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Do you have anything to add to this? Are you interested in joining the open discussion? Let us know in our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" or "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("If you want to know more about Matteo’s presentation, here’s the recording:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/sHHRT5ptqbg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),t._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[t._v("#")]),t._v(" Join us next month!")]),t._v(" "),a("p",[t._v("Next community call is on February 23"),a("sup",[t._v("rd")]),t._v(" and we are going to hear about the database curation software for the World Glacier Monitoring Service (WGMS) from Ethan Welty.")]),t._v(" "),a("p",[t._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(". Do you want to share something with the community? Let us know when you sign up.")]),t._v(" "),a("p",[t._v("And if you have a cool project that you would like to show to the community, please let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(", or come and tell us on our community chat.")]),t._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[t._v("#")]),t._v(" Call Recording")]),t._v(" "),a("p",[t._v("On a final note, here is the recording of the full call:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/DTykNylDdsA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[171],{707:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("p",[t._v("At our last community call on January 26"),a("sup",[t._v("th")]),t._v(", we had Matteo Fortini from the Italian National Department of Digital Transformation, who led a discussion about DCAT and Frictionless Data.")]),t._v(" "),a("p",[t._v("Open data is key to ensure transparency and accountability, understand the world, and have an economy of data. The open data publishing chain in Europe starts with distribution of datasets that go into a national catalogue, which is then harvested by an EU catalogue - all this enabled by metadata.")]),t._v(" "),a("p",[t._v("In practice, Matteo and his colleagues would publish the data (e.g. on the Next Generation EU funds, or on the National Population Registry) as Frictionless Data with DCAT metadata, a format that is mandatory to get into the EU catalogue.")]),t._v(" "),a("p",[t._v("The data is gathered on GitHub (a CKAN instance is sadly not available yet) through scripts that are run everyday. The data is published in both CSV and JSON format, with foreign keys to other tabular data (e.g. geographical data for municipalities) and Frictionless metadata to have a standard way to document all the different attributes of the data, to enforce constraints, and ensure data quality in general. On top of that there is the Italian DCAT_AP, and the mandatory attributes for metadata.")]),t._v(" "),a("p",[t._v("While DCAT is very useful to understand the content, the themes, and the licences, Frictionless Data goes down to attribute descriptions, data types and constraints. So what Matteo would like to have in the future is one type of metadata that would cover both the data description and attributes, and the catalogue information.")]),t._v(" "),a("p",[t._v("Some efforts were already made in the past by community members Augusto Herrman and Ayrton Bourne to map data packages to DCAT (as documented in this "),a("a",{attrs:{href:"https://github.com/frictionlessdata/frictionlessdata.io/issues/551",target:"_blank",rel:"noopener noreferrer"}},[t._v("issue"),a("OutboundLink")],1),t._v("). Now Matteo and his colleagues are actively looking for other people who would be interested in creating a working group about this, to try to get to some kind of shared standard.")]),t._v(" "),a("p",[t._v("Other community members present at the call shared their own experience with Frictionless and DCAT:.")]),t._v(" "),a("p",[t._v("The German State of Schleswig - Holstein shared "),a("a",{attrs:{href:"https://opendata.schleswig-holstein.de/dataset/marktplatz-autos-2023",target:"_blank",rel:"noopener noreferrer"}},[t._v("a very interesting example"),a("OutboundLink")],1),t._v(" from their portal. As they did not find a good way to attach the Frictionless Specification to the DCAT Distribution, they created a separate distribution for the Frictionless Tabular Data Resource. Switzerland took the same approach, linking the Frictionless Specification as a separate distribution, as you can see "),a("a",{attrs:{href:"https://opendata.swiss/de/dataset/vollzugsresultate-der-co2-emissionsvorschriften-fur-lieferwagen-und-leichte-sattelschlepper",target:"_blank",rel:"noopener noreferrer"}},[t._v("in this example"),a("OutboundLink")],1),t._v(". They are unsure about this approach though, as it seems to be a misuse of the DCAT Class.")]),t._v(" "),a("p",[t._v("To make Frictionless Data more interoperable with other semantic web standards, Dan Feder pointed out the idea to create RDF or JSON-LD Specification, something that had already been discussed in the past, as documented in "),a("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/218",target:"_blank",rel:"noopener noreferrer"}},[t._v("this issue"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("Do you have anything to add to this? Are you interested in joining the open discussion? Let us know in our community chat on "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),a("OutboundLink")],1),t._v(" or "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("p",[t._v("If you want to know more about Matteo’s presentation, here’s the recording:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/sHHRT5ptqbg",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),t._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[t._v("#")]),t._v(" Join us next month!")]),t._v(" "),a("p",[t._v("Next community call is on February 23"),a("sup",[t._v("rd")]),t._v(" and we are going to hear about the database curation software for the World Glacier Monitoring Service (WGMS) from Ethan Welty.")]),t._v(" "),a("p",[t._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),a("OutboundLink")],1),t._v(". Do you want to share something with the community? Let us know when you sign up.")]),t._v(" "),a("p",[t._v("And if you have a cool project that you would like to show to the community, please let us know! You can just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(", or come and tell us on our community chat.")]),t._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[t._v("#")]),t._v(" Call Recording")]),t._v(" "),a("p",[t._v("On a final note, here is the recording of the full call:")]),t._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/DTykNylDdsA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/172.5070cd58.js b/assets/js/172.edf8b83c.js similarity index 98% rename from assets/js/172.5070cd58.js rename to assets/js/172.edf8b83c.js index 6b633a3f3..600edc3b6 100644 --- a/assets/js/172.5070cd58.js +++ b/assets/js/172.edf8b83c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[172],{710:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on February 28th, we heard about generating spreadsheet templates from Tabular Data Package metadata from Ethan Welty.")]),e._v(" "),a("p",[e._v("Ethan works for the "),a("a",{attrs:{href:"https://wgms.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("World Glacier Monitoring Service (WGMS)"),a("OutboundLink")],1),e._v(", which maintains and curates a single longrunning dataset (with entries dating back to 1894!) combining both satellite measurements, and manual submissions of scientists from around the world who go out to glaciers and measure the mass changes on the ground.")]),e._v(" "),a("p",[e._v("One of their biggest challenges is that parts of the data are not machine-generated, but inserted by humans. It is therefore important to review the data submissions to try and catch any possible error. To do that, Ethan adopted the Frictionless Tabular Data Package approach, getting as much of the organisation logic and data management into a centralised metadata.")]),e._v(" "),a("p",[e._v("Plus, to help people doing their data entry, they have spreadsheet templates automatically generated. The file is built in markup language, and is generated from the validation pipeline (which works in a slightly different way than in Frictionless Data, as it scales to a much longer pipeline). The template generator, called "),a("em",[e._v("Tablecloth")]),e._v(", currently supports Excel - as it is what most people who work with the WGMS are comfortable using, and it is soon going to support Google Sheets too.")]),e._v(" "),a("p",[e._v("If you want to know more about "),a("em",[e._v("Tablecloth")]),e._v(" and are interested in having a look at the demo Ethan did on the call, go ahead and have a look at the recording of the presentation:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/79CrD5O96vk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also check out "),a("em",[e._v("Tablecloth")]),e._v(" on "),a("a",{attrs:{href:"https://github.com/ezwelty/tablecloth",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://gitlab.com/wgms/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitLab"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on March 30"),a("sup",[e._v("th")]),e._v(" and guess what? We do not have any presentations scheduled yet! So this could be your moment to come and tell us about your project! If you are interested in doing so just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our community chat.")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/_k7NlWztGlc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[172],{708:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on February 28th, we heard about generating spreadsheet templates from Tabular Data Package metadata from Ethan Welty.")]),e._v(" "),a("p",[e._v("Ethan works for the "),a("a",{attrs:{href:"https://wgms.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("World Glacier Monitoring Service (WGMS)"),a("OutboundLink")],1),e._v(", which maintains and curates a single longrunning dataset (with entries dating back to 1894!) combining both satellite measurements, and manual submissions of scientists from around the world who go out to glaciers and measure the mass changes on the ground.")]),e._v(" "),a("p",[e._v("One of their biggest challenges is that parts of the data are not machine-generated, but inserted by humans. It is therefore important to review the data submissions to try and catch any possible error. To do that, Ethan adopted the Frictionless Tabular Data Package approach, getting as much of the organisation logic and data management into a centralised metadata.")]),e._v(" "),a("p",[e._v("Plus, to help people doing their data entry, they have spreadsheet templates automatically generated. The file is built in markup language, and is generated from the validation pipeline (which works in a slightly different way than in Frictionless Data, as it scales to a much longer pipeline). The template generator, called "),a("em",[e._v("Tablecloth")]),e._v(", currently supports Excel - as it is what most people who work with the WGMS are comfortable using, and it is soon going to support Google Sheets too.")]),e._v(" "),a("p",[e._v("If you want to know more about "),a("em",[e._v("Tablecloth")]),e._v(" and are interested in having a look at the demo Ethan did on the call, go ahead and have a look at the recording of the presentation:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/79CrD5O96vk",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("p",[e._v("You can also check out "),a("em",[e._v("Tablecloth")]),e._v(" on "),a("a",{attrs:{href:"https://github.com/ezwelty/tablecloth",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://gitlab.com/wgms/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitLab"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on March 30"),a("sup",[e._v("th")]),e._v(" and guess what? We do not have any presentations scheduled yet! So this could be your moment to come and tell us about your project! If you are interested in doing so just fill out "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our community chat.")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/_k7NlWztGlc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/173.738038e0.js b/assets/js/173.560957ed.js similarity index 98% rename from assets/js/173.738038e0.js rename to assets/js/173.560957ed.js index d668ad00d..3bdf83724 100644 --- a/assets/js/173.738038e0.js +++ b/assets/js/173.560957ed.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[173],{708:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on March 30"),a("sup",[e._v("th")]),e._v(", our very own Evgeny Karev - tech lead of the Frictionless Data project at "),a("a",{attrs:{href:"http://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),a("OutboundLink")],1),e._v(", presented the new Frictionless command line features.")]),e._v(" "),a("p",[e._v("The new commands have been developed as part of the effort of building recommended data workflows for different needs, and might be particularly useful for data wrangling and data exploration. Here they are:")]),e._v(" "),a("ul",[a("li",[a("strong",[e._v("List")]),e._v(" function is a new command to quickly see lists of resources in a dataset.")]),e._v(" "),a("li",[a("strong",[e._v("Describe")]),e._v(", an old command actually, but that can be part of the exploration workflow as it infers Table Schemas for all tabular resources.")]),e._v(" "),a("li",[a("strong",[e._v("Extract")]),e._v(", also an old command, can be used to understand what kind of data is in the table, and get a preview of it.")]),e._v(" "),a("li",[a("strong",[e._v("Explore")]),e._v(", to use in combination with "),a("a",{attrs:{href:"https://www.visidata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Visidata"),a("OutboundLink")],1),e._v(" to edit tables directly in the command line.")]),e._v(" "),a("li",[a("strong",[e._v("Query")]),e._v(" which will put a dataset into a SQLite database, with everything indexed, adding nice functionalities, like the possibility of saving queries as CSV files.")]),e._v(" "),a("li",[a("strong",[e._v("Script")]),e._v(" is a feature that allows dataset indexing and will create Pandas dataframes for you.")]),e._v(" "),a("li",[a("strong",[e._v("Convert")]),e._v(", a work-in-progress command that can be used to convert from one format to the other, something that was historically done with the Extract function in the Framework.")]),e._v(" "),a("li",[a("strong",[e._v("Publish")]),e._v(" is also a work-in-progress command, and you can use it to upload your dataset to a data portal (e.g. a CKAN instance) just providing an API key.")])]),e._v(" "),a("p",[e._v("To better understand how you can use all these new commands, have a look at Evgeny’s presentation and demo:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/yNYAGMcAGl4",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on April 27"),a("sup",[e._v("th")]),e._v(". Keith Hughitt will share with us his ideas on how to improve support for non-tabular data, a proposed abstract data model, and a specification for describing the relationship between datasets.")]),e._v(" "),a("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),a("OutboundLink")],1),e._v("(also accessible via a "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),a("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call, including the short presentation and community discussion on the project governance:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/vgeXcDd5KEE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[173],{710:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on March 30"),a("sup",[e._v("th")]),e._v(", our very own Evgeny Karev - tech lead of the Frictionless Data project at "),a("a",{attrs:{href:"http://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation"),a("OutboundLink")],1),e._v(", presented the new Frictionless command line features.")]),e._v(" "),a("p",[e._v("The new commands have been developed as part of the effort of building recommended data workflows for different needs, and might be particularly useful for data wrangling and data exploration. Here they are:")]),e._v(" "),a("ul",[a("li",[a("strong",[e._v("List")]),e._v(" function is a new command to quickly see lists of resources in a dataset.")]),e._v(" "),a("li",[a("strong",[e._v("Describe")]),e._v(", an old command actually, but that can be part of the exploration workflow as it infers Table Schemas for all tabular resources.")]),e._v(" "),a("li",[a("strong",[e._v("Extract")]),e._v(", also an old command, can be used to understand what kind of data is in the table, and get a preview of it.")]),e._v(" "),a("li",[a("strong",[e._v("Explore")]),e._v(", to use in combination with "),a("a",{attrs:{href:"https://www.visidata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Visidata"),a("OutboundLink")],1),e._v(" to edit tables directly in the command line.")]),e._v(" "),a("li",[a("strong",[e._v("Query")]),e._v(" which will put a dataset into a SQLite database, with everything indexed, adding nice functionalities, like the possibility of saving queries as CSV files.")]),e._v(" "),a("li",[a("strong",[e._v("Script")]),e._v(" is a feature that allows dataset indexing and will create Pandas dataframes for you.")]),e._v(" "),a("li",[a("strong",[e._v("Convert")]),e._v(", a work-in-progress command that can be used to convert from one format to the other, something that was historically done with the Extract function in the Framework.")]),e._v(" "),a("li",[a("strong",[e._v("Publish")]),e._v(" is also a work-in-progress command, and you can use it to upload your dataset to a data portal (e.g. a CKAN instance) just providing an API key.")])]),e._v(" "),a("p",[e._v("To better understand how you can use all these new commands, have a look at Evgeny’s presentation and demo:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/yNYAGMcAGl4",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on April 27"),a("sup",[e._v("th")]),e._v(". Keith Hughitt will share with us his ideas on how to improve support for non-tabular data, a proposed abstract data model, and a specification for describing the relationship between datasets.")]),e._v(" "),a("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),a("OutboundLink")],1),e._v("(also accessible via a "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),a("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call, including the short presentation and community discussion on the project governance:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/vgeXcDd5KEE",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/174.3966497a.js b/assets/js/174.a383a588.js similarity index 98% rename from assets/js/174.3966497a.js rename to assets/js/174.a383a588.js index e2e310ec0..0584a0dab 100644 --- a/assets/js/174.3966497a.js +++ b/assets/js/174.a383a588.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[174],{712:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on April 27"),o("sup",[e._v("th")]),e._v(" Keith Hughitt presented his ideas on how we can improve support for non-tabular data, and on how we could build a specification for describing the relationship between datasets. It took me some time to write this recap blog, because some of the reflections that Keith shared with us resonated very much with some of the thinking we have been doing at Open Knowledge Foundation around governance. I had explained during the "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2023/04/06/march-community-call/",target:"_blank",rel:"noopener noreferrer"}},[e._v("March community call"),o("OutboundLink")],1),e._v(" that the governance of the specs has been recently unblocked, and we are starting to think about how to get to v2. It was actually Keith who urged me to do that presentation to clarify the project governance (and I am so glad he did!).")]),e._v(" "),o("p",[e._v("Keith’s main goals are pretty clear: 1. He wants datasets to be soft contained and well defined enough to be combinable with minimal effort. Datasets should function like lego blocks, which is the way Frictionless Data works too. 2. He wants transparency on how the data is processed and communicated, as this is key to reproducibility.")]),e._v(" "),o("p",[e._v("At the moment the Frictionless Data specs have a strong focus on tabular data, and Keith would like to extend that same kind of support to other types of data as well. Having some kind of common spec would be very useful for all those who work with more than one type of data, and he feels something can be done to make that work easier.")]),e._v(" "),o("h3",{attrs:{id:"so-what-does-keith-have-in-mind"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#so-what-does-keith-have-in-mind"}},[e._v("#")]),e._v(" So what does Keith have in mind?")]),e._v(" "),o("p",[e._v("He argues that we should separate the description of structure (data types) and domain (fields that are included in one discipline). This is easy to achieve because Frictionless is modular by design.")]),e._v(" "),o("p",[e._v("We should take some intentional action to design a high-level model, so that even if we leave it to community members to build domain-specific specs, the core Frictionless team at Open Knowledge Foundation would oversee that they all still have a common core data model which allows all the different extensions to interact easily.")]),e._v(" "),o("p",[e._v("Keith suggests using a mix-in approach, where the domain-specific schema would be made by combining specs (data type/structure + data domain). This would make sense to avoid redundancy in the code structure.")]),e._v(" "),o("p",[e._v("It would be important to have a working group with representatives from different disciplines, and working in different capacities, to build together this common data model in a way that really fits the needs of everyone (or at least find some minimal common ground). This is exactly the direction we would like the project to move forward. We are working on it, so stay tuned!")]),e._v(" "),o("p",[e._v("Meanwhile, if you want to know more about Keith’s ideas, you can watch the recording of his presentation:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/UhRYtkYDHsM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on May 25"),o("sup",[e._v("th")]),e._v(".")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/qL3uBfer1sA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[174],{709:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("At our last community call on April 27"),o("sup",[e._v("th")]),e._v(" Keith Hughitt presented his ideas on how we can improve support for non-tabular data, and on how we could build a specification for describing the relationship between datasets. It took me some time to write this recap blog, because some of the reflections that Keith shared with us resonated very much with some of the thinking we have been doing at Open Knowledge Foundation around governance. I had explained during the "),o("a",{attrs:{href:"https://frictionlessdata.io/blog/2023/04/06/march-community-call/",target:"_blank",rel:"noopener noreferrer"}},[e._v("March community call"),o("OutboundLink")],1),e._v(" that the governance of the specs has been recently unblocked, and we are starting to think about how to get to v2. It was actually Keith who urged me to do that presentation to clarify the project governance (and I am so glad he did!).")]),e._v(" "),o("p",[e._v("Keith’s main goals are pretty clear: 1. He wants datasets to be soft contained and well defined enough to be combinable with minimal effort. Datasets should function like lego blocks, which is the way Frictionless Data works too. 2. He wants transparency on how the data is processed and communicated, as this is key to reproducibility.")]),e._v(" "),o("p",[e._v("At the moment the Frictionless Data specs have a strong focus on tabular data, and Keith would like to extend that same kind of support to other types of data as well. Having some kind of common spec would be very useful for all those who work with more than one type of data, and he feels something can be done to make that work easier.")]),e._v(" "),o("h3",{attrs:{id:"so-what-does-keith-have-in-mind"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#so-what-does-keith-have-in-mind"}},[e._v("#")]),e._v(" So what does Keith have in mind?")]),e._v(" "),o("p",[e._v("He argues that we should separate the description of structure (data types) and domain (fields that are included in one discipline). This is easy to achieve because Frictionless is modular by design.")]),e._v(" "),o("p",[e._v("We should take some intentional action to design a high-level model, so that even if we leave it to community members to build domain-specific specs, the core Frictionless team at Open Knowledge Foundation would oversee that they all still have a common core data model which allows all the different extensions to interact easily.")]),e._v(" "),o("p",[e._v("Keith suggests using a mix-in approach, where the domain-specific schema would be made by combining specs (data type/structure + data domain). This would make sense to avoid redundancy in the code structure.")]),e._v(" "),o("p",[e._v("It would be important to have a working group with representatives from different disciplines, and working in different capacities, to build together this common data model in a way that really fits the needs of everyone (or at least find some minimal common ground). This is exactly the direction we would like the project to move forward. We are working on it, so stay tuned!")]),e._v(" "),o("p",[e._v("Meanwhile, if you want to know more about Keith’s ideas, you can watch the recording of his presentation:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/UhRYtkYDHsM",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"join-us-next-month"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),o("p",[e._v("Next community call is on May 25"),o("sup",[e._v("th")]),e._v(".")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/qL3uBfer1sA",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/175.33e4d10e.js b/assets/js/175.b7a38070.js similarity index 98% rename from assets/js/175.33e4d10e.js rename to assets/js/175.b7a38070.js index 00618cde1..28f4df966 100644 --- a/assets/js/175.33e4d10e.js +++ b/assets/js/175.b7a38070.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[175],{713:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on May 25"),a("sup",[e._v("th")]),e._v(" Augusto Herrmann presented FastETL, a free and open source software library for Apache Airflow that makes it easier to integrate heterogeneous data sources and to publish open data (e.g. to CKAN data portals) using Apache Airflow.")]),e._v(" "),a("p",[e._v("Augusto told us how the data engineering team at the Secretariat for Management and Innovation in the Brazilian federal government has been using FastETL in combination with the Frictionless Framework, and Tabular Data Packages for processing data pipelines and to publish open data.")]),e._v(" "),a("p",[e._v("Augusto and his team have developed FastETL, among other things, to be able to periodically synchronise data sources in the data lake, publish open data on open data portals, and be notified about publications in the official gazette.")]),e._v(" "),a("p",[e._v("Some of the things that you can do with FastETL are:")]),e._v(" "),a("ul",[a("li",[e._v("Full or incremental replication of tables in SQL Server, and Postgres databases (and MySQL sources).")]),e._v(" "),a("li",[e._v("Load data from GSheets and from spreadsheets on Samba/Windows networks.")]),e._v(" "),a("li",[e._v("Extract CSVs from SQL.")]),e._v(" "),a("li",[e._v("Query the Brazilian National Official Gazette’s API, and get a notification when there is a new publication in the Official Gazette.")]),e._v(" "),a("li",[e._v("Use CKAN or "),a("a",{attrs:{href:"http://dados.gov.br",target:"_blank",rel:"noopener noreferrer"}},[e._v("dados.gov.br"),a("OutboundLink")],1),e._v("’s API to update dataset metadata.")]),e._v(" "),a("li",[e._v("Use Frictionless Tabular Data Packages to write data dictionaries in OpenDocument Text format.")])]),e._v(" "),a("p",[e._v("Would you like to know more? You can have a look at Augusto’s slides on "),a("a",{attrs:{href:"https://herrmann.tech/slide-decks/2023/05/integrating-data-sources-and-publishing-open-data-with-fastetl-airflow-and-frictionless",target:"_blank",rel:"noopener noreferrer"}},[e._v("his website here"),a("OutboundLink")],1),e._v(", or check out the "),a("a",{attrs:{href:"https://github.com/gestaogovbr/FastETL",target:"_blank",rel:"noopener noreferrer"}},[e._v("FastETL GitHub Repository"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nAnd if you want to better understand how to use FastETL, have a look at Augusto’s presentation, with some great data pipeline examples:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z8bo6cyd-gw",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on June 29"),a("sup",[e._v("th")]),e._v(", and it will be a hands-on session on strange datasets and how to describe them! Jesper Zedlitz from the German federal state of Schleswig-Holstein will be bringing one. Let us know if you would also like to bring a dataset to this call, by emailing Sara Petti sara.petti[at]"),a("a",{attrs:{href:"http://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("okfn.org"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),a("OutboundLink")],1),e._v(" (also accessible via a "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),a("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z8bo6cyd-gw",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[175],{712:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("At our last community call on May 25"),a("sup",[e._v("th")]),e._v(" Augusto Herrmann presented FastETL, a free and open source software library for Apache Airflow that makes it easier to integrate heterogeneous data sources and to publish open data (e.g. to CKAN data portals) using Apache Airflow.")]),e._v(" "),a("p",[e._v("Augusto told us how the data engineering team at the Secretariat for Management and Innovation in the Brazilian federal government has been using FastETL in combination with the Frictionless Framework, and Tabular Data Packages for processing data pipelines and to publish open data.")]),e._v(" "),a("p",[e._v("Augusto and his team have developed FastETL, among other things, to be able to periodically synchronise data sources in the data lake, publish open data on open data portals, and be notified about publications in the official gazette.")]),e._v(" "),a("p",[e._v("Some of the things that you can do with FastETL are:")]),e._v(" "),a("ul",[a("li",[e._v("Full or incremental replication of tables in SQL Server, and Postgres databases (and MySQL sources).")]),e._v(" "),a("li",[e._v("Load data from GSheets and from spreadsheets on Samba/Windows networks.")]),e._v(" "),a("li",[e._v("Extract CSVs from SQL.")]),e._v(" "),a("li",[e._v("Query the Brazilian National Official Gazette’s API, and get a notification when there is a new publication in the Official Gazette.")]),e._v(" "),a("li",[e._v("Use CKAN or "),a("a",{attrs:{href:"http://dados.gov.br",target:"_blank",rel:"noopener noreferrer"}},[e._v("dados.gov.br"),a("OutboundLink")],1),e._v("’s API to update dataset metadata.")]),e._v(" "),a("li",[e._v("Use Frictionless Tabular Data Packages to write data dictionaries in OpenDocument Text format.")])]),e._v(" "),a("p",[e._v("Would you like to know more? You can have a look at Augusto’s slides on "),a("a",{attrs:{href:"https://herrmann.tech/slide-decks/2023/05/integrating-data-sources-and-publishing-open-data-with-fastetl-airflow-and-frictionless",target:"_blank",rel:"noopener noreferrer"}},[e._v("his website here"),a("OutboundLink")],1),e._v(", or check out the "),a("a",{attrs:{href:"https://github.com/gestaogovbr/FastETL",target:"_blank",rel:"noopener noreferrer"}},[e._v("FastETL GitHub Repository"),a("OutboundLink")],1),e._v("."),a("br"),e._v("\nAnd if you want to better understand how to use FastETL, have a look at Augusto’s presentation, with some great data pipeline examples:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z8bo6cyd-gw",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),a("h1",{attrs:{id:"join-us-next-month"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-us-next-month"}},[e._v("#")]),e._v(" Join us next month!")]),e._v(" "),a("p",[e._v("Next community call is on June 29"),a("sup",[e._v("th")]),e._v(", and it will be a hands-on session on strange datasets and how to describe them! Jesper Zedlitz from the German federal state of Schleswig-Holstein will be bringing one. Let us know if you would also like to bring a dataset to this call, by emailing Sara Petti sara.petti[at]"),a("a",{attrs:{href:"http://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("okfn.org"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),a("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),a("OutboundLink")],1),e._v(", or come and tell us on our "),a("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),a("OutboundLink")],1),e._v(" (also accessible via a "),a("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),a("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),a("p",[e._v("You can sign up for the call already "),a("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),a("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),a("h1",{attrs:{id:"call-recording"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),a("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),a("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/Z8bo6cyd-gw",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/176.2a94c2ee.js b/assets/js/176.ad6847db.js similarity index 98% rename from assets/js/176.2a94c2ee.js rename to assets/js/176.ad6847db.js index d9822ad44..660abb6a5 100644 --- a/assets/js/176.2a94c2ee.js +++ b/assets/js/176.ad6847db.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[176],{716:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On June 29"),o("sup",[e._v("th")]),e._v(" we had our last monthly call, and it was kind of a special one! Instead of the usual project presentation, we had a hands-on session on strange datasets and how to describe them.")]),e._v(" "),o("p",[e._v("Our community member Jesper Zedlitz comes regularly across very weird datasets in his day-to-day work, and had asked in the May community call, whether it was possible to bring some of them to the call and check them out together with the community to try to make sense of them all together. This turned out to be an excellent idea for a fun call!")]),e._v(" "),o("p",[e._v("So what kind of problems is Jesper encountering?")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Sometimes we have extra information on the dataset, the licence, etc. at the beginning and comments at the end of the csv, so some rows need to be ignored. This is easy to do for the top part of the dataset, but it’s harder for the bottom part. Something we will definitely need to think about for the next iteration of the Frictionless specs, for example by giving the possibility to have a “headline row”, or something like that. This was a common problem for other community members too.")])]),e._v(" "),o("li",[o("p",[e._v("Sometimes we don’t have any information at all: Jesper showed us some CSVs without any headerlines, where it’s up to you to figure out what kind of data is in there.")])]),e._v(" "),o("li",[o("p",[e._v("The dialect (e.g. weird delimiters) and character encoding are sometimes tricky too, but that’s already easy to manage with the Frictionless specs.")])])]),e._v(" "),o("p",[e._v("Do you want to know more about the strange datasets that Jesper has shown us during the call? Then you should watch the full recording of the call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/WekfG2AZ-Dc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"join-us-in-august"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-august"}},[e._v("#")]),e._v(" Join us in August!")]),e._v(" "),o("p",[e._v("Exceptionally we won’t have any community call in July, so see you all on August 31"),o("sup",[e._v("st")]),e._v("!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[176],{714:function(e,t,o){"use strict";o.r(t);var a=o(29),n=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On June 29"),o("sup",[e._v("th")]),e._v(" we had our last monthly call, and it was kind of a special one! Instead of the usual project presentation, we had a hands-on session on strange datasets and how to describe them.")]),e._v(" "),o("p",[e._v("Our community member Jesper Zedlitz comes regularly across very weird datasets in his day-to-day work, and had asked in the May community call, whether it was possible to bring some of them to the call and check them out together with the community to try to make sense of them all together. This turned out to be an excellent idea for a fun call!")]),e._v(" "),o("p",[e._v("So what kind of problems is Jesper encountering?")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Sometimes we have extra information on the dataset, the licence, etc. at the beginning and comments at the end of the csv, so some rows need to be ignored. This is easy to do for the top part of the dataset, but it’s harder for the bottom part. Something we will definitely need to think about for the next iteration of the Frictionless specs, for example by giving the possibility to have a “headline row”, or something like that. This was a common problem for other community members too.")])]),e._v(" "),o("li",[o("p",[e._v("Sometimes we don’t have any information at all: Jesper showed us some CSVs without any headerlines, where it’s up to you to figure out what kind of data is in there.")])]),e._v(" "),o("li",[o("p",[e._v("The dialect (e.g. weird delimiters) and character encoding are sometimes tricky too, but that’s already easy to manage with the Frictionless specs.")])])]),e._v(" "),o("p",[e._v("Do you want to know more about the strange datasets that Jesper has shown us during the call? Then you should watch the full recording of the call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/WekfG2AZ-Dc",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"join-us-in-august"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-august"}},[e._v("#")]),e._v(" Join us in August!")]),e._v(" "),o("p",[e._v("Exceptionally we won’t have any community call in July, so see you all on August 31"),o("sup",[e._v("st")]),e._v("!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/177.6c30d8b5.js b/assets/js/177.76c81c00.js similarity index 98% rename from assets/js/177.6c30d8b5.js rename to assets/js/177.76c81c00.js index 1cbade261..aec5864c3 100644 --- a/assets/js/177.6c30d8b5.js +++ b/assets/js/177.76c81c00.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[177],{715:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("After 2 long months of absence, our monthly community call was finally back on September 28"),o("sup",[e._v("th")]),e._v(", with some very exciting news! Our Tech Lead Evgeny Karev presented the work that has absorbed so much of his and Shashi Gharti’s time in the last months: the Frictionless no-code application "),o("a",{attrs:{href:"https://opendataeditor.okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Editor"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The problem that inspired this new tool is that still today there is no easy tool to manage and publish data for those who don’t have technical skills. The new Open Data Editor offers the possibility to access all Frictionless functionalities without having to write one single line of code, nor open your shell. Like most Frictionless products, Open Data Editor focuses on tabular data, and it can easily open big files because it uses the "),o("code",[e._v("database")]),e._v(" under the hood (similarly to CKAN). You can use it to edit metadata, and declare some rules for opening that you can share with your collaborators, making your data more reproducible.")]),e._v(" "),o("p",[e._v("You can use it to create data visualisation with VegaLite, but Open Data Editor has also an AI support, which you can use to create charts for you, in case you don’t know how to use the VegaLite specifications. You can also publish data stories, and much much more! Check out Evgeny’s presentation to see all the great features of the Open Data Editor:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/a0IyJPSmJyY?si=xOfI7YmS4EsVcKlp",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("p",[e._v("The application is still a work in progress, but if you would like to try it out that’s of course absolutely possible, and we would love it if you could give us feedback, so please let us know if you spot anything weird. To make the experience smoother, we have a detailed "),o("a",{attrs:{href:"https://opendataeditor.okfn.org/documentation/getting-started/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation website"),o("OutboundLink")],1),e._v(" you can consult.")]),e._v(" "),o("h1",{attrs:{id:"join-us-in-october"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-october"}},[e._v("#")]),e._v(" Join us in October!")]),e._v(" "),o("p",[e._v("Next community call is on October 26"),o("sup",[e._v("th")]),e._v(", join us to hear exciting news about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless specs"),o("OutboundLink")],1),e._v(" update!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kxB7NZiXF4A?si=5tb2LrJFJaChP-dR",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[177],{713:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("After 2 long months of absence, our monthly community call was finally back on September 28"),o("sup",[e._v("th")]),e._v(", with some very exciting news! Our Tech Lead Evgeny Karev presented the work that has absorbed so much of his and Shashi Gharti’s time in the last months: the Frictionless no-code application "),o("a",{attrs:{href:"https://opendataeditor.okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Data Editor"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The problem that inspired this new tool is that still today there is no easy tool to manage and publish data for those who don’t have technical skills. The new Open Data Editor offers the possibility to access all Frictionless functionalities without having to write one single line of code, nor open your shell. Like most Frictionless products, Open Data Editor focuses on tabular data, and it can easily open big files because it uses the "),o("code",[e._v("database")]),e._v(" under the hood (similarly to CKAN). You can use it to edit metadata, and declare some rules for opening that you can share with your collaborators, making your data more reproducible.")]),e._v(" "),o("p",[e._v("You can use it to create data visualisation with VegaLite, but Open Data Editor has also an AI support, which you can use to create charts for you, in case you don’t know how to use the VegaLite specifications. You can also publish data stories, and much much more! Check out Evgeny’s presentation to see all the great features of the Open Data Editor:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/a0IyJPSmJyY?si=xOfI7YmS4EsVcKlp",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("p",[e._v("The application is still a work in progress, but if you would like to try it out that’s of course absolutely possible, and we would love it if you could give us feedback, so please let us know if you spot anything weird. To make the experience smoother, we have a detailed "),o("a",{attrs:{href:"https://opendataeditor.okfn.org/documentation/getting-started/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation website"),o("OutboundLink")],1),e._v(" you can consult.")]),e._v(" "),o("h1",{attrs:{id:"join-us-in-october"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-october"}},[e._v("#")]),e._v(" Join us in October!")]),e._v(" "),o("p",[e._v("Next community call is on October 26"),o("sup",[e._v("th")]),e._v(", join us to hear exciting news about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless specs"),o("OutboundLink")],1),e._v(" update!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol) .")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("On a final note, here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/kxB7NZiXF4A?si=5tb2LrJFJaChP-dR",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}})])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/178.cfa13136.js b/assets/js/178.84bba6c4.js similarity index 98% rename from assets/js/178.cfa13136.js rename to assets/js/178.84bba6c4.js index 09d3790ed..a9c726e64 100644 --- a/assets/js/178.cfa13136.js +++ b/assets/js/178.84bba6c4.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[178],{714:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last community call on October 25"),o("sup",[e._v("th")]),e._v(", we started discussing with the community the Frictionless specs update. Thanks to the generous support of "),o("a",{attrs:{href:"https://nlnet.nl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("NLnet"),o("OutboundLink")],1),e._v(", the Frictionless core team, together with a working group composed of members of the community, will be focusing on this for the coming months.")]),e._v(" "),o("h2",{attrs:{id:"what-is-this-update-all-about"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-is-this-update-all-about"}},[e._v("#")]),e._v(" What is this update all about?")]),e._v(" "),o("p",[e._v("Our main goal is really to make the current Frictionless specs a finished product. We have a list of GitHub issues that we will use as a starting point for this iteration, but we would like to follow decisions made by the working group.")]),e._v(" "),o("p",[e._v("Please note, there will be no breaking changes (we can hear your sigh of relief!).")]),e._v(" "),o("p",[e._v("As a next step, "),o("strong",[e._v("we will write a separate blog that will serve as a reference for the overarching goals and roadmap of the project.")])]),e._v(" "),o("p",[e._v("The core Frictionless team at Open Knowledge Foundation will also draft a governance model to apply to the review process and how things get merged. Ideally we would like to test and build a new governance model that would delegate more decisions to the community, and that would then stay in place beyond the v2 release, to improve the project sustainability.")]),e._v(" "),o("p",[e._v("Another key goal is to increase diversity to get better representation when we think about things. We have a couple of ideas in mind, but we welcome any suggestion you may have.")]),e._v(" "),o("h1",{attrs:{id:"join-us-in-november"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-november"}},[e._v("#")]),e._v(" Join us in November!")]),e._v(" "),o("p",[e._v("Next community call is on November 30"),o("sup",[e._v("th")]),e._v(", join us to hear all the exciting news about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless specs"),o("OutboundLink")],1),e._v(" update!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol).")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("Here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/NVMZT19hlw0?si=wrFfKStzBFWNE4mI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"thank-you"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#thank-you"}},[e._v("#")]),e._v(" Thank you")]),e._v(" "),o("p",[e._v("On a final note, we would like to thank all community members who joined the call and who keep all these discussions alive, and those who manifested their interest in joining the specs working group. Without you, all of this would not be possible.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[178],{715:function(e,t,o){"use strict";o.r(t);var a=o(29),r=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("On our last community call on October 25"),o("sup",[e._v("th")]),e._v(", we started discussing with the community the Frictionless specs update. Thanks to the generous support of "),o("a",{attrs:{href:"https://nlnet.nl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("NLnet"),o("OutboundLink")],1),e._v(", the Frictionless core team, together with a working group composed of members of the community, will be focusing on this for the coming months.")]),e._v(" "),o("h2",{attrs:{id:"what-is-this-update-all-about"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-is-this-update-all-about"}},[e._v("#")]),e._v(" What is this update all about?")]),e._v(" "),o("p",[e._v("Our main goal is really to make the current Frictionless specs a finished product. We have a list of GitHub issues that we will use as a starting point for this iteration, but we would like to follow decisions made by the working group.")]),e._v(" "),o("p",[e._v("Please note, there will be no breaking changes (we can hear your sigh of relief!).")]),e._v(" "),o("p",[e._v("As a next step, "),o("strong",[e._v("we will write a separate blog that will serve as a reference for the overarching goals and roadmap of the project.")])]),e._v(" "),o("p",[e._v("The core Frictionless team at Open Knowledge Foundation will also draft a governance model to apply to the review process and how things get merged. Ideally we would like to test and build a new governance model that would delegate more decisions to the community, and that would then stay in place beyond the v2 release, to improve the project sustainability.")]),e._v(" "),o("p",[e._v("Another key goal is to increase diversity to get better representation when we think about things. We have a couple of ideas in mind, but we welcome any suggestion you may have.")]),e._v(" "),o("h1",{attrs:{id:"join-us-in-november"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#join-us-in-november"}},[e._v("#")]),e._v(" Join us in November!")]),e._v(" "),o("p",[e._v("Next community call is on November 30"),o("sup",[e._v("th")]),e._v(", join us to hear all the exciting news about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless specs"),o("OutboundLink")],1),e._v(" update!")]),e._v(" "),o("p",[e._v("Do you have something you would like to present to the community at one of the upcoming calls? Let us know via "),o("a",{attrs:{href:"https://forms.gle/AWpbxyiGESNSUFK2A",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(", or come and tell us on our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[e._v("community chat on Slack"),o("OutboundLink")],1),e._v("(also accessible via a "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Matrix bridge"),o("OutboundLink")],1),e._v(" if you prefer to use an open protocol).")]),e._v(" "),o("p",[e._v("You can sign up for the call already "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSeuNCopxXauMkrWvF6VHqOyHMcy54SfNDOseVXfWRQZWkvqjQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(". Do you want to share something with the community? Let us know when you sign up.")]),e._v(" "),o("h1",{attrs:{id:"call-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#call-recording"}},[e._v("#")]),e._v(" Call Recording")]),e._v(" "),o("p",[e._v("Here is the recording of the full call:")]),e._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/NVMZT19hlw0?si=wrFfKStzBFWNE4mI",title:"YouTube video player",frameborder:"0",allow:"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share",allowfullscreen:""}}),e._v(" "),o("h1",{attrs:{id:"thank-you"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#thank-you"}},[e._v("#")]),e._v(" Thank you")]),e._v(" "),o("p",[e._v("On a final note, we would like to thank all community members who joined the call and who keep all these discussions alive, and those who manifested their interest in joining the specs working group. Without you, all of this would not be possible.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/179.a21511ce.js b/assets/js/179.77c4feba.js similarity index 88% rename from assets/js/179.a21511ce.js rename to assets/js/179.77c4feba.js index 16c3215f5..bf46609e1 100644 --- a/assets/js/179.a21511ce.js +++ b/assets/js/179.77c4feba.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[179],{730:function(t,s,e){"use strict";e.r(s);var r=e(29),i=Object(r.a)({},(function(){var t=this.$createElement,s=this._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":this.$parent.slotKey}},[s("h1",{attrs:{id:"frictionless-architecture"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-architecture"}},[this._v("#")]),this._v(" Frictionless Architecture")]),this._v(" "),s("p",[s("img",{attrs:{src:"/img/structure.png",alt:"Design"}})])])}),[],!1,null,null,null);s.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[179],{717:function(t,s,e){"use strict";e.r(s);var r=e(29),i=Object(r.a)({},(function(){var t=this.$createElement,s=this._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":this.$parent.slotKey}},[s("h1",{attrs:{id:"frictionless-architecture"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-architecture"}},[this._v("#")]),this._v(" Frictionless Architecture")]),this._v(" "),s("p",[s("img",{attrs:{src:"/img/structure.png",alt:"Design"}})])])}),[],!1,null,null,null);s.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/180.4ffbea5e.js b/assets/js/180.fc91eaf8.js similarity index 99% rename from assets/js/180.4ffbea5e.js rename to assets/js/180.fc91eaf8.js index 757347826..2e961b376 100644 --- a/assets/js/180.4ffbea5e.js +++ b/assets/js/180.fc91eaf8.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[180],{718:function(e,s,t){"use strict";t.r(s);var r=t(29),o=Object(r.a)({},(function(){var e=this,s=e.$createElement,t=e._self._c||s;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("h1",{attrs:{id:"frictionless-process"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-process"}},[e._v("#")]),e._v(" Frictionless Process")]),e._v(" "),t("p",[e._v("This document proposes a process to work on the technical side of the Frictionless Data project. The goal - have things manageable for a minimal price.")]),e._v(" "),t("h2",{attrs:{id:"project"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#project"}},[e._v("#")]),e._v(" Project")]),e._v(" "),t("p",[e._v("The specific of the project is a huge amount of components and actors (repositories, issues, contributors etc). The process should be effective in handling this specific.")]),e._v(" "),t("h2",{attrs:{id:"process"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#process"}},[e._v("#")]),e._v(" Process")]),e._v(" "),t("p",[e._v("The main idea to focus on getting things done and reduce the price of maintaining the process instead of trying to fully mimic some popular methodologies. We use different ideas from different methodologies.")]),e._v(" "),t("h2",{attrs:{id:"roles"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#roles"}},[e._v("#")]),e._v(" Roles")]),e._v(" "),t("ul",[t("li",[e._v("Product Owner (PO)")]),e._v(" "),t("li",[e._v("Product Manager (PM)")]),e._v(" "),t("li",[e._v("Developer Advocate (DA)")]),e._v(" "),t("li",[e._v("Technical Lead (TL)")]),e._v(" "),t("li",[e._v("Senior Developer (SD)")]),e._v(" "),t("li",[e._v("Junior Developer (JD)")])]),e._v(" "),t("h2",{attrs:{id:"board"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#board"}},[e._v("#")]),e._v(" Board")]),e._v(" "),t("p",[e._v("We use a kanban board located at "),t("a",{attrs:{href:"https://github.com/orgs/frictionlessdata/projects/2?fullscreen=true",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/orgs/frictionlessdata/projects/2?fullscreen=true"),t("OutboundLink")],1),e._v(" to work on the project. The board has following columns (ordered by issue stage):")]),e._v(" "),t("ul",[t("li",[e._v("Backlog - unprocessed issues without labels and processed issues with labels")]),e._v(" "),t("li",[e._v("Priority - prioritized issues planned for the next iterations (estimated and assigned)")]),e._v(" "),t("li",[e._v("Current - current iteration issues promoted on iteration planning (estimated and assigned)")]),e._v(" "),t("li",[e._v("Review - issues under review process")]),e._v(" "),t("li",[e._v("Done - completed issues")])]),e._v(" "),t("h2",{attrs:{id:"workflow"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#workflow"}},[e._v("#")]),e._v(" Workflow")]),e._v(" "),t("p",[e._v("The work on the project is a live process splitted into 2 weeks iterations between iteration plannings (including retrospection):")]),e._v(" "),t("ul",[t("li",[e._v("Inside an iteration assigned persons work on their current issues and subset of roles do issues processing and prioritizing")]),e._v(" "),t("li",[e._v("During the iteration planning the team moves issues from the Priority column to the Current column and assign persons. Instead of issue estimations assigned person approves amount of work for the current iteration as a high-level estimation.")])]),e._v(" "),t("h2",{attrs:{id:"milestones"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#milestones"}},[e._v("#")]),e._v(" Milestones")]),e._v(" "),t("p",[e._v("As milestones we use concrete achievements e.g. from our roadmap. It could be tools or spec versions like “spec-v1”. We don’t use the workflow related milestones like “current” of “backlog” managing it via the board labeling system.")]),e._v(" "),t("h2",{attrs:{id:"labels"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#labels"}},[e._v("#")]),e._v(" Labels")]),e._v(" "),t("p",[e._v("Aside internal waffle labels and helpers labels like “question” etc we use core color-coded labels based on SemVer. The main point of processing issues from Inbox to Backlog is to add one of this labels because we need to plan releases, breaking announces etc:")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://cloud.githubusercontent.com/assets/557395/17673693/f6391676-632a-11e6-9971-945623b68e16.png",alt:"labels"}})]),e._v(" "),t("h2",{attrs:{id:"assignments"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#assignments"}},[e._v("#")]),e._v(" Assignments")]),e._v(" "),t("p",[e._v("Every issue in the Current column should be assigned to some person with meaning “this person should do some work on this issue to unblock it”. Assigned person should re-assign an issue for a current blocker. It provides a good real-time overview of the project.")]),e._v(" "),t("h2",{attrs:{id:"analysis"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#analysis"}},[e._v("#")]),e._v(" Analysis")]),e._v(" "),t("p",[e._v("After planning it’s highly recommended for an assigned person to write a short plan of how to solve the issue (could be a list of steps) and ask someone to check. This work could be done on some previous stages by subset of roles.")]),e._v(" "),t("h2",{attrs:{id:"branching"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#branching"}},[e._v("#")]),e._v(" Branching")]),e._v(" "),t("p",[e._v("We use Git Flow with some simplifications (see OKI coding standards). Master branch should always be “green” on tests and new features/fixes should go from pull requests. Direct committing to master could be allowed by subset of roles in some cases.")]),e._v(" "),t("h2",{attrs:{id:"pull-requests"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#pull-requests"}},[e._v("#")]),e._v(" Pull Requests")]),e._v(" "),t("p",[e._v("A pull request should be visually merged on the board to the corresponding issue using “It fixes #issue-number” sentence in the pull request description (initial comment). If there is no corresponding issue for the pull request it should be handled as an issue with labeling etc.")]),e._v(" "),t("h2",{attrs:{id:"reviews"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reviews"}},[e._v("#")]),e._v(" Reviews")]),e._v(" "),t("p",[e._v("After sending a pull request the author should assign the pull request to another person “asking” for a code review. After the review code should be merged to the codebase by the pull request author (or person having enough rights).")]),e._v(" "),t("h2",{attrs:{id:"documentation"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#documentation"}},[e._v("#")]),e._v(" Documentation")]),e._v(" "),t("p",[e._v("By default documentation for a tool should be written in "),t("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),t("OutboundLink")],1),e._v(" not using additional files and folders. It should be clean and well-structured. API should be documented in the code as docstrings. We compile project level docs automatically.")]),e._v(" "),t("h2",{attrs:{id:"testings"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#testings"}},[e._v("#")]),e._v(" Testings")]),e._v(" "),t("p",[e._v("Tests should be written using OKI coding standards. Start write tests from top (match high-level requirements) to bottom (if needed). The most high-level tests are implemented as testsuites on project level (integration tests between different tools).")]),e._v(" "),t("h2",{attrs:{id:"releasing"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#releasing"}},[e._v("#")]),e._v(" Releasing")]),e._v(" "),t("p",[e._v("We use SemVer for versioning and Github Actions for testing and releasing/deployments. We prefer short release cycle (features and fixes could be released immediately). Releases should be configured using tags based on package examples workflow provided by OKI.")]),e._v(" "),t("p",[e._v("The release process:")]),e._v(" "),t("ul",[t("li",[e._v("merge changes to the main branch on GitHub\n"),t("ul",[t("li",[e._v("use “Squash and Merge”")]),e._v(" "),t("li",[e._v("use clean commit message")])])]),e._v(" "),t("li",[e._v("pull the changes locally")]),e._v(" "),t("li",[e._v("update the software version according to SemVer rules\n"),t("ul",[t("li",[e._v("in Python projets we use "),t("code",[e._v("/assets/VERSION")])]),e._v(" "),t("li",[e._v("in JavaScript projects we use standard "),t("code",[e._v("package.json")])])])]),e._v(" "),t("li",[e._v("update a CHANGELOG file adding info about new feature or important changes")]),e._v(" "),t("li",[e._v("run "),t("code",[e._v("main release")]),e._v(" (it will release automatically)")])]),e._v(" "),t("h2",{attrs:{id:"references"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#references"}},[e._v("#")]),e._v(" References")]),e._v(" "),t("ul",[t("li",[t("a",{attrs:{href:"https://github.com/okfn/coding-standards",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International Coding Standards"),t("OutboundLink")],1)]),e._v(" "),t("li",[t("a",{attrs:{href:"https://mui.com/versions/#versioning-strategy",target:"_blank",rel:"noopener noreferrer"}},[e._v("MUI Versioning Strategy"),t("OutboundLink")],1)])])])}),[],!1,null,null,null);s.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[180],{720:function(e,s,t){"use strict";t.r(s);var r=t(29),o=Object(r.a)({},(function(){var e=this,s=e.$createElement,t=e._self._c||s;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("h1",{attrs:{id:"frictionless-process"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-process"}},[e._v("#")]),e._v(" Frictionless Process")]),e._v(" "),t("p",[e._v("This document proposes a process to work on the technical side of the Frictionless Data project. The goal - have things manageable for a minimal price.")]),e._v(" "),t("h2",{attrs:{id:"project"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#project"}},[e._v("#")]),e._v(" Project")]),e._v(" "),t("p",[e._v("The specific of the project is a huge amount of components and actors (repositories, issues, contributors etc). The process should be effective in handling this specific.")]),e._v(" "),t("h2",{attrs:{id:"process"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#process"}},[e._v("#")]),e._v(" Process")]),e._v(" "),t("p",[e._v("The main idea to focus on getting things done and reduce the price of maintaining the process instead of trying to fully mimic some popular methodologies. We use different ideas from different methodologies.")]),e._v(" "),t("h2",{attrs:{id:"roles"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#roles"}},[e._v("#")]),e._v(" Roles")]),e._v(" "),t("ul",[t("li",[e._v("Product Owner (PO)")]),e._v(" "),t("li",[e._v("Product Manager (PM)")]),e._v(" "),t("li",[e._v("Developer Advocate (DA)")]),e._v(" "),t("li",[e._v("Technical Lead (TL)")]),e._v(" "),t("li",[e._v("Senior Developer (SD)")]),e._v(" "),t("li",[e._v("Junior Developer (JD)")])]),e._v(" "),t("h2",{attrs:{id:"board"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#board"}},[e._v("#")]),e._v(" Board")]),e._v(" "),t("p",[e._v("We use a kanban board located at "),t("a",{attrs:{href:"https://github.com/orgs/frictionlessdata/projects/2?fullscreen=true",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/orgs/frictionlessdata/projects/2?fullscreen=true"),t("OutboundLink")],1),e._v(" to work on the project. The board has following columns (ordered by issue stage):")]),e._v(" "),t("ul",[t("li",[e._v("Backlog - unprocessed issues without labels and processed issues with labels")]),e._v(" "),t("li",[e._v("Priority - prioritized issues planned for the next iterations (estimated and assigned)")]),e._v(" "),t("li",[e._v("Current - current iteration issues promoted on iteration planning (estimated and assigned)")]),e._v(" "),t("li",[e._v("Review - issues under review process")]),e._v(" "),t("li",[e._v("Done - completed issues")])]),e._v(" "),t("h2",{attrs:{id:"workflow"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#workflow"}},[e._v("#")]),e._v(" Workflow")]),e._v(" "),t("p",[e._v("The work on the project is a live process splitted into 2 weeks iterations between iteration plannings (including retrospection):")]),e._v(" "),t("ul",[t("li",[e._v("Inside an iteration assigned persons work on their current issues and subset of roles do issues processing and prioritizing")]),e._v(" "),t("li",[e._v("During the iteration planning the team moves issues from the Priority column to the Current column and assign persons. Instead of issue estimations assigned person approves amount of work for the current iteration as a high-level estimation.")])]),e._v(" "),t("h2",{attrs:{id:"milestones"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#milestones"}},[e._v("#")]),e._v(" Milestones")]),e._v(" "),t("p",[e._v("As milestones we use concrete achievements e.g. from our roadmap. It could be tools or spec versions like “spec-v1”. We don’t use the workflow related milestones like “current” of “backlog” managing it via the board labeling system.")]),e._v(" "),t("h2",{attrs:{id:"labels"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#labels"}},[e._v("#")]),e._v(" Labels")]),e._v(" "),t("p",[e._v("Aside internal waffle labels and helpers labels like “question” etc we use core color-coded labels based on SemVer. The main point of processing issues from Inbox to Backlog is to add one of this labels because we need to plan releases, breaking announces etc:")]),e._v(" "),t("p",[t("img",{attrs:{src:"https://cloud.githubusercontent.com/assets/557395/17673693/f6391676-632a-11e6-9971-945623b68e16.png",alt:"labels"}})]),e._v(" "),t("h2",{attrs:{id:"assignments"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#assignments"}},[e._v("#")]),e._v(" Assignments")]),e._v(" "),t("p",[e._v("Every issue in the Current column should be assigned to some person with meaning “this person should do some work on this issue to unblock it”. Assigned person should re-assign an issue for a current blocker. It provides a good real-time overview of the project.")]),e._v(" "),t("h2",{attrs:{id:"analysis"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#analysis"}},[e._v("#")]),e._v(" Analysis")]),e._v(" "),t("p",[e._v("After planning it’s highly recommended for an assigned person to write a short plan of how to solve the issue (could be a list of steps) and ask someone to check. This work could be done on some previous stages by subset of roles.")]),e._v(" "),t("h2",{attrs:{id:"branching"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#branching"}},[e._v("#")]),e._v(" Branching")]),e._v(" "),t("p",[e._v("We use Git Flow with some simplifications (see OKI coding standards). Master branch should always be “green” on tests and new features/fixes should go from pull requests. Direct committing to master could be allowed by subset of roles in some cases.")]),e._v(" "),t("h2",{attrs:{id:"pull-requests"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#pull-requests"}},[e._v("#")]),e._v(" Pull Requests")]),e._v(" "),t("p",[e._v("A pull request should be visually merged on the board to the corresponding issue using “It fixes #issue-number” sentence in the pull request description (initial comment). If there is no corresponding issue for the pull request it should be handled as an issue with labeling etc.")]),e._v(" "),t("h2",{attrs:{id:"reviews"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#reviews"}},[e._v("#")]),e._v(" Reviews")]),e._v(" "),t("p",[e._v("After sending a pull request the author should assign the pull request to another person “asking” for a code review. After the review code should be merged to the codebase by the pull request author (or person having enough rights).")]),e._v(" "),t("h2",{attrs:{id:"documentation"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#documentation"}},[e._v("#")]),e._v(" Documentation")]),e._v(" "),t("p",[e._v("By default documentation for a tool should be written in "),t("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),t("OutboundLink")],1),e._v(" not using additional files and folders. It should be clean and well-structured. API should be documented in the code as docstrings. We compile project level docs automatically.")]),e._v(" "),t("h2",{attrs:{id:"testings"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#testings"}},[e._v("#")]),e._v(" Testings")]),e._v(" "),t("p",[e._v("Tests should be written using OKI coding standards. Start write tests from top (match high-level requirements) to bottom (if needed). The most high-level tests are implemented as testsuites on project level (integration tests between different tools).")]),e._v(" "),t("h2",{attrs:{id:"releasing"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#releasing"}},[e._v("#")]),e._v(" Releasing")]),e._v(" "),t("p",[e._v("We use SemVer for versioning and Github Actions for testing and releasing/deployments. We prefer short release cycle (features and fixes could be released immediately). Releases should be configured using tags based on package examples workflow provided by OKI.")]),e._v(" "),t("p",[e._v("The release process:")]),e._v(" "),t("ul",[t("li",[e._v("merge changes to the main branch on GitHub\n"),t("ul",[t("li",[e._v("use “Squash and Merge”")]),e._v(" "),t("li",[e._v("use clean commit message")])])]),e._v(" "),t("li",[e._v("pull the changes locally")]),e._v(" "),t("li",[e._v("update the software version according to SemVer rules\n"),t("ul",[t("li",[e._v("in Python projets we use "),t("code",[e._v("/assets/VERSION")])]),e._v(" "),t("li",[e._v("in JavaScript projects we use standard "),t("code",[e._v("package.json")])])])]),e._v(" "),t("li",[e._v("update a CHANGELOG file adding info about new feature or important changes")]),e._v(" "),t("li",[e._v("run "),t("code",[e._v("main release")]),e._v(" (it will release automatically)")])]),e._v(" "),t("h2",{attrs:{id:"references"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#references"}},[e._v("#")]),e._v(" References")]),e._v(" "),t("ul",[t("li",[t("a",{attrs:{href:"https://github.com/okfn/coding-standards",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International Coding Standards"),t("OutboundLink")],1)]),e._v(" "),t("li",[t("a",{attrs:{href:"https://mui.com/versions/#versioning-strategy",target:"_blank",rel:"noopener noreferrer"}},[e._v("MUI Versioning Strategy"),t("OutboundLink")],1)])])])}),[],!1,null,null,null);s.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/182.bc731f01.js b/assets/js/182.95e258c9.js similarity index 98% rename from assets/js/182.bc731f01.js rename to assets/js/182.95e258c9.js index 4b6a75ae5..81015a69c 100644 --- a/assets/js/182.bc731f01.js +++ b/assets/js/182.95e258c9.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[182],{722:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("h1",{attrs:{id:"join-the-frictionless-data-community-for-a-two-day-virtual-hackathon-on-7-8-october"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-the-frictionless-data-community-for-a-two-day-virtual-hackathon-on-7-8-october"}},[t._v("#")]),t._v(" Join the Frictionless Data community for a two-day virtual Hackathon on 7-8 October!")]),t._v(" "),a("blockquote",[a("p",[t._v("Registration is now open using this form: "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://forms.gle/ZhrVfSBrNy2UPRZc9"),a("OutboundLink")],1)])]),t._v(" "),a("blockquote",[a("p",[t._v("See the Participation Guide at the bottom for more info!")])]),t._v(" "),a("h2",{attrs:{id:"what-s-a-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-a-hackathon"}},[t._v("#")]),t._v(" What’s a hackathon?")]),t._v(" "),a("p",[t._v("You’ll work within a group of other Frictionless users to create new project prototypes based on existing Frictionless open source code. For example, use the new "),a("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark"),a("OutboundLink")],1),t._v(" tool to create websites that display data-driven storytelling, or use Frictionless React "),a("a",{attrs:{href:"https://components.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Components"),a("OutboundLink")],1),t._v(" to add data validation to your application.")]),t._v(" "),a("h2",{attrs:{id:"who-should-participate-in-this-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#who-should-participate-in-this-hackathon"}},[t._v("#")]),t._v(" Who should participate in this hackathon?")]),t._v(" "),a("p",[t._v("We’re looking for contributions of all sizes and skill levels! Some skills that you would bring include: coding in Python (other languages supported too!), writing documentation, project management, having ideas, design skills, and general enthusiasm! You’ll be in a team, so you can learn from each other and help each other. You don’t have to be familiar with Frictionless yet - you can learn that during the event.")]),t._v(" "),a("h2",{attrs:{id:"why-should-i-participate"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-should-i-participate"}},[t._v("#")]),t._v(" Why should I participate?")]),t._v(" "),a("p",[t._v("First of all, it will be fun! You’ll meet other Frictionless users and learn something new. This is also an opportunity where you’ll have the uninterrupted support of the Frictionless core team to help you realize your prototype. Also, there will be prizes (details to be announced later).")]),t._v(" "),a("h2",{attrs:{id:"when-will-the-hackathon-occur"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#when-will-the-hackathon-occur"}},[t._v("#")]),t._v(" When will the hackathon occur?")]),t._v(" "),a("p",[t._v("The hackathon will be virtual and occur on 7-8 October. The event will start at 9am CEST on 7 October, and will end at 6pm CEST on 8 October. This will allow people from around the world to participate during a time that works for them. We will be using Github and Zoom to coordinate and work virtually. Teams will be able to form before the event occurs so you can start coordinating early and hit the ground running.")]),t._v(" "),a("h2",{attrs:{id:"sign-me-up"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#sign-me-up"}},[t._v("#")]),t._v(" Sign me up!")]),t._v(" "),a("p",[t._v("Use "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(" to register. The event will be free, and we will also have some scholarships for attendees that would otherwise be unable to attend. Apply for a $300 scholarship using this "),a("a",{attrs:{href:"https://forms.gle/jwxVYjDYs31t1YmKA",target:"_blank",rel:"noopener noreferrer"}},[t._v("scholarship form"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"what-projects-will-be-at-the-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-projects-will-be-at-the-hackathon"}},[t._v("#")]),t._v(" What projects will be at the Hackathon?")]),t._v(" "),a("p",[t._v("Projects will range from a GeoJSON Plugin for frictionless-py, to Python code to work with Datapackages in CKAN, to creating a static site to list all the Frictionless datasets on GitHub, to creating new tutorials for Frictionless code."),a("br"),t._v("\nAll of the projects will be added to the event dashboard at "),a("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/event/1#top",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://frictionless-hackathon.herokuapp.com/event/1#top"),a("OutboundLink")],1),t._v(", powered by "),a("a",{attrs:{href:"https://dribdat.cc/",target:"_blank",rel:"noopener noreferrer"}},[t._v("DribDat"),a("OutboundLink")],1),t._v("."),a("br"),t._v("\nInterested in working on your own project? Email us!")]),t._v(" "),a("h2",{attrs:{id:"i-have-questions"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-have-questions"}},[t._v("#")]),t._v(" I have questions…")]),t._v(" "),a("p",[t._v("Please email us at "),a("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[t._v("frictionlessdata@okfn.org")]),t._v(" if you have questions or would like to support the Hackathon.")]),t._v(" "),a("h1",{attrs:{id:"participation-guide"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#participation-guide"}},[t._v("#")]),t._v(" Participation Guide")]),t._v(" "),a("p",[t._v("("),a("a",{attrs:{href:"https://docs.google.com/document/d/e/2PACX-1vReWY9N26SbveoCM7Ra4wEry8k7a5rCa3UzpBijfU_mmyME58DRDKmu0QUmx75mif4367IZdtLijFzO/pub",target:"_blank",rel:"noopener noreferrer"}},[t._v("Here is a link to the Guide"),a("OutboundLink")],1),t._v(")")]),t._v(" "),a("iframe",{attrs:{width:"730",height:"500",src:"https://docs.google.com/document/d/e/2PACX-1vReWY9N26SbveoCM7Ra4wEry8k7a5rCa3UzpBijfU_mmyME58DRDKmu0QUmx75mif4367IZdtLijFzO/pub?embedded=true"}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[182],{721:function(t,e,a){"use strict";a.r(e);var o=a(29),r=Object(o.a)({},(function(){var t=this,e=t.$createElement,a=t._self._c||e;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("h1",{attrs:{id:"join-the-frictionless-data-community-for-a-two-day-virtual-hackathon-on-7-8-october"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#join-the-frictionless-data-community-for-a-two-day-virtual-hackathon-on-7-8-october"}},[t._v("#")]),t._v(" Join the Frictionless Data community for a two-day virtual Hackathon on 7-8 October!")]),t._v(" "),a("blockquote",[a("p",[t._v("Registration is now open using this form: "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://forms.gle/ZhrVfSBrNy2UPRZc9"),a("OutboundLink")],1)])]),t._v(" "),a("blockquote",[a("p",[t._v("See the Participation Guide at the bottom for more info!")])]),t._v(" "),a("h2",{attrs:{id:"what-s-a-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-s-a-hackathon"}},[t._v("#")]),t._v(" What’s a hackathon?")]),t._v(" "),a("p",[t._v("You’ll work within a group of other Frictionless users to create new project prototypes based on existing Frictionless open source code. For example, use the new "),a("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Livemark"),a("OutboundLink")],1),t._v(" tool to create websites that display data-driven storytelling, or use Frictionless React "),a("a",{attrs:{href:"https://components.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Components"),a("OutboundLink")],1),t._v(" to add data validation to your application.")]),t._v(" "),a("h2",{attrs:{id:"who-should-participate-in-this-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#who-should-participate-in-this-hackathon"}},[t._v("#")]),t._v(" Who should participate in this hackathon?")]),t._v(" "),a("p",[t._v("We’re looking for contributions of all sizes and skill levels! Some skills that you would bring include: coding in Python (other languages supported too!), writing documentation, project management, having ideas, design skills, and general enthusiasm! You’ll be in a team, so you can learn from each other and help each other. You don’t have to be familiar with Frictionless yet - you can learn that during the event.")]),t._v(" "),a("h2",{attrs:{id:"why-should-i-participate"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-should-i-participate"}},[t._v("#")]),t._v(" Why should I participate?")]),t._v(" "),a("p",[t._v("First of all, it will be fun! You’ll meet other Frictionless users and learn something new. This is also an opportunity where you’ll have the uninterrupted support of the Frictionless core team to help you realize your prototype. Also, there will be prizes (details to be announced later).")]),t._v(" "),a("h2",{attrs:{id:"when-will-the-hackathon-occur"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#when-will-the-hackathon-occur"}},[t._v("#")]),t._v(" When will the hackathon occur?")]),t._v(" "),a("p",[t._v("The hackathon will be virtual and occur on 7-8 October. The event will start at 9am CEST on 7 October, and will end at 6pm CEST on 8 October. This will allow people from around the world to participate during a time that works for them. We will be using Github and Zoom to coordinate and work virtually. Teams will be able to form before the event occurs so you can start coordinating early and hit the ground running.")]),t._v(" "),a("h2",{attrs:{id:"sign-me-up"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#sign-me-up"}},[t._v("#")]),t._v(" Sign me up!")]),t._v(" "),a("p",[t._v("Use "),a("a",{attrs:{href:"https://forms.gle/ZhrVfSBrNy2UPRZc9",target:"_blank",rel:"noopener noreferrer"}},[t._v("this form"),a("OutboundLink")],1),t._v(" to register. The event will be free, and we will also have some scholarships for attendees that would otherwise be unable to attend. Apply for a $300 scholarship using this "),a("a",{attrs:{href:"https://forms.gle/jwxVYjDYs31t1YmKA",target:"_blank",rel:"noopener noreferrer"}},[t._v("scholarship form"),a("OutboundLink")],1),t._v(".")]),t._v(" "),a("h2",{attrs:{id:"what-projects-will-be-at-the-hackathon"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-projects-will-be-at-the-hackathon"}},[t._v("#")]),t._v(" What projects will be at the Hackathon?")]),t._v(" "),a("p",[t._v("Projects will range from a GeoJSON Plugin for frictionless-py, to Python code to work with Datapackages in CKAN, to creating a static site to list all the Frictionless datasets on GitHub, to creating new tutorials for Frictionless code."),a("br"),t._v("\nAll of the projects will be added to the event dashboard at "),a("a",{attrs:{href:"https://frictionless-hackathon.herokuapp.com/event/1#top",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://frictionless-hackathon.herokuapp.com/event/1#top"),a("OutboundLink")],1),t._v(", powered by "),a("a",{attrs:{href:"https://dribdat.cc/",target:"_blank",rel:"noopener noreferrer"}},[t._v("DribDat"),a("OutboundLink")],1),t._v("."),a("br"),t._v("\nInterested in working on your own project? Email us!")]),t._v(" "),a("h2",{attrs:{id:"i-have-questions"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#i-have-questions"}},[t._v("#")]),t._v(" I have questions…")]),t._v(" "),a("p",[t._v("Please email us at "),a("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[t._v("frictionlessdata@okfn.org")]),t._v(" if you have questions or would like to support the Hackathon.")]),t._v(" "),a("h1",{attrs:{id:"participation-guide"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#participation-guide"}},[t._v("#")]),t._v(" Participation Guide")]),t._v(" "),a("p",[t._v("("),a("a",{attrs:{href:"https://docs.google.com/document/d/e/2PACX-1vReWY9N26SbveoCM7Ra4wEry8k7a5rCa3UzpBijfU_mmyME58DRDKmu0QUmx75mif4367IZdtLijFzO/pub",target:"_blank",rel:"noopener noreferrer"}},[t._v("Here is a link to the Guide"),a("OutboundLink")],1),t._v(")")]),t._v(" "),a("iframe",{attrs:{width:"730",height:"500",src:"https://docs.google.com/document/d/e/2PACX-1vReWY9N26SbveoCM7Ra4wEry8k7a5rCa3UzpBijfU_mmyME58DRDKmu0QUmx75mif4367IZdtLijFzO/pub?embedded=true"}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/183.69928528.js b/assets/js/183.4f0bb929.js similarity index 99% rename from assets/js/183.69928528.js rename to assets/js/183.4f0bb929.js index 7e8c2fce5..21704e075 100644 --- a/assets/js/183.69928528.js +++ b/assets/js/183.4f0bb929.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[183],{723:function(t,a,e){"use strict";e.r(a);var r=e(29),s=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"frictionless-data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data"}},[t._v("#")]),t._v(" Frictionless Data")]),t._v(" "),e("p",[e("big",[e("strong",[t._v("Get a quick introduction to Frictionless in “5 minutes”.")])])],1),t._v(" "),e("p",[t._v("Frictionless Data is a progressive open-source framework for building data infrastructure – data management, data integration, data flows, etc. It includes various data standards and provides software to work with data.")]),t._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[t._v("TIP")]),t._v(" "),e("p",[t._v("This introduction assumes some basic knowledge about data. If you are new to working with data we recommend starting with the first module, “What is Data?”, at "),e("a",{attrs:{href:"https://schoolofdata.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("School of Data"),e("OutboundLink")],1),t._v(".")])]),t._v(" "),e("h2",{attrs:{id:"why-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#why-frictionless"}},[t._v("#")]),t._v(" Why Frictionless?")]),t._v(" "),e("p",[t._v("The Frictionless Data project aims to make it easier to work with data - by reducing common data workflow issues (what we call "),e("em",[t._v("friction")]),t._v("). Frictionless Data consists of two main parts, software and standards.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/structure.png",alt:"Structure"}})]),t._v(" "),e("h3",{attrs:{id:"frictionless-software"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-software"}},[t._v("#")]),t._v(" Frictionless Software")]),t._v(" "),e("p",[t._v("The software is based on a suite of data standards that have been designed to make it easy to describe data structure and content so that data is more interoperable, easier to understand, and quicker to use. There are several aspects to the Frictionless software, including two high-level data frameworks (for Python and JavaScript), 10 low-level libraries for other languages, like R, and also visual interfaces and applications. You can read more about how to use the software (and find documentation) on the "),e("a",{attrs:{href:"/projects"}},[t._v("projects")]),t._v(" page.")]),t._v(" "),e("p",[t._v("For example, here is a validation report created by the "),e("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Repository"),e("OutboundLink")],1),t._v(" software. Data validation is one of the main focuses of Frictionless Data and this is a good visual representation of how the project might help to reveal common problems working with data.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/report.png",alt:"Report"}})]),t._v(" "),e("h3",{attrs:{id:"frictionless-standards"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-standards"}},[t._v("#")]),t._v(" Frictionless Standards")]),t._v(" "),e("p",[t._v("The Standards (aka Specifications) help to describe data. The core specification is called a "),e("strong",[t._v("Data Package")]),t._v(", which is a simple container format used to describe and package a collection of data files. The format provides a contract for data interoperability that supports frictionless delivery, installation and management of data.")]),t._v(" "),e("p",[t._v("A Data Package can contain any kind of data. At the same time, Data Packages can be specialized and enriched for specific types of data so there are, for example, Tabular Data Packages for tabular data, Geo Data Packages for geo data, etc.")]),t._v(" "),e("p",[t._v("To learn more about Data Packages and the other specifications, check out the "),e("a",{attrs:{href:"/projects"}},[t._v("projects")]),t._v(" page or watch this video to learn more about the motivation behind packaging data.")]),t._v(" "),e("iframe",{attrs:{width:"730",height:"400",src:"https://www.youtube.com/embed/lWHKVXxuci0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),e("h2",{attrs:{id:"how-can-i-use-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#how-can-i-use-frictionless"}},[t._v("#")]),t._v(" How can I use Frictionless?")]),t._v(" "),e("p",[t._v("You can use Frictionless to describe your data (add metadata and schemas), validate your data, and transform your data. You can also write custom data standards based on the Frictionless specifications. For example, you can use Frictionless to:")]),t._v(" "),e("ul",[e("li",[t._v("easily add metadata to your data before you publish it.")]),t._v(" "),e("li",[t._v("quickly validate your data to check the data quality before you share it.")]),t._v(" "),e("li",[t._v("build a declarative pipeline to clean and process data before analyzing it.")])]),t._v(" "),e("p",[t._v("Usually, new users start by trying out the software. The software gives you an ability to work with Frictionless using visual interfaces or programming languages.")]),t._v(" "),e("p",[t._v("As a new user you might not need to dive too deeply into the standards as our software incapsulates its concepts. On the other hand, once you feel comfortable with Frictionless Software you might start reading Frictionless Standards to get a better understanding of the things happening under the hood or to start creating your metadata descriptors more proficiently.")]),t._v(" "),e("h2",{attrs:{id:"who-uses-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#who-uses-frictionless"}},[t._v("#")]),t._v(" Who uses Frictionless?")]),t._v(" "),e("p",[t._v("The Frictionless Data project has a very diverse audience, ranging from climate scientists, to humanities researchers, to government data centers.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/audience.png",alt:"Audience"}})]),t._v(" "),e("p",[t._v("During our project development we have had various collaborations with institutions and individuals. We keep track of our "),e("a",{attrs:{href:"/tag/pilot"}},[t._v("Pilots")]),t._v(" and "),e("a",{attrs:{href:"/tag/case-studies"}},[t._v("Case Studies")]),t._v(" with blog posts, and we welcome our community to share their experiences using our standards and software. Generally speaking, you can apply Frictionless in almost every field where you work with data. Your Frictionless use case could range from a simple data table validation to writing complex data pipelines.")]),t._v(" "),e("h2",{attrs:{id:"ready-for-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#ready-for-more"}},[t._v("#")]),t._v(" Ready for more?")]),t._v(" "),e("p",[t._v("As a next step, we recommend you start using one of our "),e("a",{attrs:{href:"/projects"}},[t._v("Software")]),t._v(" projects, get known our "),e("a",{attrs:{href:"/projects"}},[t._v("Standards")]),t._v(" or read about other user experience in "),e("a",{attrs:{href:"/tag/pilot"}},[t._v("Pilots")]),t._v(" and "),e("a",{attrs:{href:"/tag/case-studies"}},[t._v("Case Studies")]),t._v(" sections. Also, we welcome you to reach out on "),e("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),e("OutboundLink")],1),t._v(" or "),e("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),e("OutboundLink")],1),t._v(" to say hi or ask questions!")])])}),[],!1,null,null,null);a.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[183],{722:function(t,a,e){"use strict";e.r(a);var r=e(29),s=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"frictionless-data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data"}},[t._v("#")]),t._v(" Frictionless Data")]),t._v(" "),e("p",[e("big",[e("strong",[t._v("Get a quick introduction to Frictionless in “5 minutes”.")])])],1),t._v(" "),e("p",[t._v("Frictionless Data is a progressive open-source framework for building data infrastructure – data management, data integration, data flows, etc. It includes various data standards and provides software to work with data.")]),t._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[t._v("TIP")]),t._v(" "),e("p",[t._v("This introduction assumes some basic knowledge about data. If you are new to working with data we recommend starting with the first module, “What is Data?”, at "),e("a",{attrs:{href:"https://schoolofdata.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("School of Data"),e("OutboundLink")],1),t._v(".")])]),t._v(" "),e("h2",{attrs:{id:"why-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#why-frictionless"}},[t._v("#")]),t._v(" Why Frictionless?")]),t._v(" "),e("p",[t._v("The Frictionless Data project aims to make it easier to work with data - by reducing common data workflow issues (what we call "),e("em",[t._v("friction")]),t._v("). Frictionless Data consists of two main parts, software and standards.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/structure.png",alt:"Structure"}})]),t._v(" "),e("h3",{attrs:{id:"frictionless-software"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-software"}},[t._v("#")]),t._v(" Frictionless Software")]),t._v(" "),e("p",[t._v("The software is based on a suite of data standards that have been designed to make it easy to describe data structure and content so that data is more interoperable, easier to understand, and quicker to use. There are several aspects to the Frictionless software, including two high-level data frameworks (for Python and JavaScript), 10 low-level libraries for other languages, like R, and also visual interfaces and applications. You can read more about how to use the software (and find documentation) on the "),e("a",{attrs:{href:"/projects"}},[t._v("projects")]),t._v(" page.")]),t._v(" "),e("p",[t._v("For example, here is a validation report created by the "),e("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Repository"),e("OutboundLink")],1),t._v(" software. Data validation is one of the main focuses of Frictionless Data and this is a good visual representation of how the project might help to reveal common problems working with data.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/report.png",alt:"Report"}})]),t._v(" "),e("h3",{attrs:{id:"frictionless-standards"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-standards"}},[t._v("#")]),t._v(" Frictionless Standards")]),t._v(" "),e("p",[t._v("The Standards (aka Specifications) help to describe data. The core specification is called a "),e("strong",[t._v("Data Package")]),t._v(", which is a simple container format used to describe and package a collection of data files. The format provides a contract for data interoperability that supports frictionless delivery, installation and management of data.")]),t._v(" "),e("p",[t._v("A Data Package can contain any kind of data. At the same time, Data Packages can be specialized and enriched for specific types of data so there are, for example, Tabular Data Packages for tabular data, Geo Data Packages for geo data, etc.")]),t._v(" "),e("p",[t._v("To learn more about Data Packages and the other specifications, check out the "),e("a",{attrs:{href:"/projects"}},[t._v("projects")]),t._v(" page or watch this video to learn more about the motivation behind packaging data.")]),t._v(" "),e("iframe",{attrs:{width:"730",height:"400",src:"https://www.youtube.com/embed/lWHKVXxuci0",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}}),t._v(" "),e("h2",{attrs:{id:"how-can-i-use-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#how-can-i-use-frictionless"}},[t._v("#")]),t._v(" How can I use Frictionless?")]),t._v(" "),e("p",[t._v("You can use Frictionless to describe your data (add metadata and schemas), validate your data, and transform your data. You can also write custom data standards based on the Frictionless specifications. For example, you can use Frictionless to:")]),t._v(" "),e("ul",[e("li",[t._v("easily add metadata to your data before you publish it.")]),t._v(" "),e("li",[t._v("quickly validate your data to check the data quality before you share it.")]),t._v(" "),e("li",[t._v("build a declarative pipeline to clean and process data before analyzing it.")])]),t._v(" "),e("p",[t._v("Usually, new users start by trying out the software. The software gives you an ability to work with Frictionless using visual interfaces or programming languages.")]),t._v(" "),e("p",[t._v("As a new user you might not need to dive too deeply into the standards as our software incapsulates its concepts. On the other hand, once you feel comfortable with Frictionless Software you might start reading Frictionless Standards to get a better understanding of the things happening under the hood or to start creating your metadata descriptors more proficiently.")]),t._v(" "),e("h2",{attrs:{id:"who-uses-frictionless"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#who-uses-frictionless"}},[t._v("#")]),t._v(" Who uses Frictionless?")]),t._v(" "),e("p",[t._v("The Frictionless Data project has a very diverse audience, ranging from climate scientists, to humanities researchers, to government data centers.")]),t._v(" "),e("p",[e("img",{attrs:{src:"/img/introduction/audience.png",alt:"Audience"}})]),t._v(" "),e("p",[t._v("During our project development we have had various collaborations with institutions and individuals. We keep track of our "),e("a",{attrs:{href:"/tag/pilot"}},[t._v("Pilots")]),t._v(" and "),e("a",{attrs:{href:"/tag/case-studies"}},[t._v("Case Studies")]),t._v(" with blog posts, and we welcome our community to share their experiences using our standards and software. Generally speaking, you can apply Frictionless in almost every field where you work with data. Your Frictionless use case could range from a simple data table validation to writing complex data pipelines.")]),t._v(" "),e("h2",{attrs:{id:"ready-for-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#ready-for-more"}},[t._v("#")]),t._v(" Ready for more?")]),t._v(" "),e("p",[t._v("As a next step, we recommend you start using one of our "),e("a",{attrs:{href:"/projects"}},[t._v("Software")]),t._v(" projects, get known our "),e("a",{attrs:{href:"/projects"}},[t._v("Standards")]),t._v(" or read about other user experience in "),e("a",{attrs:{href:"/tag/pilot"}},[t._v("Pilots")]),t._v(" and "),e("a",{attrs:{href:"/tag/case-studies"}},[t._v("Case Studies")]),t._v(" sections. Also, we welcome you to reach out on "),e("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack"),e("OutboundLink")],1),t._v(" or "),e("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),e("OutboundLink")],1),t._v(" to say hi or ask questions!")])])}),[],!1,null,null,null);a.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/184.ef41442f.js b/assets/js/184.2db18145.js similarity index 99% rename from assets/js/184.ef41442f.js rename to assets/js/184.2db18145.js index a17730130..52ccb3612 100644 --- a/assets/js/184.ef41442f.js +++ b/assets/js/184.2db18145.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[184],{724:function(a,t,e){"use strict";e.r(t);var r=e(29),i=Object(r.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("h1",{attrs:{id:"frictionless-projects"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-projects"}},[a._v("#")]),a._v(" Frictionless Projects")]),a._v(" "),e("p",[e("big",[e("strong",[a._v("Open source projects for working with data.")])])],1),a._v(" "),e("p",[a._v("The Frictionless Data project provides a rich set of open source projects for working with data. There are tools, a visual application, and software for many programming platforms.")]),a._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),e("p",[a._v("This document is an overview of the Frictionless Projects - for more in-depth information, please click on one of the projects below and you will be redirected to a corresponding documentation portal.")])]),a._v(" "),e("h2",{attrs:{id:"software-and-standards"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#software-and-standards"}},[a._v("#")]),a._v(" Software and Standards")]),a._v(" "),e("p",[a._v("It’s a list of core Frictionless Projects developed by the core Frictionless Team:")]),a._v(" "),e("div",{staticClass:"main-section black-text"},[e("div",{staticClass:"features flex flex-row flex-wrap py-4"},[e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://application.frictionlessdata.io/",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/components.png"}}),a._v(" "),e("h3",[a._v("Frictionless Application")])]),a._v(" "),e("p",[a._v("Data management application for Browser and Desktop for working with tabular data.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://framework.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/framework.png"}}),a._v(" "),e("h3",[a._v("Frictionless Framework")])]),a._v(" "),e("p",[a._v("Python framework to describe, extract, validate, and transform tabular data.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://livemark.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/livemark.png"}}),a._v(" "),e("h3",[a._v("Livemark")])]),a._v(" "),e("p",[a._v("Static site generator that extends Markdown with charts, tables, scripts, and more.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://repository.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/repository.png"}}),a._v(" "),e("h3",[a._v("Frictionless Repository")])]),a._v(" "),e("p",[a._v("Github Action allowing you to validate tabular data on every commit to your repository.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://specs.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/libraries.png"}}),a._v(" "),e("h3",[a._v("Frictionless Standards")])]),a._v(" "),e("p",[a._v("Lightweight yet comprehensive data standards as Data Package and Table Schema.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://datahub.io/",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/datahub.png"}}),a._v(" "),e("h3",[a._v("Datahub")])]),a._v(" "),e("p",[a._v("A web platform built on Frictionless Data that allows discovering, publishing, and sharing data.")])])])])]),a._v(" "),e("h2",{attrs:{id:"which-software-is-right-for-me"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#which-software-is-right-for-me"}},[a._v("#")]),a._v(" Which software is right for me?")]),a._v(" "),e("p",[a._v("Choosing the right tool for the job can be challenging. Here are our recommendations:")]),a._v(" "),e("h3",{attrs:{id:"visual-interfaces"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#visual-interfaces"}},[a._v("#")]),a._v(" Visual Interfaces")]),a._v(" "),e("p",[a._v("If you prefer to use a visual interface:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Application (coming soon):")]),a._v(" We’re working on our brand-new Frictionless Application that will be released in 2021. Until then, you can use "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),e("OutboundLink")],1),a._v(" to create and edit data packages and "),e("a",{attrs:{href:"http://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Goodtables On-Demand"),e("OutboundLink")],1),a._v(" for data validation.")]),a._v(" "),e("li",[e("strong",[a._v("Frictionless Repository:")]),a._v(" For ensuring the quality of your data on Github, Frictionless provides "),e("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Repository"),e("OutboundLink")],1),a._v(". This creates visual quality reports and validation statuses on Github everytime you commit your data.")]),a._v(" "),e("li",[e("strong",[a._v("Datahub:")]),a._v(" For discovering, publishing, and sharing data we have "),e("a",{attrs:{href:"https://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Datahub"),e("OutboundLink")],1),a._v(" which is built on Frictionless software. Using this software as a service, you can sign-in and find, share, and publish quality data.")])]),a._v(" "),e("h3",{attrs:{id:"command-line-interfaces"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#command-line-interfaces"}},[a._v("#")]),a._v(" Command-line Interfaces")]),a._v(" "),e("p",[a._v("If you like to write commands in the command-line interface:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Framework:")]),a._v(" For describing, extracting, validating, and transforming data, Frictionless provides the "),e("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Framework’s"),e("OutboundLink")],1),a._v(" command-line interface. Using the “frictionless” command you can achieve many goals without needing to write Python code.")]),a._v(" "),e("li",[e("strong",[a._v("Livemark:")]),a._v(" For data journalists and technical writers we have a project called "),e("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Livemark"),e("OutboundLink")],1),a._v(". Using the “livemark” command in the CLI you can publish a website that incorporates Frictionless functions and is powered by markdown articles.")]),a._v(" "),e("li",[e("strong",[a._v("Datahub:")]),a._v(" Frictionless provides a command-line tool called "),e("a",{attrs:{href:"https://datahub.io/docs/features/data-cli",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data"),e("OutboundLink")],1),a._v(" which is an important part of the Datahub project. The “data” command is available for a JavaScript environment and it helps you to interact with data stored on Datahub.")])]),a._v(" "),e("h3",{attrs:{id:"programming-languages"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#programming-languages"}},[a._v("#")]),a._v(" Programming Languages")]),a._v(" "),e("p",[a._v("If you want to use or write your own Frictionless code:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Framework:")]),a._v(" For general data programming in Python, the "),e("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Framework"),e("OutboundLink")],1),a._v(" is the way to go. You can describe, extract, validate, and transform your data. It’s also possible to extend the framework by adding new validation checks, transformation steps, etc. In addition, there is a lightweight version of the framework written in "),e("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js",target:"_blank",rel:"noopener noreferrer"}},[a._v("JavaScript"),e("OutboundLink")],1),a._v(".")]),a._v(" "),e("li",[e("strong",[a._v("Frictionless Universe:")]),a._v(" For Frictionless implementations in other languages like "),e("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-r",target:"_blank",rel:"noopener noreferrer"}},[a._v("R"),e("OutboundLink")],1),a._v(" or Java and visual components, we have "),e("RouterLink",{attrs:{to:"/universe/"}},[a._v("Frictionless Universe")]),a._v(". Each library provides metadata validation and editing along with other low-level data operations like reading or writing tabular files.")],1)]),a._v(" "),e("h2",{attrs:{id:"which-standard-is-right-for-me"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#which-standard-is-right-for-me"}},[a._v("#")]),a._v(" Which standard is right for me?")]),a._v(" "),e("p",[a._v("To help you pick a standard to use, we’ve categorized them according to how many files you are working with.")]),a._v(" "),e("h3",{attrs:{id:"collection-of-files"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#collection-of-files"}},[a._v("#")]),a._v(" Collection of Files")]),a._v(" "),e("p",[a._v("If you have more than one file:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Data Package")]),a._v(": Use a "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package"),e("OutboundLink")],1),a._v(" for describing datasets of any file format. Data Package is a basic container format for describing a collection of data in a single “package”. It provides a basis for convenient delivery, installation and management of datasets.")]),a._v(" "),e("li",[e("strong",[a._v("Fiscal Data Package")]),a._v(": For fiscal data, use a "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Fiscal Data Package"),e("OutboundLink")],1),a._v(". This lightweight and user-oriented format is for publishing and consuming fiscal data. It concerns with how fiscal data should be packaged and providing means for publishers to best convey the meaning of the data - so it can be optimally used by consumers.")])]),a._v(" "),e("h3",{attrs:{id:"individual-file"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#individual-file"}},[a._v("#")]),a._v(" Individual File")]),a._v(" "),e("p",[a._v("If you need to describe an individual file:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Data Resource")]),a._v(": Use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Resource"),e("OutboundLink")],1),a._v(" for describing individual files. Data Resource is a format to describe and package a single data resource of any file format, such as an individual table or file. It can also be extended for specific use cases.")]),a._v(" "),e("li",[e("strong",[a._v("Tabular Data Resource")]),a._v(": For tabular data, use the Data Resource extension called "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-resource/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Tabular Data Resource"),e("OutboundLink")],1),a._v(". Tabular Data Resource describes a single "),e("em",[a._v("tabular")]),a._v(" data resource such as a CSV file. It includes support for metadata and schemas to describe the data content and structure.")]),a._v(" "),e("li",[e("strong",[a._v("Table Schema")]),a._v(": To describe only the schema of a tabular data file, use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Table Schema"),e("OutboundLink")],1),a._v(". Table Schema is a format to declare a schema for tabular data. The schema is designed to be expressible in JSON. You can have a schema as independent metadata or use it with a Tabular Data Resource.")]),a._v(" "),e("li",[e("strong",[a._v("CSV Dialect")]),a._v(": To specify the CSV dialect within a schema, use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",target:"_blank",rel:"noopener noreferrer"}},[a._v("CSV Dialect"),e("OutboundLink")],1),a._v(". This defines a format to describe the various dialects of CSV files in a language agnostic manner. This is important because CSV files might be published in different forms, making it harder to read the data without errors. CSV Dialect can be used with a Tabular Data Resource to provide additional information.")])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[184],{723:function(a,t,e){"use strict";e.r(t);var r=e(29),i=Object(r.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("h1",{attrs:{id:"frictionless-projects"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-projects"}},[a._v("#")]),a._v(" Frictionless Projects")]),a._v(" "),e("p",[e("big",[e("strong",[a._v("Open source projects for working with data.")])])],1),a._v(" "),e("p",[a._v("The Frictionless Data project provides a rich set of open source projects for working with data. There are tools, a visual application, and software for many programming platforms.")]),a._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),e("p",[a._v("This document is an overview of the Frictionless Projects - for more in-depth information, please click on one of the projects below and you will be redirected to a corresponding documentation portal.")])]),a._v(" "),e("h2",{attrs:{id:"software-and-standards"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#software-and-standards"}},[a._v("#")]),a._v(" Software and Standards")]),a._v(" "),e("p",[a._v("It’s a list of core Frictionless Projects developed by the core Frictionless Team:")]),a._v(" "),e("div",{staticClass:"main-section black-text"},[e("div",{staticClass:"features flex flex-row flex-wrap py-4"},[e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://application.frictionlessdata.io/",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/components.png"}}),a._v(" "),e("h3",[a._v("Frictionless Application")])]),a._v(" "),e("p",[a._v("Data management application for Browser and Desktop for working with tabular data.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://framework.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/framework.png"}}),a._v(" "),e("h3",[a._v("Frictionless Framework")])]),a._v(" "),e("p",[a._v("Python framework to describe, extract, validate, and transform tabular data.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://livemark.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/livemark.png"}}),a._v(" "),e("h3",[a._v("Livemark")])]),a._v(" "),e("p",[a._v("Static site generator that extends Markdown with charts, tables, scripts, and more.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://repository.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/repository.png"}}),a._v(" "),e("h3",[a._v("Frictionless Repository")])]),a._v(" "),e("p",[a._v("Github Action allowing you to validate tabular data on every commit to your repository.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://specs.frictionlessdata.io",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/libraries.png"}}),a._v(" "),e("h3",[a._v("Frictionless Standards")])]),a._v(" "),e("p",[a._v("Lightweight yet comprehensive data standards as Data Package and Table Schema.")])])]),a._v(" "),e("div",{staticClass:"w-full md:w-1/3 feature flex justify-center"},[e("div",{staticClass:"px-8 text-center"},[e("a",{attrs:{href:"https://datahub.io/",target:"_blank"}},[e("img",{staticStyle:{width:"200px",border:"dashed 1px #555",padding:"10px","border-radius":"10px"},attrs:{src:"/img/software/datahub.png"}}),a._v(" "),e("h3",[a._v("Datahub")])]),a._v(" "),e("p",[a._v("A web platform built on Frictionless Data that allows discovering, publishing, and sharing data.")])])])])]),a._v(" "),e("h2",{attrs:{id:"which-software-is-right-for-me"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#which-software-is-right-for-me"}},[a._v("#")]),a._v(" Which software is right for me?")]),a._v(" "),e("p",[a._v("Choosing the right tool for the job can be challenging. Here are our recommendations:")]),a._v(" "),e("h3",{attrs:{id:"visual-interfaces"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#visual-interfaces"}},[a._v("#")]),a._v(" Visual Interfaces")]),a._v(" "),e("p",[a._v("If you prefer to use a visual interface:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Application (coming soon):")]),a._v(" We’re working on our brand-new Frictionless Application that will be released in 2021. Until then, you can use "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),e("OutboundLink")],1),a._v(" to create and edit data packages and "),e("a",{attrs:{href:"http://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Goodtables On-Demand"),e("OutboundLink")],1),a._v(" for data validation.")]),a._v(" "),e("li",[e("strong",[a._v("Frictionless Repository:")]),a._v(" For ensuring the quality of your data on Github, Frictionless provides "),e("a",{attrs:{href:"https://repository.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Repository"),e("OutboundLink")],1),a._v(". This creates visual quality reports and validation statuses on Github everytime you commit your data.")]),a._v(" "),e("li",[e("strong",[a._v("Datahub:")]),a._v(" For discovering, publishing, and sharing data we have "),e("a",{attrs:{href:"https://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Datahub"),e("OutboundLink")],1),a._v(" which is built on Frictionless software. Using this software as a service, you can sign-in and find, share, and publish quality data.")])]),a._v(" "),e("h3",{attrs:{id:"command-line-interfaces"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#command-line-interfaces"}},[a._v("#")]),a._v(" Command-line Interfaces")]),a._v(" "),e("p",[a._v("If you like to write commands in the command-line interface:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Framework:")]),a._v(" For describing, extracting, validating, and transforming data, Frictionless provides the "),e("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Framework’s"),e("OutboundLink")],1),a._v(" command-line interface. Using the “frictionless” command you can achieve many goals without needing to write Python code.")]),a._v(" "),e("li",[e("strong",[a._v("Livemark:")]),a._v(" For data journalists and technical writers we have a project called "),e("a",{attrs:{href:"https://livemark.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Livemark"),e("OutboundLink")],1),a._v(". Using the “livemark” command in the CLI you can publish a website that incorporates Frictionless functions and is powered by markdown articles.")]),a._v(" "),e("li",[e("strong",[a._v("Datahub:")]),a._v(" Frictionless provides a command-line tool called "),e("a",{attrs:{href:"https://datahub.io/docs/features/data-cli",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data"),e("OutboundLink")],1),a._v(" which is an important part of the Datahub project. The “data” command is available for a JavaScript environment and it helps you to interact with data stored on Datahub.")])]),a._v(" "),e("h3",{attrs:{id:"programming-languages"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#programming-languages"}},[a._v("#")]),a._v(" Programming Languages")]),a._v(" "),e("p",[a._v("If you want to use or write your own Frictionless code:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Frictionless Framework:")]),a._v(" For general data programming in Python, the "),e("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Frictionless Framework"),e("OutboundLink")],1),a._v(" is the way to go. You can describe, extract, validate, and transform your data. It’s also possible to extend the framework by adding new validation checks, transformation steps, etc. In addition, there is a lightweight version of the framework written in "),e("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js",target:"_blank",rel:"noopener noreferrer"}},[a._v("JavaScript"),e("OutboundLink")],1),a._v(".")]),a._v(" "),e("li",[e("strong",[a._v("Frictionless Universe:")]),a._v(" For Frictionless implementations in other languages like "),e("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-r",target:"_blank",rel:"noopener noreferrer"}},[a._v("R"),e("OutboundLink")],1),a._v(" or Java and visual components, we have "),e("RouterLink",{attrs:{to:"/universe/"}},[a._v("Frictionless Universe")]),a._v(". Each library provides metadata validation and editing along with other low-level data operations like reading or writing tabular files.")],1)]),a._v(" "),e("h2",{attrs:{id:"which-standard-is-right-for-me"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#which-standard-is-right-for-me"}},[a._v("#")]),a._v(" Which standard is right for me?")]),a._v(" "),e("p",[a._v("To help you pick a standard to use, we’ve categorized them according to how many files you are working with.")]),a._v(" "),e("h3",{attrs:{id:"collection-of-files"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#collection-of-files"}},[a._v("#")]),a._v(" Collection of Files")]),a._v(" "),e("p",[a._v("If you have more than one file:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Data Package")]),a._v(": Use a "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package"),e("OutboundLink")],1),a._v(" for describing datasets of any file format. Data Package is a basic container format for describing a collection of data in a single “package”. It provides a basis for convenient delivery, installation and management of datasets.")]),a._v(" "),e("li",[e("strong",[a._v("Fiscal Data Package")]),a._v(": For fiscal data, use a "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Fiscal Data Package"),e("OutboundLink")],1),a._v(". This lightweight and user-oriented format is for publishing and consuming fiscal data. It concerns with how fiscal data should be packaged and providing means for publishers to best convey the meaning of the data - so it can be optimally used by consumers.")])]),a._v(" "),e("h3",{attrs:{id:"individual-file"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#individual-file"}},[a._v("#")]),a._v(" Individual File")]),a._v(" "),e("p",[a._v("If you need to describe an individual file:")]),a._v(" "),e("ul",[e("li",[e("strong",[a._v("Data Resource")]),a._v(": Use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Resource"),e("OutboundLink")],1),a._v(" for describing individual files. Data Resource is a format to describe and package a single data resource of any file format, such as an individual table or file. It can also be extended for specific use cases.")]),a._v(" "),e("li",[e("strong",[a._v("Tabular Data Resource")]),a._v(": For tabular data, use the Data Resource extension called "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-resource/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Tabular Data Resource"),e("OutboundLink")],1),a._v(". Tabular Data Resource describes a single "),e("em",[a._v("tabular")]),a._v(" data resource such as a CSV file. It includes support for metadata and schemas to describe the data content and structure.")]),a._v(" "),e("li",[e("strong",[a._v("Table Schema")]),a._v(": To describe only the schema of a tabular data file, use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Table Schema"),e("OutboundLink")],1),a._v(". Table Schema is a format to declare a schema for tabular data. The schema is designed to be expressible in JSON. You can have a schema as independent metadata or use it with a Tabular Data Resource.")]),a._v(" "),e("li",[e("strong",[a._v("CSV Dialect")]),a._v(": To specify the CSV dialect within a schema, use "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",target:"_blank",rel:"noopener noreferrer"}},[a._v("CSV Dialect"),e("OutboundLink")],1),a._v(". This defines a format to describe the various dialects of CSV files in a language agnostic manner. This is important because CSV files might be published in different forms, making it harder to read the data without errors. CSV Dialect can be used with a Tabular Data Resource to provide additional information.")])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/186.7b3af427.js b/assets/js/186.a43d5ac2.js similarity index 99% rename from assets/js/186.7b3af427.js rename to assets/js/186.a43d5ac2.js index c06b56d03..b101effa3 100644 --- a/assets/js/186.7b3af427.js +++ b/assets/js/186.a43d5ac2.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[186],{726:function(e,t,o){"use strict";o.r(t);var a=o(29),i=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h1",{attrs:{id:"code-of-conduct"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#code-of-conduct"}},[e._v("#")]),e._v(" Code of Conduct")]),e._v(" "),o("h2",{attrs:{id:"introduction"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[e._v("#")]),e._v(" Introduction")]),e._v(" "),o("p",[e._v("The goal of this Code of Conduct is to make explicit the type of participation that is expected, and the behaviour that is unacceptable. These guidelines are to be adhered to by all Frictionless Data team members, all partners on a given project, and all other participants.")]),e._v(" "),o("p",[e._v("This Code of Conduct applies to all the projects that Frictionless Data hosts/organises and describes the standards of behaviour that we expect all our partners to observe when taking part in our projects. We expect all voices to be welcomed at our events and strive to empower everyone to feel able to participate fully.")]),e._v(" "),o("h2",{attrs:{id:"this-code-is-applicable-to"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#this-code-is-applicable-to"}},[e._v("#")]),e._v(" This Code is applicable to")]),e._v(" "),o("ul",[o("li",[e._v("All public areas of participation, including but not limited to discussion forums, mailing lists, issue trackers, social media, and in-person venues such as conferences and workshops.")]),e._v(" "),o("li",[e._v("All private areas of participation, including but not limited to email and closed platforms such as Slack or Matrix.")]),e._v(" "),o("li",[e._v("Any project that Frictionless Data leads on or partners in.")])]),e._v(" "),o("h2",{attrs:{id:"what-we-expect"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-we-expect"}},[e._v("#")]),e._v(" What we expect")]),e._v(" "),o("p",[e._v("The following behaviours are expected from all project participants, including Frictionless Data core team members, project partners, and all other participants.")]),e._v(" "),o("ul",[o("li",[e._v("Lead by example by being considerate in your actions and decisions.")]),e._v(" "),o("li",[e._v("Be respectful in speech and action, especially in disagreement.")]),e._v(" "),o("li",[e._v("Refrain from demeaning, discriminatory, or harassing behaviour and speech.")]),e._v(" "),o("li",[e._v("We all make mistakes, and when we do, we take responsibility for them.")]),e._v(" "),o("li",[e._v("Be mindful of your fellow participants. If someone is in distress, or if someone is in violation of these guidelines, reach out.")])]),e._v(" "),o("h2",{attrs:{id:"what-we-find-unacceptable"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-we-find-unacceptable"}},[e._v("#")]),e._v(" What we find unacceptable")]),e._v(" "),o("p",[e._v("We do not tolerate harassment of participants at our events in any form. Harassment includes offensive verbal comments, deliberate intimidation, harassing photography or recording, inappropriate physical contact and unwanted sexual attention. Anything that makes someone feel uncomfortable could be deemed harassment. For more information and examples about what constitutes harassment, please refer to "),o("a",{attrs:{href:"https://www.opencon2018.org/code_of_conduct",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenCon’s Code of Conduct in Brief"),o("OutboundLink")],1),e._v(" and the "),o("a",{attrs:{href:"http://openhardware.science/gosh-2017/gosh-code-of-conduct/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gathering for Open Source Hardware’s examples of behaviour"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("This non-exhaustive list shows examples of behaviours that are unacceptable from all participants:")]),e._v(" "),o("ul",[o("li",[e._v("Violence and threats of violence.")]),e._v(" "),o("li",[e._v("Derogatory comments of any form, including related to gender and expression, sexual orientation, disability, mental illness, neuro(a)typicality, physical appearance, body size, race, religion, age, or socio-economic status.")]),e._v(" "),o("li",[e._v("Sexual images or behaviour.")]),e._v(" "),o("li",[e._v("Posting or threatening to post other people’s personally identifying information (“doxing”).")]),e._v(" "),o("li",[e._v("Deliberate misgendering or use of former names, or improper titles.")]),e._v(" "),o("li",[e._v("Inappropriate photography or recording.")]),e._v(" "),o("li",[e._v("Physical contact without affirmative consent.")]),e._v(" "),o("li",[e._v("Unwelcome sexual attention. This includes, sexualised comments or jokes; inappropriate touching, groping, and unwelcome sexual advances.")]),e._v(" "),o("li",[e._v("Deliberate intimidation, stalking or following (online or in person).")]),e._v(" "),o("li",[e._v("Sustained disruption of conference events, including talks and presentations.")]),e._v(" "),o("li",[e._v("Advocating for, or encouraging, any of the above behaviour.")])]),e._v(" "),o("h2",{attrs:{id:"consequences-of-unacceptable-behaviour"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#consequences-of-unacceptable-behaviour"}},[e._v("#")]),e._v(" Consequences of unacceptable behaviour")]),e._v(" "),o("p",[e._v("Unacceptable behaviour from any participant in any public or private forum around projects we are involved in, including those with decision-making authority, will not be tolerated.")]),e._v(" "),o("p",[e._v("Anyone asked to stop unacceptable behaviour is expected to comply immediately.")]),e._v(" "),o("p",[e._v("If a participant engages in unacceptable behaviour, any action deemed appropriate will be taken, up to and including a temporary ban, permanent expulsion from participatory forums, or reporting to local law enforcement for criminal offences.")]),e._v(" "),o("h2",{attrs:{id:"reporting"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#reporting"}},[e._v("#")]),e._v(" Reporting")]),e._v(" "),o("p",[e._v("If you are subject to, or witness, unacceptable behaviour, or have any other concerns, please email "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(". We will handle all reports with discretion, and you can report anonymously if you wish using "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSfoly-CZT9ZONcns4uG7BsoxGObRqgTlI6NdfvlYSCRVyy_QQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In your report, please do your best to include:")]),e._v(" "),o("p",[e._v("Your contact information (unless you wish to report anonymously)")]),e._v(" "),o("ul",[o("li",[e._v("Identifying information (e.g. names, nicknames, pseudonyms) of the participant who has violated the Code of Conduct")]),e._v(" "),o("li",[e._v("The behaviour that was in violation")]),e._v(" "),o("li",[e._v("The approximate time of the behaviour")]),e._v(" "),o("li",[e._v("If possible, where the Code of Conduct violation happened")]),e._v(" "),o("li",[e._v("The circumstances surrounding the incident")]),e._v(" "),o("li",[e._v("Other people involved in the incident")]),e._v(" "),o("li",[e._v("If you believe the incident is ongoing, please let us know")]),e._v(" "),o("li",[e._v("If there is a publicly available record (e.g. mailing list record), please include a link")]),e._v(" "),o("li",[e._v("Any additional helpful information")])]),e._v(" "),o("p",[e._v("We will fully investigate any reports, follow up with the reportee (unless it is an anonymous report), and we will work with the reportee (unless anonymous) to decide what action to take. If the complaint is about someone on the response team, that person will recuse themselves from handling the response.")]),e._v(" "),o("h2",{attrs:{id:"confidentiality"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#confidentiality"}},[e._v("#")]),e._v(" Confidentiality")]),e._v(" "),o("p",[e._v("All reports will be kept confidential. When we discuss incidents with people who are reported, we will anonymize details as much as we can to protect reporter privacy. In some cases we may determine that a public statement will need to be made. If that’s the case, the identities of all victims and reporters will remain confidential unless those individuals instruct us otherwise.")]),e._v(" "),o("h2",{attrs:{id:"license-and-attribution"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#license-and-attribution"}},[e._v("#")]),e._v(" License and attribution")]),e._v(" "),o("p",[e._v("This Code of Conduct is distributed under a "),o("a",{attrs:{href:"https://creativecommons.org/licenses/by-sa/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Creative Commons Attribution-ShareAlike license"),o("OutboundLink")],1),e._v(". It draws heavily on the "),o("a",{attrs:{href:"https://okfn.org/about/code-of-conduct/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation Code of Conduct"),o("OutboundLink")],1),e._v(", which is based on this "),o("a",{attrs:{href:"https://wiki.mozilla.org/Participation/Community_Gatherings/Brazil_2016/Code_of_Conduct",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mozilla Code of Conduct"),o("OutboundLink")],1),e._v(", the School of Data Code of Conduct, and the "),o("a",{attrs:{href:"https://csvconf.com/coc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf Code of Conduct"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[186],{724:function(e,t,o){"use strict";o.r(t);var a=o(29),i=Object(a.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h1",{attrs:{id:"code-of-conduct"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#code-of-conduct"}},[e._v("#")]),e._v(" Code of Conduct")]),e._v(" "),o("h2",{attrs:{id:"introduction"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[e._v("#")]),e._v(" Introduction")]),e._v(" "),o("p",[e._v("The goal of this Code of Conduct is to make explicit the type of participation that is expected, and the behaviour that is unacceptable. These guidelines are to be adhered to by all Frictionless Data team members, all partners on a given project, and all other participants.")]),e._v(" "),o("p",[e._v("This Code of Conduct applies to all the projects that Frictionless Data hosts/organises and describes the standards of behaviour that we expect all our partners to observe when taking part in our projects. We expect all voices to be welcomed at our events and strive to empower everyone to feel able to participate fully.")]),e._v(" "),o("h2",{attrs:{id:"this-code-is-applicable-to"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#this-code-is-applicable-to"}},[e._v("#")]),e._v(" This Code is applicable to")]),e._v(" "),o("ul",[o("li",[e._v("All public areas of participation, including but not limited to discussion forums, mailing lists, issue trackers, social media, and in-person venues such as conferences and workshops.")]),e._v(" "),o("li",[e._v("All private areas of participation, including but not limited to email and closed platforms such as Slack or Matrix.")]),e._v(" "),o("li",[e._v("Any project that Frictionless Data leads on or partners in.")])]),e._v(" "),o("h2",{attrs:{id:"what-we-expect"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-we-expect"}},[e._v("#")]),e._v(" What we expect")]),e._v(" "),o("p",[e._v("The following behaviours are expected from all project participants, including Frictionless Data core team members, project partners, and all other participants.")]),e._v(" "),o("ul",[o("li",[e._v("Lead by example by being considerate in your actions and decisions.")]),e._v(" "),o("li",[e._v("Be respectful in speech and action, especially in disagreement.")]),e._v(" "),o("li",[e._v("Refrain from demeaning, discriminatory, or harassing behaviour and speech.")]),e._v(" "),o("li",[e._v("We all make mistakes, and when we do, we take responsibility for them.")]),e._v(" "),o("li",[e._v("Be mindful of your fellow participants. If someone is in distress, or if someone is in violation of these guidelines, reach out.")])]),e._v(" "),o("h2",{attrs:{id:"what-we-find-unacceptable"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-we-find-unacceptable"}},[e._v("#")]),e._v(" What we find unacceptable")]),e._v(" "),o("p",[e._v("We do not tolerate harassment of participants at our events in any form. Harassment includes offensive verbal comments, deliberate intimidation, harassing photography or recording, inappropriate physical contact and unwanted sexual attention. Anything that makes someone feel uncomfortable could be deemed harassment. For more information and examples about what constitutes harassment, please refer to "),o("a",{attrs:{href:"https://www.opencon2018.org/code_of_conduct",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenCon’s Code of Conduct in Brief"),o("OutboundLink")],1),e._v(" and the "),o("a",{attrs:{href:"http://openhardware.science/gosh-2017/gosh-code-of-conduct/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gathering for Open Source Hardware’s examples of behaviour"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("This non-exhaustive list shows examples of behaviours that are unacceptable from all participants:")]),e._v(" "),o("ul",[o("li",[e._v("Violence and threats of violence.")]),e._v(" "),o("li",[e._v("Derogatory comments of any form, including related to gender and expression, sexual orientation, disability, mental illness, neuro(a)typicality, physical appearance, body size, race, religion, age, or socio-economic status.")]),e._v(" "),o("li",[e._v("Sexual images or behaviour.")]),e._v(" "),o("li",[e._v("Posting or threatening to post other people’s personally identifying information (“doxing”).")]),e._v(" "),o("li",[e._v("Deliberate misgendering or use of former names, or improper titles.")]),e._v(" "),o("li",[e._v("Inappropriate photography or recording.")]),e._v(" "),o("li",[e._v("Physical contact without affirmative consent.")]),e._v(" "),o("li",[e._v("Unwelcome sexual attention. This includes, sexualised comments or jokes; inappropriate touching, groping, and unwelcome sexual advances.")]),e._v(" "),o("li",[e._v("Deliberate intimidation, stalking or following (online or in person).")]),e._v(" "),o("li",[e._v("Sustained disruption of conference events, including talks and presentations.")]),e._v(" "),o("li",[e._v("Advocating for, or encouraging, any of the above behaviour.")])]),e._v(" "),o("h2",{attrs:{id:"consequences-of-unacceptable-behaviour"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#consequences-of-unacceptable-behaviour"}},[e._v("#")]),e._v(" Consequences of unacceptable behaviour")]),e._v(" "),o("p",[e._v("Unacceptable behaviour from any participant in any public or private forum around projects we are involved in, including those with decision-making authority, will not be tolerated.")]),e._v(" "),o("p",[e._v("Anyone asked to stop unacceptable behaviour is expected to comply immediately.")]),e._v(" "),o("p",[e._v("If a participant engages in unacceptable behaviour, any action deemed appropriate will be taken, up to and including a temporary ban, permanent expulsion from participatory forums, or reporting to local law enforcement for criminal offences.")]),e._v(" "),o("h2",{attrs:{id:"reporting"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#reporting"}},[e._v("#")]),e._v(" Reporting")]),e._v(" "),o("p",[e._v("If you are subject to, or witness, unacceptable behaviour, or have any other concerns, please email "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("frictionlessdata@okfn.org")]),e._v(". We will handle all reports with discretion, and you can report anonymously if you wish using "),o("a",{attrs:{href:"https://docs.google.com/forms/d/e/1FAIpQLSfoly-CZT9ZONcns4uG7BsoxGObRqgTlI6NdfvlYSCRVyy_QQ/viewform?usp=sf_link",target:"_blank",rel:"noopener noreferrer"}},[e._v("this form"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In your report, please do your best to include:")]),e._v(" "),o("p",[e._v("Your contact information (unless you wish to report anonymously)")]),e._v(" "),o("ul",[o("li",[e._v("Identifying information (e.g. names, nicknames, pseudonyms) of the participant who has violated the Code of Conduct")]),e._v(" "),o("li",[e._v("The behaviour that was in violation")]),e._v(" "),o("li",[e._v("The approximate time of the behaviour")]),e._v(" "),o("li",[e._v("If possible, where the Code of Conduct violation happened")]),e._v(" "),o("li",[e._v("The circumstances surrounding the incident")]),e._v(" "),o("li",[e._v("Other people involved in the incident")]),e._v(" "),o("li",[e._v("If you believe the incident is ongoing, please let us know")]),e._v(" "),o("li",[e._v("If there is a publicly available record (e.g. mailing list record), please include a link")]),e._v(" "),o("li",[e._v("Any additional helpful information")])]),e._v(" "),o("p",[e._v("We will fully investigate any reports, follow up with the reportee (unless it is an anonymous report), and we will work with the reportee (unless anonymous) to decide what action to take. If the complaint is about someone on the response team, that person will recuse themselves from handling the response.")]),e._v(" "),o("h2",{attrs:{id:"confidentiality"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#confidentiality"}},[e._v("#")]),e._v(" Confidentiality")]),e._v(" "),o("p",[e._v("All reports will be kept confidential. When we discuss incidents with people who are reported, we will anonymize details as much as we can to protect reporter privacy. In some cases we may determine that a public statement will need to be made. If that’s the case, the identities of all victims and reporters will remain confidential unless those individuals instruct us otherwise.")]),e._v(" "),o("h2",{attrs:{id:"license-and-attribution"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#license-and-attribution"}},[e._v("#")]),e._v(" License and attribution")]),e._v(" "),o("p",[e._v("This Code of Conduct is distributed under a "),o("a",{attrs:{href:"https://creativecommons.org/licenses/by-sa/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Creative Commons Attribution-ShareAlike license"),o("OutboundLink")],1),e._v(". It draws heavily on the "),o("a",{attrs:{href:"https://okfn.org/about/code-of-conduct/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation Code of Conduct"),o("OutboundLink")],1),e._v(", which is based on this "),o("a",{attrs:{href:"https://wiki.mozilla.org/Participation/Community_Gatherings/Brazil_2016/Code_of_Conduct",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mozilla Code of Conduct"),o("OutboundLink")],1),e._v(", the School of Data Code of Conduct, and the "),o("a",{attrs:{href:"https://csvconf.com/coc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csv,conf Code of Conduct"),o("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/187.e3b7e0c5.js b/assets/js/187.c303e15d.js similarity index 98% rename from assets/js/187.e3b7e0c5.js rename to assets/js/187.c303e15d.js index a99ddd439..e5bedc692 100644 --- a/assets/js/187.e3b7e0c5.js +++ b/assets/js/187.c303e15d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[187],{727:function(e,t,s){"use strict";s.r(t);var o=s(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,s=e._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[s("h1",{attrs:{id:"contribute"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#contribute"}},[e._v("#")]),e._v(" Contribute")]),e._v(" "),s("h2",{attrs:{id:"introduction"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[e._v("#")]),e._v(" Introduction")]),e._v(" "),s("p",[e._v("We welcome contributions – and you don’t have to be a software developer to get involved! The first step to becoming a Frictionless Data contributor is to become a Frictionless Data user. Please read the following guidelines, and feel free to reach out to us if you have any questions. Thanks for your interest in helping make Frictionless awesome!")]),e._v(" "),s("h2",{attrs:{id:"general-guidelines"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#general-guidelines"}},[e._v("#")]),e._v(" General Guidelines")]),e._v(" "),s("h3",{attrs:{id:"reporting-a-bug-or-issue"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#reporting-a-bug-or-issue"}},[e._v("#")]),e._v(" Reporting a bug or issue:")]),e._v(" "),s("p",[e._v("We use "),s("a",{attrs:{href:"https://github.com/frictionlessdata/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),s("OutboundLink")],1),e._v(" as a code and issues hosting platform. To report a bug or propose a new feature, please open an issue. For issues with a specific code repository, please open an issue in that specific repository’s tracker on GitHub. For example: "),s("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),s("OutboundLink")],1)]),e._v(" "),s("h3",{attrs:{id:"give-us-feedback-suggestions-propose-a-new-idea"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#give-us-feedback-suggestions-propose-a-new-idea"}},[e._v("#")]),e._v(" Give us feedback/suggestions/propose a new idea:")]),e._v(" "),s("p",[e._v("What if the issue is not a bug but a question? Please head to the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/project/discussions",target:"_blank",rel:"noopener noreferrer"}},[e._v("discussion forum"),s("OutboundLink")],1),e._v(". This is an excellent place to give us thorough feedback about your experience as a whole. In the same way, you may participate in existing discussions and make your voice heard.")]),e._v(" "),s("h3",{attrs:{id:"pull-requests"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#pull-requests"}},[e._v("#")]),e._v(" Pull requests:")]),e._v(" "),s("p",[e._v("For pull requests, we ask that you initially create an issue and then create a pull requests linked to this issue. Look for issues with “help wanted” or “first-time contributor.” We welcome pull requests from anyone!")]),e._v(" "),s("h3",{attrs:{id:"specific-guidelines"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#specific-guidelines"}},[e._v("#")]),e._v(" Specific guidelines:")]),e._v(" "),s("p",[e._v("Each individual software project has more specific contribution guidelines that you can find in the README in the project’s repository. For example: "),s("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js#developers",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-js#developers"),s("OutboundLink")],1)]),e._v(" "),s("h2",{attrs:{id:"documentation"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#documentation"}},[e._v("#")]),e._v(" Documentation")]),e._v(" "),s("p",[e._v("Are you seeking to advocate and educate people in the data space? We always welcome contributions to our documentation! You can help improve our documentation by opening pull requests if you find typos, have ideas to improve the clarity of the document, or want to translate the text to a non-English language. You can also write tutorials (like this one: "),s("a",{attrs:{href:"https://colab.research.google.com/drive/12RmGajHamGP5wOoAhy8N7Gchn9TmVnG-",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Describe and Extract Tutorial"),s("OutboundLink")],1),e._v("). Let us know if you would like to contribute or if you are interested but need some help!")]),e._v(" "),s("h2",{attrs:{id:"share-your-work-with-us"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#share-your-work-with-us"}},[e._v("#")]),e._v(" Share your work with us!")]),e._v(" "),s("p",[e._v("Are you using Frictionless with your data? Have you spoken at a conference about using Frictionless? We would love to hear about it! We also have opportunities for blog writing and presenting at our monthly community calls - "),s("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("contact us")]),e._v(" to learn more!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[187],{728:function(e,t,s){"use strict";s.r(t);var o=s(29),a=Object(o.a)({},(function(){var e=this,t=e.$createElement,s=e._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[s("h1",{attrs:{id:"contribute"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#contribute"}},[e._v("#")]),e._v(" Contribute")]),e._v(" "),s("h2",{attrs:{id:"introduction"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[e._v("#")]),e._v(" Introduction")]),e._v(" "),s("p",[e._v("We welcome contributions – and you don’t have to be a software developer to get involved! The first step to becoming a Frictionless Data contributor is to become a Frictionless Data user. Please read the following guidelines, and feel free to reach out to us if you have any questions. Thanks for your interest in helping make Frictionless awesome!")]),e._v(" "),s("h2",{attrs:{id:"general-guidelines"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#general-guidelines"}},[e._v("#")]),e._v(" General Guidelines")]),e._v(" "),s("h3",{attrs:{id:"reporting-a-bug-or-issue"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#reporting-a-bug-or-issue"}},[e._v("#")]),e._v(" Reporting a bug or issue:")]),e._v(" "),s("p",[e._v("We use "),s("a",{attrs:{href:"https://github.com/frictionlessdata/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),s("OutboundLink")],1),e._v(" as a code and issues hosting platform. To report a bug or propose a new feature, please open an issue. For issues with a specific code repository, please open an issue in that specific repository’s tracker on GitHub. For example: "),s("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-py/issues",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-py/issues"),s("OutboundLink")],1)]),e._v(" "),s("h3",{attrs:{id:"give-us-feedback-suggestions-propose-a-new-idea"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#give-us-feedback-suggestions-propose-a-new-idea"}},[e._v("#")]),e._v(" Give us feedback/suggestions/propose a new idea:")]),e._v(" "),s("p",[e._v("What if the issue is not a bug but a question? Please head to the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/project/discussions",target:"_blank",rel:"noopener noreferrer"}},[e._v("discussion forum"),s("OutboundLink")],1),e._v(". This is an excellent place to give us thorough feedback about your experience as a whole. In the same way, you may participate in existing discussions and make your voice heard.")]),e._v(" "),s("h3",{attrs:{id:"pull-requests"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#pull-requests"}},[e._v("#")]),e._v(" Pull requests:")]),e._v(" "),s("p",[e._v("For pull requests, we ask that you initially create an issue and then create a pull requests linked to this issue. Look for issues with “help wanted” or “first-time contributor.” We welcome pull requests from anyone!")]),e._v(" "),s("h3",{attrs:{id:"specific-guidelines"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#specific-guidelines"}},[e._v("#")]),e._v(" Specific guidelines:")]),e._v(" "),s("p",[e._v("Each individual software project has more specific contribution guidelines that you can find in the README in the project’s repository. For example: "),s("a",{attrs:{href:"https://github.com/frictionlessdata/frictionless-js#developers",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/frictionless-js#developers"),s("OutboundLink")],1)]),e._v(" "),s("h2",{attrs:{id:"documentation"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#documentation"}},[e._v("#")]),e._v(" Documentation")]),e._v(" "),s("p",[e._v("Are you seeking to advocate and educate people in the data space? We always welcome contributions to our documentation! You can help improve our documentation by opening pull requests if you find typos, have ideas to improve the clarity of the document, or want to translate the text to a non-English language. You can also write tutorials (like this one: "),s("a",{attrs:{href:"https://colab.research.google.com/drive/12RmGajHamGP5wOoAhy8N7Gchn9TmVnG-",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Describe and Extract Tutorial"),s("OutboundLink")],1),e._v("). Let us know if you would like to contribute or if you are interested but need some help!")]),e._v(" "),s("h2",{attrs:{id:"share-your-work-with-us"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#share-your-work-with-us"}},[e._v("#")]),e._v(" Share your work with us!")]),e._v(" "),s("p",[e._v("Are you using Frictionless with your data? Have you spoken at a conference about using Frictionless? We would love to hear about it! We also have opportunities for blog writing and presenting at our monthly community calls - "),s("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[e._v("contact us")]),e._v(" to learn more!")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/188.6507abdd.js b/assets/js/188.bed32f20.js similarity index 96% rename from assets/js/188.6507abdd.js rename to assets/js/188.bed32f20.js index 4a1b70f43..3fab90a8a 100644 --- a/assets/js/188.6507abdd.js +++ b/assets/js/188.bed32f20.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[188],{729:function(t,a,e){"use strict";e.r(a);var r=e(29),n=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"events-calendar"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#events-calendar"}},[t._v("#")]),t._v(" Events Calendar")]),t._v(" "),e("h2",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("Frictionless Data calendar with a listing of our upcoming "),e("RouterLink",{attrs:{to:"/tag/events/"}},[t._v("events")]),t._v(" including webinars, virtual hangouts, etc.")],1),t._v(" "),e("h2",{attrs:{id:"frictionless-data-monthly-community-call"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data-monthly-community-call"}},[t._v("#")]),t._v(" Frictionless Data Monthly Community Call")]),t._v(" "),e("p",[t._v("Join the vibrant Frictionless Data community every last Thursday of the month on a call to hear about recent project developments! You can sign up here: "),e("a",{attrs:{href:"https://forms.gle/rtK7xZw5vrwouTE98",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://forms.gle/rtK7xZw5vrwouTE98"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"calendar"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#calendar"}},[t._v("#")]),t._v(" Calendar")]),t._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[t._v("TIP")]),t._v(" "),e("p",[t._v("You can add any upcoming event to your calendar by clicking on a specific event and "),e("strong",[t._v("selecting copy to my calendar.")])])]),t._v(" "),e("iframe",{staticStyle:{border:"solid 1px #777"},attrs:{src:"https://calendar.google.com/calendar/embed?height=700&wkst=1&bgcolor=%23EF6C00&ctz=Europe%2FRome&src=b2tmbi5vcmdfaDk3bm05ZDhxcG50cXExc2ZzcWZnbTNwdTBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ&color=%23EF6C00",width:"740",height:"700",frameborder:"0",scrolling:"no"}})])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[188],{727:function(t,a,e){"use strict";e.r(a);var r=e(29),n=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"events-calendar"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#events-calendar"}},[t._v("#")]),t._v(" Events Calendar")]),t._v(" "),e("h2",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("Frictionless Data calendar with a listing of our upcoming "),e("RouterLink",{attrs:{to:"/tag/events/"}},[t._v("events")]),t._v(" including webinars, virtual hangouts, etc.")],1),t._v(" "),e("h2",{attrs:{id:"frictionless-data-monthly-community-call"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-data-monthly-community-call"}},[t._v("#")]),t._v(" Frictionless Data Monthly Community Call")]),t._v(" "),e("p",[t._v("Join the vibrant Frictionless Data community every last Thursday of the month on a call to hear about recent project developments! You can sign up here: "),e("a",{attrs:{href:"https://forms.gle/rtK7xZw5vrwouTE98",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://forms.gle/rtK7xZw5vrwouTE98"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"calendar"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#calendar"}},[t._v("#")]),t._v(" Calendar")]),t._v(" "),e("div",{staticClass:"custom-block tip"},[e("p",{staticClass:"custom-block-title"},[t._v("TIP")]),t._v(" "),e("p",[t._v("You can add any upcoming event to your calendar by clicking on a specific event and "),e("strong",[t._v("selecting copy to my calendar.")])])]),t._v(" "),e("iframe",{staticStyle:{border:"solid 1px #777"},attrs:{src:"https://calendar.google.com/calendar/embed?height=700&wkst=1&bgcolor=%23EF6C00&ctz=Europe%2FRome&src=b2tmbi5vcmdfaDk3bm05ZDhxcG50cXExc2ZzcWZnbTNwdTBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ&color=%23EF6C00",width:"740",height:"700",frameborder:"0",scrolling:"no"}})])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/189.0b2e84a4.js b/assets/js/189.397daceb.js similarity index 97% rename from assets/js/189.0b2e84a4.js rename to assets/js/189.397daceb.js index 9f4f8e5a6..f9072a655 100644 --- a/assets/js/189.0b2e84a4.js +++ b/assets/js/189.397daceb.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[189],{728:function(t,a,o){"use strict";o.r(a);var e=o(29),r=Object(e.a)({},(function(){var t=this,a=t.$createElement,o=t._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("h1",{attrs:{id:"need-help"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#need-help"}},[t._v("#")]),t._v(" Need Help?")]),t._v(" "),o("p",{staticClass:"font-light text-xl"},[t._v(" We're happy to provide support! Please reach out to us by using one of the following methods:")]),t._v(" "),o("h2",{attrs:{id:"community-support"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#community-support"}},[t._v("#")]),t._v(" Community Support")]),t._v(" "),o("p",[t._v("You can ask any questions in our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack Community Chat room"),o("OutboundLink")],1),t._v(" (the Chat room is also accessible via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),o("OutboundLink")],1),t._v("). You can also start a thread in "),o("a",{attrs:{href:"https://github.com/frictionlessdata/project/discussions",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub Discussions"),o("OutboundLink")],1),t._v(". Frictionless is a big community that consists of people having different expertise in different domains. Feel free to ask us any questions!")]),t._v(" "),o("h2",{attrs:{id:"school-of-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#school-of-data"}},[t._v("#")]),t._v(" School of Data")]),t._v(" "),o("p",[t._v("School of Data is a project overseen by the Open Knowledge Foundation consisting of a network of individuals and organizations working on empowering civil society organizations, journalists and citizens with skills they need to use data effectively. School of Data provides data literacy trainings and resources for learning how to work with data.")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://schoolofdata.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("School of Data"),o("OutboundLink")],1)]),t._v(" "),o("h2",{attrs:{id:"paid-support"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#paid-support"}},[t._v("#")]),t._v(" Paid Support")]),t._v(" "),o("p",[t._v("Professional, timely support is available on a paid basis from the creators of Frictionless Data at Datopian and Open Knowledge Foundation. Please get in touch via:")]),t._v(" "),o("p",[o("a",{attrs:{href:"http://datopian.com/contact",target:"_blank",rel:"noopener noreferrer"}},[t._v("Datopian"),o("OutboundLink")],1),o("br"),t._v("\nOpen Knowledge Foundation: "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[t._v("frictionlessdata@okfn.org")])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[189],{726:function(t,a,o){"use strict";o.r(a);var e=o(29),r=Object(e.a)({},(function(){var t=this,a=t.$createElement,o=t._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("h1",{attrs:{id:"need-help"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#need-help"}},[t._v("#")]),t._v(" Need Help?")]),t._v(" "),o("p",{staticClass:"font-light text-xl"},[t._v(" We're happy to provide support! Please reach out to us by using one of the following methods:")]),t._v(" "),o("h2",{attrs:{id:"community-support"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#community-support"}},[t._v("#")]),t._v(" Community Support")]),t._v(" "),o("p",[t._v("You can ask any questions in our "),o("a",{attrs:{href:"https://join.slack.com/t/frictionlessdata/shared_invite/zt-17kpbffnm-tRfDW_wJgOw8tJVLvZTrBg",target:"_blank",rel:"noopener noreferrer"}},[t._v("Slack Community Chat room"),o("OutboundLink")],1),t._v(" (the Chat room is also accessible via "),o("a",{attrs:{href:"https://matrix.to/#/#frictionlessdata:matrix.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Matrix"),o("OutboundLink")],1),t._v("). You can also start a thread in "),o("a",{attrs:{href:"https://github.com/frictionlessdata/project/discussions",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub Discussions"),o("OutboundLink")],1),t._v(". Frictionless is a big community that consists of people having different expertise in different domains. Feel free to ask us any questions!")]),t._v(" "),o("h2",{attrs:{id:"school-of-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#school-of-data"}},[t._v("#")]),t._v(" School of Data")]),t._v(" "),o("p",[t._v("School of Data is a project overseen by the Open Knowledge Foundation consisting of a network of individuals and organizations working on empowering civil society organizations, journalists and citizens with skills they need to use data effectively. School of Data provides data literacy trainings and resources for learning how to work with data.")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://schoolofdata.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("School of Data"),o("OutboundLink")],1)]),t._v(" "),o("h2",{attrs:{id:"paid-support"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#paid-support"}},[t._v("#")]),t._v(" Paid Support")]),t._v(" "),o("p",[t._v("Professional, timely support is available on a paid basis from the creators of Frictionless Data at Datopian and Open Knowledge Foundation. Please get in touch via:")]),t._v(" "),o("p",[o("a",{attrs:{href:"http://datopian.com/contact",target:"_blank",rel:"noopener noreferrer"}},[t._v("Datopian"),o("OutboundLink")],1),o("br"),t._v("\nOpen Knowledge Foundation: "),o("a",{attrs:{href:"mailto:frictionlessdata@okfn.org"}},[t._v("frictionlessdata@okfn.org")])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/19.7e84e393.js b/assets/js/19.28d80010.js similarity index 98% rename from assets/js/19.7e84e393.js rename to assets/js/19.28d80010.js index 62fc737f0..f38ae69b9 100644 --- a/assets/js/19.7e84e393.js +++ b/assets/js/19.28d80010.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[19],{435:function(t,e,a){t.exports=a.p+"assets/img/ukds-pipeline-flow.8cf26465.png"},436:function(t,e,a){t.exports=a.p+"assets/img/ukds-au-pairing-datasheet.6d159749.png"},437:function(t,e,a){t.exports=a.p+"assets/img/ukds-govt-petitions-datasheet.d9d1cb8a.png"},583:function(t,e,a){"use strict";a.r(e);var s=a(29),n=Object(s.a)({},(function(){var t=this,e=t.$createElement,s=t._self._c||e;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("The UK Data Service, like many other research repository services, employs a range of closed source software solutions for the publication and consumption of research data. The data itself is often published in closed and proprietary data formats, and the data is not always, or purposefully, published in a way that enables data reuse.")]),t._v(" "),s("p",[t._v("Based on an initial exploration of user need, we identified, together with the UK Data Service, the following areas for a Frictionless Data pilot:")]),t._v(" "),s("ul",[s("li",[t._v("Conversion of data and metadata to open formats using open source tools.")]),t._v(" "),s("li",[t._v("Use the Frictionless Data toolchain to assess and report on data quality (as a proxy for reusability).")]),t._v(" "),s("li",[t._v("Demonstrate the possibility of generating visualizations from source data and metadata, described with Frictionless Data specifications.")]),t._v(" "),s("li",[t._v("Host the data with all these attributes (open formats, reusable quality, visualized) on an open source platform for data.")])]),t._v(" "),s("p",[t._v("We worked with data that was publicly accessible, and therefore in its post-publication phase. This also informed the way we designed the work, as a set of connected processing and transport steps, very much outside of the publication process itself. While this was acceptable for the scope of the pilot, the real power of the approach we demonstrate here is in integrating it with pre-publication phases of data, via a combined automated and manually curated data process. Indeed, we can see via this pilot the potential to streamline the workflow demonstrated into a complete research data publication process, and would welcome the opportunity to conduct one or more pilots that build on this approach, deeply integrated into pre-publication data workflows.")]),t._v(" "),s("h2",{attrs:{id:"context"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[t._v("#")]),t._v(" Context")]),t._v(" "),s("p",[t._v("The UK Data Service offers an online repository where researchers can archive, publish and share research data, called "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Reshare"),s("OutboundLink")],1),t._v(". Reshare exposes an "),s("a",{attrs:{href:"https://www.openarchives.org/pmh/",target:"_blank",rel:"noopener noreferrer"}},[t._v("OAI-PMH"),s("OutboundLink")],1),t._v(" endpoint to facilitate metadata harvesting.")]),t._v(" "),s("p",[s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" is a data workflows web application build around the modular Frictionless Data toolchain, designed to find, share and publish high quality data online. Each entry has a ‘Showcase’ to display data package properties, and preview data with tables and simple visualisations. As well as the Showcase, "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" provides straight-forward direct access to import data into a variety of tools used by researchers; R, Pandas, Python, JavaScript, and SQL. "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Data Packages"),s("OutboundLink")],1),t._v(" can be pushed to "),s("a",{attrs:{href:"http://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to create dataset entries.")]),t._v(" "),s("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[t._v("#")]),t._v(" Problem We Were Trying To Solve")]),t._v(" "),s("p",[t._v("We want to investigate the use of the Data Package concept, and Frictionless Data software to facilitate the reuse of data archived in Reshare.")]),t._v(" "),s("p",[t._v("We are especially interested in trialling pipelines to automate data harvesting from UKDS into "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" using Frictionless Data software such as "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-pipelines"),s("OutboundLink")],1),t._v(", and creating appropriate processors to translate widely used statistics file formats, such as "),s("a",{attrs:{href:"https://www.ibm.com/analytics/us/en/technology/spss/",target:"_blank",rel:"noopener noreferrer"}},[t._v("SPSS"),s("OutboundLink")],1),t._v(", to text-based tabular data formats such as CSV.")]),t._v(" "),s("p",[t._v("We chose the Data Package Pipelines library because it provides us with a well tested and mature framework of established processors to work with tabular data from a variety of sources and formats. Custom processors can easily be added to extend pipeline functionality. Pipelines can be configured using a simple declarative specification. Other tools supporting the underlying Frictionless Data specifications, such as "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py/",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables"),s("OutboundLink")],1),t._v(" can be easily integrated as appropriate.")]),t._v(" "),s("p",[t._v("In this pilot we are trialling tools to:")]),t._v(" "),s("ul",[s("li",[t._v("automate data harvesting from UKDS, to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(", through a data package pipeline.")]),t._v(" "),s("li",[t._v("translate binary data formats (SPSS) to text-based tabular formats.")]),t._v(" "),s("li",[t._v("validate tabular data harvested from UKDS with goodtables.")]),t._v(" "),s("li",[t._v("fix or workaround common data issues identified from validation report, in the source-spec")]),t._v(" "),s("li",[t._v("correct file encoding")]),t._v(" "),s("li",[t._v("skip non-data rows")]),t._v(" "),s("li",[t._v("skip specified validation checks (duplicate-rows)")]),t._v(" "),s("li",[t._v("specify header rows in csv files")]),t._v(" "),s("li",[t._v("explicitly defining tabular headers")]),t._v(" "),s("li",[t._v("trial the "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" API with real-world data")]),t._v(" "),s("li",[t._v("use the Showcase features of "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to provide instance data previews and visualisations.")])]),t._v(" "),s("h3",{attrs:{id:"the-work"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[t._v("#")]),t._v(" The Work")]),t._v(" "),s("h4",{attrs:{id:"what-did-we-do"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[t._v("#")]),t._v(" What Did We Do")]),t._v(" "),s("p",[t._v("During the pilot, we focussed on creating a reusable pipeline of processors to harvest data and dataset metadata from the UKDS Reshare service, and output valid Data Packages with tabular resources. Each pipeline processor step was created as a separate module to facilitate testing and reuse in other similar pipelines.")]),t._v(" "),s("p",[t._v("UKDS datasets were selected from "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/cgi/stats/report/most_popular_eprints",target:"_blank",rel:"noopener noreferrer"}},[t._v("the UKDS list"),s("OutboundLink")],1),t._v(". Entries were selected based on the data format we intended to write processors for (.csv, .tsv, xls, or .sav), how the dataset might help demonstrate various aspects of the pipeline, and how well they might lend themselves to visualisation on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("Below is an outline of the pipeline flow from UKDS Reshare Archive to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" entry:")]),t._v(" "),s("p",[s("img",{attrs:{src:a(435),alt:"pipeline flow from UKDS Reshare Archive to datahub.io"}}),s("br"),t._v(" "),s("em",[t._v("pipeline flow from UKDS Reshare Archive to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1)])]),t._v(" "),s("h5",{attrs:{id:"specifying-an-entry"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#specifying-an-entry"}},[t._v("#")]),t._v(" Specifying an Entry")]),t._v(" "),s("p",[t._v("We wanted to ensure that each UKDS dataset to be maintained on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" could be easily configured to specify where to harvest its resource data and dataset metadata. We also wanted to add other configuration details to help customise the pipeline to work with tricky resources, and view specifications for subsequent visualisation on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The source-spec for each Reshare entry defines a list of URLs for each resource in the dataset that we’re interested in harvesting, and the resource format (csv, tsv, xls, or spss).")]),t._v(" "),s("p",[t._v("If an OAI ID is provided, it will be used to harvest dataset metadata from the Reshare OAI endpoint.")]),t._v(" "),s("p",[t._v("As well as defining source locations, we also want to provide a way to customise downstream processor behaviour, to help work around potential resource issues.")]),t._v(" "),s("p",[t._v("Below is an example yaml source-spec for two entries, demonstrating various configuration options.")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("entries")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("civil-servant-survey")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# entry name")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# a list of sources")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851401/10/Coded_SurveyData.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851401/2/key%20%283%29.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("goodtables")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# custom processor config for goodtables")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_checks")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" duplicate"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("row\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851401")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# OAI ID to harvest dataset metadata")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("uk-gov-petitions")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851614/1/gov_pet_metadata.tab\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" tsv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# custom processor config for tabulator")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("encoding")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" utf"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("8")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# explicitly define source file encoding")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# explicitly define missing column headers")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" id\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" title\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" department\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" starting\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" closing\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851614")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("views")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" views/petitions"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("view.json\n")])])]),s("p",[t._v("Sources in the first entry, "),s("em",[t._v("civil-servant-survey")]),t._v(", contain duplicate rows, which would normally fail goodtables validation. Here we will allow the "),s("code",[t._v("duplicate-row")]),t._v(" check to be skipped.")]),t._v(" "),s("p",[t._v("The source in the second entry has the wrong character encoding and no headers declared. We can fix these issues to allow the pipeline to process the resource by explicitly specifying the file encoding, and declaring the column headers.")]),t._v(" "),s("h4",{attrs:{id:"data-set-and-resource-harvest"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#data-set-and-resource-harvest"}},[t._v("#")]),t._v(" Data Set and Resource Harvest")]),t._v(" "),s("p",[t._v("We identified that Reshare has an OAI-PMH2 compatible endpoint to harvest information about each Reshare data set. So we created an "),s("code",[t._v("ukds.add_oai_metadata")]),t._v(" pipeline processor. OAI metadata is compatible with Dublin Core Elements, and the processor translates this into Data Package compatible properties.")]),t._v(" "),s("p",[t._v("Resources are added to the newly created Data Package and downloaded from the URLs defined in the yaml configuration. We support adding SPSS (.sav), CSV, TSV and XLS file formats.")]),t._v(" "),s("p",[t._v("To support the widely used SPSS format, we created an "),s("code",[t._v("spss.add_spss processor")]),t._v(" that makes use of the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-spss-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-spss"),s("OutboundLink")],1),t._v(" plugin to read SPSS files and create tableschema descriptors from them.")]),t._v(" "),s("h4",{attrs:{id:"validation-reports-and-common-issues"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#validation-reports-and-common-issues"}},[t._v("#")]),t._v(" Validation Reports and Common Issues")]),t._v(" "),s("p",[t._v("To help ensure data quality, we want to validate the harvested tabular data before continuing the pipeline. We created a "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-goodtables",target:"_blank",rel:"noopener noreferrer"}},[s("code",[t._v("goodtables.validate")]),t._v(" processor"),s("OutboundLink")],1),t._v(", which will write a validation report for each resource. If a resource fails to validate against its schema, or has other data issues, the pipeline will fail. Errors can be identified from validation reports, fixed, and the pipeline re-run.")]),t._v(" "),s("p",[t._v("Below are examples of issues revealed by validation that can occur when working with real-world data.")]),t._v(" "),s("h5",{attrs:{id:"au-pairing-after-the-au-pair-scheme-specifying-a-xls-sheet-and-working-around-non-data-rows"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#au-pairing-after-the-au-pair-scheme-specifying-a-xls-sheet-and-working-around-non-data-rows"}},[t._v("#")]),t._v(" “Au pairing after the au pair scheme”: specifying a xls sheet, and working around non-data rows")]),t._v(" "),s("p",[t._v("The "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/851656/",target:"_blank",rel:"noopener noreferrer"}},[t._v("“Au Pairing” dataset"),s("OutboundLink")],1),t._v(" has a single .xls resource we’re interested in harvesting. The file contains four sheets, and we’re interested in the second one, which contains the data. So we specify our entry:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("au-pairing")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851656/6/GumtreeAds_AuPairsAnalysis1.xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sheet")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# use sheet 2 in the file")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851656")]),t._v("\n")])])]),s("p",[t._v("Notice we have indicated which sheet in the file to use.")]),t._v(" "),s("p",[t._v("The data sheet has a single header row, but it also has this header row repeated at intervals throughout the data, presumably to aid the human reader when reviewing the data manually.")]),t._v(" "),s("p",[s("img",{attrs:{src:a(436),alt:""}}),s("br"),t._v(" "),s("em",[t._v("screengrab of the UKDS “Au Pairing” datasheet")])]),t._v(" "),s("p",[t._v("For machine processing, this isn’t ideal. In fact, it will fail our goodtables validation processor with the following (truncated) report:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"time"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.466")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"valid"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean important"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"error-count"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("13")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"table-count"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"tables"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"errors"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"code"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"duplicate-row"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"message"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Row 347 is duplicated to row(s) 236"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"row-number"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("347")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"column-number"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token null important"}},[t._v("null")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"row"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"happy/energetic/caring/loving outlook required"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"CV requested"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"gender specified"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"cooking"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("You can find the full report "),s("a",{attrs:{href:"https://gist.github.com/brew/8401e2875ec6d829baf95b79cd677e28",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The report tells us there are 13 errors, and lists where they are. In this case they indicate that duplicate rows are present (the repeated header). This can either be fixed within Reshare, or we can add a parameter to our entry specification to skip each row that contains the duplicate header:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("au-pairing")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851656/6/GumtreeAds_AuPairsAnalysis1.xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sheet")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_rows")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("237")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("292")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("348")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("402")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("458")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("511")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("564")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("618")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("673")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("726")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("779")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("832")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("886")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("937")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("990")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("goodtables")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_checks")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" duplicate"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("row\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851656")]),t._v("\n")])])]),s("p",[t._v("Above we’ve added a "),s("code",[t._v("skip_rows")]),t._v(" parameter with a list of row numbers to skip when generating the data package. We also instruct goodtables to skip the "),s("code",[t._v("duplicate-row")]),t._v(" check."),s("br"),t._v("\nThe outputted csv resource file will no longer contain rows with the duplicate header.")]),t._v(" "),s("h5",{attrs:{id:"uk-government-petitions-wrong-file-encoding-and-specifying-missing-headers"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#uk-government-petitions-wrong-file-encoding-and-specifying-missing-headers"}},[t._v("#")]),t._v(" “UK government petitions”: wrong file encoding, and specifying missing headers")]),t._v(" "),s("p",[t._v("The “"),s("a",{attrs:{href:"http://gov.uk/",target:"_blank",rel:"noopener noreferrer"}},[t._v("gov.uk"),s("OutboundLink")],1),t._v(" petitions” dataset has a TSV data file we’re interested in. However, it has been saved with the wrong character encoding and attempting to open may return an error, or display some characters incorrectly.")]),t._v(" "),s("p",[t._v("Additionally, there is no header row specified at the top of the file, so the resulting data package won’t have the correct header information in the resource’s schema.")]),t._v(" "),s("p",[t._v("We can fix both of these issues in our entry specification:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("uk-gov-petitions")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851614/1/gov_pet_metadata.tab\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" tsv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("encoding")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" utf"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("8")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# specify file encoding")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# define missing headers")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" id\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" title\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" department\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" starting\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" closing\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851614")]),t._v("\n")])])]),s("p",[t._v("Above, we have defined the character encoding we want to use when opening the file, and we’ve explicitly defined the headers to use. These headers will be added to the first row of the outputted csv resource file in the data package.")]),t._v(" "),s("p",[t._v("We can also use the "),s("code",[t._v("headers")]),t._v(" parameter to define which row contains header information. By default this is the first row. However, sometimes a data file will have the headers on a different row:")]),t._v(" "),s("p",[s("img",{attrs:{src:a(437),alt:""}}),s("br"),t._v(" "),s("em",[t._v("screengrab of the UKDS “Government Petitions” datasheet")])]),t._v(" "),s("p",[t._v("This example file has its headers defined in row three, with other information, and an empty row in the first two rows. We can tell our pipeline which row contains headers by specifying it in the entry configuration:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("example-entry")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//www.newcastle.gov.uk/sites/drupalncc.newcastle.gov.uk/files/wwwfileroot/your"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("council/local_transparency/january_2012.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# specifying which row contains headers")]),t._v("\n")])])]),s("h4",{attrs:{id:"add-data-package-views"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#add-data-package-views"}},[t._v("#")]),t._v(" Add Data Package Views")]),t._v(" "),s("p",[t._v("View specs can be added to the data package to enable "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to create visualisations from resource data in the data package. The "),s("code",[t._v("views")]),t._v(" property is a list of file paths to json files containing "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/views/",target:"_blank",rel:"noopener noreferrer"}},[t._v("view-spec"),s("OutboundLink")],1),t._v(" compatible views.")]),t._v(" "),s("p",[t._v("Currently, "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" supports views written either with a ‘simple’ views-spec, or using Vega (v 2.6.5). See "),s("a",{attrs:{href:"https://datahub.io/docs/features/views",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io docs"),s("OutboundLink")],1),t._v(" for more details about the supported views-spec.")]),t._v(" "),s("h4",{attrs:{id:"push-to-datahub-io"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#push-to-datahub-io"}},[t._v("#")]),t._v(" Push to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1)]),t._v(" "),s("p",[t._v("Once the harvesting pipeline has been run the resulting data packages are pushed to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" using the "),s("a",{attrs:{href:"https://github.com/datahq/datahub-cli",target:"_blank",rel:"noopener noreferrer"}},[s("code",[t._v("datahub.dump.to_datahub")]),s("OutboundLink")],1),t._v(" processor.")]),t._v(" "),s("p",[t._v("This creates or updates an entry for the package on datahub. If a view has been defined in the entry configuration, this will be created on the "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" entry Showcase page.")]),t._v(" "),s("h3",{attrs:{id:"review"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[t._v("#")]),t._v(" Review")]),t._v(" "),s("p",[t._v("We were able to demonstrate that a data processing pipeline using Frictionless Data tools can facilitate the automated harvesting, validation, transformation, and upload to a data package-compatible third-party service, based on a simple configuration.")]),t._v(" "),s("h3",{attrs:{id:"next-steps"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[t._v("#")]),t._v(" Next Steps")]),t._v(" "),s("p",[t._v("The pilot data package pipeline runs locally in a development environment, but given each processor has been written as a separate module, these could be used within any pipeline. "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" uses datapackage-pipelines within its infrastructure, and the processors developed for this project could be used within "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" itself to facilitate the automatic harvesting of datasets from OAI-PMH enabled data sources.")]),t._v(" "),s("p",[t._v("Once a pipeline is in place, it can be scheduled to run each day (or week, month, etc.). This would ensure "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" is up-to-date with data on UKDS Reshare.")]),t._v(" "),s("p",[t._v("Working with ‘real-world’ data from UKDS Reshare has helped to identify and prioritise improvements and future features for "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h3",{attrs:{id:"additional-resources"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#additional-resources"}},[t._v("#")]),t._v(" Additional Resources")]),t._v(" "),s("ul",[s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-ukds",target:"_blank",rel:"noopener noreferrer"}},[t._v("The main code repository for this pilot"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("A framework for processing data packages in pipelines of modular components"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-spss",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor for SPSS file formats"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-goodtables",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor for validating tabular data using goodtables-py"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/datahq/datapackage-pipelines-datahub",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor to push data packages to datahub.io"),s("OutboundLink")],1),t._v(".")])])])}),[],!1,null,null,null);e.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[19],{439:function(t,e,a){t.exports=a.p+"assets/img/ukds-pipeline-flow.8cf26465.png"},440:function(t,e,a){t.exports=a.p+"assets/img/ukds-au-pairing-datasheet.6d159749.png"},441:function(t,e,a){t.exports=a.p+"assets/img/ukds-govt-petitions-datasheet.d9d1cb8a.png"},587:function(t,e,a){"use strict";a.r(e);var s=a(29),n=Object(s.a)({},(function(){var t=this,e=t.$createElement,s=t._self._c||e;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("The UK Data Service, like many other research repository services, employs a range of closed source software solutions for the publication and consumption of research data. The data itself is often published in closed and proprietary data formats, and the data is not always, or purposefully, published in a way that enables data reuse.")]),t._v(" "),s("p",[t._v("Based on an initial exploration of user need, we identified, together with the UK Data Service, the following areas for a Frictionless Data pilot:")]),t._v(" "),s("ul",[s("li",[t._v("Conversion of data and metadata to open formats using open source tools.")]),t._v(" "),s("li",[t._v("Use the Frictionless Data toolchain to assess and report on data quality (as a proxy for reusability).")]),t._v(" "),s("li",[t._v("Demonstrate the possibility of generating visualizations from source data and metadata, described with Frictionless Data specifications.")]),t._v(" "),s("li",[t._v("Host the data with all these attributes (open formats, reusable quality, visualized) on an open source platform for data.")])]),t._v(" "),s("p",[t._v("We worked with data that was publicly accessible, and therefore in its post-publication phase. This also informed the way we designed the work, as a set of connected processing and transport steps, very much outside of the publication process itself. While this was acceptable for the scope of the pilot, the real power of the approach we demonstrate here is in integrating it with pre-publication phases of data, via a combined automated and manually curated data process. Indeed, we can see via this pilot the potential to streamline the workflow demonstrated into a complete research data publication process, and would welcome the opportunity to conduct one or more pilots that build on this approach, deeply integrated into pre-publication data workflows.")]),t._v(" "),s("h2",{attrs:{id:"context"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[t._v("#")]),t._v(" Context")]),t._v(" "),s("p",[t._v("The UK Data Service offers an online repository where researchers can archive, publish and share research data, called "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Reshare"),s("OutboundLink")],1),t._v(". Reshare exposes an "),s("a",{attrs:{href:"https://www.openarchives.org/pmh/",target:"_blank",rel:"noopener noreferrer"}},[t._v("OAI-PMH"),s("OutboundLink")],1),t._v(" endpoint to facilitate metadata harvesting.")]),t._v(" "),s("p",[s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" is a data workflows web application build around the modular Frictionless Data toolchain, designed to find, share and publish high quality data online. Each entry has a ‘Showcase’ to display data package properties, and preview data with tables and simple visualisations. As well as the Showcase, "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" provides straight-forward direct access to import data into a variety of tools used by researchers; R, Pandas, Python, JavaScript, and SQL. "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Data Packages"),s("OutboundLink")],1),t._v(" can be pushed to "),s("a",{attrs:{href:"http://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to create dataset entries.")]),t._v(" "),s("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[t._v("#")]),t._v(" Problem We Were Trying To Solve")]),t._v(" "),s("p",[t._v("We want to investigate the use of the Data Package concept, and Frictionless Data software to facilitate the reuse of data archived in Reshare.")]),t._v(" "),s("p",[t._v("We are especially interested in trialling pipelines to automate data harvesting from UKDS into "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" using Frictionless Data software such as "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-pipelines"),s("OutboundLink")],1),t._v(", and creating appropriate processors to translate widely used statistics file formats, such as "),s("a",{attrs:{href:"https://www.ibm.com/analytics/us/en/technology/spss/",target:"_blank",rel:"noopener noreferrer"}},[t._v("SPSS"),s("OutboundLink")],1),t._v(", to text-based tabular data formats such as CSV.")]),t._v(" "),s("p",[t._v("We chose the Data Package Pipelines library because it provides us with a well tested and mature framework of established processors to work with tabular data from a variety of sources and formats. Custom processors can easily be added to extend pipeline functionality. Pipelines can be configured using a simple declarative specification. Other tools supporting the underlying Frictionless Data specifications, such as "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py/",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables"),s("OutboundLink")],1),t._v(" can be easily integrated as appropriate.")]),t._v(" "),s("p",[t._v("In this pilot we are trialling tools to:")]),t._v(" "),s("ul",[s("li",[t._v("automate data harvesting from UKDS, to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(", through a data package pipeline.")]),t._v(" "),s("li",[t._v("translate binary data formats (SPSS) to text-based tabular formats.")]),t._v(" "),s("li",[t._v("validate tabular data harvested from UKDS with goodtables.")]),t._v(" "),s("li",[t._v("fix or workaround common data issues identified from validation report, in the source-spec")]),t._v(" "),s("li",[t._v("correct file encoding")]),t._v(" "),s("li",[t._v("skip non-data rows")]),t._v(" "),s("li",[t._v("skip specified validation checks (duplicate-rows)")]),t._v(" "),s("li",[t._v("specify header rows in csv files")]),t._v(" "),s("li",[t._v("explicitly defining tabular headers")]),t._v(" "),s("li",[t._v("trial the "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" API with real-world data")]),t._v(" "),s("li",[t._v("use the Showcase features of "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to provide instance data previews and visualisations.")])]),t._v(" "),s("h3",{attrs:{id:"the-work"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[t._v("#")]),t._v(" The Work")]),t._v(" "),s("h4",{attrs:{id:"what-did-we-do"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[t._v("#")]),t._v(" What Did We Do")]),t._v(" "),s("p",[t._v("During the pilot, we focussed on creating a reusable pipeline of processors to harvest data and dataset metadata from the UKDS Reshare service, and output valid Data Packages with tabular resources. Each pipeline processor step was created as a separate module to facilitate testing and reuse in other similar pipelines.")]),t._v(" "),s("p",[t._v("UKDS datasets were selected from "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/cgi/stats/report/most_popular_eprints",target:"_blank",rel:"noopener noreferrer"}},[t._v("the UKDS list"),s("OutboundLink")],1),t._v(". Entries were selected based on the data format we intended to write processors for (.csv, .tsv, xls, or .sav), how the dataset might help demonstrate various aspects of the pipeline, and how well they might lend themselves to visualisation on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("Below is an outline of the pipeline flow from UKDS Reshare Archive to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" entry:")]),t._v(" "),s("p",[s("img",{attrs:{src:a(439),alt:"pipeline flow from UKDS Reshare Archive to datahub.io"}}),s("br"),t._v(" "),s("em",[t._v("pipeline flow from UKDS Reshare Archive to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1)])]),t._v(" "),s("h5",{attrs:{id:"specifying-an-entry"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#specifying-an-entry"}},[t._v("#")]),t._v(" Specifying an Entry")]),t._v(" "),s("p",[t._v("We wanted to ensure that each UKDS dataset to be maintained on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" could be easily configured to specify where to harvest its resource data and dataset metadata. We also wanted to add other configuration details to help customise the pipeline to work with tricky resources, and view specifications for subsequent visualisation on "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The source-spec for each Reshare entry defines a list of URLs for each resource in the dataset that we’re interested in harvesting, and the resource format (csv, tsv, xls, or spss).")]),t._v(" "),s("p",[t._v("If an OAI ID is provided, it will be used to harvest dataset metadata from the Reshare OAI endpoint.")]),t._v(" "),s("p",[t._v("As well as defining source locations, we also want to provide a way to customise downstream processor behaviour, to help work around potential resource issues.")]),t._v(" "),s("p",[t._v("Below is an example yaml source-spec for two entries, demonstrating various configuration options.")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("entries")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("civil-servant-survey")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# entry name")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# a list of sources")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851401/10/Coded_SurveyData.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851401/2/key%20%283%29.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("goodtables")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# custom processor config for goodtables")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_checks")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" duplicate"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("row\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851401")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# OAI ID to harvest dataset metadata")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("uk-gov-petitions")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851614/1/gov_pet_metadata.tab\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" tsv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# custom processor config for tabulator")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("encoding")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" utf"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("8")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# explicitly define source file encoding")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# explicitly define missing column headers")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" id\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" title\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" department\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" starting\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" closing\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851614")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("views")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" views/petitions"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("view.json\n")])])]),s("p",[t._v("Sources in the first entry, "),s("em",[t._v("civil-servant-survey")]),t._v(", contain duplicate rows, which would normally fail goodtables validation. Here we will allow the "),s("code",[t._v("duplicate-row")]),t._v(" check to be skipped.")]),t._v(" "),s("p",[t._v("The source in the second entry has the wrong character encoding and no headers declared. We can fix these issues to allow the pipeline to process the resource by explicitly specifying the file encoding, and declaring the column headers.")]),t._v(" "),s("h4",{attrs:{id:"data-set-and-resource-harvest"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#data-set-and-resource-harvest"}},[t._v("#")]),t._v(" Data Set and Resource Harvest")]),t._v(" "),s("p",[t._v("We identified that Reshare has an OAI-PMH2 compatible endpoint to harvest information about each Reshare data set. So we created an "),s("code",[t._v("ukds.add_oai_metadata")]),t._v(" pipeline processor. OAI metadata is compatible with Dublin Core Elements, and the processor translates this into Data Package compatible properties.")]),t._v(" "),s("p",[t._v("Resources are added to the newly created Data Package and downloaded from the URLs defined in the yaml configuration. We support adding SPSS (.sav), CSV, TSV and XLS file formats.")]),t._v(" "),s("p",[t._v("To support the widely used SPSS format, we created an "),s("code",[t._v("spss.add_spss processor")]),t._v(" that makes use of the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-spss-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-spss"),s("OutboundLink")],1),t._v(" plugin to read SPSS files and create tableschema descriptors from them.")]),t._v(" "),s("h4",{attrs:{id:"validation-reports-and-common-issues"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#validation-reports-and-common-issues"}},[t._v("#")]),t._v(" Validation Reports and Common Issues")]),t._v(" "),s("p",[t._v("To help ensure data quality, we want to validate the harvested tabular data before continuing the pipeline. We created a "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-goodtables",target:"_blank",rel:"noopener noreferrer"}},[s("code",[t._v("goodtables.validate")]),t._v(" processor"),s("OutboundLink")],1),t._v(", which will write a validation report for each resource. If a resource fails to validate against its schema, or has other data issues, the pipeline will fail. Errors can be identified from validation reports, fixed, and the pipeline re-run.")]),t._v(" "),s("p",[t._v("Below are examples of issues revealed by validation that can occur when working with real-world data.")]),t._v(" "),s("h5",{attrs:{id:"au-pairing-after-the-au-pair-scheme-specifying-a-xls-sheet-and-working-around-non-data-rows"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#au-pairing-after-the-au-pair-scheme-specifying-a-xls-sheet-and-working-around-non-data-rows"}},[t._v("#")]),t._v(" “Au pairing after the au pair scheme”: specifying a xls sheet, and working around non-data rows")]),t._v(" "),s("p",[t._v("The "),s("a",{attrs:{href:"http://reshare.ukdataservice.ac.uk/851656/",target:"_blank",rel:"noopener noreferrer"}},[t._v("“Au Pairing” dataset"),s("OutboundLink")],1),t._v(" has a single .xls resource we’re interested in harvesting. The file contains four sheets, and we’re interested in the second one, which contains the data. So we specify our entry:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("au-pairing")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851656/6/GumtreeAds_AuPairsAnalysis1.xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sheet")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# use sheet 2 in the file")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851656")]),t._v("\n")])])]),s("p",[t._v("Notice we have indicated which sheet in the file to use.")]),t._v(" "),s("p",[t._v("The data sheet has a single header row, but it also has this header row repeated at intervals throughout the data, presumably to aid the human reader when reviewing the data manually.")]),t._v(" "),s("p",[s("img",{attrs:{src:a(440),alt:""}}),s("br"),t._v(" "),s("em",[t._v("screengrab of the UKDS “Au Pairing” datasheet")])]),t._v(" "),s("p",[t._v("For machine processing, this isn’t ideal. In fact, it will fail our goodtables validation processor with the following (truncated) report:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"time"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0.466")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"valid"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean important"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"error-count"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("13")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"table-count"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"tables"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"errors"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"code"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"duplicate-row"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"message"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Row 347 is duplicated to row(s) 236"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"row-number"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("347")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"column-number"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token null important"}},[t._v("null")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"row"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"happy/energetic/caring/loving outlook required"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"CV requested"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"gender specified"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"cooking"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("...")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("You can find the full report "),s("a",{attrs:{href:"https://gist.github.com/brew/8401e2875ec6d829baf95b79cd677e28",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The report tells us there are 13 errors, and lists where they are. In this case they indicate that duplicate rows are present (the repeated header). This can either be fixed within Reshare, or we can add a parameter to our entry specification to skip each row that contains the duplicate header:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("au-pairing")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851656/6/GumtreeAds_AuPairsAnalysis1.xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" xls\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sheet")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_rows")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("237")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("292")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("348")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("402")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("458")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("511")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("564")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("618")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("673")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("726")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("779")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("832")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("886")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("937")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("990")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("goodtables")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("skip_checks")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" duplicate"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("row\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851656")]),t._v("\n")])])]),s("p",[t._v("Above we’ve added a "),s("code",[t._v("skip_rows")]),t._v(" parameter with a list of row numbers to skip when generating the data package. We also instruct goodtables to skip the "),s("code",[t._v("duplicate-row")]),t._v(" check."),s("br"),t._v("\nThe outputted csv resource file will no longer contain rows with the duplicate header.")]),t._v(" "),s("h5",{attrs:{id:"uk-government-petitions-wrong-file-encoding-and-specifying-missing-headers"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#uk-government-petitions-wrong-file-encoding-and-specifying-missing-headers"}},[t._v("#")]),t._v(" “UK government petitions”: wrong file encoding, and specifying missing headers")]),t._v(" "),s("p",[t._v("The “"),s("a",{attrs:{href:"http://gov.uk/",target:"_blank",rel:"noopener noreferrer"}},[t._v("gov.uk"),s("OutboundLink")],1),t._v(" petitions” dataset has a TSV data file we’re interested in. However, it has been saved with the wrong character encoding and attempting to open may return an error, or display some characters incorrectly.")]),t._v(" "),s("p",[t._v("Additionally, there is no header row specified at the top of the file, so the resulting data package won’t have the correct header information in the resource’s schema.")]),t._v(" "),s("p",[t._v("We can fix both of these issues in our entry specification:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("uk-gov-petitions")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//reshare.ukdataservice.ac.uk/851614/1/gov_pet_metadata.tab\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" tsv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("encoding")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" utf"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("8")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# specify file encoding")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# define missing headers")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" id\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" title\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" department\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" starting\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" closing\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("oai-id")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("851614")]),t._v("\n")])])]),s("p",[t._v("Above, we have defined the character encoding we want to use when opening the file, and we’ve explicitly defined the headers to use. These headers will be added to the first row of the outputted csv resource file in the data package.")]),t._v(" "),s("p",[t._v("We can also use the "),s("code",[t._v("headers")]),t._v(" parameter to define which row contains header information. By default this is the first row. However, sometimes a data file will have the headers on a different row:")]),t._v(" "),s("p",[s("img",{attrs:{src:a(441),alt:""}}),s("br"),t._v(" "),s("em",[t._v("screengrab of the UKDS “Government Petitions” datasheet")])]),t._v(" "),s("p",[t._v("This example file has its headers defined in row three, with other information, and an empty row in the first two rows. We can tell our pipeline which row contains headers by specifying it in the entry configuration:")]),t._v(" "),s("div",{staticClass:"language-yaml extra-class"},[s("pre",{pre:!0,attrs:{class:"language-yaml"}},[s("code",[s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("example-entry")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("source")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" http"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("//www.newcastle.gov.uk/sites/drupalncc.newcastle.gov.uk/files/wwwfileroot/your"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("council/local_transparency/january_2012.csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("tabulator")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("headers")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# specifying which row contains headers")]),t._v("\n")])])]),s("h4",{attrs:{id:"add-data-package-views"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#add-data-package-views"}},[t._v("#")]),t._v(" Add Data Package Views")]),t._v(" "),s("p",[t._v("View specs can be added to the data package to enable "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" to create visualisations from resource data in the data package. The "),s("code",[t._v("views")]),t._v(" property is a list of file paths to json files containing "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/views/",target:"_blank",rel:"noopener noreferrer"}},[t._v("view-spec"),s("OutboundLink")],1),t._v(" compatible views.")]),t._v(" "),s("p",[t._v("Currently, "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" supports views written either with a ‘simple’ views-spec, or using Vega (v 2.6.5). See "),s("a",{attrs:{href:"https://datahub.io/docs/features/views",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io docs"),s("OutboundLink")],1),t._v(" for more details about the supported views-spec.")]),t._v(" "),s("h4",{attrs:{id:"push-to-datahub-io"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#push-to-datahub-io"}},[t._v("#")]),t._v(" Push to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1)]),t._v(" "),s("p",[t._v("Once the harvesting pipeline has been run the resulting data packages are pushed to "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" using the "),s("a",{attrs:{href:"https://github.com/datahq/datahub-cli",target:"_blank",rel:"noopener noreferrer"}},[s("code",[t._v("datahub.dump.to_datahub")]),s("OutboundLink")],1),t._v(" processor.")]),t._v(" "),s("p",[t._v("This creates or updates an entry for the package on datahub. If a view has been defined in the entry configuration, this will be created on the "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" entry Showcase page.")]),t._v(" "),s("h3",{attrs:{id:"review"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[t._v("#")]),t._v(" Review")]),t._v(" "),s("p",[t._v("We were able to demonstrate that a data processing pipeline using Frictionless Data tools can facilitate the automated harvesting, validation, transformation, and upload to a data package-compatible third-party service, based on a simple configuration.")]),t._v(" "),s("h3",{attrs:{id:"next-steps"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[t._v("#")]),t._v(" Next Steps")]),t._v(" "),s("p",[t._v("The pilot data package pipeline runs locally in a development environment, but given each processor has been written as a separate module, these could be used within any pipeline. "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" uses datapackage-pipelines within its infrastructure, and the processors developed for this project could be used within "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" itself to facilitate the automatic harvesting of datasets from OAI-PMH enabled data sources.")]),t._v(" "),s("p",[t._v("Once a pipeline is in place, it can be scheduled to run each day (or week, month, etc.). This would ensure "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(" is up-to-date with data on UKDS Reshare.")]),t._v(" "),s("p",[t._v("Working with ‘real-world’ data from UKDS Reshare has helped to identify and prioritise improvements and future features for "),s("a",{attrs:{href:"http://datahub.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h3",{attrs:{id:"additional-resources"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#additional-resources"}},[t._v("#")]),t._v(" Additional Resources")]),t._v(" "),s("ul",[s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-ukds",target:"_blank",rel:"noopener noreferrer"}},[t._v("The main code repository for this pilot"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("A framework for processing data packages in pipelines of modular components"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-spss",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor for SPSS file formats"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines-goodtables",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor for validating tabular data using goodtables-py"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("li",[s("a",{attrs:{href:"https://github.com/datahq/datapackage-pipelines-datahub",target:"_blank",rel:"noopener noreferrer"}},[t._v("A Data Package Pipelines processor to push data packages to datahub.io"),s("OutboundLink")],1),t._v(".")])])])}),[],!1,null,null,null);e.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/20.0b4e11bf.js b/assets/js/20.a5fbf886.js similarity index 97% rename from assets/js/20.0b4e11bf.js rename to assets/js/20.a5fbf886.js index ec292edd8..461fa31aa 100644 --- a/assets/js/20.0b4e11bf.js +++ b/assets/js/20.a5fbf886.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[20],{438:function(e,t,a){e.exports=a.p+"assets/img/ckanext-validation.9e351f1c.png"},439:function(e,t,a){e.exports=a.p+"assets/img/data-validity-badges.41769a55.png"},440:function(e,t,a){e.exports=a.p+"assets/img/data-validation-on-upload.d1c30a10.png"},584:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"context"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),r("p",[e._v("One of the main goals of the Frictionless Data project is to help improve data quality by providing easy to integrate libraries and services for data validation. We have integrated data validation seamlessly with different backends like GitHub and Amazon S3 via the online service "),r("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),r("OutboundLink")],1),e._v(", but we also wanted to explore closer integrations with other platforms.")]),e._v(" "),r("p",[e._v("An obvious choice for that are Open Data portals. They are still one of the main forms of dissemination of Open Data, especially for governments and other organizations. They provide a single entry point to data relating to a particular region or thematic area and provide users with tools to discover and access different datasets. On the backend, publishers also have tools available for the validation and publication of datasets.")]),e._v(" "),r("p",[e._v("Data Quality varies widely across different portals, reflecting the publication processes and requirements of the hosting organizations. In general, it is difficult for users to assess the quality of the data and there is a lack of descriptors for the actual data fields. At the publisher level, while strong emphasis has been put in metadata standards and interoperability, publishers don’t generally have the same help or guidance when dealing with data quality or description.")]),e._v(" "),r("p",[e._v("We believe that data quality in Open Data portals can have a central place on both these fronts, user-centric and publisher-centric, and we started this pilot to showcase a possible implementation.")]),e._v(" "),r("p",[e._v("To field test our implementation we chose the "),r("a",{attrs:{href:"https://www.wprdc.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Western Pennsylvania Regional Data Center"),r("OutboundLink")],1),e._v(" (WPRDC), managed by the "),r("a",{attrs:{href:"http://ucsur.pitt.edu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of Pittsburgh Center for Urban and Social Research"),r("OutboundLink")],1),e._v(". The Regional Data Center made for a good pilot as the project team takes an agile approach to managing their own CKAN instance along with support from OpenGov, members of the CKAN association. As the open data repository is used by a diverse array of data publishers (including project partners Allegheny County and the City of Pittsburgh), the Regional Data Center provides a good test case for testing the implementation across a variety of data types and publishing processes. WPRDC is a great example of a well managed Open Data portal, where datasets are actively maintained and the portal itself is just one component of a wider Open Data strategy. It also provides a good variety of publishers, including public sector agencies, academic institutions, and nonprofit organizations. The project’s partnership with the Digital Scholarship Services team at the University Library System also provides data management expertise not typically available in many open data implementations.")]),e._v(" "),r("h2",{attrs:{id:"the-work"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),r("h3",{attrs:{id:"what-did-we-do"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),r("p",[e._v("The portal software that we chose for this pilot is "),r("a",{attrs:{href:"https://ckan.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),r("OutboundLink")],1),e._v(", the world’s leading open source software for Open Data portals ("),r("a",{attrs:{href:"https://github.com/jalbertbowden/open-library/blob/master/lib/d2.1-state-of-the-art-report-and-evaluation-of-existing-open-data-platforms-2015-01-06-route-to-pa.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("source"),r("OutboundLink")],1),e._v("). Open Knowledge International initially fostered the CKAN project and is now a member of the "),r("a",{attrs:{href:"https://ckan.org/about/association/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN Association"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We created "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),r("OutboundLink")],1),e._v(", a CKAN extension that provides a low level API and readily available features for data validation and reporting that can be added to any CKAN instance. This is powered by "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables"),r("OutboundLink")],1),e._v(", a library developed by Open Knowledge International to support the validation of tabular datasets.")]),e._v(" "),r("p",[e._v("The extension allows users to perform data validation against any tabular resource, such as CSV or Excel files. This generates a report that is stored against a particular resource, describing issues found with the data, both at the structural level (missing headers, blank rows, etc) and at the data schema level (wrong data types, values out of range etc).")]),e._v(" "),r("p",[r("img",{attrs:{src:a(438),alt:""}}),r("br"),e._v(" "),r("em",[e._v("data validation on CKAN made possible by ckanext-validation extension")])]),e._v(" "),r("p",[e._v("This provides a good overview of the quality of the data to users but also to publishers so they can improve the quality of the data file by addressing these issues. The reports can be easily accessed via badges that provide a quick visual indication of the quality of the data file.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(439),alt:""}}),r("br"),e._v(" "),r("em",[e._v("badges indicating quality of data files on CKAN")])]),e._v(" "),r("p",[e._v("There are two default modes for performing the data validation when creating or updating resources. Data validation can be automatically performed in the background asynchronously or as part of the dataset creation in the user interface. In this case the validation will be performed immediately after uploading or linking to a new tabular file, giving quick feedback to publishers.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(440),alt:""}}),r("br"),e._v(" "),r("em",[e._v("data validation on upload or linking to a new tabular file on CKAN")])]),e._v(" "),r("p",[e._v("The extension adds functionality to provide a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema"),r("OutboundLink")],1),e._v(" for the data that describes the expected fields and types as well as other constraints, allowing to perform validation on the actual contents of the data. Additionally the schema is also stored with the resource metadata, so it can be displayed in the UI or accessed via the API.")]),e._v(" "),r("p",[e._v("The extension also provides some utility commands for CKAN maintainers, including the generation of "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation#data-validation-reports",target:"_blank",rel:"noopener noreferrer"}},[e._v("reports"),r("OutboundLink")],1),e._v(" showing the number of valid and invalid tabular files, a breakdown of the error types and links to the individual resources. This gives maintainers a snapshot of the general quality of the data hosted in their CKAN instance at any given moment in time.")]),e._v(" "),r("p",[e._v("As mentioned before, we field tested the validation extension on the Western Pennsylvania Regional Data Center (WPRDC). At the moment of the import the portal hosted 258 datasets. Out of these, 221 datasets had tabular resources, totalling 626 files (mainly CSV and XLSX files). Taking into account that we only performed the default validation that only includes structural checks (ie not schema-based ones) these are the results:")]),e._v(" "),r("blockquote",[r("p",[e._v("466 resources - validation success")])]),e._v(" "),r("blockquote",[r("p",[e._v("156 resources - validation failure")])]),e._v(" "),r("blockquote",[r("p",[e._v("4 resources - validation error")])]),e._v(" "),r("p",[e._v("The errors found are due to current limitations in the validation extension with large files.")]),e._v(" "),r("p",[e._v("Here’s a breakdown of the formats:")]),e._v(" "),r("table",[r("thead",[r("tr",[r("th",{staticStyle:{"text-align":"center"}}),e._v(" "),r("th",{staticStyle:{"text-align":"center"}},[e._v("Valid resources")]),e._v(" "),r("th",{staticStyle:{"text-align":"center"}},[e._v("Invalid / Errored resources")])])]),e._v(" "),r("tbody",[r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("CSV")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("443")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("64")])]),e._v(" "),r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("XLSX")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("21")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("57")])]),e._v(" "),r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("XLS")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("2")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("39")])])])]),e._v(" "),r("p",[e._v("And of the error types (more information about each error type can be found in the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Quality Specification"),r("OutboundLink")],1),e._v("):")]),e._v(" "),r("table",[r("thead",[r("tr",[r("th",[e._v("Type of Error")]),e._v(" "),r("th",[e._v("Error Count")])])]),e._v(" "),r("tbody",[r("tr",[r("td",[e._v("Blank row")]),e._v(" "),r("td",[e._v("19654")])]),e._v(" "),r("tr",[r("td",[e._v("Duplicate row")]),e._v(" "),r("td",[e._v("810")])]),e._v(" "),r("tr",[r("td",[e._v("Blank header")]),e._v(" "),r("td",[e._v("299")])]),e._v(" "),r("tr",[r("td",[e._v("Duplicate header")]),e._v(" "),r("td",[e._v("270")])]),e._v(" "),r("tr",[r("td",[e._v("Source error")]),e._v(" "),r("td",[e._v("30")])]),e._v(" "),r("tr",[r("td",[e._v("Extra value")]),e._v(" "),r("td",[e._v("11")])]),e._v(" "),r("tr",[r("td",[e._v("Format error")]),e._v(" "),r("td",[e._v("9")])]),e._v(" "),r("tr",[r("td",[e._v("HTTP error")]),e._v(" "),r("td",[e._v("2")])]),e._v(" "),r("tr",[r("td",[e._v("Missing value")]),e._v(" "),r("td",[e._v("1")])])])]),e._v(" "),r("p",[e._v("The highest number of errors are obviously caused by blank and duplicate rows. These are generally caused by Excel adding extra rows at the end of the file or by publishers formatting the files for human rather than machine consumption. Examples of this include adding a title in the first cell (like in this case: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/046e5b6a-0f90-4f8e-8c16-14057fd8872e/resource/b4aa617d-1cb8-42d0-8eb6-b650097cf2bf",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/046e5b6a-0f90-4f8e-8c16-14057fd8872e/resource/b4aa617d-1cb8-42d0-8eb6-b650097cf2bf/download/30-day-blotter-data-dictionary.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") or even more complex layouts ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/21a032e9-6345-42b3-b61e-10de29280946",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/21a032e9-6345-42b3-b61e-10de29280946/download/permitsummaryissuedmarch2015.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("), with logos and links. Blank and duplicate header errors like on this case ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/543ae03d-3ef4-45c7-b766-2ed49338120f/resource/f587d617-7afa-4e79-8010-c0d2bdff4c04",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/543ae03d-3ef4-45c7-b766-2ed49338120f/resource/f587d617-7afa-4e79-8010-c0d2bdff4c04/download/opendata-citiparks---summer-meal-sites-2015.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") are also normally caused by Excel storing extra empty columns (and something that can not be noticed directly from Excel).")]),e._v(" "),r("p",[e._v("These errors are easy to spot and fix manually once the file has been opened for inspection but this is still an extra step that data consumers need to perform before using the data on their own processes. It is also true that they are errors that could be easily fixed automatically as part of a pre-process of data cleanup before publication. Perhaps this is something that could be developed in the validation extension in the future.")]),e._v(" "),r("p",[e._v("Other less common errors include Source errors, which include errors that prevented the file from being read by goodtables, like encoding issues or HTTP responses or HTML files incorrectly being marked as Excel files (like in this case: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/9ea45609-e3b0-445a-8ace-0addb973fdf5",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/9ea45609-e3b0-445a-8ace-0addb973fdf5/download/plipublicwebsitemonthlysummaryaugust2017.xls",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("). Extra value errors are generally caused by not properly quoting fields that contain commas, thus breaking the parser (example: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/3130f583-9499-472b-bb5a-f63a6ff6059a/resource/12d9e6e1-3657-4cad-a430-119d34b1a5b2",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/3130f583-9499-472b-bb5a-f63a6ff6059a/resource/12d9e6e1-3657-4cad-a430-119d34b1a5b2/download/crashdatadictionary.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Format errors are caused by labelling incorrectly the format of the hosted file, for instance CSV when it links to an Excel file ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/669b2409-bb4b-46e5-9d91-c36876b58a17/resource/e919ecd3-bb11-4883-a041-bded25dc651c",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/669b2409-bb4b-46e5-9d91-c36876b58a17/resource/e919ecd3-bb11-4883-a041-bded25dc651c/download/2016-cveu-inspections.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("), CSV linking to HTML ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/libraries/resource/14babf3f-4932-4828-8b49-3c9a03bae6d0",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://wprdc-maps.carto.com/u/wprdc/builder/1142950f-f054-4b3f-8c52-2f020e23cf78/embed",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") or XLS linking to XLSX ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/40188e1c-6d2e-4f20-9391-607bd3054949/resource/cf0617a1-b950-4aa7-a36d-dc9da412ddf7",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/40188e1c-6d2e-4f20-9391-607bd3054949/resource/cf0617a1-b950-4aa7-a36d-dc9da412ddf7/download/transportation.xls",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("). These are all easily fixed at the metadata level.")]),e._v(" "),r("p",[e._v("Finally HTTP errors just show that the linked file hosted elsewhere does not exist or has been moved.")]),e._v(" "),r("p",[e._v("Again, it is important to stress that the checks performed are just "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py#validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("basic and structural checks"),r("OutboundLink")],1),e._v(" that affect the general availability of the file and its general structure. The addition of standardized schemas would allow for a more thorough and precise validation, checking the data contents and ensuring that this is what was expected.")]),e._v(" "),r("p",[e._v("Also it is interesting to note that WPRDC has the excellent good practice of publishing data dictionaries describing the contents of the data files. These are generally published in CSV format and they themselves can present validation errors as well. As we saw before, using the validation extension we can assign a schema defined in the Table Schema spec to a resource. This will be used during the validation, but the information could also be used to render it nicely on the UI or export it consistently as a CSV or PDF file.")]),e._v(" "),r("p",[e._v("All the generated reports can be further analyzed using the output files stored "),r("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-wprdc",target:"_blank",rel:"noopener noreferrer"}},[e._v("in this repository"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Additionally, to help browse the validation reports created from the WPRDC site we have set up a demo site that mirrors the datasets, organizations and groups hosted there (at the time we did the import).")]),e._v(" "),r("p",[e._v("All tabular resources have the validation report attached, that can be accessed clicking on the data valid / invalid badges.")]),e._v(" "),r("h2",{attrs:{id:"next-steps"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[e._v("#")]),e._v(" Next Steps")]),e._v(" "),r("h3",{attrs:{id:"areas-for-further-work"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#areas-for-further-work"}},[e._v("#")]),e._v(" Areas for further work")]),e._v(" "),r("p",[e._v("The validation extension for CKAN currently provides a very basic workflow for validation at creation and update time: basically if the validation fails in any way you are not allowed to create or edit the dataset. Maintainers can define a set of default validation options to make it more permissive but even so some publishers probably wouldn’t want to enforce all validation checks before allowing the creation of a dataset, or just apply validation to datasets from a particular organization or type. Of course the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation#action-functions",target:"_blank",rel:"noopener noreferrer"}},[e._v("underlying API"),r("OutboundLink")],1),e._v(" is available for extension developers to implement these workflows, but the validation extension itself could provide some of them.")]),e._v(" "),r("p",[e._v("The user interface for defining the validation options can definitely be improved, and we are planning to integrate a "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation/issues/10",target:"_blank",rel:"noopener noreferrer"}},[e._v("Schema Creator"),r("OutboundLink")],1),e._v(" to make easier for publishers to describe their data with a schema based on the actual fields on the file. If the resource has a schema assigned, this information can be presented nicely on the UI to the users and exported in different formats.")]),e._v(" "),r("p",[e._v("The validation extension is a first iteration to demonstrate the capabilities of integrating data validation directly into CKAN, but we are keen to know about different ways in which this could be expanded or integrated in other workflows, so any feedback or thoughts is appreciated.")]),e._v(" "),r("h3",{attrs:{id:"additional-resources"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#additional-resources"}},[e._v("#")]),e._v(" Additional Resources")]),e._v(" "),r("ul",[r("li",[r("p",[e._v("Check the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation/blob/master/README.md#how-it-works",target:"_blank",rel:"noopener noreferrer"}},[e._v("full documentation"),r("OutboundLink")],1),e._v(" for ckanext-validation, covering all details on how to install it and configure it, features and available API")])]),e._v(" "),r("li",[r("p",[e._v("Source material:")])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation codebase"),r("OutboundLink")],1)])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-wprdc",target:"_blank",rel:"noopener noreferrer"}},[e._v("Western Pennsylvania Regional Data Center Github repository"),r("OutboundLink")],1)])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[20],{435:function(e,t,a){e.exports=a.p+"assets/img/ckanext-validation.9e351f1c.png"},436:function(e,t,a){e.exports=a.p+"assets/img/data-validity-badges.41769a55.png"},437:function(e,t,a){e.exports=a.p+"assets/img/data-validation-on-upload.d1c30a10.png"},583:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h2",{attrs:{id:"context"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),r("p",[e._v("One of the main goals of the Frictionless Data project is to help improve data quality by providing easy to integrate libraries and services for data validation. We have integrated data validation seamlessly with different backends like GitHub and Amazon S3 via the online service "),r("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),r("OutboundLink")],1),e._v(", but we also wanted to explore closer integrations with other platforms.")]),e._v(" "),r("p",[e._v("An obvious choice for that are Open Data portals. They are still one of the main forms of dissemination of Open Data, especially for governments and other organizations. They provide a single entry point to data relating to a particular region or thematic area and provide users with tools to discover and access different datasets. On the backend, publishers also have tools available for the validation and publication of datasets.")]),e._v(" "),r("p",[e._v("Data Quality varies widely across different portals, reflecting the publication processes and requirements of the hosting organizations. In general, it is difficult for users to assess the quality of the data and there is a lack of descriptors for the actual data fields. At the publisher level, while strong emphasis has been put in metadata standards and interoperability, publishers don’t generally have the same help or guidance when dealing with data quality or description.")]),e._v(" "),r("p",[e._v("We believe that data quality in Open Data portals can have a central place on both these fronts, user-centric and publisher-centric, and we started this pilot to showcase a possible implementation.")]),e._v(" "),r("p",[e._v("To field test our implementation we chose the "),r("a",{attrs:{href:"https://www.wprdc.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Western Pennsylvania Regional Data Center"),r("OutboundLink")],1),e._v(" (WPRDC), managed by the "),r("a",{attrs:{href:"http://ucsur.pitt.edu/",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of Pittsburgh Center for Urban and Social Research"),r("OutboundLink")],1),e._v(". The Regional Data Center made for a good pilot as the project team takes an agile approach to managing their own CKAN instance along with support from OpenGov, members of the CKAN association. As the open data repository is used by a diverse array of data publishers (including project partners Allegheny County and the City of Pittsburgh), the Regional Data Center provides a good test case for testing the implementation across a variety of data types and publishing processes. WPRDC is a great example of a well managed Open Data portal, where datasets are actively maintained and the portal itself is just one component of a wider Open Data strategy. It also provides a good variety of publishers, including public sector agencies, academic institutions, and nonprofit organizations. The project’s partnership with the Digital Scholarship Services team at the University Library System also provides data management expertise not typically available in many open data implementations.")]),e._v(" "),r("h2",{attrs:{id:"the-work"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),r("h3",{attrs:{id:"what-did-we-do"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),r("p",[e._v("The portal software that we chose for this pilot is "),r("a",{attrs:{href:"https://ckan.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),r("OutboundLink")],1),e._v(", the world’s leading open source software for Open Data portals ("),r("a",{attrs:{href:"https://github.com/jalbertbowden/open-library/blob/master/lib/d2.1-state-of-the-art-report-and-evaluation-of-existing-open-data-platforms-2015-01-06-route-to-pa.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("source"),r("OutboundLink")],1),e._v("). Open Knowledge International initially fostered the CKAN project and is now a member of the "),r("a",{attrs:{href:"https://ckan.org/about/association/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN Association"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We created "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),r("OutboundLink")],1),e._v(", a CKAN extension that provides a low level API and readily available features for data validation and reporting that can be added to any CKAN instance. This is powered by "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables"),r("OutboundLink")],1),e._v(", a library developed by Open Knowledge International to support the validation of tabular datasets.")]),e._v(" "),r("p",[e._v("The extension allows users to perform data validation against any tabular resource, such as CSV or Excel files. This generates a report that is stored against a particular resource, describing issues found with the data, both at the structural level (missing headers, blank rows, etc) and at the data schema level (wrong data types, values out of range etc).")]),e._v(" "),r("p",[r("img",{attrs:{src:a(435),alt:""}}),r("br"),e._v(" "),r("em",[e._v("data validation on CKAN made possible by ckanext-validation extension")])]),e._v(" "),r("p",[e._v("This provides a good overview of the quality of the data to users but also to publishers so they can improve the quality of the data file by addressing these issues. The reports can be easily accessed via badges that provide a quick visual indication of the quality of the data file.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(436),alt:""}}),r("br"),e._v(" "),r("em",[e._v("badges indicating quality of data files on CKAN")])]),e._v(" "),r("p",[e._v("There are two default modes for performing the data validation when creating or updating resources. Data validation can be automatically performed in the background asynchronously or as part of the dataset creation in the user interface. In this case the validation will be performed immediately after uploading or linking to a new tabular file, giving quick feedback to publishers.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(437),alt:""}}),r("br"),e._v(" "),r("em",[e._v("data validation on upload or linking to a new tabular file on CKAN")])]),e._v(" "),r("p",[e._v("The extension adds functionality to provide a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema"),r("OutboundLink")],1),e._v(" for the data that describes the expected fields and types as well as other constraints, allowing to perform validation on the actual contents of the data. Additionally the schema is also stored with the resource metadata, so it can be displayed in the UI or accessed via the API.")]),e._v(" "),r("p",[e._v("The extension also provides some utility commands for CKAN maintainers, including the generation of "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation#data-validation-reports",target:"_blank",rel:"noopener noreferrer"}},[e._v("reports"),r("OutboundLink")],1),e._v(" showing the number of valid and invalid tabular files, a breakdown of the error types and links to the individual resources. This gives maintainers a snapshot of the general quality of the data hosted in their CKAN instance at any given moment in time.")]),e._v(" "),r("p",[e._v("As mentioned before, we field tested the validation extension on the Western Pennsylvania Regional Data Center (WPRDC). At the moment of the import the portal hosted 258 datasets. Out of these, 221 datasets had tabular resources, totalling 626 files (mainly CSV and XLSX files). Taking into account that we only performed the default validation that only includes structural checks (ie not schema-based ones) these are the results:")]),e._v(" "),r("blockquote",[r("p",[e._v("466 resources - validation success")])]),e._v(" "),r("blockquote",[r("p",[e._v("156 resources - validation failure")])]),e._v(" "),r("blockquote",[r("p",[e._v("4 resources - validation error")])]),e._v(" "),r("p",[e._v("The errors found are due to current limitations in the validation extension with large files.")]),e._v(" "),r("p",[e._v("Here’s a breakdown of the formats:")]),e._v(" "),r("table",[r("thead",[r("tr",[r("th",{staticStyle:{"text-align":"center"}}),e._v(" "),r("th",{staticStyle:{"text-align":"center"}},[e._v("Valid resources")]),e._v(" "),r("th",{staticStyle:{"text-align":"center"}},[e._v("Invalid / Errored resources")])])]),e._v(" "),r("tbody",[r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("CSV")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("443")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("64")])]),e._v(" "),r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("XLSX")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("21")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("57")])]),e._v(" "),r("tr",[r("td",{staticStyle:{"text-align":"center"}},[e._v("XLS")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("2")]),e._v(" "),r("td",{staticStyle:{"text-align":"center"}},[e._v("39")])])])]),e._v(" "),r("p",[e._v("And of the error types (more information about each error type can be found in the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Quality Specification"),r("OutboundLink")],1),e._v("):")]),e._v(" "),r("table",[r("thead",[r("tr",[r("th",[e._v("Type of Error")]),e._v(" "),r("th",[e._v("Error Count")])])]),e._v(" "),r("tbody",[r("tr",[r("td",[e._v("Blank row")]),e._v(" "),r("td",[e._v("19654")])]),e._v(" "),r("tr",[r("td",[e._v("Duplicate row")]),e._v(" "),r("td",[e._v("810")])]),e._v(" "),r("tr",[r("td",[e._v("Blank header")]),e._v(" "),r("td",[e._v("299")])]),e._v(" "),r("tr",[r("td",[e._v("Duplicate header")]),e._v(" "),r("td",[e._v("270")])]),e._v(" "),r("tr",[r("td",[e._v("Source error")]),e._v(" "),r("td",[e._v("30")])]),e._v(" "),r("tr",[r("td",[e._v("Extra value")]),e._v(" "),r("td",[e._v("11")])]),e._v(" "),r("tr",[r("td",[e._v("Format error")]),e._v(" "),r("td",[e._v("9")])]),e._v(" "),r("tr",[r("td",[e._v("HTTP error")]),e._v(" "),r("td",[e._v("2")])]),e._v(" "),r("tr",[r("td",[e._v("Missing value")]),e._v(" "),r("td",[e._v("1")])])])]),e._v(" "),r("p",[e._v("The highest number of errors are obviously caused by blank and duplicate rows. These are generally caused by Excel adding extra rows at the end of the file or by publishers formatting the files for human rather than machine consumption. Examples of this include adding a title in the first cell (like in this case: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/046e5b6a-0f90-4f8e-8c16-14057fd8872e/resource/b4aa617d-1cb8-42d0-8eb6-b650097cf2bf",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/046e5b6a-0f90-4f8e-8c16-14057fd8872e/resource/b4aa617d-1cb8-42d0-8eb6-b650097cf2bf/download/30-day-blotter-data-dictionary.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") or even more complex layouts ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/21a032e9-6345-42b3-b61e-10de29280946",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/21a032e9-6345-42b3-b61e-10de29280946/download/permitsummaryissuedmarch2015.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("), with logos and links. Blank and duplicate header errors like on this case ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/543ae03d-3ef4-45c7-b766-2ed49338120f/resource/f587d617-7afa-4e79-8010-c0d2bdff4c04",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/543ae03d-3ef4-45c7-b766-2ed49338120f/resource/f587d617-7afa-4e79-8010-c0d2bdff4c04/download/opendata-citiparks---summer-meal-sites-2015.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") are also normally caused by Excel storing extra empty columns (and something that can not be noticed directly from Excel).")]),e._v(" "),r("p",[e._v("These errors are easy to spot and fix manually once the file has been opened for inspection but this is still an extra step that data consumers need to perform before using the data on their own processes. It is also true that they are errors that could be easily fixed automatically as part of a pre-process of data cleanup before publication. Perhaps this is something that could be developed in the validation extension in the future.")]),e._v(" "),r("p",[e._v("Other less common errors include Source errors, which include errors that prevented the file from being read by goodtables, like encoding issues or HTTP responses or HTML files incorrectly being marked as Excel files (like in this case: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/9ea45609-e3b0-445a-8ace-0addb973fdf5",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/9c4eab3b-e05d-4af8-ad18-76e4c1a71a74/resource/9ea45609-e3b0-445a-8ace-0addb973fdf5/download/plipublicwebsitemonthlysummaryaugust2017.xls",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("). Extra value errors are generally caused by not properly quoting fields that contain commas, thus breaking the parser (example: "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/3130f583-9499-472b-bb5a-f63a6ff6059a/resource/12d9e6e1-3657-4cad-a430-119d34b1a5b2",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/3130f583-9499-472b-bb5a-f63a6ff6059a/resource/12d9e6e1-3657-4cad-a430-119d34b1a5b2/download/crashdatadictionary.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Format errors are caused by labelling incorrectly the format of the hosted file, for instance CSV when it links to an Excel file ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/669b2409-bb4b-46e5-9d91-c36876b58a17/resource/e919ecd3-bb11-4883-a041-bded25dc651c",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/669b2409-bb4b-46e5-9d91-c36876b58a17/resource/e919ecd3-bb11-4883-a041-bded25dc651c/download/2016-cveu-inspections.xlsx",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("), CSV linking to HTML ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/libraries/resource/14babf3f-4932-4828-8b49-3c9a03bae6d0",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://wprdc-maps.carto.com/u/wprdc/builder/1142950f-f054-4b3f-8c52-2f020e23cf78/embed",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v(") or XLS linking to XLSX ("),r("a",{attrs:{href:"https://data.wprdc.org/dataset/40188e1c-6d2e-4f20-9391-607bd3054949/resource/cf0617a1-b950-4aa7-a36d-dc9da412ddf7",target:"_blank",rel:"noopener noreferrer"}},[e._v("portal page"),r("OutboundLink")],1),e._v(" | "),r("a",{attrs:{href:"https://data.wprdc.org/dataset/40188e1c-6d2e-4f20-9391-607bd3054949/resource/cf0617a1-b950-4aa7-a36d-dc9da412ddf7/download/transportation.xls",target:"_blank",rel:"noopener noreferrer"}},[e._v("file"),r("OutboundLink")],1),e._v("). These are all easily fixed at the metadata level.")]),e._v(" "),r("p",[e._v("Finally HTTP errors just show that the linked file hosted elsewhere does not exist or has been moved.")]),e._v(" "),r("p",[e._v("Again, it is important to stress that the checks performed are just "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py#validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("basic and structural checks"),r("OutboundLink")],1),e._v(" that affect the general availability of the file and its general structure. The addition of standardized schemas would allow for a more thorough and precise validation, checking the data contents and ensuring that this is what was expected.")]),e._v(" "),r("p",[e._v("Also it is interesting to note that WPRDC has the excellent good practice of publishing data dictionaries describing the contents of the data files. These are generally published in CSV format and they themselves can present validation errors as well. As we saw before, using the validation extension we can assign a schema defined in the Table Schema spec to a resource. This will be used during the validation, but the information could also be used to render it nicely on the UI or export it consistently as a CSV or PDF file.")]),e._v(" "),r("p",[e._v("All the generated reports can be further analyzed using the output files stored "),r("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-wprdc",target:"_blank",rel:"noopener noreferrer"}},[e._v("in this repository"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Additionally, to help browse the validation reports created from the WPRDC site we have set up a demo site that mirrors the datasets, organizations and groups hosted there (at the time we did the import).")]),e._v(" "),r("p",[e._v("All tabular resources have the validation report attached, that can be accessed clicking on the data valid / invalid badges.")]),e._v(" "),r("h2",{attrs:{id:"next-steps"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[e._v("#")]),e._v(" Next Steps")]),e._v(" "),r("h3",{attrs:{id:"areas-for-further-work"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#areas-for-further-work"}},[e._v("#")]),e._v(" Areas for further work")]),e._v(" "),r("p",[e._v("The validation extension for CKAN currently provides a very basic workflow for validation at creation and update time: basically if the validation fails in any way you are not allowed to create or edit the dataset. Maintainers can define a set of default validation options to make it more permissive but even so some publishers probably wouldn’t want to enforce all validation checks before allowing the creation of a dataset, or just apply validation to datasets from a particular organization or type. Of course the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation#action-functions",target:"_blank",rel:"noopener noreferrer"}},[e._v("underlying API"),r("OutboundLink")],1),e._v(" is available for extension developers to implement these workflows, but the validation extension itself could provide some of them.")]),e._v(" "),r("p",[e._v("The user interface for defining the validation options can definitely be improved, and we are planning to integrate a "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation/issues/10",target:"_blank",rel:"noopener noreferrer"}},[e._v("Schema Creator"),r("OutboundLink")],1),e._v(" to make easier for publishers to describe their data with a schema based on the actual fields on the file. If the resource has a schema assigned, this information can be presented nicely on the UI to the users and exported in different formats.")]),e._v(" "),r("p",[e._v("The validation extension is a first iteration to demonstrate the capabilities of integrating data validation directly into CKAN, but we are keen to know about different ways in which this could be expanded or integrated in other workflows, so any feedback or thoughts is appreciated.")]),e._v(" "),r("h3",{attrs:{id:"additional-resources"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#additional-resources"}},[e._v("#")]),e._v(" Additional Resources")]),e._v(" "),r("ul",[r("li",[r("p",[e._v("Check the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation/blob/master/README.md#how-it-works",target:"_blank",rel:"noopener noreferrer"}},[e._v("full documentation"),r("OutboundLink")],1),e._v(" for ckanext-validation, covering all details on how to install it and configure it, features and available API")])]),e._v(" "),r("li",[r("p",[e._v("Source material:")])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation codebase"),r("OutboundLink")],1)])]),e._v(" "),r("li",[r("p",[r("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-wprdc",target:"_blank",rel:"noopener noreferrer"}},[e._v("Western Pennsylvania Regional Data Center Github repository"),r("OutboundLink")],1)])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/22.2c8e874b.js b/assets/js/22.7697693e.js similarity index 99% rename from assets/js/22.2c8e874b.js rename to assets/js/22.7697693e.js index 458605c8c..5e42e4804 100644 --- a/assets/js/22.2c8e874b.js +++ b/assets/js/22.7697693e.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[22],{475:function(e,a,t){e.exports=t.p+"assets/img/data-curator.1e9d2493.png"},476:function(e,a,t){e.exports=t.p+"assets/img/data-curator-2.947fbd26.png"},477:function(e,a,t){e.exports=t.p+"assets/img/data-curator-3.4f1191bc.png"},609:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h1",{attrs:{id:"data-curator-share-usable-open-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#data-curator-share-usable-open-data"}},[e._v("#")]),e._v(" Data Curator - share usable open data")]),e._v(" "),r("p",[e._v("Open data producers are increasingly focusing on improving open data so it can be easily used to create insight and drive positive change.")]),e._v(" "),r("p",[e._v("Open data is more likely to be used if data consumers can:")]),e._v(" "),r("ul",[r("li",[e._v("understand the structure of the data")]),e._v(" "),r("li",[e._v("understand the quality of the data")]),e._v(" "),r("li",[e._v("understand why and how the data was collected")]),e._v(" "),r("li",[e._v("look up the meaning of codes used in the data")]),e._v(" "),r("li",[e._v("access the data in an open machine-readable format")]),e._v(" "),r("li",[e._v("know how the data is licensed and how it can be reused")])]),e._v(" "),r("p",[e._v("Data Curator enables open data producers to define all this information using their desktop computer, prior to publishing it on the Internet.")]),e._v(" "),r("p",[e._v("Data Curator uses the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data specification"),r("OutboundLink")],1),e._v(" and software to package the data and supporting information in a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",title:"Tabular Data Package specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[r("img",{attrs:{src:t(475),alt:"Data Curator screenshot"}})]),e._v(" "),r("h2",{attrs:{id:"using-data-curator"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#using-data-curator"}},[e._v("#")]),e._v(" Using Data Curator")]),e._v(" "),r("p",[e._v("Here’s how to use Data Curator to share usable open data in a data package:")]),e._v(" "),r("ol",[r("li",[e._v("Download "),r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/releases/latest",title:"Download Data Curator for Windows or macOS",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Curator"),r("OutboundLink")],1),e._v(" for Windows or macOS")]),e._v(" "),r("li",[e._v("In Data Curator, either:\n"),r("ul",[r("li",[e._v("create some data")]),e._v(" "),r("li",[e._v("open an Excel sheet")]),e._v(" "),r("li",[e._v("open a separated value file (e.g. CSV, TSV)")])])]),e._v(" "),r("li",[e._v("Follow the steps below…")])]),e._v(" "),r("h3",{attrs:{id:"describe-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#describe-the-data"}},[e._v("#")]),e._v(" Describe the data")]),e._v(" "),r("p",[e._v("The Frictionless Data specification allows you to describe tabular data using a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",title:"Table Schema specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),r("OutboundLink")],1),e._v(". A Table Schema allows each field in the data to be given:")]),e._v(" "),r("ul",[r("li",[e._v("a "),r("code",[e._v("name")]),e._v(", "),r("code",[e._v("title")]),e._v(" and "),r("code",[e._v("description")])]),e._v(" "),r("li",[e._v("a data "),r("code",[e._v("type")]),e._v(" (e.g. "),r("code",[e._v("string")]),e._v(", "),r("code",[e._v("integer")]),e._v(") and "),r("code",[e._v("format")]),e._v(" (e.g. "),r("code",[e._v("uri")]),e._v(", "),r("code",[e._v("email")]),e._v(")")]),e._v(" "),r("li",[e._v("one or more "),r("code",[e._v("constraints")]),e._v(" (e.g. "),r("code",[e._v("required")]),e._v(", "),r("code",[e._v("unique")]),e._v(") to limit data values and improve data validation")])]),e._v(" "),r("p",[e._v("The Table Schema also allows you to describe the characters used to represent missing values (e.g. "),r("code",[e._v("n/a")]),e._v(", "),r("code",[e._v("tba")]),e._v("), primary keys, and foreign key relationships.")]),e._v(" "),r("p",[e._v("After adding data in Data Curator, to create a Table Schema:")]),e._v(" "),r("ul",[r("li",[e._v("Give your data a header row, if it doesn’t have one")]),e._v(" "),r("li",[e._v("Set the header row to give each field a "),r("code",[e._v("name")])]),e._v(" "),r("li",[e._v("Guess column properties to give each field a "),r("code",[e._v("type")]),e._v(" and "),r("code",[e._v("format")])]),e._v(" "),r("li",[e._v("Set column properties to improve the data "),r("code",[e._v("type")]),e._v(" and "),r("code",[e._v("format")]),e._v(" guesses, and add a "),r("code",[e._v("title")]),e._v(", "),r("code",[e._v("description")]),e._v(" and "),r("code",[e._v("constraints")])]),e._v(" "),r("li",[e._v("Set table properties to give the table a "),r("code",[e._v("name")]),e._v(", define missing values, a primary key, and foreign keys.")])]),e._v(" "),r("h3",{attrs:{id:"validate-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#validate-the-data"}},[e._v("#")]),e._v(" Validate the data")]),e._v(" "),r("p",[e._v("Using Data Curator, you can validate if the data complies with the field’s "),r("code",[e._v("type")]),e._v(", "),r("code",[e._v("format")]),e._v(" and "),r("code",[e._v("contraints")]),e._v(". Errors found can be filtered in different ways so you can correct errors by row, by column or by error type.")]),e._v(" "),r("p",[e._v("In some cases data errors cannot be corrected, as they should be corrected in the source system and not as part of the data packaging process. If you’re happy to publish the data with errors, the error messages can be appended to the provenance information.")]),e._v(" "),r("h3",{attrs:{id:"provide-context"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#provide-context"}},[e._v("#")]),e._v(" Provide context")]),e._v(" "),r("p",[e._v("Data Curator lets you add provenance information to help people understand why and how the data was collected and determine if it is fit for their purpose.")]),e._v(" "),r("p",[e._v("Provenance information can be entered using "),r("a",{attrs:{href:"http://commonmark.org",title:"Markdown specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Markdown"),r("OutboundLink")],1),e._v(". You can preview the Markdown formatting in Data Curator.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(476),alt:"Add provenance information screenshot"}})]),e._v(" "),r("p",[e._v("You should follow the "),r("RouterLink",{attrs:{to:"/blog/2016/04/20/publish-faq/",title:"Publishing Data Packages - FAQ"}},[e._v("Readme FAQ")]),e._v(" when writing provenance information or, even easier, cut and paste from this "),r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/blob/develop/test/features/tools/sample-provenance-information.md",title:"Sample Provenance Information Markdown file on GitHub",target:"_blank",rel:"noopener noreferrer"}},[e._v("sample"),r("OutboundLink")],1),e._v(".")],1),e._v(" "),r("h3",{attrs:{id:"explain-the-meaning-of-codes"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#explain-the-meaning-of-codes"}},[e._v("#")]),e._v(" Explain the meaning of codes")]),e._v(" "),r("p",[e._v("Data Curator supports foreign key relationships between data. Often a set of codes is used in a column of data and the list of valid codes and their description is in another table. The Frictionless Data specification enables linking this data within a table or across two tables in the same data package.")]),e._v(" "),r("p",[e._v("We’ve implemented the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#table-schema:-foreign-keys-to-data-packages",title:"The Foreign Keys to Data Packages pattern",target:"_blank",rel:"noopener noreferrer"}},[e._v("Foreign Keys to Data Packages pattern"),r("OutboundLink")],1),e._v(" so you can have foreign key relationships across two data packages. This is really useful if you want to share code-lists across organisations.")]),e._v(" "),r("p",[e._v("You can define foreign key relationships in Data Curator in the table properties and the relationships are checked when you validate the data.")]),e._v(" "),r("h3",{attrs:{id:"save-the-data-in-an-open-format"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#save-the-data-in-an-open-format"}},[e._v("#")]),e._v(" Save the data in an open format")]),e._v(" "),r("p",[e._v("Data Curator lets you save data as a comma, semicolon, or tab separated value file. A matching "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",title:"The CSV Dialect specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV Dialect"),r("OutboundLink")],1),e._v(" is added to the data package.")]),e._v(" "),r("h3",{attrs:{id:"apply-an-open-license"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#apply-an-open-license"}},[e._v("#")]),e._v(" Apply an open license")]),e._v(" "),r("p",[e._v("Applying a license, waiver, or public domain mark to a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",title:"The licenses property in the Data Package specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package"),r("OutboundLink")],1),e._v(" and its "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#optional-properties",title:"The licenses property in the Data Resource specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("resources"),r("OutboundLink")],1),e._v(" helps people understand how they can use, modify, and share the contents of the data package.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(477),alt:"Apply open license to data package screenshot"}})]),e._v(" "),r("p",[e._v("Although there are many ways to "),r("RouterLink",{attrs:{to:"/blog/2018/03/27/applying-licenses/",title:"Guide to applying licenses, waivers or public domain marks to data packages"}},[e._v("apply a licence, waiver or public domain mark")]),e._v(" to a data package, Data Curator only allows you to use open licences - after all, its purpose is to share usable open data.")],1),e._v(" "),r("h3",{attrs:{id:"export-the-data-package"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#export-the-data-package"}},[e._v("#")]),e._v(" Export the data package")]),e._v(" "),r("p",[e._v("To ensure only usable open data is shared, Data Curator applies some checks before allowing a data package to be exported. These go beyond the mandatory requirements* in the Frictionless Data specification.")]),e._v(" "),r("p",[e._v("To export a tabular data package, it must have:")]),e._v(" "),r("ul",[r("li",[e._v("a header row")]),e._v(" "),r("li",[e._v("a table schema*")]),e._v(" "),r("li",[e._v("a table (resource) "),r("code",[e._v("name")]),e._v("*")]),e._v(" "),r("li",[e._v("a data package "),r("code",[e._v("name")]),e._v("*")]),e._v(" "),r("li",[e._v("provenance information")]),e._v(" "),r("li",[e._v("an open licence applied to the data package")])]),e._v(" "),r("p",[e._v("If a data package "),r("code",[e._v("version")]),e._v(" is used, it must follow the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",title:"Data Package Version pattern",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package version pattern"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Before exporting a data package you should:")]),e._v(" "),r("ul",[r("li",[e._v("add a "),r("code",[e._v("title")]),e._v(" and "),r("code",[e._v("description")]),e._v(" to each field, table and data package")]),e._v(" "),r("li",[e._v("acknowledge any data sources and contributors")]),e._v(" "),r("li",[e._v("validate the data and add any known errors to the provenance information")])]),e._v(" "),r("p",[e._v("The data package is exported as a "),r("code",[e._v("datapackage.zip")]),e._v(" file that contains the:")]),e._v(" "),r("ul",[r("li",[e._v("data files in a "),r("code",[e._v("/data")]),e._v(" directory")]),e._v(" "),r("li",[e._v("data package, table (resource), table schema, and csv dialect properties in a"),r("code",[e._v("datapackage.json")]),e._v(" file")]),e._v(" "),r("li",[e._v("provenance information in a "),r("code",[e._v("README.md")]),e._v(" file")])]),e._v(" "),r("h3",{attrs:{id:"share-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#share-the-data"}},[e._v("#")]),e._v(" Share the data")]),e._v(" "),r("p",[e._v("Share the "),r("code",[e._v("datapackage.zip")]),e._v(" with open data consumers by publishing it on the Internet or on an open data platform. Some platforms support uploading, displaying, and downloading data packages.")]),e._v(" "),r("p",[e._v("Open data consumers will be able to read the data package with one of the many applications and software libraries that work with data packages, including Data Curator.")]),e._v(" "),r("h2",{attrs:{id:"get-started"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#get-started"}},[e._v("#")]),e._v(" Get Started")]),e._v(" "),r("p",[r("strong",[r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/releases/latest",title:"Download Data Curator for Windows or macOS",target:"_blank",rel:"noopener noreferrer"}},[e._v("Download Data Curator"),r("OutboundLink")],1)]),e._v(" for Windows or macOS and start sharing usable open data.")]),e._v(" "),r("h2",{attrs:{id:"who-made-data-curator"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#who-made-data-curator"}},[e._v("#")]),e._v(" Who made Data Curator?")]),e._v(" "),r("p",[e._v("Data Curator was made possible with funding from the "),r("a",{attrs:{href:"https://www.qld.gov.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Queensland Government"),r("OutboundLink")],1),e._v(" and the guidance of the Open Data Policy team within the Department of Housing and Public Works. We’re grateful for the ideas and testing provided by open data champions in the Department of Environment and Science, and the Department of Transport and Main Roads.")]),e._v(" "),r("p",[e._v("The project was led by "),r("a",{attrs:{href:"https://theodi.org/article/open-data-pathway-introducing-country-level-statistics/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Stephen Gates"),r("OutboundLink")],1),e._v(" from the "),r("a",{attrs:{href:"https://www.linkedin.com/company/odiaustraliannetwork/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ODI Australian Network"),r("OutboundLink")],1),e._v(". Software development was coordinated by Gavin Kennedy and performed by Matt Mulholland from the "),r("a",{attrs:{href:"https://www.qcif.edu.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Queensland Cyber Infrastructure Foundation"),r("OutboundLink")],1),e._v(" (QCIF).")]),e._v(" "),r("p",[e._v("Data Curator uses the Frictionless Data software libraries maintained by "),r("a",{attrs:{href:"https://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(" and we’re extremely grateful for the support provided by "),r("a",{attrs:{href:"https://github.com/orgs/frictionlessdata/teams/core/members",target:"_blank",rel:"noopener noreferrer"}},[e._v("the team"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Data Curator started life as "),r("a",{attrs:{href:"http://comma-chameleon.io",title:"Comma Chameleon - A desktop CSV editor for data publishers\n",target:"_blank",rel:"noopener noreferrer"}},[e._v("Comma Chameleon"),r("OutboundLink")],1),e._v(", an "),r("a",{attrs:{href:"https://youtu.be/wIIw0cTeUG0",title:"Stuart Harrison explains Comma Chameleon at CSVConf",target:"_blank",rel:"noopener noreferrer"}},[e._v("experiment"),r("OutboundLink")],1),e._v(" by "),r("a",{attrs:{href:"https://theodi.org",title:"The Open Data Institute",target:"_blank",rel:"noopener noreferrer"}},[e._v("the ODI"),r("OutboundLink")],1),e._v(". The ODI and the ODI Australian Network agreed to take the software in "),r("a",{attrs:{href:"https://theodi.org/article/odi-toolbox-application-experiments-from-comma-chameleon-to-data-curator/",title:"Stephen Fortune explains why Data Curator is a fork of Comma Chameleon",target:"_blank",rel:"noopener noreferrer"}},[e._v("different directions"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[22],{475:function(e,a,t){e.exports=t.p+"assets/img/data-curator.1e9d2493.png"},476:function(e,a,t){e.exports=t.p+"assets/img/data-curator-2.947fbd26.png"},477:function(e,a,t){e.exports=t.p+"assets/img/data-curator-3.4f1191bc.png"},608:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("h1",{attrs:{id:"data-curator-share-usable-open-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#data-curator-share-usable-open-data"}},[e._v("#")]),e._v(" Data Curator - share usable open data")]),e._v(" "),r("p",[e._v("Open data producers are increasingly focusing on improving open data so it can be easily used to create insight and drive positive change.")]),e._v(" "),r("p",[e._v("Open data is more likely to be used if data consumers can:")]),e._v(" "),r("ul",[r("li",[e._v("understand the structure of the data")]),e._v(" "),r("li",[e._v("understand the quality of the data")]),e._v(" "),r("li",[e._v("understand why and how the data was collected")]),e._v(" "),r("li",[e._v("look up the meaning of codes used in the data")]),e._v(" "),r("li",[e._v("access the data in an open machine-readable format")]),e._v(" "),r("li",[e._v("know how the data is licensed and how it can be reused")])]),e._v(" "),r("p",[e._v("Data Curator enables open data producers to define all this information using their desktop computer, prior to publishing it on the Internet.")]),e._v(" "),r("p",[e._v("Data Curator uses the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data specification"),r("OutboundLink")],1),e._v(" and software to package the data and supporting information in a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",title:"Tabular Data Package specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[r("img",{attrs:{src:t(475),alt:"Data Curator screenshot"}})]),e._v(" "),r("h2",{attrs:{id:"using-data-curator"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#using-data-curator"}},[e._v("#")]),e._v(" Using Data Curator")]),e._v(" "),r("p",[e._v("Here’s how to use Data Curator to share usable open data in a data package:")]),e._v(" "),r("ol",[r("li",[e._v("Download "),r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/releases/latest",title:"Download Data Curator for Windows or macOS",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Curator"),r("OutboundLink")],1),e._v(" for Windows or macOS")]),e._v(" "),r("li",[e._v("In Data Curator, either:\n"),r("ul",[r("li",[e._v("create some data")]),e._v(" "),r("li",[e._v("open an Excel sheet")]),e._v(" "),r("li",[e._v("open a separated value file (e.g. CSV, TSV)")])])]),e._v(" "),r("li",[e._v("Follow the steps below…")])]),e._v(" "),r("h3",{attrs:{id:"describe-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#describe-the-data"}},[e._v("#")]),e._v(" Describe the data")]),e._v(" "),r("p",[e._v("The Frictionless Data specification allows you to describe tabular data using a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",title:"Table Schema specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),r("OutboundLink")],1),e._v(". A Table Schema allows each field in the data to be given:")]),e._v(" "),r("ul",[r("li",[e._v("a "),r("code",[e._v("name")]),e._v(", "),r("code",[e._v("title")]),e._v(" and "),r("code",[e._v("description")])]),e._v(" "),r("li",[e._v("a data "),r("code",[e._v("type")]),e._v(" (e.g. "),r("code",[e._v("string")]),e._v(", "),r("code",[e._v("integer")]),e._v(") and "),r("code",[e._v("format")]),e._v(" (e.g. "),r("code",[e._v("uri")]),e._v(", "),r("code",[e._v("email")]),e._v(")")]),e._v(" "),r("li",[e._v("one or more "),r("code",[e._v("constraints")]),e._v(" (e.g. "),r("code",[e._v("required")]),e._v(", "),r("code",[e._v("unique")]),e._v(") to limit data values and improve data validation")])]),e._v(" "),r("p",[e._v("The Table Schema also allows you to describe the characters used to represent missing values (e.g. "),r("code",[e._v("n/a")]),e._v(", "),r("code",[e._v("tba")]),e._v("), primary keys, and foreign key relationships.")]),e._v(" "),r("p",[e._v("After adding data in Data Curator, to create a Table Schema:")]),e._v(" "),r("ul",[r("li",[e._v("Give your data a header row, if it doesn’t have one")]),e._v(" "),r("li",[e._v("Set the header row to give each field a "),r("code",[e._v("name")])]),e._v(" "),r("li",[e._v("Guess column properties to give each field a "),r("code",[e._v("type")]),e._v(" and "),r("code",[e._v("format")])]),e._v(" "),r("li",[e._v("Set column properties to improve the data "),r("code",[e._v("type")]),e._v(" and "),r("code",[e._v("format")]),e._v(" guesses, and add a "),r("code",[e._v("title")]),e._v(", "),r("code",[e._v("description")]),e._v(" and "),r("code",[e._v("constraints")])]),e._v(" "),r("li",[e._v("Set table properties to give the table a "),r("code",[e._v("name")]),e._v(", define missing values, a primary key, and foreign keys.")])]),e._v(" "),r("h3",{attrs:{id:"validate-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#validate-the-data"}},[e._v("#")]),e._v(" Validate the data")]),e._v(" "),r("p",[e._v("Using Data Curator, you can validate if the data complies with the field’s "),r("code",[e._v("type")]),e._v(", "),r("code",[e._v("format")]),e._v(" and "),r("code",[e._v("contraints")]),e._v(". Errors found can be filtered in different ways so you can correct errors by row, by column or by error type.")]),e._v(" "),r("p",[e._v("In some cases data errors cannot be corrected, as they should be corrected in the source system and not as part of the data packaging process. If you’re happy to publish the data with errors, the error messages can be appended to the provenance information.")]),e._v(" "),r("h3",{attrs:{id:"provide-context"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#provide-context"}},[e._v("#")]),e._v(" Provide context")]),e._v(" "),r("p",[e._v("Data Curator lets you add provenance information to help people understand why and how the data was collected and determine if it is fit for their purpose.")]),e._v(" "),r("p",[e._v("Provenance information can be entered using "),r("a",{attrs:{href:"http://commonmark.org",title:"Markdown specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("Markdown"),r("OutboundLink")],1),e._v(". You can preview the Markdown formatting in Data Curator.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(476),alt:"Add provenance information screenshot"}})]),e._v(" "),r("p",[e._v("You should follow the "),r("RouterLink",{attrs:{to:"/blog/2016/04/20/publish-faq/",title:"Publishing Data Packages - FAQ"}},[e._v("Readme FAQ")]),e._v(" when writing provenance information or, even easier, cut and paste from this "),r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/blob/develop/test/features/tools/sample-provenance-information.md",title:"Sample Provenance Information Markdown file on GitHub",target:"_blank",rel:"noopener noreferrer"}},[e._v("sample"),r("OutboundLink")],1),e._v(".")],1),e._v(" "),r("h3",{attrs:{id:"explain-the-meaning-of-codes"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#explain-the-meaning-of-codes"}},[e._v("#")]),e._v(" Explain the meaning of codes")]),e._v(" "),r("p",[e._v("Data Curator supports foreign key relationships between data. Often a set of codes is used in a column of data and the list of valid codes and their description is in another table. The Frictionless Data specification enables linking this data within a table or across two tables in the same data package.")]),e._v(" "),r("p",[e._v("We’ve implemented the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#table-schema:-foreign-keys-to-data-packages",title:"The Foreign Keys to Data Packages pattern",target:"_blank",rel:"noopener noreferrer"}},[e._v("Foreign Keys to Data Packages pattern"),r("OutboundLink")],1),e._v(" so you can have foreign key relationships across two data packages. This is really useful if you want to share code-lists across organisations.")]),e._v(" "),r("p",[e._v("You can define foreign key relationships in Data Curator in the table properties and the relationships are checked when you validate the data.")]),e._v(" "),r("h3",{attrs:{id:"save-the-data-in-an-open-format"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#save-the-data-in-an-open-format"}},[e._v("#")]),e._v(" Save the data in an open format")]),e._v(" "),r("p",[e._v("Data Curator lets you save data as a comma, semicolon, or tab separated value file. A matching "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",title:"The CSV Dialect specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV Dialect"),r("OutboundLink")],1),e._v(" is added to the data package.")]),e._v(" "),r("h3",{attrs:{id:"apply-an-open-license"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#apply-an-open-license"}},[e._v("#")]),e._v(" Apply an open license")]),e._v(" "),r("p",[e._v("Applying a license, waiver, or public domain mark to a "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",title:"The licenses property in the Data Package specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package"),r("OutboundLink")],1),e._v(" and its "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#optional-properties",title:"The licenses property in the Data Resource specification",target:"_blank",rel:"noopener noreferrer"}},[e._v("resources"),r("OutboundLink")],1),e._v(" helps people understand how they can use, modify, and share the contents of the data package.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(477),alt:"Apply open license to data package screenshot"}})]),e._v(" "),r("p",[e._v("Although there are many ways to "),r("RouterLink",{attrs:{to:"/blog/2018/03/27/applying-licenses/",title:"Guide to applying licenses, waivers or public domain marks to data packages"}},[e._v("apply a licence, waiver or public domain mark")]),e._v(" to a data package, Data Curator only allows you to use open licences - after all, its purpose is to share usable open data.")],1),e._v(" "),r("h3",{attrs:{id:"export-the-data-package"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#export-the-data-package"}},[e._v("#")]),e._v(" Export the data package")]),e._v(" "),r("p",[e._v("To ensure only usable open data is shared, Data Curator applies some checks before allowing a data package to be exported. These go beyond the mandatory requirements* in the Frictionless Data specification.")]),e._v(" "),r("p",[e._v("To export a tabular data package, it must have:")]),e._v(" "),r("ul",[r("li",[e._v("a header row")]),e._v(" "),r("li",[e._v("a table schema*")]),e._v(" "),r("li",[e._v("a table (resource) "),r("code",[e._v("name")]),e._v("*")]),e._v(" "),r("li",[e._v("a data package "),r("code",[e._v("name")]),e._v("*")]),e._v(" "),r("li",[e._v("provenance information")]),e._v(" "),r("li",[e._v("an open licence applied to the data package")])]),e._v(" "),r("p",[e._v("If a data package "),r("code",[e._v("version")]),e._v(" is used, it must follow the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",title:"Data Package Version pattern",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package version pattern"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Before exporting a data package you should:")]),e._v(" "),r("ul",[r("li",[e._v("add a "),r("code",[e._v("title")]),e._v(" and "),r("code",[e._v("description")]),e._v(" to each field, table and data package")]),e._v(" "),r("li",[e._v("acknowledge any data sources and contributors")]),e._v(" "),r("li",[e._v("validate the data and add any known errors to the provenance information")])]),e._v(" "),r("p",[e._v("The data package is exported as a "),r("code",[e._v("datapackage.zip")]),e._v(" file that contains the:")]),e._v(" "),r("ul",[r("li",[e._v("data files in a "),r("code",[e._v("/data")]),e._v(" directory")]),e._v(" "),r("li",[e._v("data package, table (resource), table schema, and csv dialect properties in a"),r("code",[e._v("datapackage.json")]),e._v(" file")]),e._v(" "),r("li",[e._v("provenance information in a "),r("code",[e._v("README.md")]),e._v(" file")])]),e._v(" "),r("h3",{attrs:{id:"share-the-data"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#share-the-data"}},[e._v("#")]),e._v(" Share the data")]),e._v(" "),r("p",[e._v("Share the "),r("code",[e._v("datapackage.zip")]),e._v(" with open data consumers by publishing it on the Internet or on an open data platform. Some platforms support uploading, displaying, and downloading data packages.")]),e._v(" "),r("p",[e._v("Open data consumers will be able to read the data package with one of the many applications and software libraries that work with data packages, including Data Curator.")]),e._v(" "),r("h2",{attrs:{id:"get-started"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#get-started"}},[e._v("#")]),e._v(" Get Started")]),e._v(" "),r("p",[r("strong",[r("a",{attrs:{href:"https://github.com/ODIQueensland/data-curator/releases/latest",title:"Download Data Curator for Windows or macOS",target:"_blank",rel:"noopener noreferrer"}},[e._v("Download Data Curator"),r("OutboundLink")],1)]),e._v(" for Windows or macOS and start sharing usable open data.")]),e._v(" "),r("h2",{attrs:{id:"who-made-data-curator"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#who-made-data-curator"}},[e._v("#")]),e._v(" Who made Data Curator?")]),e._v(" "),r("p",[e._v("Data Curator was made possible with funding from the "),r("a",{attrs:{href:"https://www.qld.gov.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Queensland Government"),r("OutboundLink")],1),e._v(" and the guidance of the Open Data Policy team within the Department of Housing and Public Works. We’re grateful for the ideas and testing provided by open data champions in the Department of Environment and Science, and the Department of Transport and Main Roads.")]),e._v(" "),r("p",[e._v("The project was led by "),r("a",{attrs:{href:"https://theodi.org/article/open-data-pathway-introducing-country-level-statistics/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Stephen Gates"),r("OutboundLink")],1),e._v(" from the "),r("a",{attrs:{href:"https://www.linkedin.com/company/odiaustraliannetwork/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ODI Australian Network"),r("OutboundLink")],1),e._v(". Software development was coordinated by Gavin Kennedy and performed by Matt Mulholland from the "),r("a",{attrs:{href:"https://www.qcif.edu.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Queensland Cyber Infrastructure Foundation"),r("OutboundLink")],1),e._v(" (QCIF).")]),e._v(" "),r("p",[e._v("Data Curator uses the Frictionless Data software libraries maintained by "),r("a",{attrs:{href:"https://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(" and we’re extremely grateful for the support provided by "),r("a",{attrs:{href:"https://github.com/orgs/frictionlessdata/teams/core/members",target:"_blank",rel:"noopener noreferrer"}},[e._v("the team"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Data Curator started life as "),r("a",{attrs:{href:"http://comma-chameleon.io",title:"Comma Chameleon - A desktop CSV editor for data publishers\n",target:"_blank",rel:"noopener noreferrer"}},[e._v("Comma Chameleon"),r("OutboundLink")],1),e._v(", an "),r("a",{attrs:{href:"https://youtu.be/wIIw0cTeUG0",title:"Stuart Harrison explains Comma Chameleon at CSVConf",target:"_blank",rel:"noopener noreferrer"}},[e._v("experiment"),r("OutboundLink")],1),e._v(" by "),r("a",{attrs:{href:"https://theodi.org",title:"The Open Data Institute",target:"_blank",rel:"noopener noreferrer"}},[e._v("the ODI"),r("OutboundLink")],1),e._v(". The ODI and the ODI Australian Network agreed to take the software in "),r("a",{attrs:{href:"https://theodi.org/article/odi-toolbox-application-experiments-from-comma-chameleon-to-data-curator/",title:"Stephen Fortune explains why Data Curator is a fork of Comma Chameleon",target:"_blank",rel:"noopener noreferrer"}},[e._v("different directions"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/24.289b646c.js b/assets/js/24.c46f7101.js similarity index 86% rename from assets/js/24.289b646c.js rename to assets/js/24.c46f7101.js index 8b237dcf6..76b0d5f91 100644 --- a/assets/js/24.289b646c.js +++ b/assets/js/24.c46f7101.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[24],{506:function(e,t,o){e.exports=o.p+"assets/img/home.82d676b3.png"},507:function(e,t,o){e.exports=o.p+"assets/img/brand.5332d005.png"},508:function(e,t,o){e.exports=o.p+"assets/img/team.d3b5ad31.png"},632:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("We’re excited to announce the launch of our newly designed Frictionless Data website. The goal of the rebranding was to better communicate our brand values and improve the user experience. We want Frictionless Data to be wildly successful – we want people to not only know about us, but also also use our tools by default.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(506),alt:"Frictionless Data Homepage"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Screenshot of Frictionless Data Homepage")])]),e._v(" "),r("p",[e._v("We’ve improved the layout of our content, done some general changes on our brand logo, design, as well as on the whole site structure - the navigation is now more accessible with a sidebar option integrated so you can access key items easily and you get more from a quick read.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(507),alt:"Revamped Frictionless Brand Logo"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Revamped Frictionless Brand Logo")])]),e._v(" "),r("p",[e._v("We have a new "),r("a",{attrs:{href:"https://frictionlessdata.io/team/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Team page"),r("OutboundLink")],1),e._v(" with a list of Core Team Members, Tool Fund Partners, and Reproducible Research Fellows contributing effort to the project. There are also many other smaller, but impactful changes, all aiming to make the experience of the Frictionless Data website much better for you.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(508),alt:" Team Page"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Frictionless Data Team Page")])]),e._v(" "),r("p",[e._v("In our bid to increase the adoption of our tooling and specifications, we are also working on rewriting our documentation. The current effort involved will birth a new subpage called the "),r("a",{attrs:{href:"https://frictionlessdata.io/guide/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Guide"),r("OutboundLink")],1),e._v(" - it’s first section is even already published on the website. Furthermore, we’ll be releasing different How-to’s sections that’ll walk our users through the steps required to solve a real-world data problem.")]),e._v(" "),r("p",[e._v("We hope you find our new website fresher, cleaner and clearer. If you have any feedback and/or improvement suggestions, please let us know on our "),r("a",{attrs:{href:"https://discordapp.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord Channel"),r("OutboundLink")],1),e._v(" or on "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[24],{503:function(e,t,o){e.exports=o.p+"assets/img/home.82d676b3.png"},504:function(e,t,o){e.exports=o.p+"assets/img/brand.5332d005.png"},505:function(e,t,o){e.exports=o.p+"assets/img/team.d3b5ad31.png"},628:function(e,t,o){"use strict";o.r(t);var r=o(29),a=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("We’re excited to announce the launch of our newly designed Frictionless Data website. The goal of the rebranding was to better communicate our brand values and improve the user experience. We want Frictionless Data to be wildly successful – we want people to not only know about us, but also also use our tools by default.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(503),alt:"Frictionless Data Homepage"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Screenshot of Frictionless Data Homepage")])]),e._v(" "),r("p",[e._v("We’ve improved the layout of our content, done some general changes on our brand logo, design, as well as on the whole site structure - the navigation is now more accessible with a sidebar option integrated so you can access key items easily and you get more from a quick read.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(504),alt:"Revamped Frictionless Brand Logo"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Revamped Frictionless Brand Logo")])]),e._v(" "),r("p",[e._v("We have a new "),r("a",{attrs:{href:"https://frictionlessdata.io/team/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Team page"),r("OutboundLink")],1),e._v(" with a list of Core Team Members, Tool Fund Partners, and Reproducible Research Fellows contributing effort to the project. There are also many other smaller, but impactful changes, all aiming to make the experience of the Frictionless Data website much better for you.")]),e._v(" "),r("figure",[r("img",{attrs:{src:o(505),alt:" Team Page"}}),e._v(" "),r("figcaption",{staticStyle:{"text-align":"center"}},[e._v("Frictionless Data Team Page")])]),e._v(" "),r("p",[e._v("In our bid to increase the adoption of our tooling and specifications, we are also working on rewriting our documentation. The current effort involved will birth a new subpage called the "),r("a",{attrs:{href:"https://frictionlessdata.io/guide/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Guide"),r("OutboundLink")],1),e._v(" - it’s first section is even already published on the website. Furthermore, we’ll be releasing different How-to’s sections that’ll walk our users through the steps required to solve a real-world data problem.")]),e._v(" "),r("p",[e._v("We hope you find our new website fresher, cleaner and clearer. If you have any feedback and/or improvement suggestions, please let us know on our "),r("a",{attrs:{href:"https://discordapp.com/invite/Sewv6av",target:"_blank",rel:"noopener noreferrer"}},[e._v("Discord Channel"),r("OutboundLink")],1),e._v(" or on "),r("a",{attrs:{href:"https://twitter.com/frictionlessd8a",target:"_blank",rel:"noopener noreferrer"}},[e._v("Twitter"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=a.exports}}]); \ No newline at end of file diff --git a/assets/js/25.42ace4bf.js b/assets/js/25.f083e122.js similarity index 99% rename from assets/js/25.42ace4bf.js rename to assets/js/25.f083e122.js index 31ec58ee7..f021d5e7a 100644 --- a/assets/js/25.42ace4bf.js +++ b/assets/js/25.f083e122.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[25],{511:function(e,t,a){e.exports=a.p+"assets/img/figure1.8ca2ebc2.png"},512:function(e,t,a){e.exports=a.p+"assets/img/figure2.581442ee.png"},513:function(e,t,a){e.exports=a.p+"assets/img/figure3.985e7aff.png"},681:function(e,t,a){"use strict";a.r(t);var o=a(29),i=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Scientific work produces a wealth of data every year - ranging from electrical signals in neurons to maze-running in mice to hospital readmission counts in patients. Taken as a whole, this data could be queried to discover new connections that could lead to new breakthroughs – how does that increased neuronal activity lead to better memory performance in a mouse, and does that relate to improved Alzheimer’s outcomes in humans? The data is there, but it is often difficult to find and mobilize.")]),e._v(" "),o("p",[e._v("A main reason that this data is under-utilized is because datasets are often created in fragmented, domain-specific, or proprietary formats that aren’t easily used by others. The Frictionless Data team has been working with Dr. Philippe Rocca-Serra on some of these key challenges – increasing data set discoverability and highlighting how disparate data can be combined. Establishing a dataset catalogue, or index, represents a solution for helping scientists discover data. But, this requires some level of data standardization from different sources. To accomplish this, Dr. Rocca-Serra with the NIH Common Fund Data Ecosystem (NIH CFDE) opted for the Frictionless Data for Reproducible Research Project at the Open Knowledge Foundation (OKF).")]),e._v(" "),o("p",[e._v("The "),o("a",{attrs:{href:"https://www.nih-cfde.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("NIH Common Fund Data Ecosystem"),o("OutboundLink")],1),e._v(" project launched in 2019 with the aim of providing a data discovery portal in the form of a single venue where all data coordinating centers (DCC) funded by the NIH would index their experimental metadata. Therefore, the "),o("a",{attrs:{href:"https://www.nih-cfde.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("NIH-CFDE"),o("OutboundLink")],1),e._v(" is meant to be a data catalogue (Figure 1), allowing users to search the entire set of NIH funded programs from one single data aggregating site. Achieving this goal is no mean feat, requiring striking a balance between functional simplicity and useful detail. Data extraction from individual coordinating centers (for example LINCS DCC) into the selected format should be as straightforward as possible yet the underlying object model needs to be rich enough to allow meaningful structuring of the information.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(511),alt:"Figure 1"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 1")]),e._v(" shows the landing page of the NIH-CFDE data portal which welcomes visitors to a histogram detailing the datasets distribution based on data types and file counts by default. This settings may be changes to show sample counts, species or anatomical location for instance."),o("br"),e._v("\nurl: "),o("a",{attrs:{href:"https://www.nih-cfde.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.nih-cfde.org/"),o("OutboundLink")],1),o("br")])]),e._v(" "),o("p",[e._v("Furthermore, it is highly desirable to ensure that structural and content validation is performed prior to upload, so only valid submissions are sent to the Deriva-based NIH CFDE catalogue. How could the team achieve these goals while keeping the agility and flexibility required to allow for iterations to occur, adjustments to be made, and integration of user feedback to be included without major overhauls?")]),e._v(" "),o("p",[e._v("Owing to the nature of the defined backend, the Deriva System, and the overall consistency of data stored by most DCCs, an object model was built around key objects, connected together via linked tables, very much following the "),o("a",{attrs:{href:"https://en.wikipedia.org/wiki/OLAP_cube",target:"_blank",rel:"noopener noreferrer"}},[e._v("RDBMS / OLAP cubes paradigm"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("With this as a background, the choice of using "),o("a",{attrs:{href:"https://frictionlessdata.io/standards/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKF Frictionless data packages framework"),o("OutboundLink")],1),e._v(" came to the fore. The Frictionless specifications are straightforward to understand, supported by libraries available in different languages, allowing creation, I/O operations and validations of objects models as well as instance data.")]),e._v(" "),o("p",[e._v("Frictionless specifications offer several features which assist several aspects of data interoperation and reuse. The tabular data is always shipped with a JSON-formated definition of the field headers. Each field is typed to a data type but can also be marked-up with an RDFtype. Terminology harmonization relies on 4 resources, NCBI Taxonomy for species descriptions, UBERON for anatomical terms, OBI for experimental methods, and EDAM for data types and file format. Regular expression can be specified by the data model for input validation, and last but not least, the declaration of missing information can be made explicit and specific. The CFDE CrossCut Metadata Model (C2M2) relies on Frictionless specifications to define the objects and their relations (Figure 2).")]),e._v(" "),o("p",[o("img",{attrs:{src:a(512),alt:"Figure 2"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 2")]),e._v(" shows the latest version of the NIH CFDE data models where the central objects to enable data discovery are identified. Namely, study, biomaterial, biosample, file, each coming with a tight, essential set of attributes some of which associated to controlled vocabularies. url: "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/c2m2/draft-C2M2_specification/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://docs.nih-cfde.org/en/latest/c2m2/draft-C2M2_specification/"),o("OutboundLink")],1),o("br")])]),e._v(" "),o("p",[e._v("Researchers can submit their metadata to the portal via the "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/cfde-submit/docs/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datapackage Submission System"),o("OutboundLink")],1),e._v("(Figure 3). By incorporating Frictionless specifications to produce a common metadata model and applying a thin layer of semantic harmonization on core biological objects, we are closer to the goal of making available an aggregated data index that increases visibility, reusability and clarity of access to a wealth of experimental data. The NIH CFDE data portal currently indexes over 2 million data files, mainly from RNA-Seq and imaging experiments from 9 major NIH programs: a treasure trove for data miners.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(513),alt:"Figure 3"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 3")]),e._v(" shows the architecture of the software components supporting the overall operation, from ETL from the individual DCC into the NIH CFDE data model to the validation and upload component."),o("br"),e._v("\nurl: "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/cfde-submit/docs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://docs.nih-cfde.org/en/latest/cfde-submit/docs/"),o("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[25],{511:function(e,t,a){e.exports=a.p+"assets/img/figure1.8ca2ebc2.png"},512:function(e,t,a){e.exports=a.p+"assets/img/figure2.581442ee.png"},513:function(e,t,a){e.exports=a.p+"assets/img/figure3.985e7aff.png"},680:function(e,t,a){"use strict";a.r(t);var o=a(29),i=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("Scientific work produces a wealth of data every year - ranging from electrical signals in neurons to maze-running in mice to hospital readmission counts in patients. Taken as a whole, this data could be queried to discover new connections that could lead to new breakthroughs – how does that increased neuronal activity lead to better memory performance in a mouse, and does that relate to improved Alzheimer’s outcomes in humans? The data is there, but it is often difficult to find and mobilize.")]),e._v(" "),o("p",[e._v("A main reason that this data is under-utilized is because datasets are often created in fragmented, domain-specific, or proprietary formats that aren’t easily used by others. The Frictionless Data team has been working with Dr. Philippe Rocca-Serra on some of these key challenges – increasing data set discoverability and highlighting how disparate data can be combined. Establishing a dataset catalogue, or index, represents a solution for helping scientists discover data. But, this requires some level of data standardization from different sources. To accomplish this, Dr. Rocca-Serra with the NIH Common Fund Data Ecosystem (NIH CFDE) opted for the Frictionless Data for Reproducible Research Project at the Open Knowledge Foundation (OKF).")]),e._v(" "),o("p",[e._v("The "),o("a",{attrs:{href:"https://www.nih-cfde.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("NIH Common Fund Data Ecosystem"),o("OutboundLink")],1),e._v(" project launched in 2019 with the aim of providing a data discovery portal in the form of a single venue where all data coordinating centers (DCC) funded by the NIH would index their experimental metadata. Therefore, the "),o("a",{attrs:{href:"https://www.nih-cfde.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("NIH-CFDE"),o("OutboundLink")],1),e._v(" is meant to be a data catalogue (Figure 1), allowing users to search the entire set of NIH funded programs from one single data aggregating site. Achieving this goal is no mean feat, requiring striking a balance between functional simplicity and useful detail. Data extraction from individual coordinating centers (for example LINCS DCC) into the selected format should be as straightforward as possible yet the underlying object model needs to be rich enough to allow meaningful structuring of the information.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(511),alt:"Figure 1"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 1")]),e._v(" shows the landing page of the NIH-CFDE data portal which welcomes visitors to a histogram detailing the datasets distribution based on data types and file counts by default. This settings may be changes to show sample counts, species or anatomical location for instance."),o("br"),e._v("\nurl: "),o("a",{attrs:{href:"https://www.nih-cfde.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.nih-cfde.org/"),o("OutboundLink")],1),o("br")])]),e._v(" "),o("p",[e._v("Furthermore, it is highly desirable to ensure that structural and content validation is performed prior to upload, so only valid submissions are sent to the Deriva-based NIH CFDE catalogue. How could the team achieve these goals while keeping the agility and flexibility required to allow for iterations to occur, adjustments to be made, and integration of user feedback to be included without major overhauls?")]),e._v(" "),o("p",[e._v("Owing to the nature of the defined backend, the Deriva System, and the overall consistency of data stored by most DCCs, an object model was built around key objects, connected together via linked tables, very much following the "),o("a",{attrs:{href:"https://en.wikipedia.org/wiki/OLAP_cube",target:"_blank",rel:"noopener noreferrer"}},[e._v("RDBMS / OLAP cubes paradigm"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("With this as a background, the choice of using "),o("a",{attrs:{href:"https://frictionlessdata.io/standards/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKF Frictionless data packages framework"),o("OutboundLink")],1),e._v(" came to the fore. The Frictionless specifications are straightforward to understand, supported by libraries available in different languages, allowing creation, I/O operations and validations of objects models as well as instance data.")]),e._v(" "),o("p",[e._v("Frictionless specifications offer several features which assist several aspects of data interoperation and reuse. The tabular data is always shipped with a JSON-formated definition of the field headers. Each field is typed to a data type but can also be marked-up with an RDFtype. Terminology harmonization relies on 4 resources, NCBI Taxonomy for species descriptions, UBERON for anatomical terms, OBI for experimental methods, and EDAM for data types and file format. Regular expression can be specified by the data model for input validation, and last but not least, the declaration of missing information can be made explicit and specific. The CFDE CrossCut Metadata Model (C2M2) relies on Frictionless specifications to define the objects and their relations (Figure 2).")]),e._v(" "),o("p",[o("img",{attrs:{src:a(512),alt:"Figure 2"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 2")]),e._v(" shows the latest version of the NIH CFDE data models where the central objects to enable data discovery are identified. Namely, study, biomaterial, biosample, file, each coming with a tight, essential set of attributes some of which associated to controlled vocabularies. url: "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/c2m2/draft-C2M2_specification/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://docs.nih-cfde.org/en/latest/c2m2/draft-C2M2_specification/"),o("OutboundLink")],1),o("br")])]),e._v(" "),o("p",[e._v("Researchers can submit their metadata to the portal via the "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/cfde-submit/docs/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datapackage Submission System"),o("OutboundLink")],1),e._v("(Figure 3). By incorporating Frictionless specifications to produce a common metadata model and applying a thin layer of semantic harmonization on core biological objects, we are closer to the goal of making available an aggregated data index that increases visibility, reusability and clarity of access to a wealth of experimental data. The NIH CFDE data portal currently indexes over 2 million data files, mainly from RNA-Seq and imaging experiments from 9 major NIH programs: a treasure trove for data miners.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(513),alt:"Figure 3"}}),o("br")]),e._v(" "),o("blockquote",[o("p",[o("strong",[e._v("Figure 3")]),e._v(" shows the architecture of the software components supporting the overall operation, from ETL from the individual DCC into the NIH CFDE data model to the validation and upload component."),o("br"),e._v("\nurl: "),o("a",{attrs:{href:"https://docs.nih-cfde.org/en/latest/cfde-submit/docs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://docs.nih-cfde.org/en/latest/cfde-submit/docs/"),o("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/31.5a70e269.js b/assets/js/31.1461b66d.js similarity index 98% rename from assets/js/31.5a70e269.js rename to assets/js/31.1461b66d.js index c33eebc9a..4451dfa3a 100644 --- a/assets/js/31.5a70e269.js +++ b/assets/js/31.1461b66d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[31],{400:function(e,t,a){e.exports=a.p+"assets/img/opsd-1.1c092f24.png"},401:function(e,t){e.exports=""},562:function(e,t,a){"use strict";a.r(t);var o=a(29),s=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"http://open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Power System Data"),o("OutboundLink")],1),e._v(" aims at providing a "),o("strong",[e._v("free-of-charge")]),e._v(" and "),o("strong",[e._v("open")]),e._v(" platform"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" that provides the data needed for power system analysis and modeling.")]),e._v(" "),o("p",[e._v("All of our project members are energy researchers. We struggled collecting this kind of data in what is typically a very burdensome and lengthy process. In doing my PhD, I spent the first year collecting data and realized that "),o("em",[e._v("not only had many others done that before, but that many others coming later would have to do it again")]),e._v(". This is arguably a huge waste of time and resources, so we thought we (Open Power System Data) should align ourselves and join forces to do this properly, once and for all, and in a free and open manner to be used by everyone. We are funded for two years by the German government. After starting work in 2015, we have about one more year to go.")]),e._v(" "),o("p",[e._v("On one hand, people who are interested in European power systems are lucky because a lot of data needed for that research is available. If you work on, say, Chinese systems, and you are not employed at the Chinese power company, you probably won’t find anything. On the other hand, if you search long enough (and you know where to look), you can find stuff online (and usually free of charge) on European power systems—not everything you want, but a"),o("br"),e._v("\nbig chunk, so in that respect, we are all lucky. However, this data is quite problematic for many reasons.")]),e._v(" "),o("p",[o("a",{attrs:{href:"http://data.open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(400),alt:"Available Data"}}),o("OutboundLink")],1)]),e._v(" "),o("p",[o("em",[e._v("Data availability overview on the platform")])]),e._v(" "),o("p",[e._v("Some of the problems we face in working with data include:")]),e._v(" "),o("ul",[o("li",[e._v("varied data sources and formats")]),e._v(" "),o("li",[e._v("licensing issues")]),e._v(" "),o("li",[e._v("‘dirty’ data")])]),e._v(" "),o("h3",{attrs:{id:"inconsistent-data-sources-and-formats"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#inconsistent-data-sources-and-formats"}},[e._v("#")]),e._v(" Inconsistent Data Sources and formats")]),e._v(" "),o("p",[e._v("First, it is scattered throughout the Internet and very hard to Google. For example, the Spanish government will only publish their data in the Spanish language, while the German government will publish only in German, so you need to speak 20 languages if we are talking about Europe. Second, it is often of low quality. For instance, we work with a lot with time series data—that is, hourly data for electricity generation and consumption. Twice a year, during the shift between summer and winter, there is sort of an “extra” or “missing” hour to account for daylight savings time. Every single data source has a different approach for how to handle that. While some datasets just ignore it, some double the hours, while others call the third hour something like “3a” and “3b”. To align these data sources, you have to handle all these different approaches. In addition, some data providers, for example, provide data in one format for the years 2010 and 2011, and then for 2012 and 2013 in a different format, and 2014 and 2015 in yet another format. A lot of that data comes in little chunks, so some datasets have one file for everything (which is great) but then others provide files split by the year, the month, or even the day. "),o("strong",[e._v("If you are not familiar with programming, you can’t write scripts to download that, and you have to manually download three years of daily data files: thousands of files")]),e._v(". Worse, these files come in different formats: some companies and agencies provide CSV files, others Excel files, and still others provide formats which are not very broadly used (e.g. XML and NetCDF).")]),e._v(" "),o("h3",{attrs:{id:"licensing-questions"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#licensing-questions"}},[e._v("#")]),e._v(" Licensing Questions")]),e._v(" "),o("p",[e._v("And maybe least known, but really tricky for us is the fact that all those data are subject to copyright. These data are open in the sense that they are on the Internet to be accessed freely, but they are not open in the legal sense; you are not allowed to use them or republish them or share them with others. If you look at the terms of use that you agree on to download, it will usually says that all those data are subject to copyright and you are not allowed to do anything with them, essentially.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(401),alt:"Available Data"}})]),e._v(" "),o("p",[e._v("This last fact is somewhat surprising. Mostly, the belief is that if something is free online then it’s “Open” but legally that, of course, doesn’t say anything; "),o("strong",[e._v("just because something is on YouTube and you can access that for free, that doesn’t mean you can copy, resample, and sell it to someone. And the same is true for data.")]),e._v(" So, in the project, we are trying to convince these owners and suppliers of data to change their terms of use, provide good licenses, publish data under an open license, preferably, something like Creative Commons"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" or the ODbL"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(", or something else that people from the open world use. That’s a very burdensome process; we just talked to four German transmission system operators and it took us a full year of meetings and emails to convince them. They finally signed on to open licensing last month.")]),e._v(" "),o("h3",{attrs:{id:"dirty-data-aka-the-devil-in-the-details"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#dirty-data-aka-the-devil-in-the-details"}},[e._v("#")]),e._v(" ‘Dirty’ data aka the devil in the details")]),e._v(" "),o("p",[e._v("Some of the most annoying problems are not the major problems, but all these surprising minor problems. As I mentioned earlier, I work a lot with time series data and there are so many weird mistakes, errors, or random facts in the data. For example, we have one source where every day, the 24th hour of the day is "),o("em",[e._v("simply missing")]),e._v(" so the days only have 23 hours. Another weird phenomenon is that another data source, a huge data source that publishes a lot, only starts the year aligned on weeks, so if the first Monday falls on January 4th, they might miss the first four days of the year. If you want to model energy consumption for a year, you can’t use the data at all because the first four days are missing. So, nitty-gritty nasty stuff like this that makes work really burdensome if you look at this scale of numbers of information: you have to find these errors while looking at hundreds of thousands of data entry points. There’s of course, nothing you can easily do manually.")]),e._v(" "),o("p",[e._v("Our target users are researchers, economists, or engineers interested in energy; they are mostly familiar with Excel, or some statistical software like R, SPSS, or STATA but they are not programmers or data scientists. As a result, they are not experts in data handling and not trained in detecting errors, missing data, and correct interpolation. If you know where to look to find gaps in your data, this is quickly done. However, if you are doing this kind of data wrangling for the first time (and you don’t really want to do it, but rather you want to learn something about solar power in Switzerland) then this is, of course, a long detour for a lot of our users.")]),e._v(" "),o("p",[e._v("We collect time series data for renewable and thermal power plants, each of which we compile into a dataset that follows the specification for a Tabular Data Package"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", consisting of a "),o("code",[e._v("datapackage.json")]),e._v(" file for metadata and a CSV file containing the actual data. On top of this we include the same data in Excel format and also some differently structured CSV files to suit the needs of different user types. We also implemented a framework that parses the content of the "),o("code",[e._v("datapackage.json")]),e._v(" and renders it into a more human-readable form for our website.")]),e._v(" "),o("p",[e._v("Where the data in each column is homogeneous in terms of the original source, as is the case with time series data, the "),o("code",[e._v("datapackage.json")]),e._v(" file is used to document the sources per column.")]),e._v(" "),o("p",[e._v("We started this project only knowing what we wanted to do in vague terms, but very little understanding of how to go about it, so we weren’t clear at all about how to publish this data. The first idea that we had was to build a database without any of us knowing what a database actually was.")]),e._v(" "),o("p",[o("strong",[e._v("Step-by-step, we realized we would like to offer a full “package” of all data that users can download in one click and have everything they need on their hard drive.")]),e._v(" Sort of a full model input package of everything a researcher would like with the option to just delete (or simply ignore) the data that is not useful.")]),e._v(" "),o("p",[e._v("We had a first workshop"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(" with potential users, and I think one of us, maybe it was Ingmar, Googled you and found out about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(". That it perfectly fit our needs was pretty evident within a few minutes, and we decided to go along with this.")]),e._v(" "),o("p",[e._v("A lot of our clients are practitioners that use Microsoft Excel as a standard tool. If I look at a data source, and I open a well structured Excel sheet with colors and (visually) well structured tables, it makes it a lot easier for me to get a first glimpse of the data and an insight as to what’s in there, what’s not in there, its quality, how well it is documented, and so on. So the one difficulty I see from a user perspective with the Data Package specification (at least, in the way we use it) is that CSV and JSON files take more than one click in a browser to get a human-readable, easily understandable, picture of the data.")]),e._v(" "),o("p",[e._v("The stuff that is convenient for humans to structure content—colors, headlines, bolding, the right number of decimals, different types of data sorted by blocks, with visual spaces in between; this stuff makes a table aesthetically convenient to read, but is totally unnecessary for being machine-readable. The number one priority for us is to have the data in a format that’s machine-readable and my view is that Frictionless Data/Data Packages are perfect for this. But from the "),o("em",[e._v("have-a-first-glimpse-at-the-data-as-a-human perspective")]),e._v(", having a nice colored Excel table, from my personal point of view, is still preferable. We have decided in the end just to provide both. We publish everything as a Data Package and on top of that we also publish the data in an Excel file for those who prefer it. On top of that we publish everything in an SQLite database for our clients and users who would like it in an SQL database.")]),e._v(" "),o("p",[e._v("We also think there is potential to expand on the "),o("a",{attrs:{href:"http://data.okfn.org/tools/view",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Viewer"),o("OutboundLink")],1),e._v(" tool provided by Open Knowledge International. In its current state, we cannot really use it, because it hangs on the big datasets we’re working with. So mainly, I would imagine that for large datasets, the Data Package Viewer should not try to show and visualize all data but just, for example, show a summary. Furthermore, it would be nice if it also offered possibilities to filter the datasets for downloading of subsets. The filter criteria could be specified as part of the "),o("code",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[e._v("The old data package viewer, referenced above, is now deprecated. The new data package viewer, available on "),o("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("create.frictionlessdata.io"),o("OutboundLink")],1),e._v(", addresses the issues raised above.")]),e._v(" "),o("p",[e._v("Generally I think such an online Data Package viewer could be made more and more feature-rich as you go. It could, for example, also offer possibilities to download the data in alternative formats such as Excel or SQLite, which would be generated by the Data Package viewer automatically on the server-side (of course, the data would then need to be cached on the server side).")]),e._v(" "),o("p",[e._v("Advantages I see from those things are:")]),e._v(" "),o("ul",[o("li",[e._v("Ease of use for data providers: Just provide the CSV with a proper description of all fields in the "),o("code",[e._v("datapackage.json")]),e._v(", and everything else is taken care of by the online Data Package viewer.")]),e._v(" "),o("li",[e._v("Ease of use for data consumers: They get what they want (filtered) in the format they prefer.")]),e._v(" "),o("li",[e._v("Implicitly that would also do a proper validation of the"),o("code",[e._v("datapackage.json")]),e._v(": Because if you have an error there, then things will also be messed up in the automatically generated files. So that also ensures good "),o("code",[e._v("datapackage.json")]),e._v(" metadata quality in general which is important for all sorts of things you can do with Data Packages.")])]),e._v(" "),o("p",[e._v("Regarding the data processing workflow we created, I would refer you to our processingscripts"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" on GitHub. I talked a lot about time series data – this should give you an "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data/time_series/blob/master/main.ipynb",target:"_blank",rel:"noopener noreferrer"}},[e._v("overview"),o("OutboundLink")],1),e._v("; here are the "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data/time_series/blob/master/processing.ipynb",target:"_blank",rel:"noopener noreferrer"}},[e._v("processing details"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In the coming days, we are going to extend the geographic scope and other various details—user friendliness, interpolation, data quality issues—so no big changes, just further work in the same direction.")]),e._v(" "),o("hr",{staticClass:"footnotes-sep"}),e._v(" "),o("section",{staticClass:"footnotes"},[o("ol",{staticClass:"footnotes-list"},[o("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[o("p",[e._v("Data Platform: "),o("a",{attrs:{href:"http://data.open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://data.open-power-system-data.org/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[o("p",[o("a",{attrs:{href:"https://creativecommons.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://creativecommons.org/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[o("p",[o("a",{attrs:{href:"http://opendatacommons.org/licenses/odbl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://opendatacommons.org/licenses/odbl/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[o("p",[e._v("Tabular Data Package specifications: "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/tabular-data-package/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[o("p",[e._v("First Workshop of Open Power System Data: "),o("a",{attrs:{href:"http://open-power-system-data.org/workshop-1/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://open-power-system-data.org/workshop-1/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[o("p",[e._v("GitHub repository: "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/Open-Power-System-Data"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[31],{401:function(e,t,a){e.exports=a.p+"assets/img/opsd-1.1c092f24.png"},402:function(e,t){e.exports=""},563:function(e,t,a){"use strict";a.r(t);var o=a(29),s=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"http://open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Power System Data"),o("OutboundLink")],1),e._v(" aims at providing a "),o("strong",[e._v("free-of-charge")]),e._v(" and "),o("strong",[e._v("open")]),e._v(" platform"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" that provides the data needed for power system analysis and modeling.")]),e._v(" "),o("p",[e._v("All of our project members are energy researchers. We struggled collecting this kind of data in what is typically a very burdensome and lengthy process. In doing my PhD, I spent the first year collecting data and realized that "),o("em",[e._v("not only had many others done that before, but that many others coming later would have to do it again")]),e._v(". This is arguably a huge waste of time and resources, so we thought we (Open Power System Data) should align ourselves and join forces to do this properly, once and for all, and in a free and open manner to be used by everyone. We are funded for two years by the German government. After starting work in 2015, we have about one more year to go.")]),e._v(" "),o("p",[e._v("On one hand, people who are interested in European power systems are lucky because a lot of data needed for that research is available. If you work on, say, Chinese systems, and you are not employed at the Chinese power company, you probably won’t find anything. On the other hand, if you search long enough (and you know where to look), you can find stuff online (and usually free of charge) on European power systems—not everything you want, but a"),o("br"),e._v("\nbig chunk, so in that respect, we are all lucky. However, this data is quite problematic for many reasons.")]),e._v(" "),o("p",[o("a",{attrs:{href:"http://data.open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:a(401),alt:"Available Data"}}),o("OutboundLink")],1)]),e._v(" "),o("p",[o("em",[e._v("Data availability overview on the platform")])]),e._v(" "),o("p",[e._v("Some of the problems we face in working with data include:")]),e._v(" "),o("ul",[o("li",[e._v("varied data sources and formats")]),e._v(" "),o("li",[e._v("licensing issues")]),e._v(" "),o("li",[e._v("‘dirty’ data")])]),e._v(" "),o("h3",{attrs:{id:"inconsistent-data-sources-and-formats"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#inconsistent-data-sources-and-formats"}},[e._v("#")]),e._v(" Inconsistent Data Sources and formats")]),e._v(" "),o("p",[e._v("First, it is scattered throughout the Internet and very hard to Google. For example, the Spanish government will only publish their data in the Spanish language, while the German government will publish only in German, so you need to speak 20 languages if we are talking about Europe. Second, it is often of low quality. For instance, we work with a lot with time series data—that is, hourly data for electricity generation and consumption. Twice a year, during the shift between summer and winter, there is sort of an “extra” or “missing” hour to account for daylight savings time. Every single data source has a different approach for how to handle that. While some datasets just ignore it, some double the hours, while others call the third hour something like “3a” and “3b”. To align these data sources, you have to handle all these different approaches. In addition, some data providers, for example, provide data in one format for the years 2010 and 2011, and then for 2012 and 2013 in a different format, and 2014 and 2015 in yet another format. A lot of that data comes in little chunks, so some datasets have one file for everything (which is great) but then others provide files split by the year, the month, or even the day. "),o("strong",[e._v("If you are not familiar with programming, you can’t write scripts to download that, and you have to manually download three years of daily data files: thousands of files")]),e._v(". Worse, these files come in different formats: some companies and agencies provide CSV files, others Excel files, and still others provide formats which are not very broadly used (e.g. XML and NetCDF).")]),e._v(" "),o("h3",{attrs:{id:"licensing-questions"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#licensing-questions"}},[e._v("#")]),e._v(" Licensing Questions")]),e._v(" "),o("p",[e._v("And maybe least known, but really tricky for us is the fact that all those data are subject to copyright. These data are open in the sense that they are on the Internet to be accessed freely, but they are not open in the legal sense; you are not allowed to use them or republish them or share them with others. If you look at the terms of use that you agree on to download, it will usually says that all those data are subject to copyright and you are not allowed to do anything with them, essentially.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(402),alt:"Available Data"}})]),e._v(" "),o("p",[e._v("This last fact is somewhat surprising. Mostly, the belief is that if something is free online then it’s “Open” but legally that, of course, doesn’t say anything; "),o("strong",[e._v("just because something is on YouTube and you can access that for free, that doesn’t mean you can copy, resample, and sell it to someone. And the same is true for data.")]),e._v(" So, in the project, we are trying to convince these owners and suppliers of data to change their terms of use, provide good licenses, publish data under an open license, preferably, something like Creative Commons"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" or the ODbL"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(", or something else that people from the open world use. That’s a very burdensome process; we just talked to four German transmission system operators and it took us a full year of meetings and emails to convince them. They finally signed on to open licensing last month.")]),e._v(" "),o("h3",{attrs:{id:"dirty-data-aka-the-devil-in-the-details"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#dirty-data-aka-the-devil-in-the-details"}},[e._v("#")]),e._v(" ‘Dirty’ data aka the devil in the details")]),e._v(" "),o("p",[e._v("Some of the most annoying problems are not the major problems, but all these surprising minor problems. As I mentioned earlier, I work a lot with time series data and there are so many weird mistakes, errors, or random facts in the data. For example, we have one source where every day, the 24th hour of the day is "),o("em",[e._v("simply missing")]),e._v(" so the days only have 23 hours. Another weird phenomenon is that another data source, a huge data source that publishes a lot, only starts the year aligned on weeks, so if the first Monday falls on January 4th, they might miss the first four days of the year. If you want to model energy consumption for a year, you can’t use the data at all because the first four days are missing. So, nitty-gritty nasty stuff like this that makes work really burdensome if you look at this scale of numbers of information: you have to find these errors while looking at hundreds of thousands of data entry points. There’s of course, nothing you can easily do manually.")]),e._v(" "),o("p",[e._v("Our target users are researchers, economists, or engineers interested in energy; they are mostly familiar with Excel, or some statistical software like R, SPSS, or STATA but they are not programmers or data scientists. As a result, they are not experts in data handling and not trained in detecting errors, missing data, and correct interpolation. If you know where to look to find gaps in your data, this is quickly done. However, if you are doing this kind of data wrangling for the first time (and you don’t really want to do it, but rather you want to learn something about solar power in Switzerland) then this is, of course, a long detour for a lot of our users.")]),e._v(" "),o("p",[e._v("We collect time series data for renewable and thermal power plants, each of which we compile into a dataset that follows the specification for a Tabular Data Package"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", consisting of a "),o("code",[e._v("datapackage.json")]),e._v(" file for metadata and a CSV file containing the actual data. On top of this we include the same data in Excel format and also some differently structured CSV files to suit the needs of different user types. We also implemented a framework that parses the content of the "),o("code",[e._v("datapackage.json")]),e._v(" and renders it into a more human-readable form for our website.")]),e._v(" "),o("p",[e._v("Where the data in each column is homogeneous in terms of the original source, as is the case with time series data, the "),o("code",[e._v("datapackage.json")]),e._v(" file is used to document the sources per column.")]),e._v(" "),o("p",[e._v("We started this project only knowing what we wanted to do in vague terms, but very little understanding of how to go about it, so we weren’t clear at all about how to publish this data. The first idea that we had was to build a database without any of us knowing what a database actually was.")]),e._v(" "),o("p",[o("strong",[e._v("Step-by-step, we realized we would like to offer a full “package” of all data that users can download in one click and have everything they need on their hard drive.")]),e._v(" Sort of a full model input package of everything a researcher would like with the option to just delete (or simply ignore) the data that is not useful.")]),e._v(" "),o("p",[e._v("We had a first workshop"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(" with potential users, and I think one of us, maybe it was Ingmar, Googled you and found out about the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(". That it perfectly fit our needs was pretty evident within a few minutes, and we decided to go along with this.")]),e._v(" "),o("p",[e._v("A lot of our clients are practitioners that use Microsoft Excel as a standard tool. If I look at a data source, and I open a well structured Excel sheet with colors and (visually) well structured tables, it makes it a lot easier for me to get a first glimpse of the data and an insight as to what’s in there, what’s not in there, its quality, how well it is documented, and so on. So the one difficulty I see from a user perspective with the Data Package specification (at least, in the way we use it) is that CSV and JSON files take more than one click in a browser to get a human-readable, easily understandable, picture of the data.")]),e._v(" "),o("p",[e._v("The stuff that is convenient for humans to structure content—colors, headlines, bolding, the right number of decimals, different types of data sorted by blocks, with visual spaces in between; this stuff makes a table aesthetically convenient to read, but is totally unnecessary for being machine-readable. The number one priority for us is to have the data in a format that’s machine-readable and my view is that Frictionless Data/Data Packages are perfect for this. But from the "),o("em",[e._v("have-a-first-glimpse-at-the-data-as-a-human perspective")]),e._v(", having a nice colored Excel table, from my personal point of view, is still preferable. We have decided in the end just to provide both. We publish everything as a Data Package and on top of that we also publish the data in an Excel file for those who prefer it. On top of that we publish everything in an SQLite database for our clients and users who would like it in an SQL database.")]),e._v(" "),o("p",[e._v("We also think there is potential to expand on the "),o("a",{attrs:{href:"http://data.okfn.org/tools/view",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Viewer"),o("OutboundLink")],1),e._v(" tool provided by Open Knowledge International. In its current state, we cannot really use it, because it hangs on the big datasets we’re working with. So mainly, I would imagine that for large datasets, the Data Package Viewer should not try to show and visualize all data but just, for example, show a summary. Furthermore, it would be nice if it also offered possibilities to filter the datasets for downloading of subsets. The filter criteria could be specified as part of the "),o("code",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[e._v("The old data package viewer, referenced above, is now deprecated. The new data package viewer, available on "),o("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("create.frictionlessdata.io"),o("OutboundLink")],1),e._v(", addresses the issues raised above.")]),e._v(" "),o("p",[e._v("Generally I think such an online Data Package viewer could be made more and more feature-rich as you go. It could, for example, also offer possibilities to download the data in alternative formats such as Excel or SQLite, which would be generated by the Data Package viewer automatically on the server-side (of course, the data would then need to be cached on the server side).")]),e._v(" "),o("p",[e._v("Advantages I see from those things are:")]),e._v(" "),o("ul",[o("li",[e._v("Ease of use for data providers: Just provide the CSV with a proper description of all fields in the "),o("code",[e._v("datapackage.json")]),e._v(", and everything else is taken care of by the online Data Package viewer.")]),e._v(" "),o("li",[e._v("Ease of use for data consumers: They get what they want (filtered) in the format they prefer.")]),e._v(" "),o("li",[e._v("Implicitly that would also do a proper validation of the"),o("code",[e._v("datapackage.json")]),e._v(": Because if you have an error there, then things will also be messed up in the automatically generated files. So that also ensures good "),o("code",[e._v("datapackage.json")]),e._v(" metadata quality in general which is important for all sorts of things you can do with Data Packages.")])]),e._v(" "),o("p",[e._v("Regarding the data processing workflow we created, I would refer you to our processingscripts"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" on GitHub. I talked a lot about time series data – this should give you an "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data/time_series/blob/master/main.ipynb",target:"_blank",rel:"noopener noreferrer"}},[e._v("overview"),o("OutboundLink")],1),e._v("; here are the "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data/time_series/blob/master/processing.ipynb",target:"_blank",rel:"noopener noreferrer"}},[e._v("processing details"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("In the coming days, we are going to extend the geographic scope and other various details—user friendliness, interpolation, data quality issues—so no big changes, just further work in the same direction.")]),e._v(" "),o("hr",{staticClass:"footnotes-sep"}),e._v(" "),o("section",{staticClass:"footnotes"},[o("ol",{staticClass:"footnotes-list"},[o("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[o("p",[e._v("Data Platform: "),o("a",{attrs:{href:"http://data.open-power-system-data.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://data.open-power-system-data.org/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[o("p",[o("a",{attrs:{href:"https://creativecommons.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://creativecommons.org/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[o("p",[o("a",{attrs:{href:"http://opendatacommons.org/licenses/odbl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://opendatacommons.org/licenses/odbl/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[o("p",[e._v("Tabular Data Package specifications: "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/tabular-data-package/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[o("p",[e._v("First Workshop of Open Power System Data: "),o("a",{attrs:{href:"http://open-power-system-data.org/workshop-1/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://open-power-system-data.org/workshop-1/"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[o("p",[e._v("GitHub repository: "),o("a",{attrs:{href:"https://github.com/Open-Power-System-Data",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/Open-Power-System-Data"),o("OutboundLink")],1),e._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/34.979bfc4d.js b/assets/js/34.7fe9cce9.js similarity index 96% rename from assets/js/34.979bfc4d.js rename to assets/js/34.7fe9cce9.js index 3b6a4ac68..79b3d7a89 100644 --- a/assets/js/34.979bfc4d.js +++ b/assets/js/34.7fe9cce9.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[34],{442:function(t,a,e){t.exports=e.p+"assets/img/goodtablesio-screenshot.132bcda0.png"},443:function(t,a,e){t.exports=e.p+"assets/img/ckan-validation.c22eb702.png"},588:function(t,a,e){"use strict";e.r(a);var o=e(29),r=Object(o.a)({},(function(){var t=this,a=t.$createElement,o=t._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("One-off validation of your tabular datasets can be hectic, especially where plenty of published data is maintained and updated fairly regularly.")]),t._v(" "),o("p",[t._v("Running continuous checks on data provides regular feedback and contributes to better data quality as errors can be flagged and fixed early on. This section introduces you to tools that continually check your data for errors and flag content and structural issues as they arise. By eliminating the need to run manual checks on tabular datasets every time they are updated, they make your data workflow more efficient.")]),t._v(" "),o("p",[t._v("In this section, you will learn how to setup automatic tabular data validation using goodtables, so your data is validated every time it’s updated. Although not strictly necessary, it’s useful to "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[t._v("know about Data Packages and Table Schema")]),t._v(" before proceeding, as they allow you to describe your data in more detail, allowing more advanced validations.")],1),t._v(" "),o("p",[t._v("We will show how to set up automated tabular data validations for data published on:")]),t._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://ckan.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("CKAN"),o("OutboundLink")],1),t._v(", an open source data publishing platform;")]),t._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub"),o("OutboundLink")],1),t._v(", a hosting service;")]),t._v(" "),o("li",[o("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Amazon S3"),o("OutboundLink")],1),t._v(", a data storage service.")])]),t._v(" "),o("p",[t._v("If you don’t use any of these platforms, you can still setup the validation using "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables-py"),o("OutboundLink")],1),t._v(", it will just require some technical knowledge")]),t._v(" "),o("p",[t._v("If you do use some of these platforms, the data validation report look like:")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://goodtables.io/github/vitorbaptista/birmingham_schools/jobs/3",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:e(442),alt:"Figure 1: Goodtables.io tabular data validation report"}}),o("OutboundLink")],1),o("br"),t._v(" "),o("em",[t._v("Figure 1: "),o("a",{attrs:{href:"http://Goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Goodtables.io"),o("OutboundLink")],1),t._v(" tabular data validation report.")])]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-ckan"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-ckan"}},[t._v("#")]),t._v(" Validate tabular data automatically on CKAN")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("CKAN"),o("OutboundLink")],1),t._v(" is an open source platform for publishing data online. It is widely used across the planet, including by the federal governments of the USA, United Kingdom, Brazil, and others.")]),t._v(" "),o("p",[t._v("To automatically validate tabular data on CKAN, enable the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension, which uses goodtables to run continuous checks on your data. The "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension:")]),t._v(" "),o("ul",[o("li",[t._v("Adds a badge next to each dataset showing the status of their validation (valid or invalid), and")]),t._v(" "),o("li",[t._v("Allows users to access the validation report, making it possible for errors to be identified and fixed.")])]),t._v(" "),o("p",[o("img",{attrs:{src:e(443),alt:"Figure 2: Annotated in red, automated validation checks on datasets in CKAN"}}),o("br"),t._v(" "),o("em",[t._v("Figure 2: Annotated in red, automated validation checks on datasets in CKAN.")])]),t._v(" "),o("p",[t._v("The installation and usage instructions for "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension are available on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("Github"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-github"}},[t._v("#")]),t._v(" Validate tabular data automatically on GitHub")]),t._v(" "),o("p",[t._v("If your data is hosted on GitHub, you can use goodtables web service to automatically validate it on every change.")]),t._v(" "),o("p",[t._v("For this section, you will first need to create a "),o("a",{attrs:{href:"https://help.github.com/articles/create-a-repo/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub repository"),o("OutboundLink")],1),t._v(" and add tabular data to it.")]),t._v(" "),o("p",[t._v("Once you have tabular data in your Github repository:")]),t._v(" "),o("ol",[o("li",[t._v("Login on "),o("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(" using your GitHub account and accept the permissions confirmation.")]),t._v(" "),o("li",[t._v("Once we’ve synchronized your repository list, go to the "),o("a",{attrs:{href:"https://goodtables.io/settings",target:"_blank",rel:"noopener noreferrer"}},[t._v("Manage Sources"),o("OutboundLink")],1),t._v(" page and enable the repository with the data you want to validate.\n"),o("ul",[o("li",[t._v("If you can’t find the repository, try clicking on the Refresh button on the Manage Sources page")])])])]),t._v(" "),o("p",[t._v("Goodtables will then validate all tabular data files (CSV, XLS, XLSX, ODS) and "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("data packages"),o("OutboundLink")],1),t._v(" in the repository. These validations will be executed on every change, including pull requests.")]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-amazon-s3"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-amazon-s3"}},[t._v("#")]),t._v(" Validate tabular data automatically on Amazon S3")]),t._v(" "),o("p",[t._v("If your data is hosted on Amazon S3, you can use "),o("a",{attrs:{href:"https://goodtables.io/",title:"Goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(" to automatically validate it on every change.")]),t._v(" "),o("p",[t._v("It is a technical process to set up, as you need to know how to configure your Amazon S3 bucket. However, once it’s configured, the validations happen automatically on any tabular data created or updated. Find the detailed instructions "),o("a",{attrs:{href:"https://docs.goodtables.io/getting_started/s3.html",title:"Goodtables.io Amazon S3 instructions",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"custom-setup-of-automatic-tabular-data-validation"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#custom-setup-of-automatic-tabular-data-validation"}},[t._v("#")]),t._v(" Custom setup of automatic tabular data validation")]),t._v(" "),o("p",[t._v("If you don’t use any of the officially supported data publishing platforms, you can use "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables-py"),o("OutboundLink")],1),t._v(" directly to validate your data. This is the most flexible option, as you can configure exactly when, and how your tabular data is validated. For example, if your data come from an external source, you could validate it once before you process it (so you catch errors in the source data), and once after cleaning, just before you publish it, so you catch errors introduced by your data processing.")]),t._v(" "),o("p",[t._v("The instructions on how to do this are technical, and can be found on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/goodtables-py"),o("OutboundLink")],1),t._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[34],{450:function(t,a,e){t.exports=e.p+"assets/img/goodtablesio-screenshot.132bcda0.png"},451:function(t,a,e){t.exports=e.p+"assets/img/ckan-validation.c22eb702.png"},590:function(t,a,e){"use strict";e.r(a);var o=e(29),r=Object(o.a)({},(function(){var t=this,a=t.$createElement,o=t._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("One-off validation of your tabular datasets can be hectic, especially where plenty of published data is maintained and updated fairly regularly.")]),t._v(" "),o("p",[t._v("Running continuous checks on data provides regular feedback and contributes to better data quality as errors can be flagged and fixed early on. This section introduces you to tools that continually check your data for errors and flag content and structural issues as they arise. By eliminating the need to run manual checks on tabular datasets every time they are updated, they make your data workflow more efficient.")]),t._v(" "),o("p",[t._v("In this section, you will learn how to setup automatic tabular data validation using goodtables, so your data is validated every time it’s updated. Although not strictly necessary, it’s useful to "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[t._v("know about Data Packages and Table Schema")]),t._v(" before proceeding, as they allow you to describe your data in more detail, allowing more advanced validations.")],1),t._v(" "),o("p",[t._v("We will show how to set up automated tabular data validations for data published on:")]),t._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://ckan.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("CKAN"),o("OutboundLink")],1),t._v(", an open source data publishing platform;")]),t._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub"),o("OutboundLink")],1),t._v(", a hosting service;")]),t._v(" "),o("li",[o("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Amazon S3"),o("OutboundLink")],1),t._v(", a data storage service.")])]),t._v(" "),o("p",[t._v("If you don’t use any of these platforms, you can still setup the validation using "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables-py"),o("OutboundLink")],1),t._v(", it will just require some technical knowledge")]),t._v(" "),o("p",[t._v("If you do use some of these platforms, the data validation report look like:")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://goodtables.io/github/vitorbaptista/birmingham_schools/jobs/3",target:"_blank",rel:"noopener noreferrer"}},[o("img",{attrs:{src:e(450),alt:"Figure 1: Goodtables.io tabular data validation report"}}),o("OutboundLink")],1),o("br"),t._v(" "),o("em",[t._v("Figure 1: "),o("a",{attrs:{href:"http://Goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Goodtables.io"),o("OutboundLink")],1),t._v(" tabular data validation report.")])]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-ckan"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-ckan"}},[t._v("#")]),t._v(" Validate tabular data automatically on CKAN")]),t._v(" "),o("p",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("CKAN"),o("OutboundLink")],1),t._v(" is an open source platform for publishing data online. It is widely used across the planet, including by the federal governments of the USA, United Kingdom, Brazil, and others.")]),t._v(" "),o("p",[t._v("To automatically validate tabular data on CKAN, enable the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension, which uses goodtables to run continuous checks on your data. The "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension:")]),t._v(" "),o("ul",[o("li",[t._v("Adds a badge next to each dataset showing the status of their validation (valid or invalid), and")]),t._v(" "),o("li",[t._v("Allows users to access the validation report, making it possible for errors to be identified and fixed.")])]),t._v(" "),o("p",[o("img",{attrs:{src:e(451),alt:"Figure 2: Annotated in red, automated validation checks on datasets in CKAN"}}),o("br"),t._v(" "),o("em",[t._v("Figure 2: Annotated in red, automated validation checks on datasets in CKAN.")])]),t._v(" "),o("p",[t._v("The installation and usage instructions for "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("ckanext-validation"),o("OutboundLink")],1),t._v(" extension are available on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[t._v("Github"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-github"}},[t._v("#")]),t._v(" Validate tabular data automatically on GitHub")]),t._v(" "),o("p",[t._v("If your data is hosted on GitHub, you can use goodtables web service to automatically validate it on every change.")]),t._v(" "),o("p",[t._v("For this section, you will first need to create a "),o("a",{attrs:{href:"https://help.github.com/articles/create-a-repo/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub repository"),o("OutboundLink")],1),t._v(" and add tabular data to it.")]),t._v(" "),o("p",[t._v("Once you have tabular data in your Github repository:")]),t._v(" "),o("ol",[o("li",[t._v("Login on "),o("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(" using your GitHub account and accept the permissions confirmation.")]),t._v(" "),o("li",[t._v("Once we’ve synchronized your repository list, go to the "),o("a",{attrs:{href:"https://goodtables.io/settings",target:"_blank",rel:"noopener noreferrer"}},[t._v("Manage Sources"),o("OutboundLink")],1),t._v(" page and enable the repository with the data you want to validate.\n"),o("ul",[o("li",[t._v("If you can’t find the repository, try clicking on the Refresh button on the Manage Sources page")])])])]),t._v(" "),o("p",[t._v("Goodtables will then validate all tabular data files (CSV, XLS, XLSX, ODS) and "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("data packages"),o("OutboundLink")],1),t._v(" in the repository. These validations will be executed on every change, including pull requests.")]),t._v(" "),o("h2",{attrs:{id:"validate-tabular-data-automatically-on-amazon-s3"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-amazon-s3"}},[t._v("#")]),t._v(" Validate tabular data automatically on Amazon S3")]),t._v(" "),o("p",[t._v("If your data is hosted on Amazon S3, you can use "),o("a",{attrs:{href:"https://goodtables.io/",title:"Goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),o("OutboundLink")],1),t._v(" to automatically validate it on every change.")]),t._v(" "),o("p",[t._v("It is a technical process to set up, as you need to know how to configure your Amazon S3 bucket. However, once it’s configured, the validations happen automatically on any tabular data created or updated. Find the detailed instructions "),o("a",{attrs:{href:"https://docs.goodtables.io/getting_started/s3.html",title:"Goodtables.io Amazon S3 instructions",target:"_blank",rel:"noopener noreferrer"}},[t._v("here"),o("OutboundLink")],1),t._v(".")]),t._v(" "),o("h2",{attrs:{id:"custom-setup-of-automatic-tabular-data-validation"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#custom-setup-of-automatic-tabular-data-validation"}},[t._v("#")]),t._v(" Custom setup of automatic tabular data validation")]),t._v(" "),o("p",[t._v("If you don’t use any of the officially supported data publishing platforms, you can use "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables-py"),o("OutboundLink")],1),t._v(" directly to validate your data. This is the most flexible option, as you can configure exactly when, and how your tabular data is validated. For example, if your data come from an external source, you could validate it once before you process it (so you catch errors in the source data), and once after cleaning, just before you publish it, so you catch errors introduced by your data processing.")]),t._v(" "),o("p",[t._v("The instructions on how to do this are technical, and can be found on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/goodtables-py"),o("OutboundLink")],1),t._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/38.ab911e22.js b/assets/js/38.60d749aa.js similarity index 97% rename from assets/js/38.ab911e22.js rename to assets/js/38.60d749aa.js index de07fc464..a8d256e21 100644 --- a/assets/js/38.ab911e22.js +++ b/assets/js/38.60d749aa.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[38],{504:function(e,a,t){e.exports=t.p+"assets/img/schema-1.d455d7fb.png"},505:function(e,a,t){e.exports=t.p+"assets/img/schema-2.97aad0c8.png"},630:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("In June 2019, "),r("a",{attrs:{href:"https://etalab.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("Etalab"),r("OutboundLink")],1),e._v(", a department of the French interministerial digital service (DINUM), launched "),r("a",{attrs:{href:"https://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(", a platform listing schemas for France. It could be described as what Johan Richer recently called a "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/04/23/table-schema-catalog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema catalog"),r("OutboundLink")],1),e._v(". This project is an initiative of "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(", the French open data platform, which is developed and maintained by Etalab.")]),e._v(" "),r("p",[r("img",{attrs:{src:"/img/blog/schema.gouv.fr.png",alt:"schema.gouv.fr homepage"}})]),e._v(" "),r("h2",{attrs:{id:"what-s-a-schema"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-s-a-schema"}},[e._v("#")]),e._v(" What’s a schema?")]),e._v(" "),r("p",[e._v("A schema declares a data model in a clear and precise manner, the various fields and types in a structured and consistent manner, according to a specification. For example, "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),r("OutboundLink")],1),e._v(" is a simple language to declare a schema for tabular data.")]),e._v(" "),r("p",[e._v("Schemas are well suited for a wide range of applications: validating data against a schema, documenting a data model, consolidating data from multiple sources, generating example datasets, or proposing tailored input forms. This wide range of applications makes schemas an important tool for both producers and reusers.")]),e._v(" "),r("h2",{attrs:{id:"advancing-open-data-quality"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#advancing-open-data-quality"}},[e._v("#")]),e._v(" Advancing open data quality")]),e._v(" "),r("p",[e._v("A common complaint of open data reusers has been the lack of quality of the data and data structure changes over time, without notice. The OKFN spoke about this issue in mid-2017 in a blog post, "),r("a",{attrs:{href:"https://blog.okfn.org/2017/05/31/open-data-quality-the-next-shift-in-open-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open data quality – the next shift in open data?"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("With "),r("a",{attrs:{href:"schema.data.gouv.fr"}},[e._v("schema.data.gouv.fr")]),e._v(", Etalab promotes high-quality open data: producers are encouraged to discuss and come up with an appropriate schema for the data they want to publish, and to document it with a recognised specification. Producers will then be able to make sure that the data they publish conforms to the schema over time. Reusers benefit from high-quality documentation, a stable data structure, and increased quality of the data.")]),e._v(" "),r("h2",{attrs:{id:"impacts"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#impacts"}},[e._v("#")]),e._v(" Impacts")]),e._v(" "),r("p",[e._v("The first impact of the launch of "),r("a",{attrs:{href:"https://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" has "),r("strong",[e._v("put at the forefront the challenge of open data quality")]),e._v(". It acknowledges that this is not a solved problem and that producers should embrace schemas, validators, documentation, automated testing to raise the quality of the data they publish. It’s also a recognition of the efforts already made by the community, for example the “Socle commun des données locales” (Common Ground of Local Data) by "),r("a",{attrs:{href:"http://www.opendatafrance.net",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenDataFrance"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("To help producers discover schemas and how it can be helpful for them, we published in March 2020 a "),r("a",{attrs:{href:"https://guides.etalab.gouv.fr/producteurs-schemas/",target:"_blank",rel:"noopener noreferrer"}},[e._v("long guide"),r("OutboundLink")],1),e._v(" going over steps producers are encouraged to follow when creating a schema: discovery, discussions, implementation, publication and finally referencing the schema on "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Since the launch, producers worked with their reusers and published various schemas: "),r("a",{attrs:{href:"https://schema.data.gouv.fr/etalab/schema-lieux-covoiturage/latest.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("carpooling places"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://schema.data.gouv.fr/arsante/schema-dae/latest.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("defibrillators"),r("OutboundLink")],1),e._v(" to name a few. People had in-depth discussions about their data model, encouraged by the thoroughness of the Table Schema specification. Producers worked hard to clean their data and finally reached a point where their dataset is 100% aligned with the schema, without any errors.")]),e._v(" "),r("h2",{attrs:{id:"what-s-next"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next")]),e._v(" "),r("p",[e._v("Here are a few things we are working on and hope to be able to finish in the coming years.")]),e._v(" "),r("h3",{attrs:{id:"improved-data-models-defined-in-the-law"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#improved-data-models-defined-in-the-law"}},[e._v("#")]),e._v(" Improved data models defined in the law")]),e._v(" "),r("p",[e._v("Right now, when data models are introduced by law, the data model is often described by a table. We’d like to offer a schema when these laws are published, to ease adoption by the community and improve discoverability.")]),e._v(" "),r("h3",{attrs:{id:"integration-with-data-gouv-fr"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#integration-with-data-gouv-fr"}},[e._v("#")]),e._v(" Integration with "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("The "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" initiative is mainly based on published datasets on the French open data platform "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(". However, these tools are still quite separated today. In the coming months, we would like to strengthen the link between "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" and "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" by promoting existing schemas directly on the open data platform.")]),e._v(" "),r("p",[e._v("First, we would like to inform users of the existence of a consolidated dataset based on an existing schema and provide them with its quality report. Such a feature is newly available on "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(". The same feature will arrive soon on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[r("img",{attrs:{src:t(504),alt:"Screenshot à prévoir"}})]),e._v(" "),r("p",[e._v("Second, we’re looking into integrating schemas into the data publishing process on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(". We could help users by letting them know that a schema corresponding to their dataset already exists. We could suggest them what changes to make to get their data directly validated. We already started doing this with a simple implementation: we post comments on datasets which are supposed to follow a schema, letting producers know if the data is valid and if not, enabling them to access a report to troubleshoot.")]),e._v(" "),r("p",[e._v("Another possibility would be to offer a new service on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" such as the generation of data from an automatically generated form. This is the goal of the ongoing development of "),r("a",{attrs:{href:"https://csv-gg.etalab.studio/?schema=etalab%2Fschema-lieux-covoiturage",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV-GG"),r("OutboundLink")],1),e._v(" allowing to generate a form from an existing Table Schema. This could help users to directly produce validated data.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(505),alt:"screenshot à prévoir"}})]),e._v(" "),r("h3",{attrs:{id:"automation"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#automation"}},[e._v("#")]),e._v(" Automation")]),e._v(" "),r("p",[e._v("In the longer term, we also plan to automate data consolidation based on a schema as much as possible. For that, we need to better know and understand available resources on the platform. This could be done by systematically analyzing the content of a new resource and try to fetch metadata such as headers or type of data for each column.")]),e._v(" "),r("p",[e._v("These metadata could then be used to identify datasets with similar structures and link them to an existing schema or propose to create a new one if it does not already exist.")]),e._v(" "),r("p",[e._v("We could also take advantage of the tool "),r("a",{attrs:{href:"https://github.com/etalab/csvapi",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSVAPI"),r("OutboundLink")],1),e._v(" which is actually in use on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" to preview data of a specific dataset. CSVAPI could evolve to offer new features such as highlighting quality problems directly in the dataset or navigating through different datasets with same - or partial - structures. The schema associated with a dataset could also help having a better preview by associating a type to each field. For example, a postal code could be recognized as such and the leading zero would not be cropped.")]),e._v(" "),r("h2",{attrs:{id:"conclusion"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#conclusion"}},[e._v("#")]),e._v(" Conclusion")]),e._v(" "),r("p",[e._v("All of the features mentioned in this article are intended to promote the usefulness and the value of schemas and lead to the creation of new ones. We hope this will result in an increase of the overall quality of the data hosted on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Furthermore, we strongly believe that these features will help to link different users and producers with similar interests and therefore be in line with the community-based nature of "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[38],{507:function(e,a,t){e.exports=t.p+"assets/img/schema-1.d455d7fb.png"},508:function(e,a,t){e.exports=t.p+"assets/img/schema-2.97aad0c8.png"},631:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("In June 2019, "),r("a",{attrs:{href:"https://etalab.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("Etalab"),r("OutboundLink")],1),e._v(", a department of the French interministerial digital service (DINUM), launched "),r("a",{attrs:{href:"https://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(", a platform listing schemas for France. It could be described as what Johan Richer recently called a "),r("a",{attrs:{href:"https://frictionlessdata.io/blog/2020/04/23/table-schema-catalog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema catalog"),r("OutboundLink")],1),e._v(". This project is an initiative of "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(", the French open data platform, which is developed and maintained by Etalab.")]),e._v(" "),r("p",[r("img",{attrs:{src:"/img/blog/schema.gouv.fr.png",alt:"schema.gouv.fr homepage"}})]),e._v(" "),r("h2",{attrs:{id:"what-s-a-schema"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-s-a-schema"}},[e._v("#")]),e._v(" What’s a schema?")]),e._v(" "),r("p",[e._v("A schema declares a data model in a clear and precise manner, the various fields and types in a structured and consistent manner, according to a specification. For example, "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),r("OutboundLink")],1),e._v(" is a simple language to declare a schema for tabular data.")]),e._v(" "),r("p",[e._v("Schemas are well suited for a wide range of applications: validating data against a schema, documenting a data model, consolidating data from multiple sources, generating example datasets, or proposing tailored input forms. This wide range of applications makes schemas an important tool for both producers and reusers.")]),e._v(" "),r("h2",{attrs:{id:"advancing-open-data-quality"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#advancing-open-data-quality"}},[e._v("#")]),e._v(" Advancing open data quality")]),e._v(" "),r("p",[e._v("A common complaint of open data reusers has been the lack of quality of the data and data structure changes over time, without notice. The OKFN spoke about this issue in mid-2017 in a blog post, "),r("a",{attrs:{href:"https://blog.okfn.org/2017/05/31/open-data-quality-the-next-shift-in-open-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open data quality – the next shift in open data?"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("With "),r("a",{attrs:{href:"schema.data.gouv.fr"}},[e._v("schema.data.gouv.fr")]),e._v(", Etalab promotes high-quality open data: producers are encouraged to discuss and come up with an appropriate schema for the data they want to publish, and to document it with a recognised specification. Producers will then be able to make sure that the data they publish conforms to the schema over time. Reusers benefit from high-quality documentation, a stable data structure, and increased quality of the data.")]),e._v(" "),r("h2",{attrs:{id:"impacts"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#impacts"}},[e._v("#")]),e._v(" Impacts")]),e._v(" "),r("p",[e._v("The first impact of the launch of "),r("a",{attrs:{href:"https://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" has "),r("strong",[e._v("put at the forefront the challenge of open data quality")]),e._v(". It acknowledges that this is not a solved problem and that producers should embrace schemas, validators, documentation, automated testing to raise the quality of the data they publish. It’s also a recognition of the efforts already made by the community, for example the “Socle commun des données locales” (Common Ground of Local Data) by "),r("a",{attrs:{href:"http://www.opendatafrance.net",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenDataFrance"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("To help producers discover schemas and how it can be helpful for them, we published in March 2020 a "),r("a",{attrs:{href:"https://guides.etalab.gouv.fr/producteurs-schemas/",target:"_blank",rel:"noopener noreferrer"}},[e._v("long guide"),r("OutboundLink")],1),e._v(" going over steps producers are encouraged to follow when creating a schema: discovery, discussions, implementation, publication and finally referencing the schema on "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Since the launch, producers worked with their reusers and published various schemas: "),r("a",{attrs:{href:"https://schema.data.gouv.fr/etalab/schema-lieux-covoiturage/latest.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("carpooling places"),r("OutboundLink")],1),e._v(" or "),r("a",{attrs:{href:"https://schema.data.gouv.fr/arsante/schema-dae/latest.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("defibrillators"),r("OutboundLink")],1),e._v(" to name a few. People had in-depth discussions about their data model, encouraged by the thoroughness of the Table Schema specification. Producers worked hard to clean their data and finally reached a point where their dataset is 100% aligned with the schema, without any errors.")]),e._v(" "),r("h2",{attrs:{id:"what-s-next"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-s-next"}},[e._v("#")]),e._v(" What’s next")]),e._v(" "),r("p",[e._v("Here are a few things we are working on and hope to be able to finish in the coming years.")]),e._v(" "),r("h3",{attrs:{id:"improved-data-models-defined-in-the-law"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#improved-data-models-defined-in-the-law"}},[e._v("#")]),e._v(" Improved data models defined in the law")]),e._v(" "),r("p",[e._v("Right now, when data models are introduced by law, the data model is often described by a table. We’d like to offer a schema when these laws are published, to ease adoption by the community and improve discoverability.")]),e._v(" "),r("h3",{attrs:{id:"integration-with-data-gouv-fr"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#integration-with-data-gouv-fr"}},[e._v("#")]),e._v(" Integration with "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("The "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" initiative is mainly based on published datasets on the French open data platform "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(". However, these tools are still quite separated today. In the coming months, we would like to strengthen the link between "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(" and "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" by promoting existing schemas directly on the open data platform.")]),e._v(" "),r("p",[e._v("First, we would like to inform users of the existence of a consolidated dataset based on an existing schema and provide them with its quality report. Such a feature is newly available on "),r("a",{attrs:{href:"http://schema.data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema.data.gouv.fr"),r("OutboundLink")],1),e._v(". The same feature will arrive soon on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[r("img",{attrs:{src:t(507),alt:"Screenshot à prévoir"}})]),e._v(" "),r("p",[e._v("Second, we’re looking into integrating schemas into the data publishing process on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(". We could help users by letting them know that a schema corresponding to their dataset already exists. We could suggest them what changes to make to get their data directly validated. We already started doing this with a simple implementation: we post comments on datasets which are supposed to follow a schema, letting producers know if the data is valid and if not, enabling them to access a report to troubleshoot.")]),e._v(" "),r("p",[e._v("Another possibility would be to offer a new service on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" such as the generation of data from an automatically generated form. This is the goal of the ongoing development of "),r("a",{attrs:{href:"https://csv-gg.etalab.studio/?schema=etalab%2Fschema-lieux-covoiturage",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV-GG"),r("OutboundLink")],1),e._v(" allowing to generate a form from an existing Table Schema. This could help users to directly produce validated data.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(508),alt:"screenshot à prévoir"}})]),e._v(" "),r("h3",{attrs:{id:"automation"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#automation"}},[e._v("#")]),e._v(" Automation")]),e._v(" "),r("p",[e._v("In the longer term, we also plan to automate data consolidation based on a schema as much as possible. For that, we need to better know and understand available resources on the platform. This could be done by systematically analyzing the content of a new resource and try to fetch metadata such as headers or type of data for each column.")]),e._v(" "),r("p",[e._v("These metadata could then be used to identify datasets with similar structures and link them to an existing schema or propose to create a new one if it does not already exist.")]),e._v(" "),r("p",[e._v("We could also take advantage of the tool "),r("a",{attrs:{href:"https://github.com/etalab/csvapi",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSVAPI"),r("OutboundLink")],1),e._v(" which is actually in use on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(" to preview data of a specific dataset. CSVAPI could evolve to offer new features such as highlighting quality problems directly in the dataset or navigating through different datasets with same - or partial - structures. The schema associated with a dataset could also help having a better preview by associating a type to each field. For example, a postal code could be recognized as such and the leading zero would not be cropped.")]),e._v(" "),r("h2",{attrs:{id:"conclusion"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#conclusion"}},[e._v("#")]),e._v(" Conclusion")]),e._v(" "),r("p",[e._v("All of the features mentioned in this article are intended to promote the usefulness and the value of schemas and lead to the creation of new ones. We hope this will result in an increase of the overall quality of the data hosted on "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Furthermore, we strongly believe that these features will help to link different users and producers with similar interests and therefore be in line with the community-based nature of "),r("a",{attrs:{href:"http://data.gouv.fr",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.gouv.fr"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/4.40dcf021.js b/assets/js/4.86253916.js similarity index 99% rename from assets/js/4.40dcf021.js rename to assets/js/4.86253916.js index 7b6795fd9..e7ea43560 100644 --- a/assets/js/4.40dcf021.js +++ b/assets/js/4.86253916.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[4],{403:function(e,t,a){e.exports=a.p+"assets/img/mrathris.dcada7fe.png"},404:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-8.076febc6.png"},405:function(e,t,a){e.exports=a.p+"assets/img/fgrow-report-committed.65b32666.png"},406:function(e,t,a){e.exports=a.p+"assets/img/fgrow-import-violations.3e3b4fd5.png"},407:function(e,t,a){e.exports=a.p+"assets/img/fgrow-staging-violations.74e14b9b.png"},408:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-2.064ea569.png"},409:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-6.be27035f.png"},410:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-1.8bb982f4.png"},411:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-10.6fde368d.png"},412:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-3.a93a07b9.png"},413:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-4.89b52caf.png"},565:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Tesera is an employee-owned company, founded in 1997. Our focus is helping our clients create data-driven applications in the cloud. We also maintain two core product lines in addition to our consulting practice. "),r("a",{attrs:{href:"https://www.linkedin.com/showcase/municipal-risk-assessment-tool/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("MRAT.ca"),r("OutboundLink")],1),e._v(" helps municipalities identify risk of basement flooding, while "),r("a",{attrs:{href:"https://cran.r-project.org/web/packages/forestinventory/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("forestinventory.ca"),r("OutboundLink")],1),e._v(" (High Resolution Inventory Services) enables forest and natural resource companies to access a new level of accuracy and precision in resource inventories and carbon measurement.")]),e._v(" "),r("p",[r("a",{attrs:{href:"http://tesera.com/",target:"_blank",rel:"noopener noreferrer"}},[r("img",{attrs:{src:a(403),alt:"MRAT + HRIS"}}),r("OutboundLink")],1),e._v(" "),r("br"),e._v(" "),r("em",[r("a",{attrs:{href:"https://www.linkedin.com/showcase/municipal-risk-assessment-tool/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("MRAT.ca"),r("OutboundLink")],1),e._v(" and "),r("a",{attrs:{href:"http://forestinventory.ca",target:"_blank",rel:"noopener noreferrer"}},[e._v("forestinventory.ca"),r("OutboundLink")],1)])]),e._v(" "),r("p",[e._v("We deal with data from a variety of sources ranging from sample plots to in situ sensors. We grab samples and measurements to remotely sensed information from LiDAR, colour infrared and others. Many proprietary specifications exist across those data sources, and to work around this, we’ve adopted CSV as our universal format. We use Data Packages"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(", CSV files, and Table Schema"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" to create database tables, validate data schemas and domains, import data from S3"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" to PostgreSQL, DynamoDB"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", and Elastic"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(". In some cases we also use these Frictionless Data specs to move between application components, in particular where multiple technologies (Python, R, Javascript, and other) are utilized in a workflow.")]),e._v(" "),r("p",[e._v("We have adopted the Data Package standard as a simple, elegant way to describe and package our CSV data for interoperability between systems and components. We use this in conjunction with the Table Schema which enables us to define rules and constraints"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" for each field in the CSV file. With this in mind we have set up our workflows to essentially connect S3 buckets with analytical processes. We have written some simple open-source AWS Lambda"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn7",id:"fnref7"}},[e._v("[7]")])]),e._v(" functions that let us easily invoke validation and sanitization at the end of each process on the backend. We also expose this to the frontend of some of our applications so users can work through an import/contribution process where they are shown issues with their data that must be fixed before they can contribute. "),r("strong",[e._v("This helps us ensure good interoperable data at a foundational level, thereby making it easier to use for analysis, visualization, or modeling without extensive ad-hoc quality control.")])]),e._v(" "),r("p",[r("img",{attrs:{src:a(404),alt:'Example of validation error ("not a number") on import driven by Table Schema metadata'}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Example of validation error (“not a number”) on import driven by Table Schema metadata")])]),e._v(" "),r("p",[e._v("We discovered Frictionless Data through GitHub by following Max Ogden and some of the interesting work he is doing with "),r("a",{attrs:{href:"http://datproject.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dat"),r("OutboundLink")],1),e._v(". We were looking for simpler, more usable alternatives to the “standards” web-services craze of the 2000s. We had implemented a large interoperability hub for observation data called the [Water and Environmental hub (WEHUB)]"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn8",id:"fnref8"}},[e._v("[8]")])]),e._v(" which supported various "),r("a",{attrs:{href:"http://www.opengeospatial.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OGC"),r("OutboundLink")],1),e._v(" standards ("),r("a",{attrs:{href:"http://www.opengeospatial.org/standards/waterml",target:"_blank",rel:"noopener noreferrer"}},[e._v("WaterML"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"http://www.opengeospatial.org/standards/sos",target:"_blank",rel:"noopener noreferrer"}},[e._v("SOS"),r("OutboundLink")],1),e._v(") which was supposed to make important information accessible to many stakeholders, but in reality, nobody was using it. We were looking for a simpler way to enable data access and use for developers and downloaders alike.")]),e._v(" "),r("p",[e._v("We are especially keen on software that enables faster interoperability, especially within an AWS environment. We envision a framework of loaders, validators, sanitizers, analyzers, and exporters, fundamentally based around Amazon S3, various databases, and Lambda or Elastic Container Service"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn9",id:"fnref9"}},[e._v("[9]")])]),e._v(" (for larger processes). "),r("strong",[e._v("Having supported a lot of clients with a lot of projects, our goal has been to remove the common grunt work associated with data workflows to enable effort to be prioritized towards the use and application of the data.")])]),e._v(" "),r("p",[e._v("For instance, every data portal needs a way to import data into the system and likely a way to export data from the system. Depending on the complexity of the application and the size of the imports and exports, various approaches were utilized which directly leveraged the database or relied on various libraries. "),r("em",[e._v("The friction required to load and begin to make use of the data often consumed a large portion of project budgets.")]),e._v(" By moving towards common methods of import and export (as enabled by Data Package and Table Schema and deployed to Elastic Container Service and/or Lambda), we’ve been able to standardize that aspect of our data applications and not have to revisit it.")]),e._v(" "),r("p",[e._v("As the “Internet of Things” threatens to release yet another round of standards for essentially observation data, we hope to keep things simple and use what we have for these use cases as well. Smaller imports and exports can readily be executed by Lambda; when they are more complex or resource-intensive, Lambda can trigger an ECS task to complete the work.")]),e._v(" "),r("p",[e._v("We developed some basic CSV to DynamoDB and ElasticSearch loaders in support of a Common Operating Picture toolset for the "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/2016_Fort_McMurray_wildfire",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fort McMurray Wildfires"),r("OutboundLink")],1),e._v(". In the coming days, we would like to clean those up, along with our existing RDS loaders and Lambda functions and start moving towards the framework described. We are cleaning up and open sourcing a number of utilities to facilitate these workflows with the goal of being able to describe data types in CSV files, then automatically map them or input them into a model. There may be an opportunity to explicitly identify how spatial feature information is carried within a Data Package or Table Schema.")]),e._v(" "),r("p",[e._v("We are kind of excited about the method and framework itself to have almost "),r("a",{attrs:{href:"https://zapier.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Zapier"),r("OutboundLink")],1),e._v("- or"),r("br"),e._v(" "),r("a",{attrs:{href:"https://ifttt.com",target:"_blank",rel:"noopener noreferrer"}},[e._v("IFTTT"),r("OutboundLink")],1),e._v("-like capabilities for CSV data where we can rapidly accomplish many common use cases enabling resources to be prioritized to the business value. On the application side, we have been getting pretty excited about ElasticSearch and Kibana"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn10",id:"fnref10"}},[e._v("[10]")])]),e._v(" and perhaps extending them to bring together more seamless exploration of large dynamic geospatial datasets, especially where the data is continuous/temporal in nature and existing GIS technology falls pretty flat. This will be important as smart cities and “Internet of Things” use cases advance.")]),e._v(" "),r("h2",{attrs:{id:"projects"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#projects"}},[e._v("#")]),e._v(" Projects")]),e._v(" "),r("p",[r("em",[e._v("This next section will explore two Tesera-developed projects that employ the Frictionless Data specifications: the Provincial Growth and Yield Initiative Plot Sharing App (PGYI) and Mackenzie DataStream.")])]),e._v(" "),r("h3",{attrs:{id:"_1-provincial-growth-and-yield-initiative-plot-sharing-app"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#_1-provincial-growth-and-yield-initiative-plot-sharing-app"}},[e._v("#")]),e._v(" 1. Provincial Growth and Yield Initiative Plot Sharing App")]),e._v(" "),r("p",[r("img",{attrs:{src:a(405),alt:"The Provincial Growth and Yield Initiative Plot Sharing App"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Provincial Growth and Yield Initiative Plot Sharing App")])]),e._v(" "),r("p",[e._v("With this app, we are enabling the 16 government and industrial members of "),r("a",{attrs:{href:"https://fgrow.friresearch.ca/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Forest Growth Organization of Western Canada (FGrOW)"),r("OutboundLink")],1),e._v(" to seamlessly share forest plot measurement data with each other and know that the data will be interoperable and meet their specifications. Specifications were designed primarily with the data manager in mind and were formatted as a contribution guidelines document. From this document, the "),r("a",{attrs:{href:"https://github.com/tesera/datatheme-afgo-pgyi",target:"_blank",rel:"noopener noreferrer"}},[e._v("afgo-pgyi"),r("OutboundLink")],1),e._v(" “Data Theme” was created which contains the Data Package details as well as the several Table Schemas required to assemble a dataset. Having access to this large and interoperable dataset will enable their members to improve their growth and yield models and respond to bioclimatic changes as they occur.")]),e._v(" "),r("p",[e._v("We supported FGrOW in creating a set of data standards and then created the Table Schemas to enable a validation workflow. The members upload a set of relational CSV files which are packaged up as Data Packages, uploaded to S3, and then validated by the Lambda Data Package Validator. The results of this initial validation are returned to the user as errors (cannot proceed) or warnings (something is wrong but it can be accepted).")]),e._v(" "),r("p",[r("img",{attrs:{src:a(406),alt:"PGYI import violations"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("PGYI import violations")])]),e._v(" "),r("p",[e._v("At this stage the data is considered imported. If there are no errors the user is able to stage their dataset which uses the Lambda RDS Loader to import the Data Package into an RDS PostGreSQL instance. This triggers a number of more sophisticated validation functions relating to tree growth rates, measurement impossibilities, and sanity checks at the database level.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(407),alt:"PGYI staging violations"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("PGYI staging violations")])]),e._v(" "),r("p",[e._v("Having previously ensured the data meets the Table Schema and was loaded successfully, we have confidence in executing custom database functions without having to handle endless data anomalies and exceptions. A simple example check to see if species changes between measurements can be illustrated below:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",{pre:!0,attrs:{class:"language-text"}},[r("code",[e._v("CREATE OR REPLACE FUNCTION staging.get_upload_trees_species_violations(in_upload_id text)\nRETURNS SETOF staging.violation AS $$\n\nBEGIN\n -- RULE 1: tree species should not change over time\n RETURN QUERY\n\n SELECT\n '0'::text,\n staged_tree.upload_id,\n\n staged_tree.source_row_index,\n 'trees'::text,\n array_to_string(ARRAY[staged_tree.company, staged_tree.company_plot_number, staged_tree.tree_number::text], '-'),\n\n 'trees.species.change'::text,\n 'warning'::text,\n format('Tree species changed from %s to %s', committed_tree.species, staged_tree.species)\n\n FROM staging.staged_trees staged_tree\n INNER JOIN staging.committed_trees committed_tree\n USING (company, company_plot_number, tree_number)\n\n WHERE staged_tree.upload_id = in_upload_id\n AND (staged_tree.species NOTNULL AND staged_tree.species <>'No')\n AND staged_tree.species != committed_tree.species;\n\nEND;\n$$ LANGUAGE plpgsql;\n")])])]),r("p",[e._v("Again the user is presented with violations as errors or warnings and can they can choose to commit the plots without errors into the shared database. Essentially this three step workflow from imported, to staged, to committed allows FGroW to ensure quality data that will be useful for their modeling and analysis purposes.")]),e._v(" "),r("p",[e._v("FGroW has built a database that currently has 2400 permanent sample plots each containing many trees and all together 10s of millions of measurements across a wide variety of strata including various natural regions and natural sub-regions. This database provides the numeric power to produce and refine better growth models and enable companies to adopt their planning and management to real conditions.")]),e._v(" "),r("p",[e._v("There are many cases where industries might wish to bring together measurement data in a consistent way to maximize their productivity. "),r("strong",[e._v("One of the more obvious examples is in agriculture where precision information is increasingly collected at the local or individual farm level, but bringing this information together in aggregate would produce new and greater insight with regard to productivity, broad scale change, and perhaps adaption to climate change strategies.")])]),e._v(" "),r("h3",{attrs:{id:"_2-mackenzie-datastream"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#_2-mackenzie-datastream"}},[e._v("#")]),e._v(" 2. Mackenzie DataStream")]),e._v(" "),r("p",[r("a",{attrs:{href:"http://www.mackenziedatastream.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://www.mackenziedatastream.org/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("img",{attrs:{src:a(408),alt:"Mackenzie DataStream App"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream App")])]),e._v(" "),r("p",[r("a",{attrs:{href:"http://www.mackenziedatastream.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mackenzie DataStream"),r("OutboundLink")],1),e._v(" is an open access platform for exploring and sharing water data in the Mackenzie River Basin. DataStream’s mission is to promote knowledge sharing and advance collaborative and evidence-based decision making throughout the Basin. The Mackenzie River Basin is extremely large, measuring 1.8 million square kilometers and as such monitoring is a large challenge. To overcome this challenge, water quality monitoring is carried out by a variety of partners which include communities and Aboriginal, territorial, and federal governments. With multiple parties collecting and sharing information, Mackenzie DataStream had to overcome challenges of trust and interoperability.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(409),alt:"The Mackenzie River Basin"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Mackenzie River Basin")])]),e._v(" "),r("p",[e._v("Tesera leveraged the Data Package standard as an easy way for Government and community partners alike to import data into the system. We used Table Schema to define the structure and constraints of the Data Themes which we represented in a simple visible way.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(410),alt:"Table fields and validation rules derived from Table Schema"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Table fields and validation rules derived from Table Schema")])]),e._v(" "),r("p",[e._v("The backend on this system also relies on the Data Package Validator and the Relational Database Loader. The observation data is then exposed to the client via a simple "),r("a",{attrs:{href:"http://expressjs.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Express.js"),r("OutboundLink")],1),e._v(" API as JSON. The Frictionless Data specifications help us ensure clean consistent data and make visualization a breeze. We push the data to "),r("a",{attrs:{href:"https://plot.ly/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Plotly"),r("OutboundLink")],1),e._v(" to build the charts as it provides lots of options for scientific plotting, as well as a good api, at a minimal cost.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(411),alt:"Mackenzie DataStream visualization example"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream visualization example")])]),e._v(" "),r("p",[e._v("The Mackenzie DataStream is gaining momentum and partners. The "),r("a",{attrs:{href:"http://www.fortnelsonfirstnation.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fort Nelson First Nation"),r("OutboundLink")],1),e._v(" has joined on as a contributing partner and the "),r("a",{attrs:{href:"http://www.gov.nt.ca/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Government of Northwest Territories"),r("OutboundLink")],1),e._v(" is looking to apply DataStream to a few other datatypes and bringing on some addition partners in water permitting and cumulative effects monitoring. We think of this as a simple and effective way to make environmental monitoring data more accessible.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(412),alt:"Mackenzie DataStream environmental observation data"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream environmental observation data")])]),e._v(" "),r("p",[e._v("There are many ways to monitor the environment, but bringing the data together according to standards, ensuring that it is loaded correctly, and making it accessible via a simple API seems pretty universal. We are working through a UX/UI overhaul and then hope to open source the entire DataStream application for other organizations that are collecting environmental observation data and looking to increase its utility to citizens, scientists, and consultants alike.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(413),alt:"Mackenzie DataStream summary statistics"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream summary statistics")])]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("Data Packages: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Table Schema: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Amazon Simple Storage Service (Amazon S3): "),r("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/s3/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Amazon DynamoDB: "),r("a",{attrs:{href:"https://aws.amazon.com/dynamodb/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/dynamodb/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Elastic Search: "),r("a",{attrs:{href:"https://www.elastic.co/products/elasticsearch",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/elasticsearch"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[r("p",[e._v("Table Schema Field Constraints: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#constraints",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema/#constraints"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[r("p",[e._v("Amazon AWS Lambda: "),r("a",{attrs:{href:"https://aws.amazon.com/lambda/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/lambda/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn8"}},[r("p",[e._v("Water and Environmental Hub: "),r("a",{attrs:{href:"http://watercanada.net/2013/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://watercanada.net/2013/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref8"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn9"}},[r("p",[e._v("Amazon EC2: Virtual Server Hosting: "),r("a",{attrs:{href:"https://aws.amazon.com/ec2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/ec2/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref9"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn10"}},[r("p",[e._v("Kibana: "),r("a",{attrs:{href:"https://www.elastic.co/products/kibana",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/kibana"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref10"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[4],{403:function(e,t,a){e.exports=a.p+"assets/img/mrathris.dcada7fe.png"},404:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-8.076febc6.png"},405:function(e,t,a){e.exports=a.p+"assets/img/fgrow-report-committed.65b32666.png"},406:function(e,t,a){e.exports=a.p+"assets/img/fgrow-import-violations.3e3b4fd5.png"},407:function(e,t,a){e.exports=a.p+"assets/img/fgrow-staging-violations.74e14b9b.png"},408:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-2.064ea569.png"},409:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-6.be27035f.png"},410:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-1.8bb982f4.png"},411:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-10.6fde368d.png"},412:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-3.a93a07b9.png"},413:function(e,t,a){e.exports=a.p+"assets/img/mackenzie-4.89b52caf.png"},564:function(e,t,a){"use strict";a.r(t);var r=a(29),n=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Tesera is an employee-owned company, founded in 1997. Our focus is helping our clients create data-driven applications in the cloud. We also maintain two core product lines in addition to our consulting practice. "),r("a",{attrs:{href:"https://www.linkedin.com/showcase/municipal-risk-assessment-tool/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("MRAT.ca"),r("OutboundLink")],1),e._v(" helps municipalities identify risk of basement flooding, while "),r("a",{attrs:{href:"https://cran.r-project.org/web/packages/forestinventory/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("forestinventory.ca"),r("OutboundLink")],1),e._v(" (High Resolution Inventory Services) enables forest and natural resource companies to access a new level of accuracy and precision in resource inventories and carbon measurement.")]),e._v(" "),r("p",[r("a",{attrs:{href:"http://tesera.com/",target:"_blank",rel:"noopener noreferrer"}},[r("img",{attrs:{src:a(403),alt:"MRAT + HRIS"}}),r("OutboundLink")],1),e._v(" "),r("br"),e._v(" "),r("em",[r("a",{attrs:{href:"https://www.linkedin.com/showcase/municipal-risk-assessment-tool/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("MRAT.ca"),r("OutboundLink")],1),e._v(" and "),r("a",{attrs:{href:"http://forestinventory.ca",target:"_blank",rel:"noopener noreferrer"}},[e._v("forestinventory.ca"),r("OutboundLink")],1)])]),e._v(" "),r("p",[e._v("We deal with data from a variety of sources ranging from sample plots to in situ sensors. We grab samples and measurements to remotely sensed information from LiDAR, colour infrared and others. Many proprietary specifications exist across those data sources, and to work around this, we’ve adopted CSV as our universal format. We use Data Packages"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(", CSV files, and Table Schema"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" to create database tables, validate data schemas and domains, import data from S3"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" to PostgreSQL, DynamoDB"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", and Elastic"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(". In some cases we also use these Frictionless Data specs to move between application components, in particular where multiple technologies (Python, R, Javascript, and other) are utilized in a workflow.")]),e._v(" "),r("p",[e._v("We have adopted the Data Package standard as a simple, elegant way to describe and package our CSV data for interoperability between systems and components. We use this in conjunction with the Table Schema which enables us to define rules and constraints"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" for each field in the CSV file. With this in mind we have set up our workflows to essentially connect S3 buckets with analytical processes. We have written some simple open-source AWS Lambda"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn7",id:"fnref7"}},[e._v("[7]")])]),e._v(" functions that let us easily invoke validation and sanitization at the end of each process on the backend. We also expose this to the frontend of some of our applications so users can work through an import/contribution process where they are shown issues with their data that must be fixed before they can contribute. "),r("strong",[e._v("This helps us ensure good interoperable data at a foundational level, thereby making it easier to use for analysis, visualization, or modeling without extensive ad-hoc quality control.")])]),e._v(" "),r("p",[r("img",{attrs:{src:a(404),alt:'Example of validation error ("not a number") on import driven by Table Schema metadata'}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Example of validation error (“not a number”) on import driven by Table Schema metadata")])]),e._v(" "),r("p",[e._v("We discovered Frictionless Data through GitHub by following Max Ogden and some of the interesting work he is doing with "),r("a",{attrs:{href:"http://datproject.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dat"),r("OutboundLink")],1),e._v(". We were looking for simpler, more usable alternatives to the “standards” web-services craze of the 2000s. We had implemented a large interoperability hub for observation data called the [Water and Environmental hub (WEHUB)]"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn8",id:"fnref8"}},[e._v("[8]")])]),e._v(" which supported various "),r("a",{attrs:{href:"http://www.opengeospatial.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OGC"),r("OutboundLink")],1),e._v(" standards ("),r("a",{attrs:{href:"http://www.opengeospatial.org/standards/waterml",target:"_blank",rel:"noopener noreferrer"}},[e._v("WaterML"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"http://www.opengeospatial.org/standards/sos",target:"_blank",rel:"noopener noreferrer"}},[e._v("SOS"),r("OutboundLink")],1),e._v(") which was supposed to make important information accessible to many stakeholders, but in reality, nobody was using it. We were looking for a simpler way to enable data access and use for developers and downloaders alike.")]),e._v(" "),r("p",[e._v("We are especially keen on software that enables faster interoperability, especially within an AWS environment. We envision a framework of loaders, validators, sanitizers, analyzers, and exporters, fundamentally based around Amazon S3, various databases, and Lambda or Elastic Container Service"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn9",id:"fnref9"}},[e._v("[9]")])]),e._v(" (for larger processes). "),r("strong",[e._v("Having supported a lot of clients with a lot of projects, our goal has been to remove the common grunt work associated with data workflows to enable effort to be prioritized towards the use and application of the data.")])]),e._v(" "),r("p",[e._v("For instance, every data portal needs a way to import data into the system and likely a way to export data from the system. Depending on the complexity of the application and the size of the imports and exports, various approaches were utilized which directly leveraged the database or relied on various libraries. "),r("em",[e._v("The friction required to load and begin to make use of the data often consumed a large portion of project budgets.")]),e._v(" By moving towards common methods of import and export (as enabled by Data Package and Table Schema and deployed to Elastic Container Service and/or Lambda), we’ve been able to standardize that aspect of our data applications and not have to revisit it.")]),e._v(" "),r("p",[e._v("As the “Internet of Things” threatens to release yet another round of standards for essentially observation data, we hope to keep things simple and use what we have for these use cases as well. Smaller imports and exports can readily be executed by Lambda; when they are more complex or resource-intensive, Lambda can trigger an ECS task to complete the work.")]),e._v(" "),r("p",[e._v("We developed some basic CSV to DynamoDB and ElasticSearch loaders in support of a Common Operating Picture toolset for the "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/2016_Fort_McMurray_wildfire",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fort McMurray Wildfires"),r("OutboundLink")],1),e._v(". In the coming days, we would like to clean those up, along with our existing RDS loaders and Lambda functions and start moving towards the framework described. We are cleaning up and open sourcing a number of utilities to facilitate these workflows with the goal of being able to describe data types in CSV files, then automatically map them or input them into a model. There may be an opportunity to explicitly identify how spatial feature information is carried within a Data Package or Table Schema.")]),e._v(" "),r("p",[e._v("We are kind of excited about the method and framework itself to have almost "),r("a",{attrs:{href:"https://zapier.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Zapier"),r("OutboundLink")],1),e._v("- or"),r("br"),e._v(" "),r("a",{attrs:{href:"https://ifttt.com",target:"_blank",rel:"noopener noreferrer"}},[e._v("IFTTT"),r("OutboundLink")],1),e._v("-like capabilities for CSV data where we can rapidly accomplish many common use cases enabling resources to be prioritized to the business value. On the application side, we have been getting pretty excited about ElasticSearch and Kibana"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn10",id:"fnref10"}},[e._v("[10]")])]),e._v(" and perhaps extending them to bring together more seamless exploration of large dynamic geospatial datasets, especially where the data is continuous/temporal in nature and existing GIS technology falls pretty flat. This will be important as smart cities and “Internet of Things” use cases advance.")]),e._v(" "),r("h2",{attrs:{id:"projects"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#projects"}},[e._v("#")]),e._v(" Projects")]),e._v(" "),r("p",[r("em",[e._v("This next section will explore two Tesera-developed projects that employ the Frictionless Data specifications: the Provincial Growth and Yield Initiative Plot Sharing App (PGYI) and Mackenzie DataStream.")])]),e._v(" "),r("h3",{attrs:{id:"_1-provincial-growth-and-yield-initiative-plot-sharing-app"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#_1-provincial-growth-and-yield-initiative-plot-sharing-app"}},[e._v("#")]),e._v(" 1. Provincial Growth and Yield Initiative Plot Sharing App")]),e._v(" "),r("p",[r("img",{attrs:{src:a(405),alt:"The Provincial Growth and Yield Initiative Plot Sharing App"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Provincial Growth and Yield Initiative Plot Sharing App")])]),e._v(" "),r("p",[e._v("With this app, we are enabling the 16 government and industrial members of "),r("a",{attrs:{href:"https://fgrow.friresearch.ca/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Forest Growth Organization of Western Canada (FGrOW)"),r("OutboundLink")],1),e._v(" to seamlessly share forest plot measurement data with each other and know that the data will be interoperable and meet their specifications. Specifications were designed primarily with the data manager in mind and were formatted as a contribution guidelines document. From this document, the "),r("a",{attrs:{href:"https://github.com/tesera/datatheme-afgo-pgyi",target:"_blank",rel:"noopener noreferrer"}},[e._v("afgo-pgyi"),r("OutboundLink")],1),e._v(" “Data Theme” was created which contains the Data Package details as well as the several Table Schemas required to assemble a dataset. Having access to this large and interoperable dataset will enable their members to improve their growth and yield models and respond to bioclimatic changes as they occur.")]),e._v(" "),r("p",[e._v("We supported FGrOW in creating a set of data standards and then created the Table Schemas to enable a validation workflow. The members upload a set of relational CSV files which are packaged up as Data Packages, uploaded to S3, and then validated by the Lambda Data Package Validator. The results of this initial validation are returned to the user as errors (cannot proceed) or warnings (something is wrong but it can be accepted).")]),e._v(" "),r("p",[r("img",{attrs:{src:a(406),alt:"PGYI import violations"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("PGYI import violations")])]),e._v(" "),r("p",[e._v("At this stage the data is considered imported. If there are no errors the user is able to stage their dataset which uses the Lambda RDS Loader to import the Data Package into an RDS PostGreSQL instance. This triggers a number of more sophisticated validation functions relating to tree growth rates, measurement impossibilities, and sanity checks at the database level.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(407),alt:"PGYI staging violations"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("PGYI staging violations")])]),e._v(" "),r("p",[e._v("Having previously ensured the data meets the Table Schema and was loaded successfully, we have confidence in executing custom database functions without having to handle endless data anomalies and exceptions. A simple example check to see if species changes between measurements can be illustrated below:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",{pre:!0,attrs:{class:"language-text"}},[r("code",[e._v("CREATE OR REPLACE FUNCTION staging.get_upload_trees_species_violations(in_upload_id text)\nRETURNS SETOF staging.violation AS $$\n\nBEGIN\n -- RULE 1: tree species should not change over time\n RETURN QUERY\n\n SELECT\n '0'::text,\n staged_tree.upload_id,\n\n staged_tree.source_row_index,\n 'trees'::text,\n array_to_string(ARRAY[staged_tree.company, staged_tree.company_plot_number, staged_tree.tree_number::text], '-'),\n\n 'trees.species.change'::text,\n 'warning'::text,\n format('Tree species changed from %s to %s', committed_tree.species, staged_tree.species)\n\n FROM staging.staged_trees staged_tree\n INNER JOIN staging.committed_trees committed_tree\n USING (company, company_plot_number, tree_number)\n\n WHERE staged_tree.upload_id = in_upload_id\n AND (staged_tree.species NOTNULL AND staged_tree.species <>'No')\n AND staged_tree.species != committed_tree.species;\n\nEND;\n$$ LANGUAGE plpgsql;\n")])])]),r("p",[e._v("Again the user is presented with violations as errors or warnings and can they can choose to commit the plots without errors into the shared database. Essentially this three step workflow from imported, to staged, to committed allows FGroW to ensure quality data that will be useful for their modeling and analysis purposes.")]),e._v(" "),r("p",[e._v("FGroW has built a database that currently has 2400 permanent sample plots each containing many trees and all together 10s of millions of measurements across a wide variety of strata including various natural regions and natural sub-regions. This database provides the numeric power to produce and refine better growth models and enable companies to adopt their planning and management to real conditions.")]),e._v(" "),r("p",[e._v("There are many cases where industries might wish to bring together measurement data in a consistent way to maximize their productivity. "),r("strong",[e._v("One of the more obvious examples is in agriculture where precision information is increasingly collected at the local or individual farm level, but bringing this information together in aggregate would produce new and greater insight with regard to productivity, broad scale change, and perhaps adaption to climate change strategies.")])]),e._v(" "),r("h3",{attrs:{id:"_2-mackenzie-datastream"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#_2-mackenzie-datastream"}},[e._v("#")]),e._v(" 2. Mackenzie DataStream")]),e._v(" "),r("p",[r("a",{attrs:{href:"http://www.mackenziedatastream.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://www.mackenziedatastream.org/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("img",{attrs:{src:a(408),alt:"Mackenzie DataStream App"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream App")])]),e._v(" "),r("p",[r("a",{attrs:{href:"http://www.mackenziedatastream.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Mackenzie DataStream"),r("OutboundLink")],1),e._v(" is an open access platform for exploring and sharing water data in the Mackenzie River Basin. DataStream’s mission is to promote knowledge sharing and advance collaborative and evidence-based decision making throughout the Basin. The Mackenzie River Basin is extremely large, measuring 1.8 million square kilometers and as such monitoring is a large challenge. To overcome this challenge, water quality monitoring is carried out by a variety of partners which include communities and Aboriginal, territorial, and federal governments. With multiple parties collecting and sharing information, Mackenzie DataStream had to overcome challenges of trust and interoperability.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(409),alt:"The Mackenzie River Basin"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Mackenzie River Basin")])]),e._v(" "),r("p",[e._v("Tesera leveraged the Data Package standard as an easy way for Government and community partners alike to import data into the system. We used Table Schema to define the structure and constraints of the Data Themes which we represented in a simple visible way.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(410),alt:"Table fields and validation rules derived from Table Schema"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Table fields and validation rules derived from Table Schema")])]),e._v(" "),r("p",[e._v("The backend on this system also relies on the Data Package Validator and the Relational Database Loader. The observation data is then exposed to the client via a simple "),r("a",{attrs:{href:"http://expressjs.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Express.js"),r("OutboundLink")],1),e._v(" API as JSON. The Frictionless Data specifications help us ensure clean consistent data and make visualization a breeze. We push the data to "),r("a",{attrs:{href:"https://plot.ly/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Plotly"),r("OutboundLink")],1),e._v(" to build the charts as it provides lots of options for scientific plotting, as well as a good api, at a minimal cost.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(411),alt:"Mackenzie DataStream visualization example"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream visualization example")])]),e._v(" "),r("p",[e._v("The Mackenzie DataStream is gaining momentum and partners. The "),r("a",{attrs:{href:"http://www.fortnelsonfirstnation.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fort Nelson First Nation"),r("OutboundLink")],1),e._v(" has joined on as a contributing partner and the "),r("a",{attrs:{href:"http://www.gov.nt.ca/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Government of Northwest Territories"),r("OutboundLink")],1),e._v(" is looking to apply DataStream to a few other datatypes and bringing on some addition partners in water permitting and cumulative effects monitoring. We think of this as a simple and effective way to make environmental monitoring data more accessible.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(412),alt:"Mackenzie DataStream environmental observation data"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream environmental observation data")])]),e._v(" "),r("p",[e._v("There are many ways to monitor the environment, but bringing the data together according to standards, ensuring that it is loaded correctly, and making it accessible via a simple API seems pretty universal. We are working through a UX/UI overhaul and then hope to open source the entire DataStream application for other organizations that are collecting environmental observation data and looking to increase its utility to citizens, scientists, and consultants alike.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(413),alt:"Mackenzie DataStream summary statistics"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("Mackenzie DataStream summary statistics")])]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("Data Packages: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Table Schema: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Amazon Simple Storage Service (Amazon S3): "),r("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/s3/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Amazon DynamoDB: "),r("a",{attrs:{href:"https://aws.amazon.com/dynamodb/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/dynamodb/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Elastic Search: "),r("a",{attrs:{href:"https://www.elastic.co/products/elasticsearch",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/elasticsearch"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[r("p",[e._v("Table Schema Field Constraints: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#constraints",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema/#constraints"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[r("p",[e._v("Amazon AWS Lambda: "),r("a",{attrs:{href:"https://aws.amazon.com/lambda/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/lambda/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn8"}},[r("p",[e._v("Water and Environmental Hub: "),r("a",{attrs:{href:"http://watercanada.net/2013/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://watercanada.net/2013/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref8"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn9"}},[r("p",[e._v("Amazon EC2: Virtual Server Hosting: "),r("a",{attrs:{href:"https://aws.amazon.com/ec2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://aws.amazon.com/ec2/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref9"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn10"}},[r("p",[e._v("Kibana: "),r("a",{attrs:{href:"https://www.elastic.co/products/kibana",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/kibana"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref10"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/40.d1793c56.js b/assets/js/40.3b6084c5.js similarity index 96% rename from assets/js/40.d1793c56.js rename to assets/js/40.3b6084c5.js index 7001c19f5..74ef8015a 100644 --- a/assets/js/40.d1793c56.js +++ b/assets/js/40.3b6084c5.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[40],{514:function(e,t,a){e.exports=a.p+"assets/img/TUDelft-training.7a056b52.png"},515:function(e,t,a){e.exports=a.p+"assets/img/TU-Delft-feedback.1c752ebd.png"},690:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[r("em",[e._v("Originally published on: "),r("a",{attrs:{href:"https://community.data.4tu.nl/2022/05/19/workshop-on-fair-and-frictionless-workflows-for-tabular-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://community.data.4tu.nl/2022/05/19/workshop-on-fair-and-frictionless-workflows-for-tabular-data/"),r("OutboundLink")],1)])]),e._v(" "),r("p",[e._v("4TU.ResearchData and Frictionless Data joined forces to organize the workshop "),r("a",{attrs:{href:"https://community.data.4tu.nl/2022/03/22/workshop-fair-and-frictionless-workflows-for-tabular-data-online/",target:"_blank",rel:"noopener noreferrer"}},[e._v("“FAIR and frictionless workflows for tabular data”"),r("OutboundLink")],1),e._v(". The workshop took place on 28 and 29 April 2022 in an online format")]),e._v(" "),r("p",[e._v("On 28 and 29 April we ran the workshop “FAIR and frictionless workflows for tabular data” in collaboration with members of the "),r("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data project team"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("This workshop was envisioned as a pilot to create training on reproducible and FAIR tools that researchers can use when working with tabular data, from creation to publication. The programme was a mixture of presentations, exercises and hands-on live coding sessions. We got a lot of inspiration from "),r("a",{attrs:{href:"https://carpentries.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Carpentries"),r("OutboundLink")],1),e._v(" style of workshops and tried to create a safe, inclusive and interactive learning experience for the participants.")]),e._v(" "),r("p",[e._v("The workshop started with an introduction to Reproducible and FAIR research given by "),r("a",{attrs:{href:"https://www.tudelft.nl/library/research-data-management/r/support/data-stewardship/contact/eirini-zormpa",target:"_blank",rel:"noopener noreferrer"}},[e._v("Eirini Zormpa"),r("OutboundLink")],1),e._v(" (Trainer at 4TU.ResearchData), who also introduced learners to best practices for data organization of tabular data based on the "),r("a",{attrs:{href:"https://datacarpentry.org/spreadsheet-ecology-lesson/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Carpentry for Ecologists lesson"),r("OutboundLink")],1),e._v(". You can have a look at "),r("a",{attrs:{href:"https://4turesearchdata-carpentries.github.io/frictionless-data-workshop/data-organisation.html#1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Eirini’s slides here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("The introduction was followed by a hands-on session exploring the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data framework"),r("OutboundLink")],1),e._v(". The Frictionless Data project has developed a full data management framework for Python to describe, extract, validate, and transform tabular data following the FAIR principles. "),r("a",{attrs:{href:"https://www.linkedin.com/in/lilly-winfree-phd/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lilly Winfree"),r("OutboundLink")],1),e._v(" used Jupyter Notebook to introduce learners to the different tools, as it helps visualizing the steps of the workflow. You can access the presentation and the notebook (and all the materials of the workshop) used by Lilly in "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub repository"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("During the hands-on coding session, the learners practiced what they were learning on an example dataset from ecology (source of the dataset: "),r("a",{attrs:{href:"https://datacarpentry.org/ecology-workshop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Carpentry for Ecologists"),r("OutboundLink")],1),e._v("). Later in the workshop, Katerina Drakoulaki, Frictionless Data fellow and helper, also gave an example of how to apply the framework tools to a "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-/blob/main/03_Frictionless%20Data-MBn%20presentation_28-4-2022.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("dataset coming from the computational musicology field"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We concluded the workshop with a presentation about "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-/blob/main/04_FAIRandFRictionless%20workflows_Data_Publication.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Publication"),r("OutboundLink")],1),e._v(" by "),r("a",{attrs:{href:"https://www.tudelft.nl/staff/p.m.martinezlavanchy/?cHash=38d458b8cd0f7bc5562cd130725220c6",target:"_blank",rel:"noopener noreferrer"}},[e._v("Paula Martinez Lavanchy"),r("OutboundLink")],1),e._v(", Research Data Officer at 4TU.ResearchData. The presentation focused on why researchers should publish their data, how to select the data to publish and how to choose a good data repository that helps implement the FAIR principles to the researchers’ data. Paula also briefly demoed the features of 4TU.ResearchData using the "),r("a",{attrs:{href:"https://sandbox.data.4tu.nl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository sandbox"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Besides the instructors, we also had a great team of helpers that were there in case the learners encountered any technical problems or had questions during the live coding session. We would like to give a big thank you to: Nicolas Dintzner – TU Delft Data Steward of the Faculty of Technology, Policy & Management, Katerina Drakoulaki – Postdoctoral researcher, at NKUA & Frictionless Data Fellow, Aleksandra Wilczynska – Data Manager at TU Delft Library & the Digital Competence Center and Sara Petti – Project Manager at Open Knowledge Foundation.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(514),alt:"image"}}),r("br")]),e._v(" "),r("blockquote",[r("p",[r("strong",[e._v("Image:")]),e._v(" Top-left: Eirini Zormpa -Trainer of RDM and Open Science at TU Delft Library & 4TU.ResearchData, Top-right: Lilly Winfree – Product Manager of Frictionless Data at the Open Knowledge Foundation, Bottom: Katerina Drakoulaki – Postdoctoral researcher at NKUA & Frictionless Data fellow."),r("br")])]),e._v(" "),r("p",[e._v("Nineteen learners joined the workshop. The audience had a broad range of backgrounds with both researchers and support staff (e.g. data curator, research data manager, research software engineer, data librarian, etc.) represented. The workshop received quite positive feedback. Most of the learner’s expectations were fulfilled (79%) and they would recommend the workshop to other researchers (93%). It was also nice to know that most of the learners felt that they can apply what they learned immediately and they felt comfortable learning in the workshop.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(515),alt:"image"}}),r("br")]),e._v(" "),r("blockquote",[r("p",[r("strong",[e._v("Images:")]),e._v(" Feedback training event"),r("br")])]),e._v(" "),r("p",[e._v("This feedback from the learners has helped us to start thinking about how to improve future runs of the workshop. For example, we used less time than we had planned, which creates the opportunity to provide instruction on more features of the framework or to add more exercises or practice time. The learners also indicated they would have liked to have a common document (e.g. Google doc or HackMD) to share reference material and to document the code that the instructor was typing in case they got lost.")]),e._v(" "),r("p",[e._v("Even though there is room for improvement, the learners appreciated the highly practical approach of the workshop, the space they had to practice what they learned and the overall quality of the Frictionless Data framework tools. Here are some of the strengths that learners mentioned:")]),e._v(" "),r("p",[r("em",[e._v("‘Hands-on, can start using what I learned immediately’")])]),e._v(" "),r("p",[r("em",[e._v("‘Practical experience with the framework and working on shared examples.’")])]),e._v(" "),r("p",[r("em",[e._v("‘Machine readable data and packaging for interoperability through frictionless’")])]),e._v(" "),r("p",[r("em",[e._v("‘Very clear content. Assured assistance in case of technical problems. Adherence to timelines with breaks. Provided many in-depth links. Friendly atmosphere.’")])]),e._v(" "),r("p",[e._v("We at the 4TU.ResearchData team greatly enjoyed this collaboration that allowed us to help build the skills that researchers and other users of the repository need to make research data findable, accessible, interoperable and reproducible (FAIR).")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[40],{515:function(e,t,a){e.exports=a.p+"assets/img/TUDelft-training.7a056b52.png"},516:function(e,t,a){e.exports=a.p+"assets/img/TU-Delft-feedback.1c752ebd.png"},693:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[r("em",[e._v("Originally published on: "),r("a",{attrs:{href:"https://community.data.4tu.nl/2022/05/19/workshop-on-fair-and-frictionless-workflows-for-tabular-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://community.data.4tu.nl/2022/05/19/workshop-on-fair-and-frictionless-workflows-for-tabular-data/"),r("OutboundLink")],1)])]),e._v(" "),r("p",[e._v("4TU.ResearchData and Frictionless Data joined forces to organize the workshop "),r("a",{attrs:{href:"https://community.data.4tu.nl/2022/03/22/workshop-fair-and-frictionless-workflows-for-tabular-data-online/",target:"_blank",rel:"noopener noreferrer"}},[e._v("“FAIR and frictionless workflows for tabular data”"),r("OutboundLink")],1),e._v(". The workshop took place on 28 and 29 April 2022 in an online format")]),e._v(" "),r("p",[e._v("On 28 and 29 April we ran the workshop “FAIR and frictionless workflows for tabular data” in collaboration with members of the "),r("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data project team"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("This workshop was envisioned as a pilot to create training on reproducible and FAIR tools that researchers can use when working with tabular data, from creation to publication. The programme was a mixture of presentations, exercises and hands-on live coding sessions. We got a lot of inspiration from "),r("a",{attrs:{href:"https://carpentries.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Carpentries"),r("OutboundLink")],1),e._v(" style of workshops and tried to create a safe, inclusive and interactive learning experience for the participants.")]),e._v(" "),r("p",[e._v("The workshop started with an introduction to Reproducible and FAIR research given by "),r("a",{attrs:{href:"https://www.tudelft.nl/library/research-data-management/r/support/data-stewardship/contact/eirini-zormpa",target:"_blank",rel:"noopener noreferrer"}},[e._v("Eirini Zormpa"),r("OutboundLink")],1),e._v(" (Trainer at 4TU.ResearchData), who also introduced learners to best practices for data organization of tabular data based on the "),r("a",{attrs:{href:"https://datacarpentry.org/spreadsheet-ecology-lesson/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Carpentry for Ecologists lesson"),r("OutboundLink")],1),e._v(". You can have a look at "),r("a",{attrs:{href:"https://4turesearchdata-carpentries.github.io/frictionless-data-workshop/data-organisation.html#1",target:"_blank",rel:"noopener noreferrer"}},[e._v("Eirini’s slides here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("The introduction was followed by a hands-on session exploring the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data framework"),r("OutboundLink")],1),e._v(". The Frictionless Data project has developed a full data management framework for Python to describe, extract, validate, and transform tabular data following the FAIR principles. "),r("a",{attrs:{href:"https://www.linkedin.com/in/lilly-winfree-phd/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lilly Winfree"),r("OutboundLink")],1),e._v(" used Jupyter Notebook to introduce learners to the different tools, as it helps visualizing the steps of the workflow. You can access the presentation and the notebook (and all the materials of the workshop) used by Lilly in "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-",target:"_blank",rel:"noopener noreferrer"}},[e._v("this GitHub repository"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("During the hands-on coding session, the learners practiced what they were learning on an example dataset from ecology (source of the dataset: "),r("a",{attrs:{href:"https://datacarpentry.org/ecology-workshop/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Carpentry for Ecologists"),r("OutboundLink")],1),e._v("). Later in the workshop, Katerina Drakoulaki, Frictionless Data fellow and helper, also gave an example of how to apply the framework tools to a "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-/blob/main/03_Frictionless%20Data-MBn%20presentation_28-4-2022.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("dataset coming from the computational musicology field"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We concluded the workshop with a presentation about "),r("a",{attrs:{href:"https://github.com/4TUResearchData-Carpentries/FAIR-and-Frictionless-workflows-for-tabular-data-/blob/main/04_FAIRandFRictionless%20workflows_Data_Publication.pdf",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Publication"),r("OutboundLink")],1),e._v(" by "),r("a",{attrs:{href:"https://www.tudelft.nl/staff/p.m.martinezlavanchy/?cHash=38d458b8cd0f7bc5562cd130725220c6",target:"_blank",rel:"noopener noreferrer"}},[e._v("Paula Martinez Lavanchy"),r("OutboundLink")],1),e._v(", Research Data Officer at 4TU.ResearchData. The presentation focused on why researchers should publish their data, how to select the data to publish and how to choose a good data repository that helps implement the FAIR principles to the researchers’ data. Paula also briefly demoed the features of 4TU.ResearchData using the "),r("a",{attrs:{href:"https://sandbox.data.4tu.nl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("repository sandbox"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("Besides the instructors, we also had a great team of helpers that were there in case the learners encountered any technical problems or had questions during the live coding session. We would like to give a big thank you to: Nicolas Dintzner – TU Delft Data Steward of the Faculty of Technology, Policy & Management, Katerina Drakoulaki – Postdoctoral researcher, at NKUA & Frictionless Data Fellow, Aleksandra Wilczynska – Data Manager at TU Delft Library & the Digital Competence Center and Sara Petti – Project Manager at Open Knowledge Foundation.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(515),alt:"image"}}),r("br")]),e._v(" "),r("blockquote",[r("p",[r("strong",[e._v("Image:")]),e._v(" Top-left: Eirini Zormpa -Trainer of RDM and Open Science at TU Delft Library & 4TU.ResearchData, Top-right: Lilly Winfree – Product Manager of Frictionless Data at the Open Knowledge Foundation, Bottom: Katerina Drakoulaki – Postdoctoral researcher at NKUA & Frictionless Data fellow."),r("br")])]),e._v(" "),r("p",[e._v("Nineteen learners joined the workshop. The audience had a broad range of backgrounds with both researchers and support staff (e.g. data curator, research data manager, research software engineer, data librarian, etc.) represented. The workshop received quite positive feedback. Most of the learner’s expectations were fulfilled (79%) and they would recommend the workshop to other researchers (93%). It was also nice to know that most of the learners felt that they can apply what they learned immediately and they felt comfortable learning in the workshop.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(516),alt:"image"}}),r("br")]),e._v(" "),r("blockquote",[r("p",[r("strong",[e._v("Images:")]),e._v(" Feedback training event"),r("br")])]),e._v(" "),r("p",[e._v("This feedback from the learners has helped us to start thinking about how to improve future runs of the workshop. For example, we used less time than we had planned, which creates the opportunity to provide instruction on more features of the framework or to add more exercises or practice time. The learners also indicated they would have liked to have a common document (e.g. Google doc or HackMD) to share reference material and to document the code that the instructor was typing in case they got lost.")]),e._v(" "),r("p",[e._v("Even though there is room for improvement, the learners appreciated the highly practical approach of the workshop, the space they had to practice what they learned and the overall quality of the Frictionless Data framework tools. Here are some of the strengths that learners mentioned:")]),e._v(" "),r("p",[r("em",[e._v("‘Hands-on, can start using what I learned immediately’")])]),e._v(" "),r("p",[r("em",[e._v("‘Practical experience with the framework and working on shared examples.’")])]),e._v(" "),r("p",[r("em",[e._v("‘Machine readable data and packaging for interoperability through frictionless’")])]),e._v(" "),r("p",[r("em",[e._v("‘Very clear content. Assured assistance in case of technical problems. Adherence to timelines with breaks. Provided many in-depth links. Friendly atmosphere.’")])]),e._v(" "),r("p",[e._v("We at the 4TU.ResearchData team greatly enjoyed this collaboration that allowed us to help build the skills that researchers and other users of the repository need to make research data findable, accessible, interoperable and reproducible (FAIR).")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/41.258e3c8e.js b/assets/js/41.00d298c5.js similarity index 98% rename from assets/js/41.258e3c8e.js rename to assets/js/41.00d298c5.js index 93d249b89..89a802f3e 100644 --- a/assets/js/41.258e3c8e.js +++ b/assets/js/41.00d298c5.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[41],{353:function(t,s,a){},531:function(t,s,a){"use strict";a(353)},717:function(t,s,a){"use strict";a.r(s);a(531);var l=a(29),i=Object(l.a)({},(function(){var t=this,s=t.$createElement,a=t._self._c||s;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("h1",{attrs:{id:"page-frontmatter-title"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#page-frontmatter-title"}},[t._v("#")]),t._v(" "+t._s(t.$page.frontmatter.title))]),t._v(" "),a("p",[t._v("Our mission is to bring simplicity and gracefulness to the messy world of data. We build products for developers and data engineers. And those who aspire to become one.")]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"light-logo"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#light-logo"}},[t._v("#")]),t._v(" Light Logo")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-color-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-6 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/5 pt-12 pl-64",attrs:{src:"/img/frictionless-color-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"dark-logo"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dark-logo"}},[t._v("#")]),t._v(" Dark Logo")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-black-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-6 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/5 pt-12 pl-64",attrs:{src:"/img/frictionless-black-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"light-logotype"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#light-logotype"}},[t._v("#")]),t._v(" Light Logotype")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-color-full-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-1 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/2 pt-16 pl-40",attrs:{src:"/img/frictionless-color-full-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"dark-logotype"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dark-logotype"}},[t._v("#")]),t._v(" Dark Logotype")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-black-full-logo-blackfont.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-1 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/2 pt-16 pl-40",attrs:{src:"/img/frictionless-black-full-logo-blackfont.svg"}})])])])])}),[],!1,null,"c0ada04e",null);s.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[41],{353:function(t,s,a){},531:function(t,s,a){"use strict";a(353)},716:function(t,s,a){"use strict";a.r(s);a(531);var l=a(29),i=Object(l.a)({},(function(){var t=this,s=t.$createElement,a=t._self._c||s;return a("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[a("h1",{attrs:{id:"page-frontmatter-title"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#page-frontmatter-title"}},[t._v("#")]),t._v(" "+t._s(t.$page.frontmatter.title))]),t._v(" "),a("p",[t._v("Our mission is to bring simplicity and gracefulness to the messy world of data. We build products for developers and data engineers. And those who aspire to become one.")]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"light-logo"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#light-logo"}},[t._v("#")]),t._v(" Light Logo")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-color-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-6 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/5 pt-12 pl-64",attrs:{src:"/img/frictionless-color-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"dark-logo"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dark-logo"}},[t._v("#")]),t._v(" Dark Logo")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-black-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-6 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/5 pt-12 pl-64",attrs:{src:"/img/frictionless-black-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"light-logotype"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#light-logotype"}},[t._v("#")]),t._v(" Light Logotype")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-color-full-logo.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-1 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/2 pt-16 pl-40",attrs:{src:"/img/frictionless-color-full-logo.svg"}})])])]),t._v(" "),a("div",{staticClass:"container flex flex-row"},[a("h2",{attrs:{id:"dark-logotype"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#dark-logotype"}},[t._v("#")]),t._v(" Dark Logotype")]),t._v(" "),a("div",{staticClass:"containerx w-full self-end text-right"},[a("a",{staticClass:"inline-block",attrs:{href:"/img/frictionless-black-full-logo-blackfont.svg",download:""}},[t._v("\n .svg\n ")])])]),t._v(" "),a("hr"),t._v(" "),a("div",{staticClass:"w-full pt-4 pb-5 mx-auto"},[a("div",{staticClass:"w-full shadow subpixel-antialiased rounded h-64 bg-white border-gray-100 mx-auto"},[a("div",{staticClass:"pl-1 pt-1 h-auto font-mono text-xs bg-white"},[a("img",{staticClass:"w-1/2 pt-16 pl-40",attrs:{src:"/img/frictionless-black-full-logo-blackfont.svg"}})])])])])}),[],!1,null,"c0ada04e",null);s.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/42.e817079f.js b/assets/js/42.a92b5803.js similarity index 97% rename from assets/js/42.e817079f.js rename to assets/js/42.a92b5803.js index 94e4e4161..de8d991e1 100644 --- a/assets/js/42.e817079f.js +++ b/assets/js/42.a92b5803.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[42],{354:function(t,a,s){},532:function(t,a,s){"use strict";s(354)},720:function(t,a,s){"use strict";s.r(a);s(532);var e=s(29),r=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("h1",{attrs:{id:"frictionless-roadmap"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-roadmap"}},[t._v("#")]),t._v(" Frictionless Roadmap")]),t._v(" "),s("ul",{staticClass:"timeline"},[s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Frictionless Framework (v4)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2020")]),t._v(" "),s("p",[t._v("A new Frictionless Framework is a full rework of the previous generation software stack composed by tabulator/tableschema/datapackage/etc libraries")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://repository.frictionlessdata.io/"}},[t._v("Frictionless Repository (v1)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Jun 2021")]),t._v(" "),s("p",[t._v("Data management service that brings continuous data validation to tabular data in your repository via Github Action")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Livemark (beta)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2021")]),t._v(" "),s("p",[t._v("Data presentation framework for Python that generates static sites from extended Markdown with interactive charts, tables, scripts, and other features")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://repository.frictionlessdata.io/"}},[t._v("Frictionless Repository (v2)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Sep 2022")]),t._v(" "),s("p",[t._v("Frictionless Repository is going to be updated to Frictionless Framework v5")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Frictionless Framework (v5)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2022")]),t._v(" "),s("p",[t._v("A year since the first framework release we're going to publish a new version with some low-level breaking changes.")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:""}},[t._v("Frictionless Application (beta)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Mar 2023")]),t._v(" "),s("p",[t._v("We're looking forward to finish our application work and release it to a broad audience.")])]),t._v(" "),s("li",{staticClass:"current"},[s("a",{attrs:{target:"_blank",href:""}},[t._v("Frictionless Application (v1)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Jun 2023")]),t._v(" "),s("p",[t._v("We're going to publish a stable release version of Frictionless Application")])])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[42],{354:function(t,a,s){},532:function(t,a,s){"use strict";s(354)},718:function(t,a,s){"use strict";s.r(a);s(532);var e=s(29),r=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("h1",{attrs:{id:"frictionless-roadmap"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#frictionless-roadmap"}},[t._v("#")]),t._v(" Frictionless Roadmap")]),t._v(" "),s("ul",{staticClass:"timeline"},[s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Frictionless Framework (v4)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2020")]),t._v(" "),s("p",[t._v("A new Frictionless Framework is a full rework of the previous generation software stack composed by tabulator/tableschema/datapackage/etc libraries")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://repository.frictionlessdata.io/"}},[t._v("Frictionless Repository (v1)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Jun 2021")]),t._v(" "),s("p",[t._v("Data management service that brings continuous data validation to tabular data in your repository via Github Action")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Livemark (beta)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2021")]),t._v(" "),s("p",[t._v("Data presentation framework for Python that generates static sites from extended Markdown with interactive charts, tables, scripts, and other features")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://repository.frictionlessdata.io/"}},[t._v("Frictionless Repository (v2)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Sep 2022")]),t._v(" "),s("p",[t._v("Frictionless Repository is going to be updated to Frictionless Framework v5")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:"https://framework.frictionlessdata.io/"}},[t._v("Frictionless Framework (v5)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Dec 2022")]),t._v(" "),s("p",[t._v("A year since the first framework release we're going to publish a new version with some low-level breaking changes.")])]),t._v(" "),s("li",{staticClass:"done"},[s("a",{attrs:{target:"_blank",href:""}},[t._v("Frictionless Application (beta)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Mar 2023")]),t._v(" "),s("p",[t._v("We're looking forward to finish our application work and release it to a broad audience.")])]),t._v(" "),s("li",{staticClass:"current"},[s("a",{attrs:{target:"_blank",href:""}},[t._v("Frictionless Application (v1)")]),t._v(" "),s("a",{staticClass:"float-right",attrs:{href:"#"}},[t._v("Jun 2023")]),t._v(" "),s("p",[t._v("We're going to publish a stable release version of Frictionless Application")])])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/44.6f8fce99.js b/assets/js/44.7e6af297.js similarity index 97% rename from assets/js/44.6f8fce99.js rename to assets/js/44.7e6af297.js index c82a5485b..f7395b22a 100644 --- a/assets/js/44.6f8fce99.js +++ b/assets/js/44.7e6af297.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[44],{402:function(t,e,a){t.exports=a.p+"assets/img/dataship.91c1ed64.gif"},563:function(t,e,a){"use strict";a.r(e);var o=a(29),s=Object(o.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"https://dataship.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Dataship"),o("OutboundLink")],1),t._v(" is a way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free. It allows you to create notebooks that hold and deliver your data, as well as text, images and inline scripts for doing analysis and visualization. The people you share it with can read, execute and even edit a copy of your notebook and publish the remixed version as a fork.")]),t._v(" "),o("p",[t._v("One of the main challenges we face with data is that it’s hard to share it with others. Tools like Jupyter (iPython notebook)"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn1",id:"fnref1"}},[t._v("[1]")])]),t._v(" make it much easier and more affordable to do analysis (with the help of open source projects like numpy"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn2",id:"fnref2"}},[t._v("[2]")])]),t._v(" and pandas"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn3",id:"fnref3"}},[t._v("[3]")])]),t._v("). What they don’t do is allow you to "),o("em",[t._v("cheaply and easily share that with the world")]),t._v(". "),o("strong",[t._v("If it were as easy to share data and analysis as it is to share pictures of your breakfast, the world would be a more enlightened place.")]),t._v(" Dataship is helping to build that world.")]),t._v(" "),o("p",[t._v("Every notebook on Dataship is also a Data Package"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn4",id:"fnref4"}},[t._v("[4]")])]),t._v(". Like other Data Packages it can be downloaded, along with its data, just by giving its URL to software like data-cli"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn5",id:"fnref5"}},[t._v("[5]")])]),t._v(". Additionally, working with existing Data Packages is easy. Just as you can fork other notebooks, you can also fork existing Data Packages, even when they’re located somewhere else, like GitHub.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(402),alt:"Dataship GIF"}}),t._v(" "),o("br"),t._v(" "),o("em",[t._v("Dataship in action")])]),t._v(" "),o("p",[t._v("Every cell in a notebook is represented by a resource entry"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn6",id:"fnref6"}},[t._v("[6]")])]),t._v(" in an underlying Data Package. This also allows for interesting possibilities. One of these is executable Data Packages. Since the code is included inline and its dependencies are explicit and bounded, very simple software could be written to execute a Data Package-based notebook from the command line, printing the results to the console and writing images to the current directory.")]),t._v(" "),o("p",[t._v("It would be useful to have a JavaScript version of some of the functionality in goodtables"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn7",id:"fnref7"}},[t._v("[7]")])]),t._v(" available for use, specifically header detection in parsed csv contents (output of PapaParse), as well as an option in dpm to not put things in a ‘datapackages’ folder, as I rarely need this when downloading a dataset.")]),t._v(" "),o("p",[t._v("dpm, mentioned above, is now deprecated. Check out DataHub’s "),o("a",{attrs:{href:"https://github.com/datahq/data-cli",target:"_blank",rel:"noopener noreferrer"}},[t._v("data-cli"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("My next task will be building and integrating the machine learning and neural network components into Dataship. After that I’ll be focusing on features that allow organizations to store private encrypted data, in addition to the default public storage. The focus of the platform will always be open data, but hosting closed data sources will allow us to nudge people towards sharing, when it makes sense.")]),t._v(" "),o("p",[t._v("As for additional use cases, the volume of personal data is growing exponentially- from medical data to internet activity and media consumption. These are just a few existing examples. The rise of the Internet of Things will only accelerate this. People are also beginning to see the value in controlling their data themselves. Providing mechanisms for doing this will likely become important over the next ten years.")]),t._v(" "),o("hr",{staticClass:"footnotes-sep"}),t._v(" "),o("section",{staticClass:"footnotes"},[o("ol",{staticClass:"footnotes-list"},[o("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[o("p",[t._v("Jupyter Notebook: "),o("a",{attrs:{href:"http://jupyter.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://jupyter.org/"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[o("p",[t._v("NumPy: Python package for scientific computing: "),o("a",{attrs:{href:"http://www.numpy.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://www.numpy.org"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[o("p",[t._v("Pandas: Python package for data analysis: "),o("a",{attrs:{href:"http://pandas.pydata.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://pandas.pydata.org/"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[o("p",[t._v("Data Packages: "),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[o("p",[t._v("DataHub’s data commandline tool: "),o("a",{attrs:{href:"https://github.com/datahq/data-cli",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/datahq/data-cli"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[o("p",[t._v("Data Package Resource: "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#resource-information",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://specs.frictionlessdata.io/data-package/#resource-information"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[o("p",[t._v("goodtables: "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://try.goodtables.io"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[t._v("↩︎")])])])])])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[44],{400:function(t,e,a){t.exports=a.p+"assets/img/dataship.91c1ed64.gif"},562:function(t,e,a){"use strict";a.r(e);var o=a(29),s=Object(o.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[o("a",{attrs:{href:"https://dataship.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Dataship"),o("OutboundLink")],1),t._v(" is a way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free. It allows you to create notebooks that hold and deliver your data, as well as text, images and inline scripts for doing analysis and visualization. The people you share it with can read, execute and even edit a copy of your notebook and publish the remixed version as a fork.")]),t._v(" "),o("p",[t._v("One of the main challenges we face with data is that it’s hard to share it with others. Tools like Jupyter (iPython notebook)"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn1",id:"fnref1"}},[t._v("[1]")])]),t._v(" make it much easier and more affordable to do analysis (with the help of open source projects like numpy"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn2",id:"fnref2"}},[t._v("[2]")])]),t._v(" and pandas"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn3",id:"fnref3"}},[t._v("[3]")])]),t._v("). What they don’t do is allow you to "),o("em",[t._v("cheaply and easily share that with the world")]),t._v(". "),o("strong",[t._v("If it were as easy to share data and analysis as it is to share pictures of your breakfast, the world would be a more enlightened place.")]),t._v(" Dataship is helping to build that world.")]),t._v(" "),o("p",[t._v("Every notebook on Dataship is also a Data Package"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn4",id:"fnref4"}},[t._v("[4]")])]),t._v(". Like other Data Packages it can be downloaded, along with its data, just by giving its URL to software like data-cli"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn5",id:"fnref5"}},[t._v("[5]")])]),t._v(". Additionally, working with existing Data Packages is easy. Just as you can fork other notebooks, you can also fork existing Data Packages, even when they’re located somewhere else, like GitHub.")]),t._v(" "),o("p",[o("img",{attrs:{src:a(400),alt:"Dataship GIF"}}),t._v(" "),o("br"),t._v(" "),o("em",[t._v("Dataship in action")])]),t._v(" "),o("p",[t._v("Every cell in a notebook is represented by a resource entry"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn6",id:"fnref6"}},[t._v("[6]")])]),t._v(" in an underlying Data Package. This also allows for interesting possibilities. One of these is executable Data Packages. Since the code is included inline and its dependencies are explicit and bounded, very simple software could be written to execute a Data Package-based notebook from the command line, printing the results to the console and writing images to the current directory.")]),t._v(" "),o("p",[t._v("It would be useful to have a JavaScript version of some of the functionality in goodtables"),o("sup",{staticClass:"footnote-ref"},[o("a",{attrs:{href:"#fn7",id:"fnref7"}},[t._v("[7]")])]),t._v(" available for use, specifically header detection in parsed csv contents (output of PapaParse), as well as an option in dpm to not put things in a ‘datapackages’ folder, as I rarely need this when downloading a dataset.")]),t._v(" "),o("p",[t._v("dpm, mentioned above, is now deprecated. Check out DataHub’s "),o("a",{attrs:{href:"https://github.com/datahq/data-cli",target:"_blank",rel:"noopener noreferrer"}},[t._v("data-cli"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("My next task will be building and integrating the machine learning and neural network components into Dataship. After that I’ll be focusing on features that allow organizations to store private encrypted data, in addition to the default public storage. The focus of the platform will always be open data, but hosting closed data sources will allow us to nudge people towards sharing, when it makes sense.")]),t._v(" "),o("p",[t._v("As for additional use cases, the volume of personal data is growing exponentially- from medical data to internet activity and media consumption. These are just a few existing examples. The rise of the Internet of Things will only accelerate this. People are also beginning to see the value in controlling their data themselves. Providing mechanisms for doing this will likely become important over the next ten years.")]),t._v(" "),o("hr",{staticClass:"footnotes-sep"}),t._v(" "),o("section",{staticClass:"footnotes"},[o("ol",{staticClass:"footnotes-list"},[o("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[o("p",[t._v("Jupyter Notebook: "),o("a",{attrs:{href:"http://jupyter.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://jupyter.org/"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[o("p",[t._v("NumPy: Python package for scientific computing: "),o("a",{attrs:{href:"http://www.numpy.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://www.numpy.org"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[o("p",[t._v("Pandas: Python package for data analysis: "),o("a",{attrs:{href:"http://pandas.pydata.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://pandas.pydata.org/"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[o("p",[t._v("Data Packages: "),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[o("p",[t._v("DataHub’s data commandline tool: "),o("a",{attrs:{href:"https://github.com/datahq/data-cli",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/datahq/data-cli"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[o("p",[t._v("Data Package Resource: "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#resource-information",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://specs.frictionlessdata.io/data-package/#resource-information"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[t._v("↩︎")])])]),t._v(" "),o("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[o("p",[t._v("goodtables: "),o("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://try.goodtables.io"),o("OutboundLink")],1),t._v(" "),o("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[t._v("↩︎")])])])])])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/45.98ef683b.js b/assets/js/45.18c8dc64.js similarity index 98% rename from assets/js/45.98ef683b.js rename to assets/js/45.18c8dc64.js index 8ba71fadf..7e737c05d 100644 --- a/assets/js/45.98ef683b.js +++ b/assets/js/45.18c8dc64.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[45],{417:function(e,t,a){e.exports=a.p+"assets/img/cmso-1.25f92f63.png"},569:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Researchers worldwide try to understand how cells move, a process extremely important for many physiological and pathological conditions. "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/Cell_migration",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell migration"),r("OutboundLink")],1),e._v(" is in fact involved in many processes, like wound healing,neuronal development and cancer invasion. The "),r("a",{attrs:{href:"https://cmso.science/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell Migration Standardization Organization"),r("OutboundLink")],1),e._v(" (CMSO) is a community building standards for cell migration data, in order to enable data sharing in the field. The organization has three main working groups:")]),e._v(" "),r("ul",[r("li",[e._v("Minimal reporting requirement (developing "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/MIACME",target:"_blank",rel:"noopener noreferrer"}},[e._v("MIACME"),r("OutboundLink")],1),e._v(", i.e. the Minimum Information About a Cell Migration Experiment)")]),e._v(" "),r("li",[e._v("Controlled Vocabularies")]),e._v(" "),r("li",[e._v("Data Formats and APIs")])]),e._v(" "),r("p",[e._v("In our last working group, we discussed where the Data Package specifications"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" could be used or expanded for the definition of a standard format and the corresponding libraries to interact with these standards. In particular, we have started to address the standardization of cell tracking data. This is data produced using tracking software that reconstructs cell movement in time based on images from a microscope.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(417),alt:"Diagram"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("In pink, the "),r("a",{attrs:{href:"http://isa-tools.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ISA"),r("OutboundLink")],1),e._v(" (Investigation Study Assay) model to annotate the experimental metadata; in blue, the "),r("a",{attrs:{href:"http://www.openmicroscopy.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OME"),r("OutboundLink")],1),e._v(" (Open Microscopy Environment) model for the imaging data; in green, our biotracks format based on the Data Package specification for the analytics data (cell tracking, positions, features etc.);in purple, CV: Controlled Vocabulary; and in turquoise, "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/MIACME",target:"_blank",rel:"noopener noreferrer"}},[e._v("MIACME"),r("OutboundLink")],1),e._v(": Minimum Information About a Cell Migration Experiment. "),r("a",{attrs:{href:"https://creativecommons.org/licenses/by-sa/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC BY-SA 4.0"),r("OutboundLink")],1),e._v(" Credit: Paola Masuzzo (text) and CMSO (diagram).")])]),e._v(" "),r("p",[e._v("CMSO deals specifically with cell migration data (a subject of cell biology). Our main challenge lies in the heterogeneity of the data. This diversity has its origin in two factors:")]),e._v(" "),r("ul",[r("li",[r("strong",[e._v("Experimentally")]),e._v(": Cell migration data can be produced using many diverse techniques (imaging, non-imaging, dynamic, static, high-throughput/screening, etc.)")]),e._v(" "),r("li",[r("strong",[e._v("Analytically")]),e._v(": These data are produced using many diverse software packages, each of these writing data to specific (sometimes proprietary) file formats.")])]),e._v(" "),r("p",[e._v("This diversity hampers (or at least makes very difficult) procedures like meta-analysis, data integration, data mining, and last but not least, data "),r("em",[e._v("reproducibility")]),e._v(".")]),e._v(" "),r("p",[e._v("CMSO has developed and is about to release the first specification of a "),r("a",{attrs:{href:"https://cellmigstandorg.github.io/Tracks/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell Tracking format"),r("OutboundLink")],1),e._v(". This specification is built on a tabular representation, i.e. data are stored in tables. Current v0.1 of this specification can be seen at "),r("a",{attrs:{href:"https://cellmigstandorg.github.io/Tracks/v0.1/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("CMSO is using the "),r("em",[e._v("Tabular")]),e._v(" Data Package"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" specification to represent cell migration-derived tracking data, as illustrated"),r("br"),e._v(" "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". The specification is used for two goals:")]),e._v(" "),r("ol",[r("li",[r("strong",[e._v("Create a Data Package representation")]),e._v(" where the data—in our case objects (e.g. cells detected in microscopy images), links and optionally tracks—are stored in CSV files, while metadata and schema"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" information are stored in a JSON file.")]),e._v(" "),r("li",[r("strong",[e._v("Write")]),e._v(" this Data Package to a pandas"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(" dataframe, to aid quick inspection and visualization.")])]),e._v(" "),r("p",[e._v("You can see some examples "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/tree/master/examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("I am an Open Science fan and advocate, so I try to keep up to date with the initiatives of the"),r("br"),e._v(" "),r("a",{attrs:{href:"https://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(" teams. I think I first became aware of Frictionless Data when I saw a tweet and I checked the specs out. Also, CMSO really wanted to keep a possible specification and file format light and simple. So different people of the team must have googled for ‘CSV and JSON formats’ or something like that, and Frictionless Data popped out 😃.")]),e._v(" "),r("p",[e._v("I have opened a couple of issues on the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/specs",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub page of the spec"),r("OutboundLink")],1),e._v(", detailing what I would like to see developed in the Frictionless Data project. The CMSO is not sure yet if the Data Package representation will be the one we’ll go for in the very end, because we would first like to know how sustainable/sustained this spec will be in the future.")]),e._v(" "),r("p",[e._v("CMSO is looking into expanding the "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/tree/master/examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("list of examples"),r("OutboundLink")],1),e._v(" we have so far in terms of tracking software. Personally, I would like to choose a reference data set (a live-cell, time-lapse microscopy data set), and run different cell tracking algorithms/software packages on it. Then I want to put the results into a common, light and easy-to-interpret CSV+JSON format (the biotracks format), and show people how data containerization"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(" can be the way to go to enable research data exchange and knowledge discovery at large.")]),e._v(" "),r("p",[e._v("With most other specifications, cell tracking data are stored in tabular format, but metadata are never kept together with the data, which makes data interpretation and sharing very difficult. The Frictionless Data specifications take good care of this aspect. Some other formats are based on XML"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" annotation, which certainly does the job, but are perhaps heavier (even though perhaps more sustainable in the long term). I hate Excel formats, and unfortunately I need to parse those too. I love the integration with Python"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn7",id:"fnref7"}},[e._v("[7]")])]),e._v(" and the pandas"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4:1"}},[e._v("[4:1]")])]),e._v(" system, this is a big plus when doing data science.")]),e._v(" "),r("p",[e._v("As a researcher, I mostly deal with research data. I am pretty sure if this could work for cell migration data, it could work for many cell biology disciplines as well. I recommend speaking to more researchers and data producers to determine additional use cases!")]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Tabular Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/tabular-data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Table Schema: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Pandas: Python package for data analysis: "),r("a",{attrs:{href:"http://pandas.pydata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://pandas.pydata.org/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")]),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4:1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Design Philosophy: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("specs"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[r("p",[e._v("Extensible Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/XML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/XML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[r("p",[e._v("Data Package-aware libraries in Python: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/datapackage-py"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/tableschema-py"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables-py"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[45],{416:function(e,t,a){e.exports=a.p+"assets/img/cmso-1.25f92f63.png"},568:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("Researchers worldwide try to understand how cells move, a process extremely important for many physiological and pathological conditions. "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/Cell_migration",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell migration"),r("OutboundLink")],1),e._v(" is in fact involved in many processes, like wound healing,neuronal development and cancer invasion. The "),r("a",{attrs:{href:"https://cmso.science/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell Migration Standardization Organization"),r("OutboundLink")],1),e._v(" (CMSO) is a community building standards for cell migration data, in order to enable data sharing in the field. The organization has three main working groups:")]),e._v(" "),r("ul",[r("li",[e._v("Minimal reporting requirement (developing "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/MIACME",target:"_blank",rel:"noopener noreferrer"}},[e._v("MIACME"),r("OutboundLink")],1),e._v(", i.e. the Minimum Information About a Cell Migration Experiment)")]),e._v(" "),r("li",[e._v("Controlled Vocabularies")]),e._v(" "),r("li",[e._v("Data Formats and APIs")])]),e._v(" "),r("p",[e._v("In our last working group, we discussed where the Data Package specifications"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" could be used or expanded for the definition of a standard format and the corresponding libraries to interact with these standards. In particular, we have started to address the standardization of cell tracking data. This is data produced using tracking software that reconstructs cell movement in time based on images from a microscope.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(416),alt:"Diagram"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("In pink, the "),r("a",{attrs:{href:"http://isa-tools.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ISA"),r("OutboundLink")],1),e._v(" (Investigation Study Assay) model to annotate the experimental metadata; in blue, the "),r("a",{attrs:{href:"http://www.openmicroscopy.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OME"),r("OutboundLink")],1),e._v(" (Open Microscopy Environment) model for the imaging data; in green, our biotracks format based on the Data Package specification for the analytics data (cell tracking, positions, features etc.);in purple, CV: Controlled Vocabulary; and in turquoise, "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/MIACME",target:"_blank",rel:"noopener noreferrer"}},[e._v("MIACME"),r("OutboundLink")],1),e._v(": Minimum Information About a Cell Migration Experiment. "),r("a",{attrs:{href:"https://creativecommons.org/licenses/by-sa/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC BY-SA 4.0"),r("OutboundLink")],1),e._v(" Credit: Paola Masuzzo (text) and CMSO (diagram).")])]),e._v(" "),r("p",[e._v("CMSO deals specifically with cell migration data (a subject of cell biology). Our main challenge lies in the heterogeneity of the data. This diversity has its origin in two factors:")]),e._v(" "),r("ul",[r("li",[r("strong",[e._v("Experimentally")]),e._v(": Cell migration data can be produced using many diverse techniques (imaging, non-imaging, dynamic, static, high-throughput/screening, etc.)")]),e._v(" "),r("li",[r("strong",[e._v("Analytically")]),e._v(": These data are produced using many diverse software packages, each of these writing data to specific (sometimes proprietary) file formats.")])]),e._v(" "),r("p",[e._v("This diversity hampers (or at least makes very difficult) procedures like meta-analysis, data integration, data mining, and last but not least, data "),r("em",[e._v("reproducibility")]),e._v(".")]),e._v(" "),r("p",[e._v("CMSO has developed and is about to release the first specification of a "),r("a",{attrs:{href:"https://cellmigstandorg.github.io/Tracks/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cell Tracking format"),r("OutboundLink")],1),e._v(". This specification is built on a tabular representation, i.e. data are stored in tables. Current v0.1 of this specification can be seen at "),r("a",{attrs:{href:"https://cellmigstandorg.github.io/Tracks/v0.1/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("CMSO is using the "),r("em",[e._v("Tabular")]),e._v(" Data Package"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" specification to represent cell migration-derived tracking data, as illustrated"),r("br"),e._v(" "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(". The specification is used for two goals:")]),e._v(" "),r("ol",[r("li",[r("strong",[e._v("Create a Data Package representation")]),e._v(" where the data—in our case objects (e.g. cells detected in microscopy images), links and optionally tracks—are stored in CSV files, while metadata and schema"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" information are stored in a JSON file.")]),e._v(" "),r("li",[r("strong",[e._v("Write")]),e._v(" this Data Package to a pandas"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(" dataframe, to aid quick inspection and visualization.")])]),e._v(" "),r("p",[e._v("You can see some examples "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/tree/master/examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("I am an Open Science fan and advocate, so I try to keep up to date with the initiatives of the"),r("br"),e._v(" "),r("a",{attrs:{href:"https://okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(" teams. I think I first became aware of Frictionless Data when I saw a tweet and I checked the specs out. Also, CMSO really wanted to keep a possible specification and file format light and simple. So different people of the team must have googled for ‘CSV and JSON formats’ or something like that, and Frictionless Data popped out 😃.")]),e._v(" "),r("p",[e._v("I have opened a couple of issues on the "),r("a",{attrs:{href:"https://github.com/frictionlessdata/specs",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub page of the spec"),r("OutboundLink")],1),e._v(", detailing what I would like to see developed in the Frictionless Data project. The CMSO is not sure yet if the Data Package representation will be the one we’ll go for in the very end, because we would first like to know how sustainable/sustained this spec will be in the future.")]),e._v(" "),r("p",[e._v("CMSO is looking into expanding the "),r("a",{attrs:{href:"https://github.com/CellMigStandOrg/biotracks/tree/master/examples",target:"_blank",rel:"noopener noreferrer"}},[e._v("list of examples"),r("OutboundLink")],1),e._v(" we have so far in terms of tracking software. Personally, I would like to choose a reference data set (a live-cell, time-lapse microscopy data set), and run different cell tracking algorithms/software packages on it. Then I want to put the results into a common, light and easy-to-interpret CSV+JSON format (the biotracks format), and show people how data containerization"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(" can be the way to go to enable research data exchange and knowledge discovery at large.")]),e._v(" "),r("p",[e._v("With most other specifications, cell tracking data are stored in tabular format, but metadata are never kept together with the data, which makes data interpretation and sharing very difficult. The Frictionless Data specifications take good care of this aspect. Some other formats are based on XML"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn6",id:"fnref6"}},[e._v("[6]")])]),e._v(" annotation, which certainly does the job, but are perhaps heavier (even though perhaps more sustainable in the long term). I hate Excel formats, and unfortunately I need to parse those too. I love the integration with Python"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn7",id:"fnref7"}},[e._v("[7]")])]),e._v(" and the pandas"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4:1"}},[e._v("[4:1]")])]),e._v(" system, this is a big plus when doing data science.")]),e._v(" "),r("p",[e._v("As a researcher, I mostly deal with research data. I am pretty sure if this could work for cell migration data, it could work for many cell biology disciplines as well. I recommend speaking to more researchers and data producers to determine additional use cases!")]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Tabular Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/tabular-data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Table Schema: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/table-schema"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Pandas: Python package for data analysis: "),r("a",{attrs:{href:"http://pandas.pydata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://pandas.pydata.org/"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")]),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4:1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Design Philosophy: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("specs"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn6"}},[r("p",[e._v("Extensible Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/XML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/XML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref6"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn7"}},[r("p",[e._v("Data Package-aware libraries in Python: "),r("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/datapackage-py"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/tableschema-py"),r("OutboundLink")],1),e._v(", "),r("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables-py"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref7"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/46.61d51c30.js b/assets/js/46.b7a06152.js similarity index 98% rename from assets/js/46.61d51c30.js rename to assets/js/46.b7a06152.js index 95de50f76..c9ec40e6a 100644 --- a/assets/js/46.61d51c30.js +++ b/assets/js/46.b7a06152.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[46],{416:function(e,t,a){e.exports=a.p+"assets/img/data-retriever-install.1fdce2e3.gif"},568:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[r("a",{attrs:{href:"http://www.data-retriever.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Data Retriever"),r("OutboundLink")],1),e._v(" automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a variety of databases and file formats. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.")]),e._v(" "),r("p",[e._v("We originally built the Data Retriever starting in 2010 with a focus on ecological data. Over time, we realized that the common challenges with finding downloading, and cleaning up ecological data applied to data in most other fields, so we rebranded and starting integrating data from other fields as well.")]),e._v(" "),r("p",[e._v("The Data Retriever is primarily focused on "),r("em",[e._v("tabular")]),e._v(" data, but we’re starting work on supporting spatial data as well.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(416),alt:"Diagram"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Data Retriever automatically installing the "),r("a",{attrs:{href:"https://www.pwrc.usgs.gov/bbs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("BBS (USGS North American Breeding Bird Survey)"),r("OutboundLink")],1),e._v(" dataset")])]),e._v(" "),r("p",[e._v("Data is often messy and needs cleaning and restructuring before it can be effectively used. It is often not feasible to modify and redistribute the data due to licensing and other limitations (Editor’s note: see our "),r("RouterLink",{attrs:{to:"/blog/2016/11/15/open-power-system-data/"}},[e._v("Open Power System Data case study")]),e._v(" for more on this).")],1),e._v(" "),r("p",[e._v("We need to make it as easy as possible for contributors to "),r("a",{attrs:{href:"https://retriever.readthedocs.io/en/latest/retriever.lib.html#retriever-lib-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("add new datasets"),r("OutboundLink")],1),e._v(". For relatively clean datasets this means having a simple, easy-to-work-with metadata standard to describe existing data. The description for each dataset is written in a single file which gets read by our plugin infrastructure.")]),e._v(" "),r("p",[e._v("To describe the structure of simple data, we originally created a YAML-like"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" metadata structure. When the Data Package"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" specs were created by "),r("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(", we decided to switch over to using this standard so that others could benefit from the metadata we were creating and so that we could benefit from th standards-based infrastructure[^software] being created around the specs.")]),e._v(" "),r("p",[e._v("The transition to the Data Package specification was fairly smooth as most of the fields we needed were already included in the specs. The only thing that we needed to add were fields for restructuring poorly formatted data since the spec assumes the data is well structured to begin with. For example, we use custom fields for describing how to convert "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/Wide_and_narrow_data",target:"_blank",rel:"noopener noreferrer"}},[r("strong",[e._v("wide")]),e._v(" data to "),r("strong",[e._v("long")]),e._v(" data"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We first learned about Frictionless Data through the "),r("a",{attrs:{href:"https://blog.okfn.org/2016/02/29/sloan-foundation-funds-frictionless-data-tooling-and-engagement-at-open-knowledge/",target:"_blank",rel:"noopener noreferrer"}},[e._v("announcement"),r("OutboundLink")],1),e._v(" of their funding by the Sloan Foundation. Going forward, we would love to see the Data Package spec expanded to include information about “imperfections” in data. It currently assumes that the person creating the metadata can modify the raw data files to comply with the standard rules of data structure. However this doesn’t work if someone else is distributing the data, which is a very common use"),r("br"),e._v("\ncase.")]),e._v(" "),r("p",[e._v("The expansion of the standard would include things like a way to indicate wide versus long data with enough information to uniquely describe how to translate from one to the other as well as information on single tables that are composed from data in many separate files. We have already been adding new fields to the JSON to accomplish some of these things and would be happy to be part of a larger dialog about implementing them more widely. For the wide-data-to-long-data example mentioned above, we use "),r("code",[e._v("ct_column")]),e._v(" and "),r("code",[e._v("ct_names")]),e._v(" fields and a "),r("code",[e._v("ct-type")]),e._v(" type to indicate how to transform the data into a properly normalized form.")]),e._v(" "),r("p",[e._v("The other thing we’ve come across is the need to develop a clear specification for "),r("a",{attrs:{href:"http://semver.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("semantic versioning"),r("OutboundLink")],1),e._v(" of Data Packages. The specification includes an optional "),r("code",[e._v("version")]),e._v(" field"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" for keeping track to changes to the package. This version has a standard structure from semantic versioning in software that includes major, minor, and patch level changes. Unlike in software there is no clearly established standard for what changes in different version numbers indicate. Since we work with a lot of different datasets, we’ve been changing a lot of version numbers over the last year; this has lead us to "),r("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/421",target:"_blank",rel:"noopener noreferrer"}},[e._v("open a discussion with the OKFN team"),r("OutboundLink")],1),e._v(" about developing a standard to apply to these changes.")]),e._v(" "),r("p",[e._v("Our next big step is working on the challenge of "),r("strong",[e._v("simple data integration")]),e._v(". One of the major challenges data analysts have after they have cleaned up and prepared individual data sources is combining them. General solutions to the data integration problem (e.g. linked data approaches) have proven to difficult but we are approaching the problem by tackling a small number of common use cases and involving humans in the metadata development describing the linkages between datasets.")]),e._v(" "),r("p",[e._v("The major specification that is available for ecological data is the "),r("a",{attrs:{href:"https://knb.ecoinformatics.org/#external//emlparser/docs/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Ecological Metadata Language (EML)"),r("OutboundLink")],1),e._v(". It is an XML"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(" based spec that includes a lot of information specific to ecological datasets. The nice thing about EML—which is also its challenge—is that it is very comprehensive. This gives it a lot of strength in a linked data context, but also means that it is difficult to drive adoption by users.")]),e._v(" "),r("p",[e._v("The Frictionless Data specifications line up better with our approach to data"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(", which is to complement lightweight computational methods with human contributions to make data easier to work with quickly.")]),e._v(" "),r("p",[e._v("Community contributions to our work are welcome. We work hard to make all of our development efforts open and inclusive (see our "),r("a",{attrs:{href:"https://github.com/weecology/retriever/blob/master/docs/code_of_conduct.rst",target:"_blank",rel:"noopener noreferrer"}},[e._v("Code of Conduct"),r("OutboundLink")],1),e._v(") and love it when new developers, data scientists, and domain specialists "),r("a",{attrs:{href:"http://www.data-retriever.org/#contribute",target:"_blank",rel:"noopener noreferrer"}},[e._v("contribute"),r("OutboundLink")],1),e._v(". A contribution can be as easy as adding a new dataset by following "),r("a",{attrs:{href:"https://retriever.readthedocs.io/en/latest/retriever.lib.html#retriever-lib-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("a set of prompts"),r("OutboundLink")],1),e._v(" to create a new JSON file and submitting a "),r("a",{attrs:{href:"https://help.github.com/articles/about-pull-requests/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PR"),r("OutboundLink")],1),e._v(" on GitHub, or even just opening an issue to tell us about a dataset that would be useful to you. So, "),r("a",{attrs:{href:"http://github.com/weecology/retriever/issues/new",target:"_blank",rel:"noopener noreferrer"}},[e._v("open an issue"),r("OutboundLink")],1),e._v(", submit a PR, or stop by our "),r("a",{attrs:{href:"https://gitter.im/weecology/retriever",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gitter chat channel"),r("OutboundLink")],1),e._v(" and say “Hi”. We also participate in "),r("a",{attrs:{href:"https://developers.google.com/open-source/gsoc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Google Summer of Code"),r("OutboundLink")],1),e._v(", which is a great opportunity for students interested in being directly supported to work on the project.")]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("YAML Ain’t Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/YAML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/YAML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Data Package version field: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",target:"_blank",rel:"noopener noreferrer"}},[e._v("/specs/#version"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Extensible Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/XML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/XML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Design Philosophy: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/#design-philosophy",target:"_blank",rel:"noopener noreferrer"}},[e._v("/specs/#design-philosophy"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[46],{417:function(e,t,a){e.exports=a.p+"assets/img/data-retriever-install.1fdce2e3.gif"},569:function(e,t,a){"use strict";a.r(t);var r=a(29),o=Object(r.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[r("a",{attrs:{href:"http://www.data-retriever.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Data Retriever"),r("OutboundLink")],1),e._v(" automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a variety of databases and file formats. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.")]),e._v(" "),r("p",[e._v("We originally built the Data Retriever starting in 2010 with a focus on ecological data. Over time, we realized that the common challenges with finding downloading, and cleaning up ecological data applied to data in most other fields, so we rebranded and starting integrating data from other fields as well.")]),e._v(" "),r("p",[e._v("The Data Retriever is primarily focused on "),r("em",[e._v("tabular")]),e._v(" data, but we’re starting work on supporting spatial data as well.")]),e._v(" "),r("p",[r("img",{attrs:{src:a(417),alt:"Diagram"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("The Data Retriever automatically installing the "),r("a",{attrs:{href:"https://www.pwrc.usgs.gov/bbs/",target:"_blank",rel:"noopener noreferrer"}},[e._v("BBS (USGS North American Breeding Bird Survey)"),r("OutboundLink")],1),e._v(" dataset")])]),e._v(" "),r("p",[e._v("Data is often messy and needs cleaning and restructuring before it can be effectively used. It is often not feasible to modify and redistribute the data due to licensing and other limitations (Editor’s note: see our "),r("RouterLink",{attrs:{to:"/blog/2016/11/15/open-power-system-data/"}},[e._v("Open Power System Data case study")]),e._v(" for more on this).")],1),e._v(" "),r("p",[e._v("We need to make it as easy as possible for contributors to "),r("a",{attrs:{href:"https://retriever.readthedocs.io/en/latest/retriever.lib.html#retriever-lib-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("add new datasets"),r("OutboundLink")],1),e._v(". For relatively clean datasets this means having a simple, easy-to-work-with metadata standard to describe existing data. The description for each dataset is written in a single file which gets read by our plugin infrastructure.")]),e._v(" "),r("p",[e._v("To describe the structure of simple data, we originally created a YAML-like"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(" metadata structure. When the Data Package"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(" specs were created by "),r("a",{attrs:{href:"https://okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),r("OutboundLink")],1),e._v(", we decided to switch over to using this standard so that others could benefit from the metadata we were creating and so that we could benefit from th standards-based infrastructure[^software] being created around the specs.")]),e._v(" "),r("p",[e._v("The transition to the Data Package specification was fairly smooth as most of the fields we needed were already included in the specs. The only thing that we needed to add were fields for restructuring poorly formatted data since the spec assumes the data is well structured to begin with. For example, we use custom fields for describing how to convert "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/Wide_and_narrow_data",target:"_blank",rel:"noopener noreferrer"}},[r("strong",[e._v("wide")]),e._v(" data to "),r("strong",[e._v("long")]),e._v(" data"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("We first learned about Frictionless Data through the "),r("a",{attrs:{href:"https://blog.okfn.org/2016/02/29/sloan-foundation-funds-frictionless-data-tooling-and-engagement-at-open-knowledge/",target:"_blank",rel:"noopener noreferrer"}},[e._v("announcement"),r("OutboundLink")],1),e._v(" of their funding by the Sloan Foundation. Going forward, we would love to see the Data Package spec expanded to include information about “imperfections” in data. It currently assumes that the person creating the metadata can modify the raw data files to comply with the standard rules of data structure. However this doesn’t work if someone else is distributing the data, which is a very common use"),r("br"),e._v("\ncase.")]),e._v(" "),r("p",[e._v("The expansion of the standard would include things like a way to indicate wide versus long data with enough information to uniquely describe how to translate from one to the other as well as information on single tables that are composed from data in many separate files. We have already been adding new fields to the JSON to accomplish some of these things and would be happy to be part of a larger dialog about implementing them more widely. For the wide-data-to-long-data example mentioned above, we use "),r("code",[e._v("ct_column")]),e._v(" and "),r("code",[e._v("ct_names")]),e._v(" fields and a "),r("code",[e._v("ct-type")]),e._v(" type to indicate how to transform the data into a properly normalized form.")]),e._v(" "),r("p",[e._v("The other thing we’ve come across is the need to develop a clear specification for "),r("a",{attrs:{href:"http://semver.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("semantic versioning"),r("OutboundLink")],1),e._v(" of Data Packages. The specification includes an optional "),r("code",[e._v("version")]),e._v(" field"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" for keeping track to changes to the package. This version has a standard structure from semantic versioning in software that includes major, minor, and patch level changes. Unlike in software there is no clearly established standard for what changes in different version numbers indicate. Since we work with a lot of different datasets, we’ve been changing a lot of version numbers over the last year; this has lead us to "),r("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/421",target:"_blank",rel:"noopener noreferrer"}},[e._v("open a discussion with the OKFN team"),r("OutboundLink")],1),e._v(" about developing a standard to apply to these changes.")]),e._v(" "),r("p",[e._v("Our next big step is working on the challenge of "),r("strong",[e._v("simple data integration")]),e._v(". One of the major challenges data analysts have after they have cleaned up and prepared individual data sources is combining them. General solutions to the data integration problem (e.g. linked data approaches) have proven to difficult but we are approaching the problem by tackling a small number of common use cases and involving humans in the metadata development describing the linkages between datasets.")]),e._v(" "),r("p",[e._v("The major specification that is available for ecological data is the "),r("a",{attrs:{href:"https://knb.ecoinformatics.org/#external//emlparser/docs/index.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Ecological Metadata Language (EML)"),r("OutboundLink")],1),e._v(". It is an XML"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(" based spec that includes a lot of information specific to ecological datasets. The nice thing about EML—which is also its challenge—is that it is very comprehensive. This gives it a lot of strength in a linked data context, but also means that it is difficult to drive adoption by users.")]),e._v(" "),r("p",[e._v("The Frictionless Data specifications line up better with our approach to data"),r("sup",{staticClass:"footnote-ref"},[r("a",{attrs:{href:"#fn5",id:"fnref5"}},[e._v("[5]")])]),e._v(", which is to complement lightweight computational methods with human contributions to make data easier to work with quickly.")]),e._v(" "),r("p",[e._v("Community contributions to our work are welcome. We work hard to make all of our development efforts open and inclusive (see our "),r("a",{attrs:{href:"https://github.com/weecology/retriever/blob/master/docs/code_of_conduct.rst",target:"_blank",rel:"noopener noreferrer"}},[e._v("Code of Conduct"),r("OutboundLink")],1),e._v(") and love it when new developers, data scientists, and domain specialists "),r("a",{attrs:{href:"http://www.data-retriever.org/#contribute",target:"_blank",rel:"noopener noreferrer"}},[e._v("contribute"),r("OutboundLink")],1),e._v(". A contribution can be as easy as adding a new dataset by following "),r("a",{attrs:{href:"https://retriever.readthedocs.io/en/latest/retriever.lib.html#retriever-lib-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("a set of prompts"),r("OutboundLink")],1),e._v(" to create a new JSON file and submitting a "),r("a",{attrs:{href:"https://help.github.com/articles/about-pull-requests/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PR"),r("OutboundLink")],1),e._v(" on GitHub, or even just opening an issue to tell us about a dataset that would be useful to you. So, "),r("a",{attrs:{href:"http://github.com/weecology/retriever/issues/new",target:"_blank",rel:"noopener noreferrer"}},[e._v("open an issue"),r("OutboundLink")],1),e._v(", submit a PR, or stop by our "),r("a",{attrs:{href:"https://gitter.im/weecology/retriever",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gitter chat channel"),r("OutboundLink")],1),e._v(" and say “Hi”. We also participate in "),r("a",{attrs:{href:"https://developers.google.com/open-source/gsoc/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Google Summer of Code"),r("OutboundLink")],1),e._v(", which is a great opportunity for students interested in being directly supported to work on the project.")]),e._v(" "),r("hr",{staticClass:"footnotes-sep"}),e._v(" "),r("section",{staticClass:"footnotes"},[r("ol",{staticClass:"footnotes-list"},[r("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[r("p",[e._v("YAML Ain’t Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/YAML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/YAML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[r("p",[e._v("Data Package: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[r("p",[e._v("Data Package version field: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/patterns/#data-package-version",target:"_blank",rel:"noopener noreferrer"}},[e._v("/specs/#version"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[r("p",[e._v("Extensible Markup Language: "),r("a",{attrs:{href:"https://en.wikipedia.org/wiki/XML",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://en.wikipedia.org/wiki/XML"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])]),e._v(" "),r("li",{staticClass:"footnote-item",attrs:{id:"fn5"}},[r("p",[e._v("Design Philosophy: "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/#design-philosophy",target:"_blank",rel:"noopener noreferrer"}},[e._v("/specs/#design-philosophy"),r("OutboundLink")],1),e._v(" "),r("a",{staticClass:"footnote-backref",attrs:{href:"#fnref5"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/47.7ce0d603.js b/assets/js/47.852889fe.js similarity index 99% rename from assets/js/47.7ce0d603.js rename to assets/js/47.852889fe.js index 5d341426e..c4d37875f 100644 --- a/assets/js/47.7ce0d603.js +++ b/assets/js/47.852889fe.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[47],{418:function(e,t,a){e.exports=a.p+"assets/img/adbio.ea0206c3.png"},570:function(e,t,a){"use strict";a.r(t);var o=a(29),i=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"context"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),o("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[e._v("#")]),e._v(" Problem We Were Trying To Solve")]),e._v(" "),o("p",[e._v("Sam Payne and his team at the Pacific Northwest National Laboratory (PNNL) have designed an application called "),o("a",{attrs:{href:"https://adbio.pnnl.gov/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Active Data Biology"),o("OutboundLink")],1),e._v(" (ADBio) which is an interactive web-based suite of tools for analyzing high-throughput omics (a set of related fields of study in biology). The goal is to visualize and analyze datasets while still enabling seamless collaboration between computational and non-computational domain experts. The tool provides several views on the same data facilitating different avenues of investigation.")]),e._v(" "),o("p",[e._v("One of the high level goals of ADBio was to make collaborative data analysis work in a similar manner to collaborative software development (versioned, asynchronous, flexible, sharable, global). You can read more of the motivation in the Open Knowledge International blog post "),o("a",{attrs:{href:"https://blog.okfn.org/2016/11/29/git-for-data-analysis-why-version-control-is-essential-collaboration-public-trust/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Git for Data Analysis – why version control is essential for collaboration and for gaining public trust"),o("OutboundLink")],1),e._v(" written by Sam Payne as part of the pilot. To facilitate this goal, Sam and his team used version-controlled repositories as the storage mechanism for all required resources. Data, software (for conducting analyses), and insights (gained from these analyses) for the project all get checked into the same repository. ADBio pulls data and software directly from the repository and serves up an interactive visualization for data exploration. Any insight you choose to record gets checked back into the repository.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(418),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("When we were first approached by Sam and his team, they outlined several use cases for which it might be valuable to have formal Data Package support (with the benefit of the associated tooling) within their framework. In the end, we decided to work on the first: "),o("em",[e._v("validating metadata associated with ADBio repositories")]),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"use-case-validating-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#use-case-validating-metadata"}},[e._v("#")]),e._v(" Use Case: Validating Metadata")]),e._v(" "),o("p",[e._v("To initiate a project in Active Data Biology, users start with a dataset of quantitative molecular measurements across multiple samples combined with metadata for each sample. Each repository on ADBio contains these two types of files. For clinical experiments, the metadata may include information about a participant’s age, gender, disease stage, etc. For an environmental experiment, this may be geographical location, temperature, time of day, etc. One "),o("a",{attrs:{href:"https://github.com/ActiveDataBio/ADB-User-Study/blob/master/metadata.tsv",target:"_blank",rel:"noopener noreferrer"}},[e._v("example"),o("OutboundLink")],1),e._v(" of a metadata file can be found at on the ADB-User-Study project repository under the "),o("a",{attrs:{href:"https://github.com/ActiveDataBio/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ActiveDataBio organization on GitHub"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The metadata file can be updated or expanded during the course of analysis. This is currently not easily done within ADBio. Moreover, the researchers lacked any formal schema describing the metadata file and its contents. It was suggested that having a Data Package formalizing the metadata file would be a benefit. This would also enable validation of the contents, according to the schema stored as part of the Data Package. Finally, the researchers also requested the development of a web UI to edit the metadata file that would be an application within the ADBio suite. Users could then update the schema online, and it would be versioned through GitHub like everything else. Scenario")]),e._v(" "),o("p",[e._v("A user gets updated survival information for patients in a clinical study and wants to update the metadata associated with this experiment. Within ADBio, the user opens the “Metadata” app and enters new information into the user interface. When finished, user clicks a ‘save’ button and the data is validated against the schema. If it fails, the specific cells are highlighted and annotated with failure codes. If it passes, the new metadata file is checked into the repository with a user-specified comment for the commit message.")]),e._v(" "),o("h2",{attrs:{id:"the-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),o("h3",{attrs:{id:"what-did-we-do"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),o("p",[e._v("This was a valuable pilot for several reasons. For one, the researchers interests in openness and the value of public, versioned infrastructure like GitHub for tabular, flat file datasets aligned well with the overall interests of the project. OKI’s first step was to start a new repository to track progress "),o("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-pnnl",target:"_blank",rel:"noopener noreferrer"}},[e._v("in the open"),o("OutboundLink")],1),e._v(". In addition, OKI also created their own "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ADB-User-Study",target:"_blank",rel:"noopener noreferrer"}},[e._v("“fork” (i.e. versioned copy) of the repository"),o("OutboundLink")],1),e._v(" in which PNNL stored their exemplar metadata file.")]),e._v(" "),o("h3",{attrs:{id:"data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data"}},[e._v("#")]),e._v(" Data")]),e._v(" "),o("p",[e._v("The "),o("code",[e._v("metadata.tsv")]),e._v(" file is specially formatted compared to other TSV (tab-separated values) files in that it contains two extra rows below the header for describing a column’s "),o("em",[e._v("methods")]),e._v(" and "),o("em",[e._v("descriptions")]),e._v(". While this is a neat way of storing metadata for each column, it is not particularly standard as ordinarily, we would expect all rows below the header contain actual data. Nevertheless, it provided a great start to the development of a custom schema. We used the information stored in these rows to generate a "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),o("OutboundLink")],1),e._v(" for the data compatible with our software ("),o("a",{attrs:{href:"https://github.com/frictionlessdata/ADB-User-Study/blob/master/metadata-schema.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("the schema"),o("OutboundLink")],1),e._v(").")]),e._v(" "),o("p",[e._v("For instance, if a column in the original metadata.tsv file had the text "),o("code",[e._v("categorical")]),e._v(" in its "),o("code",[e._v("#methods")]),e._v(" row, we knew that this translated very well to our "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#constraints",target:"_blank",rel:"noopener noreferrer"}},[e._v("enum (short for enumerated list) constraint"),o("OutboundLink")],1),e._v(". However, this was not enough. We had to infer from the values below in the dataset which values were actually valid categorical values for that column. So, for example, the "),o("code",[e._v("PlatinumStatus")]),e._v(" column could only be one of "),o("code",[e._v("Resistant")]),e._v(", "),o("code",[e._v("Sensitive")]),e._v(", or "),o("code",[e._v("Tooearly")]),e._v(" leading to the following constraint definition in Table Schema:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"constraints": {\n "enum": [\n "Resistant",\n "Sensitive",\n "Tooearly"\n ]\n}\n')])])]),o("p",[e._v("More straightforward was the translation of the "),o("code",[e._v("#descriptions")]),e._v(" row; each description was translated directly into a "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#description",target:"_blank",rel:"noopener noreferrer"}},[e._v("description attribute"),o("OutboundLink")],1),e._v(" on the column:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"description": "It describes whether the patient was resistant to platinum (chemotherapy) treatment",\n')])])]),o("p",[e._v("What the "),o("code",[e._v("metadata.tsv")]),e._v(" file did not record at all was any information about the “type” of value expected for each column. For instance, the "),o("code",[e._v("days_to_death")]),e._v(" column would never contain a value that was of a “geopoint” type, but rather always a number (and a whole number at that). Likewise, the "),o("code",[e._v("additional_immuno_therapy")]),e._v(" column would always be a True/False (i.e. boolean) value. With PNNL’s domain expertise, OKI added these expectations to the schema so that "),o("code",[e._v("days_to_death")]),e._v(" could be relied upon to always be an integer and "),o("code",[e._v("additional_immuno_therapy")]),e._v(" a boolean (True/False) value.")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('{\n "name": "additional_immuno_therapy",\n "type": "boolean"\n}\n')])])]),o("p",[e._v("Up to this point, the dataset provided by PNNL was adequately described by our specifications. One challenge was how to deal, though, with the many missing values in the dataset. While we had discussion on the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/97",target:"_blank",rel:"noopener noreferrer"}},[e._v("topic"),o("OutboundLink")],1),e._v(", we had not yet established a formal way of specifying. In part due to observed usage and the needs of the pilot, we formalized an approach to recording information about which values signal missing data in "),o("a",{attrs:{href:"https://twitter.com/OKFNLabs/status/765568650699018241",target:"_blank",rel:"noopener noreferrer"}},[e._v("mid-August 2016"),o("OutboundLink")],1),e._v(". We added this information to the Table Schema:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"missingValues": [\n "[Not Applicable]",\n "[Not Available]",\n "[Pending]"\n]\n')])])]),o("h3",{attrs:{id:"software"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#software"}},[e._v("#")]),e._v(" Software")]),e._v(" "),o("p",[e._v("Goodtables had "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2015/02/20/introducing-goodtables.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("existed"),o("OutboundLink")],1),e._v(" as a Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. This software was put to good use in a local government context.")]),e._v(" "),o("p",[e._v("For this pilot, and in coordination with other work in the project, we took the opportunity to drastically improve the software to support the online, automated validation referenced in the above use case. We took as inspiration the workflow in use in software development environments around the world—continuous automated testing—and applied to data. This involved not only updating the Python library to reflect the specification development to date, but the design of a new data publishing workflow that is applicable beyond PNNL’s needs. It is designed to be extensible, so that custom checks and custom backends (e.g. other places where one might publish a dataset) can take advantage of this workflow. For example, in addition to datasets stored on GitHub, the new goodtables supports the automated validation of datasets stored on S3 and we are currently working on validation of datasets stored on CKAN.")]),e._v(" "),o("p",[e._v("Goodtables supports validation of tabular data in GitHub repositories to solve the use case for Active Data Biology. On every update to the dataset, a validation task is run on the stored data.")]),e._v(" "),o("h2",{attrs:{id:"review"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[e._v("#")]),e._v(" Review")]),e._v(" "),o("h3",{attrs:{id:"how-effective-was-it"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-effective-was-it"}},[e._v("#")]),e._v(" How Effective Was It")]),e._v(" "),o("p",[e._v("The omics team at PNNL are still investigating the use of "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" for their use case, but early reports are positive:")]),e._v(" "),o("blockquote",[o("p",[e._v("We created a schema and started testing it. So far so good! I think this is going to work for a lot of projects which want to store data in a repo.")])]),e._v(" "),o("p",[e._v("As a real test of the generality of goodtables, we also tried to apply it to another project. This second project is a public repository describing measurements of metabolites in ion mobility mass spectrometry. Here, we are again using flat files for structured data. The data is actually a library of information describing metabolites, and we know that the library will be growing. So it was very similar to the ADBio project, in that the curated data would be continually updated. (see "),o("a",{attrs:{href:"https://github.com/PNNL-Comp-Mass-Spec/MetabolomicsCCS",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/PNNL-Comp-Mass-Spec/MetabolomicsCCS"),o("OutboundLink")],1),e._v(" for the project itself, and "),o("a",{attrs:{href:"https://github.com/PNNL-Comp-Mass-Spec/metaboliteValidation",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/PNNL-Comp-Mass-Spec/metaboliteValidation"),o("OutboundLink")],1),e._v(" for a validation script that leverages goodtables).")]),e._v(" "),o("p",[e._v("Of course, technical issues that they have encountered have been translated in GitHub issues and are being addressed:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/issues/233",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/issues/233"),o("OutboundLink")],1)]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/pull/235",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/pull/235"),o("OutboundLink")],1)]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/issues/232",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/issues/232"),o("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[47],{418:function(e,t,a){e.exports=a.p+"assets/img/adbio.ea0206c3.png"},571:function(e,t,a){"use strict";a.r(t);var o=a(29),i=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"context"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[e._v("#")]),e._v(" Context")]),e._v(" "),o("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[e._v("#")]),e._v(" Problem We Were Trying To Solve")]),e._v(" "),o("p",[e._v("Sam Payne and his team at the Pacific Northwest National Laboratory (PNNL) have designed an application called "),o("a",{attrs:{href:"https://adbio.pnnl.gov/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Active Data Biology"),o("OutboundLink")],1),e._v(" (ADBio) which is an interactive web-based suite of tools for analyzing high-throughput omics (a set of related fields of study in biology). The goal is to visualize and analyze datasets while still enabling seamless collaboration between computational and non-computational domain experts. The tool provides several views on the same data facilitating different avenues of investigation.")]),e._v(" "),o("p",[e._v("One of the high level goals of ADBio was to make collaborative data analysis work in a similar manner to collaborative software development (versioned, asynchronous, flexible, sharable, global). You can read more of the motivation in the Open Knowledge International blog post "),o("a",{attrs:{href:"https://blog.okfn.org/2016/11/29/git-for-data-analysis-why-version-control-is-essential-collaboration-public-trust/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Git for Data Analysis – why version control is essential for collaboration and for gaining public trust"),o("OutboundLink")],1),e._v(" written by Sam Payne as part of the pilot. To facilitate this goal, Sam and his team used version-controlled repositories as the storage mechanism for all required resources. Data, software (for conducting analyses), and insights (gained from these analyses) for the project all get checked into the same repository. ADBio pulls data and software directly from the repository and serves up an interactive visualization for data exploration. Any insight you choose to record gets checked back into the repository.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(418),alt:"ADBio"}})]),e._v(" "),o("p",[e._v("When we were first approached by Sam and his team, they outlined several use cases for which it might be valuable to have formal Data Package support (with the benefit of the associated tooling) within their framework. In the end, we decided to work on the first: "),o("em",[e._v("validating metadata associated with ADBio repositories")]),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"use-case-validating-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#use-case-validating-metadata"}},[e._v("#")]),e._v(" Use Case: Validating Metadata")]),e._v(" "),o("p",[e._v("To initiate a project in Active Data Biology, users start with a dataset of quantitative molecular measurements across multiple samples combined with metadata for each sample. Each repository on ADBio contains these two types of files. For clinical experiments, the metadata may include information about a participant’s age, gender, disease stage, etc. For an environmental experiment, this may be geographical location, temperature, time of day, etc. One "),o("a",{attrs:{href:"https://github.com/ActiveDataBio/ADB-User-Study/blob/master/metadata.tsv",target:"_blank",rel:"noopener noreferrer"}},[e._v("example"),o("OutboundLink")],1),e._v(" of a metadata file can be found at on the ADB-User-Study project repository under the "),o("a",{attrs:{href:"https://github.com/ActiveDataBio/",target:"_blank",rel:"noopener noreferrer"}},[e._v("ActiveDataBio organization on GitHub"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("The metadata file can be updated or expanded during the course of analysis. This is currently not easily done within ADBio. Moreover, the researchers lacked any formal schema describing the metadata file and its contents. It was suggested that having a Data Package formalizing the metadata file would be a benefit. This would also enable validation of the contents, according to the schema stored as part of the Data Package. Finally, the researchers also requested the development of a web UI to edit the metadata file that would be an application within the ADBio suite. Users could then update the schema online, and it would be versioned through GitHub like everything else. Scenario")]),e._v(" "),o("p",[e._v("A user gets updated survival information for patients in a clinical study and wants to update the metadata associated with this experiment. Within ADBio, the user opens the “Metadata” app and enters new information into the user interface. When finished, user clicks a ‘save’ button and the data is validated against the schema. If it fails, the specific cells are highlighted and annotated with failure codes. If it passes, the new metadata file is checked into the repository with a user-specified comment for the commit message.")]),e._v(" "),o("h2",{attrs:{id:"the-work"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[e._v("#")]),e._v(" The Work")]),e._v(" "),o("h3",{attrs:{id:"what-did-we-do"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-did-we-do"}},[e._v("#")]),e._v(" What Did We Do")]),e._v(" "),o("p",[e._v("This was a valuable pilot for several reasons. For one, the researchers interests in openness and the value of public, versioned infrastructure like GitHub for tabular, flat file datasets aligned well with the overall interests of the project. OKI’s first step was to start a new repository to track progress "),o("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-pnnl",target:"_blank",rel:"noopener noreferrer"}},[e._v("in the open"),o("OutboundLink")],1),e._v(". In addition, OKI also created their own "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ADB-User-Study",target:"_blank",rel:"noopener noreferrer"}},[e._v("“fork” (i.e. versioned copy) of the repository"),o("OutboundLink")],1),e._v(" in which PNNL stored their exemplar metadata file.")]),e._v(" "),o("h3",{attrs:{id:"data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#data"}},[e._v("#")]),e._v(" Data")]),e._v(" "),o("p",[e._v("The "),o("code",[e._v("metadata.tsv")]),e._v(" file is specially formatted compared to other TSV (tab-separated values) files in that it contains two extra rows below the header for describing a column’s "),o("em",[e._v("methods")]),e._v(" and "),o("em",[e._v("descriptions")]),e._v(". While this is a neat way of storing metadata for each column, it is not particularly standard as ordinarily, we would expect all rows below the header contain actual data. Nevertheless, it provided a great start to the development of a custom schema. We used the information stored in these rows to generate a "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema"),o("OutboundLink")],1),e._v(" for the data compatible with our software ("),o("a",{attrs:{href:"https://github.com/frictionlessdata/ADB-User-Study/blob/master/metadata-schema.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("the schema"),o("OutboundLink")],1),e._v(").")]),e._v(" "),o("p",[e._v("For instance, if a column in the original metadata.tsv file had the text "),o("code",[e._v("categorical")]),e._v(" in its "),o("code",[e._v("#methods")]),e._v(" row, we knew that this translated very well to our "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#constraints",target:"_blank",rel:"noopener noreferrer"}},[e._v("enum (short for enumerated list) constraint"),o("OutboundLink")],1),e._v(". However, this was not enough. We had to infer from the values below in the dataset which values were actually valid categorical values for that column. So, for example, the "),o("code",[e._v("PlatinumStatus")]),e._v(" column could only be one of "),o("code",[e._v("Resistant")]),e._v(", "),o("code",[e._v("Sensitive")]),e._v(", or "),o("code",[e._v("Tooearly")]),e._v(" leading to the following constraint definition in Table Schema:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"constraints": {\n "enum": [\n "Resistant",\n "Sensitive",\n "Tooearly"\n ]\n}\n')])])]),o("p",[e._v("More straightforward was the translation of the "),o("code",[e._v("#descriptions")]),e._v(" row; each description was translated directly into a "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#description",target:"_blank",rel:"noopener noreferrer"}},[e._v("description attribute"),o("OutboundLink")],1),e._v(" on the column:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"description": "It describes whether the patient was resistant to platinum (chemotherapy) treatment",\n')])])]),o("p",[e._v("What the "),o("code",[e._v("metadata.tsv")]),e._v(" file did not record at all was any information about the “type” of value expected for each column. For instance, the "),o("code",[e._v("days_to_death")]),e._v(" column would never contain a value that was of a “geopoint” type, but rather always a number (and a whole number at that). Likewise, the "),o("code",[e._v("additional_immuno_therapy")]),e._v(" column would always be a True/False (i.e. boolean) value. With PNNL’s domain expertise, OKI added these expectations to the schema so that "),o("code",[e._v("days_to_death")]),e._v(" could be relied upon to always be an integer and "),o("code",[e._v("additional_immuno_therapy")]),e._v(" a boolean (True/False) value.")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('{\n "name": "additional_immuno_therapy",\n "type": "boolean"\n}\n')])])]),o("p",[e._v("Up to this point, the dataset provided by PNNL was adequately described by our specifications. One challenge was how to deal, though, with the many missing values in the dataset. While we had discussion on the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/specs/issues/97",target:"_blank",rel:"noopener noreferrer"}},[e._v("topic"),o("OutboundLink")],1),e._v(", we had not yet established a formal way of specifying. In part due to observed usage and the needs of the pilot, we formalized an approach to recording information about which values signal missing data in "),o("a",{attrs:{href:"https://twitter.com/OKFNLabs/status/765568650699018241",target:"_blank",rel:"noopener noreferrer"}},[e._v("mid-August 2016"),o("OutboundLink")],1),e._v(". We added this information to the Table Schema:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v('"missingValues": [\n "[Not Applicable]",\n "[Not Available]",\n "[Pending]"\n]\n')])])]),o("h3",{attrs:{id:"software"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#software"}},[e._v("#")]),e._v(" Software")]),e._v(" "),o("p",[e._v("Goodtables had "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2015/02/20/introducing-goodtables.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("existed"),o("OutboundLink")],1),e._v(" as a Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. This software was put to good use in a local government context.")]),e._v(" "),o("p",[e._v("For this pilot, and in coordination with other work in the project, we took the opportunity to drastically improve the software to support the online, automated validation referenced in the above use case. We took as inspiration the workflow in use in software development environments around the world—continuous automated testing—and applied to data. This involved not only updating the Python library to reflect the specification development to date, but the design of a new data publishing workflow that is applicable beyond PNNL’s needs. It is designed to be extensible, so that custom checks and custom backends (e.g. other places where one might publish a dataset) can take advantage of this workflow. For example, in addition to datasets stored on GitHub, the new goodtables supports the automated validation of datasets stored on S3 and we are currently working on validation of datasets stored on CKAN.")]),e._v(" "),o("p",[e._v("Goodtables supports validation of tabular data in GitHub repositories to solve the use case for Active Data Biology. On every update to the dataset, a validation task is run on the stored data.")]),e._v(" "),o("h2",{attrs:{id:"review"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[e._v("#")]),e._v(" Review")]),e._v(" "),o("h3",{attrs:{id:"how-effective-was-it"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#how-effective-was-it"}},[e._v("#")]),e._v(" How Effective Was It")]),e._v(" "),o("p",[e._v("The omics team at PNNL are still investigating the use of "),o("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" for their use case, but early reports are positive:")]),e._v(" "),o("blockquote",[o("p",[e._v("We created a schema and started testing it. So far so good! I think this is going to work for a lot of projects which want to store data in a repo.")])]),e._v(" "),o("p",[e._v("As a real test of the generality of goodtables, we also tried to apply it to another project. This second project is a public repository describing measurements of metabolites in ion mobility mass spectrometry. Here, we are again using flat files for structured data. The data is actually a library of information describing metabolites, and we know that the library will be growing. So it was very similar to the ADBio project, in that the curated data would be continually updated. (see "),o("a",{attrs:{href:"https://github.com/PNNL-Comp-Mass-Spec/MetabolomicsCCS",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/PNNL-Comp-Mass-Spec/MetabolomicsCCS"),o("OutboundLink")],1),e._v(" for the project itself, and "),o("a",{attrs:{href:"https://github.com/PNNL-Comp-Mass-Spec/metaboliteValidation",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/PNNL-Comp-Mass-Spec/metaboliteValidation"),o("OutboundLink")],1),e._v(" for a validation script that leverages goodtables).")]),e._v(" "),o("p",[e._v("Of course, technical issues that they have encountered have been translated in GitHub issues and are being addressed:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/issues/233",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/issues/233"),o("OutboundLink")],1)]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/pull/235",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/pull/235"),o("OutboundLink")],1)]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables.io/issues/232",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables.io/issues/232"),o("OutboundLink")],1)])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/49.63bda42d.js b/assets/js/49.39f15b94.js similarity index 99% rename from assets/js/49.63bda42d.js rename to assets/js/49.39f15b94.js index 12b7f0393..61b375964 100644 --- a/assets/js/49.63bda42d.js +++ b/assets/js/49.39f15b94.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[49],{441:function(t,a,s){t.exports=s.p+"assets/img/packagist.835e6f2c.png"},586:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"dm4t-pilot"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#dm4t-pilot"}},[t._v("#")]),t._v(" DM4T Pilot")]),t._v(" "),e("h2",{attrs:{id:"pilot-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#pilot-name"}},[t._v("#")]),t._v(" Pilot Name")]),t._v(" "),e("p",[t._v("Data Management for TEDDINET (DM4T)")]),t._v(" "),e("h2",{attrs:{id:"authors"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#authors"}},[t._v("#")]),t._v(" Authors")]),t._v(" "),e("p",[t._v("Julian Padget (DM4T), Dan Fowler (OKI), Evgeny Karev (OKI)")]),t._v(" "),e("h2",{attrs:{id:"field"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#field"}},[t._v("#")]),t._v(" Field")]),t._v(" "),e("p",[t._v("Energy Data")]),t._v(" "),e("h2",{attrs:{id:"fd-tech-involved"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#fd-tech-involved"}},[t._v("#")]),t._v(" FD Tech Involved")]),t._v(" "),e("ul",[e("li",[t._v("Frictionless Data specs: "),e("a",{attrs:{href:"http://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://specs.frictionlessdata.io/"),e("OutboundLink")],1)]),t._v(" "),e("li",[t._v("Data Package Pipelines: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/datapackage-pipelines"),e("OutboundLink")],1)]),t._v(" "),e("li",[t._v("Goodtables: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/goodtables-py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[e("code",[t._v("packagist")]),t._v(" has now moved to "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("create.frictionlessdata.io"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"context"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[t._v("#")]),t._v(" Context")]),t._v(" "),e("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[t._v("#")]),t._v(" Problem We Were Trying To Solve")]),t._v(" "),e("p",[t._v("Open Knowledge International and the Data Management for TEDDINET project (DM4T) agreed to work together on a proof-of-concept pilot to attempt to use Frictionless Data approaches to address some of the data legacy issues facing the TEDDINET project, a research network addressing the challenges of transforming energy demand in our buildings, as a key component of the transition to an affordable, low carbon energy system. The problem as described on the DM4T Website:")]),t._v(" "),e("blockquote",[e("p",[t._v("The Engineering and Physical Sciences Research Council (EPSRC), the UK’s main agency for funding research in engineering and the physical sciences, funded 22 projects over two calls in 2010 and 2012 to investigate Transforming Energy Demand through Digital Innovation’ (TEDDI) as a means to find out how people use energy in homes and what can be done reduce energy consumption. A lot of data is being collected at different levels of detail in a variety of housing throughout the UK, but the level of detail are largely defined by the needs of each individual project. At the same time, the Research Councils UK (RCUK) are defining guidelines for what happens to data generated by projects they fund which require researchers to take concrete actions to store, preserve, and document their data for future reference.")])]),t._v(" "),e("blockquote",[e("p",[t._v("The problem, however, is that there is relatively little awareness, limited experience and only emerging practice of how to incorporate data management into much of physical science research. This is in contrast to established procedures for data formats and sharing in the biosciences, stemming from international collaboration on the Human Genome Project, and in the social sciences, where data from national surveys, including census data, have been centrally archived for many years. Consequently, current solutions may be able to meet a minimal interpretation of the requirements, but not effectively deliver the desired data legacy.")])]),t._v(" "),e("p",[t._v("The DM4T group selected three suitable datasets to on which to base this work and provided domain knowledge to ensure the pilot is applicable to real use cases.")]),t._v(" "),e("p",[t._v("Output was tracked here: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/issues"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"the-work"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[t._v("#")]),t._v(" The work")]),t._v(" "),e("p",[t._v("We will use the "),e("code",[t._v("refit-cleaned")]),t._v(" dataset to show the Frictionless Data specs and software capabilities. For this work, we limited the size of this dataset in order to preserve a reasonable showcasing time. However, by design the Frictionless Data software has a very good scalability and this process could be reproduced for the whole dataset. But for now it is worth noting that the speed for such a big datasets could be a bottle neck for a research work.")]),t._v(" "),e("h3",{attrs:{id:"refit-electrical-load-measurements-cleaned"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#refit-electrical-load-measurements-cleaned"}},[t._v("#")]),t._v(" REFIT: Electrical Load Measurements (Cleaned)")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the dataset: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/tree/delivery/datasets/refit-cleaned",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/tree/delivery/datasets/refit-cleaned"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("For each house in the study, this dataset consists of granular readings of electrical load. There were 20 houses in total, and each house had a different mix of devices plugged into the electrical load sensor. The dataset was distributed as a zipped file (~500MB) containing 20 CSVs with a combined ~120 million rows.")]),t._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[t._v("Time,Unix,Aggregate,Appliance1,Appliance2,Appliance3,Appliance4,Appliance5,Appliance6,Appliance7,Appliance8,Appliance9\n2013-10-09 13:06:17,1381323977,523,74,0,69,0,0,0,0,0,1\n2013-10-09 13:06:31,1381323991,526,75,0,69,0,0,0,0,0,1\n2013-10-09 13:06:46,1381324006,540,74,0,68,0,0,0,0,0,1\n2013-10-09 13:07:01,1381324021,532,74,0,68,0,0,0,0,0,1\n2013-10-09 13:07:15,1381324035,540,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:18,1381324038,539,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:30,1381324050,537,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:32,1381324052,537,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:44,1381324064,548,74,0,69,0,0,0,0,0,1\n")])])]),e("p",[t._v("Given that these datasets were already provided in well structured CSV files, it was straightforward to translate the data dictionary found in the dataset’s README into the relevant fields in the datapackage.json. We did not need to alter the CSVs that comprise the dataset.")]),t._v(" "),e("h3",{attrs:{id:"creating-a-data-package-using-datapackage-pipelines"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-data-package-using-datapackage-pipelines"}},[t._v("#")]),t._v(" Creating a data package using Datapackage Pipelines")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the Datapackage Pipelines project: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/datapackage-pipelines"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("Datapackage Pipelines is a framework for declarative stream-processing of tabular data. It is built upon the concepts and tooling of the Frictionless Data project. The basic concept in this framework is the pipeline. A pipeline has a list of processing steps, and it generates a single data package as its output. Pipelines are defined in a declarative way, not in code. One or more pipelines can be defined in a "),e("code",[t._v("pipeline-spec.yaml")]),t._v(" file. This file specifies the list of processors (referenced by name) and their execution parameters.")]),t._v(" "),e("p",[t._v("One of the main purposes of the Frictionless Data project is data containerization. It means that instead of having two separated data knowledge sources (data files and text readme), we’re going to put both of them into a container based on the "),e("code",[t._v("Data Package")]),t._v(" specification. This allows us to:")]),t._v(" "),e("ul",[e("li",[t._v("Ensure that the dataset description is shipped with the data files")]),t._v(" "),e("li",[t._v("Provide column data type information to allow type validation")]),t._v(" "),e("li",[t._v("Use the Frictionless Data tooling for reading and validating datasets")]),t._v(" "),e("li",[t._v("Enable usage of other software which supports Frictionless Data specifications")])]),t._v(" "),e("p",[t._v("First, we used the "),e("code",[t._v("datapackage-pipeline")]),t._v(" library to create a data package from the raw dataset. We need a declarative file called "),e("code",[t._v("datapackage-pipelines.yaml")]),t._v(" to describe data transformations steps:")]),t._v(" "),e("blockquote",[e("p",[t._v("datapackage-pipelines.yaml")])]),t._v(" "),e("div",{staticClass:"language-yaml extra-class"},[e("pre",{pre:!0,attrs:{class:"language-yaml"}},[e("code",[e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("refit-cleaned")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("pipeline")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" add_metadata\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("name")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("electrical"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("load"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("measurements\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("title")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'REFIT: Electrical Load Measurements'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("license")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" CC"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("BY"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.0")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Collection of this dataset was supported by the Engineering and Physical Sciences Research Council (EPSRC) via the project entitled Personalised Retrofit Decision Support Tools for UK Homes using Smart Home Technology (REFIT)"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" which is a collaboration among the Universities of Strathclyde"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" Loughborough and East Anglia. The dataset includes data from 20 households from the Loughborough area over the period 2013 "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" 2015. Additional information about REFIT is available from www.refitsmarthomes.org.\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("title")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'REFIT: Electrical Load Measurements (Cleaned)'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("web")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://pure.strath.ac.uk/portal/en/datasets/refit-electrical-load-measurements-cleaned(9ab14b0e-19ac-4279-938f-27f643078cec).html'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("email")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" researchdataproject@strath.ac.uk\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" add_resource\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("name")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datasets/refit-cleaned/House_1.csv'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Other resources are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" stream_remote_resources\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" set_types\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("resources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"house-[0-9]{1,2}"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("types")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"Time"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" datetime\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt:%Y-%m-%d %H:%M:%S"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Unix")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Aggregate")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"Appliance[1-9]"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" processors.modify_descriptions\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("resources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" house"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("descriptions")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Fridge\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance2")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Chest Freezer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance3")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Upright Freezer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance4")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Tumble Dryer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance5")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("descripion")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Washing Machine\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance6")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Dishwasher\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance7")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Computer Site\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance8")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Television Site\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance9")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Electric Heater\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Other resources are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" dump.to_path\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("out-path")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" packages/refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("cleaned\n")])])]),e("p",[t._v("The process follows contains these steps:")]),t._v(" "),e("ul",[e("li",[t._v("Create the data package metadata")]),t._v(" "),e("li",[t._v("Add all data files from the disc")]),t._v(" "),e("li",[t._v("Start resources streaming into the data package")]),t._v(" "),e("li",[t._v("Update resources descriptions using a custom processor")]),t._v(" "),e("li",[t._v("Save the data package to the disc")])]),t._v(" "),e("p",[t._v("Now we’re ready to run this pipeline:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ dpp run ./refit-cleaned\n")])])]),e("p",[t._v("After this step we have a data package containing a descriptor:")]),t._v(" "),e("blockquote",[e("p",[t._v("packages/refit-cleaned/datapakcage.json")])]),t._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"bytes"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1121187")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"count_of_rows"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("19980")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Collection of this dataset was supported by the Engineering and Physical Sciences Research Council (EPSRC) via the project entitled Personalised Retrofit Decision Support Tools for UK Homes using Smart Home Technology (REFIT), which is a collaboration among the Universities of Strathclyde, Loughborough and East Anglia. The dataset includes data from 20 households from the Loughborough area over the period 2013 - 2015. Additional information about REFIT is available from www.refitsmarthomes.org."')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"hash"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"433ff35135e0a43af6f00f04cb8e666d"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"license"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"CC-BY-4.0"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"refit-electrical-load-measurements"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"resources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"bytes"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("55251")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"count_of_rows"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("999")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dialect"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"delimiter"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('","')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"doubleQuote"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"lineTerminator"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"\\r\\n"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"quoteChar"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"\\""')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"skipInitialSpace"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dpp:streamedFrom"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"datasets/refit-cleaned/House_1.csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dpp:streaming"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"encoding"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"utf-8"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"format"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"hash"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"ad42fbf1302cabe30e217ff105d5a7fd"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"house-1"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"path"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data/house-1.csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"schema"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"fields"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"format"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%Y-%m-%d %H:%M:%S"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Time"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"datetime"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Unix"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Aggregate"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Fridge"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance1"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Chest Freezer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance2"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Upright Freezer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance3"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Tumble Dryer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance4"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"descripion"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Washing Machine"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance5"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Dishwasher"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance6"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Computer Site"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance7"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Television Site"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance8"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Electric Heater"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance9"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n\n # Other resources is omitted\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),e("p",[t._v("And a list of data files linked in the descriptor:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ "),e("span",{pre:!0,attrs:{class:"token function"}},[t._v("ls")]),t._v(" packages/refit-cleaned/data\nhouse-10.csv house-13.csv house-17.csv house-1.csv house-2.csv house-5.csv house-8.csv\nhouse-11.csv house-15.csv house-18.csv house-20.csv house-3.csv house-6.csv house-9.csv\nhouse-12.csv house-16.csv house-19.csv house-21.csv house-4.csv house-7.csv\n")])])]),e("h3",{attrs:{id:"validating-a-data-package-using-goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-a-data-package-using-goodtables"}},[t._v("#")]),t._v(" Validating a data package using Goodtables")]),t._v(" "),e("p",[t._v("Goodtables is a software family for tabular data validation. It’s available as a Python library, a command line tool, "),e("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("web application"),e("OutboundLink")],1),t._v(" and "),e("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("continuous validation service"),e("OutboundLink")],1),t._v(".")]),t._v(" "),e("p",[t._v("The main features of Goodtables are:")]),t._v(" "),e("ul",[e("li",[t._v("Structural checks: Ensure that there are no empty rows, no blank headers, etc.")]),t._v(" "),e("li",[t._v("Content checks: Ensure that the values have the correct types (“string”, “number”, “date”, etc.), that their format is valid (“string must be an e-mail”), and that they respect the constraints (“age must be a number greater than 18”).")]),t._v(" "),e("li",[t._v("Support for multiple tabular formats: CSV, Excel files, LibreOffice, Data Package, etc.")]),t._v(" "),e("li",[t._v("Parallelized validations for multi-table datasets")])]),t._v(" "),e("p",[t._v("Because we have provided data types for the columns at the wrapping stage, here we validate both the data structure and compliance to the data types using the Goodtables command line interface:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ goodtables packages/refit-cleaned/datapackage.json\nDATASET\n"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'error-count'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'preset'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datapackage'")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table-count'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("20")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'time'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.694")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" True"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),e("h3",{attrs:{id:"modifying-a-data-package-using-packagist"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#modifying-a-data-package-using-packagist"}},[t._v("#")]),t._v(" Modifying a data package using Packagist")]),t._v(" "),e("p",[t._v("If we need to modify our data package, we could use the "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Packagist"),e("OutboundLink")],1),t._v(". It incorporates a straightforward UI to modify and validate data package descriptors. With its easy to use interface we are able to:")]),t._v(" "),e("ul",[e("li",[t._v("Load/validate/save a data package")]),t._v(" "),e("li",[t._v("Update a data package metadata")]),t._v(" "),e("li",[t._v("Add/remove/modify data package resources")]),t._v(" "),e("li",[t._v("Add/remove/modify data resource fields")]),t._v(" "),e("li",[t._v("Set type/format for data values")])]),t._v(" "),e("p",[e("img",{attrs:{src:s(441),alt:"ADBio"}})]),t._v(" "),e("p",[t._v("On the figure above we have loaded the "),e("code",[t._v("refit-cleaned")]),t._v(" data package into the Packagist UI to make changes to the data package as needed.")]),t._v(" "),e("h3",{attrs:{id:"publishing-a-data-package-to-amazon-s3"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing-a-data-package-to-amazon-s3"}},[t._v("#")]),t._v(" Publishing a data package to Amazon S3")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the published package: "),e("a",{attrs:{href:"https://s3.eu-central-1.amazonaws.com/pilot-dm4t/pilot-dm4t/packages/refit-cleaned/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://s3.eu-central-1.amazonaws.com/pilot-dm4t/pilot-dm4t/packages/refit-cleaned/datapackage.json"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("In this section we will show how data packages can be moved from one data storage system to another. This is possible because it has been containerised.")]),t._v(" "),e("p",[t._v("One important feature of the "),e("code",[t._v("datapackage-pipelines")]),t._v(" project that it works as a conveyor. We could push our data package not only to the local disc but to other destinations. For example to the Amazon S3:")]),t._v(" "),e("blockquote",[e("p",[t._v("pipelines-spec.yml")])]),t._v(" "),e("div",{staticClass:"language-yaml extra-class"},[e("pre",{pre:!0,attrs:{class:"language-yaml"}},[e("code",[e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("refit-cleaned")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Initial steps are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" aws.dump.to_s3\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("bucket")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" pilot"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("dm4t\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("path")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" pilot"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("dm4t/packages/refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("cleaned\n")])])]),e("p",[t._v("Running this command again:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ dpp run ./refit-cleaned\n")])])]),e("p",[t._v("And now our data package is published to Amazon the S3 remote storage:")]),t._v(" "),e("p",[e("img",{attrs:{src:"https://i.imgur.com/5Z7EPDR.pnghttps://",alt:"screenshot of S3 storage"}})]),t._v(" "),e("h3",{attrs:{id:"getting-insight-from-data-using-python-libraries"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#getting-insight-from-data-using-python-libraries"}},[t._v("#")]),t._v(" Getting insight from data using Python libraries")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the demostration script: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("The Frictionless Data projects provides various Python (along with other 8 languages) libraries to work with data package programatically. We used the "),e("code",[t._v("datapackage")]),t._v(" library to analyse the "),e("code",[t._v("refit-cleaned")]),t._v(" data package:")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" datetime\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" statistics\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" datapackage "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Package\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get aggregates")]),t._v("\nconsumption "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\npackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'packages/refit-cleaned/datapackage.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" resource "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" row "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("iter")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("keyed"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n hour "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Time'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("hour\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("setdefault"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("append"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("row"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Aggregate'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get averages")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" hour "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" statistics"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("mean"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Print results")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" hour "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("sorted")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("print")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Average consumption at %02d hours: %.0f'")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("%")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Now we could run it in the command line:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ python examles/refit-cleaned.py\nAverage consumption at 00 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("232")]),t._v("\nAverage consumption at 01 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("213")]),t._v("\nAverage consumption at 02 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("247")]),t._v("\nAverage consumption at 03 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("335")]),t._v("\nAverage consumption at 04 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("215")]),t._v("\nAverage consumption at 05 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("690")]),t._v("\nAverage consumption at 06 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("722")]),t._v("\nAverage consumption at 07 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("648")]),t._v("\nAverage consumption at 08 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("506")]),t._v("\nAverage consumption at 09 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("464")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("364")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("11")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("569")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("12")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("520")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("13")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("497")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("14")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("380")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("15")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("383")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("16")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("459")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("17")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("945")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("18")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("733")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("19")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("732")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("20")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("471")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("21")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("478")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("22")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("325")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("23")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("231")]),t._v("\n")])])]),e("p",[t._v("Here we we’re able to get the averages for electricity consumption grouped by hour. We could have achieved this in different ways, but using the Frictionless Data specs and software provides some important advantages:")]),t._v(" "),e("ul",[e("li",[t._v("The fact that we have data wrapped into a data package has allowed us to validate and read the data already converted for its correct types (e.g native python "),e("code",[t._v("datetime")]),t._v(" object). No need for any kind of string parsing.")]),t._v(" "),e("li",[t._v("The Frictionless Data software uses file streams under the hood. This means that only the current row is kept in memory, so we’re able to handle datasets bigger than the available RAM memory.")])]),t._v(" "),e("h3",{attrs:{id:"exporting-data-to-an-elasticsearch-cluster"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#exporting-data-to-an-elasticsearch-cluster"}},[t._v("#")]),t._v(" Exporting data to an ElasticSearch cluster")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the export script: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("The Frictionless Data software provides plugins to export data to various backends like SQL, BigQuery etc. We will export the first resource from our data package for future analysis:")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" elasticsearch "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Elasticsearch\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" datapackage "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Package\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" tableschema_elasticsearch "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Storage\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get resource")]),t._v("\npackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'packages/refit-cleaned/datapackage.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nresource "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Create storage")]),t._v("\nengine "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Elasticsearch"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nstorage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Storage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("engine"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Write data")]),t._v("\nstorage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("create"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'refit-cleaned'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("schema"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("list")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("storage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("write"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'refit-cleaned'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("read"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("keyed"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Unix'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n")])])]),e("p",[t._v("Now we are able to check that our documents are indexed:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ http http://localhost:9200/_cat/indices?v\n")])])]),e("h3",{attrs:{id:"getting-insight-from-data-using-kibana"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#getting-insight-from-data-using-kibana"}},[t._v("#")]),t._v(" Getting insight from data using Kibana")]),t._v(" "),e("p",[t._v("To demonstrate how the Frictionless Data specs and software empower the usage of other analytics tools, we will use ElasticSearch/Kibana project. On the previous step we have imported our data package into an ElasticSearch cluster. It allows us to visualize data using a simple UI:")]),t._v(" "),e("p",[e("img",{attrs:{src:"https://i.imgur.com/Fm373F4.png",alt:"screenshot of elasticsearch cluster"}})]),t._v(" "),e("p",[t._v("In this screenshot we see the distribution of the average electricity comsumption. This is just an example of what you can do by having the ability to easily load datasets into other analytical software.")]),t._v(" "),e("h2",{attrs:{id:"review"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[t._v("#")]),t._v(" Review")]),t._v(" "),e("h3",{attrs:{id:"the-results"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#the-results"}},[t._v("#")]),t._v(" The results")]),t._v(" "),e("p",[t._v("In this pilot, we have been able to demonstrate the the following:")]),t._v(" "),e("ul",[e("li",[t._v("Packaging the "),e("code",[t._v("refit-cleaned")]),t._v(" dataset as a data package using the Data Package Pipelines library")]),t._v(" "),e("li",[t._v("Validating the data package using the Goodtables library")]),t._v(" "),e("li",[t._v("Modifying data packages metadata using the Packagist UI")]),t._v(" "),e("li",[t._v("Uploading the dataset to Amazon S3 and ElasticSearch cluster using Frictionless Data tools")]),t._v(" "),e("li",[t._v("Reading and analysing in Python the created Data Package using the Frictionless Data library")])]),t._v(" "),e("h3",{attrs:{id:"current-limitations"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#current-limitations"}},[t._v("#")]),t._v(" Current limitations")]),t._v(" "),e("p",[t._v("The central challenge of working with these datasets is the size. Publishing the results of these research projects as flat files for immediate analysis is beneficial, however, the scale of each of these datasets (gigabytes of data, millions of rows) is a challenge to deal with no matter how you are storing. Processing this data through Data Package pipelines takes a long time.")]),t._v(" "),e("h3",{attrs:{id:"next-steps"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[t._v("#")]),t._v(" Next Steps")]),t._v(" "),e("ul",[e("li",[t._v("Improve the speed of the data package creation step")])]),t._v(" "),e("h3",{attrs:{id:"find-out-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#find-out-more"}},[t._v("#")]),t._v(" Find Out More")]),t._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-pnnl",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-pnnl"),e("OutboundLink")],1)])]),t._v(" "),e("h3",{attrs:{id:"source-material"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#source-material"}},[t._v("#")]),t._v(" Source Material")]),t._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"https://app.hubspot.com/sales/2281421/deal/146418008",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://app.hubspot.com/sales/2281421/deal/146418008"),e("OutboundLink")],1)]),t._v(" "),e("li",[e("a",{attrs:{href:"https://discuss.okfn.org/c/working-groups/open-archaeology",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://discuss.okfn.org/c/working-groups/open-archaeology"),e("OutboundLink")],1)]),t._v(" "),e("li",[e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-open-archaeology",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-open-archaeology"),e("OutboundLink")],1)])])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[49],{438:function(t,a,s){t.exports=s.p+"assets/img/packagist.835e6f2c.png"},584:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("h1",{attrs:{id:"dm4t-pilot"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#dm4t-pilot"}},[t._v("#")]),t._v(" DM4T Pilot")]),t._v(" "),e("h2",{attrs:{id:"pilot-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#pilot-name"}},[t._v("#")]),t._v(" Pilot Name")]),t._v(" "),e("p",[t._v("Data Management for TEDDINET (DM4T)")]),t._v(" "),e("h2",{attrs:{id:"authors"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#authors"}},[t._v("#")]),t._v(" Authors")]),t._v(" "),e("p",[t._v("Julian Padget (DM4T), Dan Fowler (OKI), Evgeny Karev (OKI)")]),t._v(" "),e("h2",{attrs:{id:"field"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#field"}},[t._v("#")]),t._v(" Field")]),t._v(" "),e("p",[t._v("Energy Data")]),t._v(" "),e("h2",{attrs:{id:"fd-tech-involved"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#fd-tech-involved"}},[t._v("#")]),t._v(" FD Tech Involved")]),t._v(" "),e("ul",[e("li",[t._v("Frictionless Data specs: "),e("a",{attrs:{href:"http://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://specs.frictionlessdata.io/"),e("OutboundLink")],1)]),t._v(" "),e("li",[t._v("Data Package Pipelines: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/datapackage-pipelines"),e("OutboundLink")],1)]),t._v(" "),e("li",[t._v("Goodtables: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/goodtables-py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[e("code",[t._v("packagist")]),t._v(" has now moved to "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("create.frictionlessdata.io"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"context"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#context"}},[t._v("#")]),t._v(" Context")]),t._v(" "),e("h3",{attrs:{id:"problem-we-were-trying-to-solve"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#problem-we-were-trying-to-solve"}},[t._v("#")]),t._v(" Problem We Were Trying To Solve")]),t._v(" "),e("p",[t._v("Open Knowledge International and the Data Management for TEDDINET project (DM4T) agreed to work together on a proof-of-concept pilot to attempt to use Frictionless Data approaches to address some of the data legacy issues facing the TEDDINET project, a research network addressing the challenges of transforming energy demand in our buildings, as a key component of the transition to an affordable, low carbon energy system. The problem as described on the DM4T Website:")]),t._v(" "),e("blockquote",[e("p",[t._v("The Engineering and Physical Sciences Research Council (EPSRC), the UK’s main agency for funding research in engineering and the physical sciences, funded 22 projects over two calls in 2010 and 2012 to investigate Transforming Energy Demand through Digital Innovation’ (TEDDI) as a means to find out how people use energy in homes and what can be done reduce energy consumption. A lot of data is being collected at different levels of detail in a variety of housing throughout the UK, but the level of detail are largely defined by the needs of each individual project. At the same time, the Research Councils UK (RCUK) are defining guidelines for what happens to data generated by projects they fund which require researchers to take concrete actions to store, preserve, and document their data for future reference.")])]),t._v(" "),e("blockquote",[e("p",[t._v("The problem, however, is that there is relatively little awareness, limited experience and only emerging practice of how to incorporate data management into much of physical science research. This is in contrast to established procedures for data formats and sharing in the biosciences, stemming from international collaboration on the Human Genome Project, and in the social sciences, where data from national surveys, including census data, have been centrally archived for many years. Consequently, current solutions may be able to meet a minimal interpretation of the requirements, but not effectively deliver the desired data legacy.")])]),t._v(" "),e("p",[t._v("The DM4T group selected three suitable datasets to on which to base this work and provided domain knowledge to ensure the pilot is applicable to real use cases.")]),t._v(" "),e("p",[t._v("Output was tracked here: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/issues"),e("OutboundLink")],1)]),t._v(" "),e("h2",{attrs:{id:"the-work"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#the-work"}},[t._v("#")]),t._v(" The work")]),t._v(" "),e("p",[t._v("We will use the "),e("code",[t._v("refit-cleaned")]),t._v(" dataset to show the Frictionless Data specs and software capabilities. For this work, we limited the size of this dataset in order to preserve a reasonable showcasing time. However, by design the Frictionless Data software has a very good scalability and this process could be reproduced for the whole dataset. But for now it is worth noting that the speed for such a big datasets could be a bottle neck for a research work.")]),t._v(" "),e("h3",{attrs:{id:"refit-electrical-load-measurements-cleaned"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#refit-electrical-load-measurements-cleaned"}},[t._v("#")]),t._v(" REFIT: Electrical Load Measurements (Cleaned)")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the dataset: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/tree/delivery/datasets/refit-cleaned",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/tree/delivery/datasets/refit-cleaned"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("For each house in the study, this dataset consists of granular readings of electrical load. There were 20 houses in total, and each house had a different mix of devices plugged into the electrical load sensor. The dataset was distributed as a zipped file (~500MB) containing 20 CSVs with a combined ~120 million rows.")]),t._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[t._v("Time,Unix,Aggregate,Appliance1,Appliance2,Appliance3,Appliance4,Appliance5,Appliance6,Appliance7,Appliance8,Appliance9\n2013-10-09 13:06:17,1381323977,523,74,0,69,0,0,0,0,0,1\n2013-10-09 13:06:31,1381323991,526,75,0,69,0,0,0,0,0,1\n2013-10-09 13:06:46,1381324006,540,74,0,68,0,0,0,0,0,1\n2013-10-09 13:07:01,1381324021,532,74,0,68,0,0,0,0,0,1\n2013-10-09 13:07:15,1381324035,540,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:18,1381324038,539,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:30,1381324050,537,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:32,1381324052,537,74,0,69,0,0,0,0,0,1\n2013-10-09 13:07:44,1381324064,548,74,0,69,0,0,0,0,0,1\n")])])]),e("p",[t._v("Given that these datasets were already provided in well structured CSV files, it was straightforward to translate the data dictionary found in the dataset’s README into the relevant fields in the datapackage.json. We did not need to alter the CSVs that comprise the dataset.")]),t._v(" "),e("h3",{attrs:{id:"creating-a-data-package-using-datapackage-pipelines"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-data-package-using-datapackage-pipelines"}},[t._v("#")]),t._v(" Creating a data package using Datapackage Pipelines")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the Datapackage Pipelines project: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/datapackage-pipelines"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("Datapackage Pipelines is a framework for declarative stream-processing of tabular data. It is built upon the concepts and tooling of the Frictionless Data project. The basic concept in this framework is the pipeline. A pipeline has a list of processing steps, and it generates a single data package as its output. Pipelines are defined in a declarative way, not in code. One or more pipelines can be defined in a "),e("code",[t._v("pipeline-spec.yaml")]),t._v(" file. This file specifies the list of processors (referenced by name) and their execution parameters.")]),t._v(" "),e("p",[t._v("One of the main purposes of the Frictionless Data project is data containerization. It means that instead of having two separated data knowledge sources (data files and text readme), we’re going to put both of them into a container based on the "),e("code",[t._v("Data Package")]),t._v(" specification. This allows us to:")]),t._v(" "),e("ul",[e("li",[t._v("Ensure that the dataset description is shipped with the data files")]),t._v(" "),e("li",[t._v("Provide column data type information to allow type validation")]),t._v(" "),e("li",[t._v("Use the Frictionless Data tooling for reading and validating datasets")]),t._v(" "),e("li",[t._v("Enable usage of other software which supports Frictionless Data specifications")])]),t._v(" "),e("p",[t._v("First, we used the "),e("code",[t._v("datapackage-pipeline")]),t._v(" library to create a data package from the raw dataset. We need a declarative file called "),e("code",[t._v("datapackage-pipelines.yaml")]),t._v(" to describe data transformations steps:")]),t._v(" "),e("blockquote",[e("p",[t._v("datapackage-pipelines.yaml")])]),t._v(" "),e("div",{staticClass:"language-yaml extra-class"},[e("pre",{pre:!0,attrs:{class:"language-yaml"}},[e("code",[e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("refit-cleaned")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("pipeline")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" add_metadata\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("name")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("electrical"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("load"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("measurements\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("title")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'REFIT: Electrical Load Measurements'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("license")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" CC"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("BY"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.0")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Collection of this dataset was supported by the Engineering and Physical Sciences Research Council (EPSRC) via the project entitled Personalised Retrofit Decision Support Tools for UK Homes using Smart Home Technology (REFIT)"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" which is a collaboration among the Universities of Strathclyde"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" Loughborough and East Anglia. The dataset includes data from 20 households from the Loughborough area over the period 2013 "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" 2015. Additional information about REFIT is available from www.refitsmarthomes.org.\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("sources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("title")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'REFIT: Electrical Load Measurements (Cleaned)'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("web")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://pure.strath.ac.uk/portal/en/datasets/refit-electrical-load-measurements-cleaned(9ab14b0e-19ac-4279-938f-27f643078cec).html'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("email")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" researchdataproject@strath.ac.uk\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" add_resource\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("name")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("url")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datasets/refit-cleaned/House_1.csv'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" csv\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Other resources are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" stream_remote_resources\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" set_types\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("resources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"house-[0-9]{1,2}"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("types")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"Time"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" datetime\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("format")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt:%Y-%m-%d %H:%M:%S"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Unix")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Aggregate")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v('"Appliance[1-9]"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("type")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" integer\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" processors.modify_descriptions\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("resources")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" house"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("descriptions")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Fridge\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance2")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Chest Freezer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance3")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Upright Freezer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance4")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Tumble Dryer\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance5")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("descripion")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Washing Machine\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance6")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Dishwasher\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance7")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Computer Site\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance8")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Television Site\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("Appliance9")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("description")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" Electric Heater\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Other resources are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" dump.to_path\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("out-path")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" packages/refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("cleaned\n")])])]),e("p",[t._v("The process follows contains these steps:")]),t._v(" "),e("ul",[e("li",[t._v("Create the data package metadata")]),t._v(" "),e("li",[t._v("Add all data files from the disc")]),t._v(" "),e("li",[t._v("Start resources streaming into the data package")]),t._v(" "),e("li",[t._v("Update resources descriptions using a custom processor")]),t._v(" "),e("li",[t._v("Save the data package to the disc")])]),t._v(" "),e("p",[t._v("Now we’re ready to run this pipeline:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ dpp run ./refit-cleaned\n")])])]),e("p",[t._v("After this step we have a data package containing a descriptor:")]),t._v(" "),e("blockquote",[e("p",[t._v("packages/refit-cleaned/datapakcage.json")])]),t._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"bytes"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1121187")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"count_of_rows"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("19980")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Collection of this dataset was supported by the Engineering and Physical Sciences Research Council (EPSRC) via the project entitled Personalised Retrofit Decision Support Tools for UK Homes using Smart Home Technology (REFIT), which is a collaboration among the Universities of Strathclyde, Loughborough and East Anglia. The dataset includes data from 20 households from the Loughborough area over the period 2013 - 2015. Additional information about REFIT is available from www.refitsmarthomes.org."')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"hash"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"433ff35135e0a43af6f00f04cb8e666d"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"license"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"CC-BY-4.0"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"refit-electrical-load-measurements"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"resources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"bytes"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("55251")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"count_of_rows"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("999")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dialect"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"delimiter"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('","')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"doubleQuote"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"lineTerminator"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"\\r\\n"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"quoteChar"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"\\""')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"skipInitialSpace"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dpp:streamedFrom"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"datasets/refit-cleaned/House_1.csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"dpp:streaming"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"encoding"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"utf-8"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"format"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"hash"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"ad42fbf1302cabe30e217ff105d5a7fd"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"house-1"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"path"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data/house-1.csv"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"schema"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"fields"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"format"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%Y-%m-%d %H:%M:%S"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Time"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"datetime"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Unix"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Aggregate"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Fridge"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance1"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Chest Freezer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance2"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Upright Freezer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance3"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Tumble Dryer"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance4"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"descripion"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Washing Machine"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance5"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Dishwasher"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance6"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Computer Site"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance7"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Television Site"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance8"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"description"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Electric Heater"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Appliance9"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"integer"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n\n # Other resources is omitted\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),e("p",[t._v("And a list of data files linked in the descriptor:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ "),e("span",{pre:!0,attrs:{class:"token function"}},[t._v("ls")]),t._v(" packages/refit-cleaned/data\nhouse-10.csv house-13.csv house-17.csv house-1.csv house-2.csv house-5.csv house-8.csv\nhouse-11.csv house-15.csv house-18.csv house-20.csv house-3.csv house-6.csv house-9.csv\nhouse-12.csv house-16.csv house-19.csv house-21.csv house-4.csv house-7.csv\n")])])]),e("h3",{attrs:{id:"validating-a-data-package-using-goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-a-data-package-using-goodtables"}},[t._v("#")]),t._v(" Validating a data package using Goodtables")]),t._v(" "),e("p",[t._v("Goodtables is a software family for tabular data validation. It’s available as a Python library, a command line tool, "),e("a",{attrs:{href:"https://try.goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("web application"),e("OutboundLink")],1),t._v(" and "),e("a",{attrs:{href:"https://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("continuous validation service"),e("OutboundLink")],1),t._v(".")]),t._v(" "),e("p",[t._v("The main features of Goodtables are:")]),t._v(" "),e("ul",[e("li",[t._v("Structural checks: Ensure that there are no empty rows, no blank headers, etc.")]),t._v(" "),e("li",[t._v("Content checks: Ensure that the values have the correct types (“string”, “number”, “date”, etc.), that their format is valid (“string must be an e-mail”), and that they respect the constraints (“age must be a number greater than 18”).")]),t._v(" "),e("li",[t._v("Support for multiple tabular formats: CSV, Excel files, LibreOffice, Data Package, etc.")]),t._v(" "),e("li",[t._v("Parallelized validations for multi-table datasets")])]),t._v(" "),e("p",[t._v("Because we have provided data types for the columns at the wrapping stage, here we validate both the data structure and compliance to the data types using the Goodtables command line interface:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ goodtables packages/refit-cleaned/datapackage.json\nDATASET\n"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'error-count'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'preset'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datapackage'")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table-count'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("20")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'time'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.694")]),t._v(",\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v(":")]),t._v(" True"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),e("h3",{attrs:{id:"modifying-a-data-package-using-packagist"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#modifying-a-data-package-using-packagist"}},[t._v("#")]),t._v(" Modifying a data package using Packagist")]),t._v(" "),e("p",[t._v("If we need to modify our data package, we could use the "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Packagist"),e("OutboundLink")],1),t._v(". It incorporates a straightforward UI to modify and validate data package descriptors. With its easy to use interface we are able to:")]),t._v(" "),e("ul",[e("li",[t._v("Load/validate/save a data package")]),t._v(" "),e("li",[t._v("Update a data package metadata")]),t._v(" "),e("li",[t._v("Add/remove/modify data package resources")]),t._v(" "),e("li",[t._v("Add/remove/modify data resource fields")]),t._v(" "),e("li",[t._v("Set type/format for data values")])]),t._v(" "),e("p",[e("img",{attrs:{src:s(438),alt:"ADBio"}})]),t._v(" "),e("p",[t._v("On the figure above we have loaded the "),e("code",[t._v("refit-cleaned")]),t._v(" data package into the Packagist UI to make changes to the data package as needed.")]),t._v(" "),e("h3",{attrs:{id:"publishing-a-data-package-to-amazon-s3"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing-a-data-package-to-amazon-s3"}},[t._v("#")]),t._v(" Publishing a data package to Amazon S3")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the published package: "),e("a",{attrs:{href:"https://s3.eu-central-1.amazonaws.com/pilot-dm4t/pilot-dm4t/packages/refit-cleaned/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://s3.eu-central-1.amazonaws.com/pilot-dm4t/pilot-dm4t/packages/refit-cleaned/datapackage.json"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("In this section we will show how data packages can be moved from one data storage system to another. This is possible because it has been containerised.")]),t._v(" "),e("p",[t._v("One important feature of the "),e("code",[t._v("datapackage-pipelines")]),t._v(" project that it works as a conveyor. We could push our data package not only to the local disc but to other destinations. For example to the Amazon S3:")]),t._v(" "),e("blockquote",[e("p",[t._v("pipelines-spec.yml")])]),t._v(" "),e("div",{staticClass:"language-yaml extra-class"},[e("pre",{pre:!0,attrs:{class:"language-yaml"}},[e("code",[e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("refit-cleaned")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Initial steps are omitted")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("run")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" aws.dump.to_s3\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("parameters")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("bucket")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" pilot"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("dm4t\n "),e("span",{pre:!0,attrs:{class:"token key atrule"}},[t._v("path")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" pilot"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("dm4t/packages/refit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("-")]),t._v("cleaned\n")])])]),e("p",[t._v("Running this command again:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ dpp run ./refit-cleaned\n")])])]),e("p",[t._v("And now our data package is published to Amazon the S3 remote storage:")]),t._v(" "),e("p",[e("img",{attrs:{src:"https://i.imgur.com/5Z7EPDR.pnghttps://",alt:"screenshot of S3 storage"}})]),t._v(" "),e("h3",{attrs:{id:"getting-insight-from-data-using-python-libraries"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#getting-insight-from-data-using-python-libraries"}},[t._v("#")]),t._v(" Getting insight from data using Python libraries")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the demostration script: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("The Frictionless Data projects provides various Python (along with other 8 languages) libraries to work with data package programatically. We used the "),e("code",[t._v("datapackage")]),t._v(" library to analyse the "),e("code",[t._v("refit-cleaned")]),t._v(" data package:")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" datetime\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" statistics\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" datapackage "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Package\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get aggregates")]),t._v("\nconsumption "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\npackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'packages/refit-cleaned/datapackage.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" resource "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" row "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("iter")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("keyed"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n hour "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Time'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("hour\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("setdefault"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("append"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("row"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Aggregate'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get averages")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" hour "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" statistics"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("mean"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Print results")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" hour "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("sorted")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("print")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Average consumption at %02d hours: %.0f'")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("%")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" consumption"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("hour"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Now we could run it in the command line:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ python examles/refit-cleaned.py\nAverage consumption at 00 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("232")]),t._v("\nAverage consumption at 01 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("213")]),t._v("\nAverage consumption at 02 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("247")]),t._v("\nAverage consumption at 03 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("335")]),t._v("\nAverage consumption at 04 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("215")]),t._v("\nAverage consumption at 05 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("690")]),t._v("\nAverage consumption at 06 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("722")]),t._v("\nAverage consumption at 07 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("648")]),t._v("\nAverage consumption at 08 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("506")]),t._v("\nAverage consumption at 09 hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("464")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("364")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("11")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("569")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("12")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("520")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("13")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("497")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("14")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("380")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("15")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("383")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("16")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("459")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("17")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("945")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("18")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("733")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("19")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("732")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("20")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("471")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("21")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("478")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("22")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("325")]),t._v("\nAverage consumption at "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("23")]),t._v(" hours: "),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("231")]),t._v("\n")])])]),e("p",[t._v("Here we we’re able to get the averages for electricity consumption grouped by hour. We could have achieved this in different ways, but using the Frictionless Data specs and software provides some important advantages:")]),t._v(" "),e("ul",[e("li",[t._v("The fact that we have data wrapped into a data package has allowed us to validate and read the data already converted for its correct types (e.g native python "),e("code",[t._v("datetime")]),t._v(" object). No need for any kind of string parsing.")]),t._v(" "),e("li",[t._v("The Frictionless Data software uses file streams under the hood. This means that only the current row is kept in memory, so we’re able to handle datasets bigger than the available RAM memory.")])]),t._v(" "),e("h3",{attrs:{id:"exporting-data-to-an-elasticsearch-cluster"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#exporting-data-to-an-elasticsearch-cluster"}},[t._v("#")]),t._v(" Exporting data to an ElasticSearch cluster")]),t._v(" "),e("blockquote",[e("p",[t._v("Link to the export script: "),e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-dm4t/blob/delivery/scripts/refit-cleaned.py"),e("OutboundLink")],1)])]),t._v(" "),e("p",[t._v("The Frictionless Data software provides plugins to export data to various backends like SQL, BigQuery etc. We will export the first resource from our data package for future analysis:")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" elasticsearch "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Elasticsearch\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" datapackage "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Package\n"),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" tableschema_elasticsearch "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" Storage\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Get resource")]),t._v("\npackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'packages/refit-cleaned/datapackage.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nresource "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" package"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Create storage")]),t._v("\nengine "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Elasticsearch"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nstorage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Storage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("engine"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# Write data")]),t._v("\nstorage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("create"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'refit-cleaned'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("schema"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),e("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("list")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("storage"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("write"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'refit-cleaned'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'house-1'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" resource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("read"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("keyed"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Unix'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n")])])]),e("p",[t._v("Now we are able to check that our documents are indexed:")]),t._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[t._v("$ http http://localhost:9200/_cat/indices?v\n")])])]),e("h3",{attrs:{id:"getting-insight-from-data-using-kibana"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#getting-insight-from-data-using-kibana"}},[t._v("#")]),t._v(" Getting insight from data using Kibana")]),t._v(" "),e("p",[t._v("To demonstrate how the Frictionless Data specs and software empower the usage of other analytics tools, we will use ElasticSearch/Kibana project. On the previous step we have imported our data package into an ElasticSearch cluster. It allows us to visualize data using a simple UI:")]),t._v(" "),e("p",[e("img",{attrs:{src:"https://i.imgur.com/Fm373F4.png",alt:"screenshot of elasticsearch cluster"}})]),t._v(" "),e("p",[t._v("In this screenshot we see the distribution of the average electricity comsumption. This is just an example of what you can do by having the ability to easily load datasets into other analytical software.")]),t._v(" "),e("h2",{attrs:{id:"review"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#review"}},[t._v("#")]),t._v(" Review")]),t._v(" "),e("h3",{attrs:{id:"the-results"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#the-results"}},[t._v("#")]),t._v(" The results")]),t._v(" "),e("p",[t._v("In this pilot, we have been able to demonstrate the the following:")]),t._v(" "),e("ul",[e("li",[t._v("Packaging the "),e("code",[t._v("refit-cleaned")]),t._v(" dataset as a data package using the Data Package Pipelines library")]),t._v(" "),e("li",[t._v("Validating the data package using the Goodtables library")]),t._v(" "),e("li",[t._v("Modifying data packages metadata using the Packagist UI")]),t._v(" "),e("li",[t._v("Uploading the dataset to Amazon S3 and ElasticSearch cluster using Frictionless Data tools")]),t._v(" "),e("li",[t._v("Reading and analysing in Python the created Data Package using the Frictionless Data library")])]),t._v(" "),e("h3",{attrs:{id:"current-limitations"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#current-limitations"}},[t._v("#")]),t._v(" Current limitations")]),t._v(" "),e("p",[t._v("The central challenge of working with these datasets is the size. Publishing the results of these research projects as flat files for immediate analysis is beneficial, however, the scale of each of these datasets (gigabytes of data, millions of rows) is a challenge to deal with no matter how you are storing. Processing this data through Data Package pipelines takes a long time.")]),t._v(" "),e("h3",{attrs:{id:"next-steps"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#next-steps"}},[t._v("#")]),t._v(" Next Steps")]),t._v(" "),e("ul",[e("li",[t._v("Improve the speed of the data package creation step")])]),t._v(" "),e("h3",{attrs:{id:"find-out-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#find-out-more"}},[t._v("#")]),t._v(" Find Out More")]),t._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-pnnl",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-pnnl"),e("OutboundLink")],1)])]),t._v(" "),e("h3",{attrs:{id:"source-material"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#source-material"}},[t._v("#")]),t._v(" Source Material")]),t._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"https://app.hubspot.com/sales/2281421/deal/146418008",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://app.hubspot.com/sales/2281421/deal/146418008"),e("OutboundLink")],1)]),t._v(" "),e("li",[e("a",{attrs:{href:"https://discuss.okfn.org/c/working-groups/open-archaeology",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://discuss.okfn.org/c/working-groups/open-archaeology"),e("OutboundLink")],1)]),t._v(" "),e("li",[e("a",{attrs:{href:"https://github.com/frictionlessdata/pilot-open-archaeology",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://github.com/frictionlessdata/pilot-open-archaeology"),e("OutboundLink")],1)])])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/5.74ceecdf.js b/assets/js/5.79072a51.js similarity index 99% rename from assets/js/5.74ceecdf.js rename to assets/js/5.79072a51.js index 12d94edce..cdda982d8 100644 --- a/assets/js/5.74ceecdf.js +++ b/assets/js/5.79072a51.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[5],{478:function(e,a,t){e.exports=t.p+"assets/img/figure-1.0b5d5da2.png"},479:function(e,a,t){e.exports=t.p+"assets/img/figure-2.a4cda338.png"},480:function(e,a,t){e.exports=t.p+"assets/img/figure-3.e234c78e.png"},481:function(e,a){e.exports=""},482:function(e,a,t){e.exports=t.p+"assets/img/figure-4.65dd4176.png"},483:function(e,a,t){e.exports=t.p+"assets/img/figure-5.a7c23193.png"},484:function(e,a,t){e.exports=t.p+"assets/img/figure-6.5520e95a.png"},485:function(e,a,t){e.exports=t.p+"assets/img/figure-7.866156c7.png"},486:function(e,a,t){e.exports=t.p+"assets/img/figure-8.3afbab68.png"},612:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,o=e._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"include-a-data-schema"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#include-a-data-schema"}},[e._v("#")]),e._v(" Include a data schema")]),e._v(" "),o("p",[e._v("Simply put, a schema is a blueprint that tells us how your data is structured, and what type of content is to be expected in it. You can think of it as a data dictionary. Having a table schema at hand makes it possible to run more precise validation checks on your data, both at a structural and content level.")]),e._v(" "),o("p",[e._v("For this section, we will use the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"http://datahub.io/core/gdp",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package")]),e._v(" is a format that makes it possible to put your data collection and relevant information that provides context about your data in one container before you share it. All contextual information, such as metadata and your data schema, is published in a JSON file named "),o("em",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package Creator")]),e._v(" is an online service that facilitates the creation and editing of data packages. The service automatically generates a "),o("em",[e._v("datapackage.json")]),e._v(" file for you as you add and edit data that is part of your data collection. We refer to each piece of data in a data collection as a "),o("strong",[e._v("data resource")]),e._v(".")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" loads with dummy data to make it easy to understand how metadata and sample resources help generate the "),o("em",[e._v("datapackage.json")]),e._v(" file. There are three ways in which a user can add data resources on "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[e._v("Provide a hyperlink to your data resource (highly recommended).")])]),e._v(" "),o("p",[e._v("If your data resource is publicly available, like on GitHub or in a data repository, simply obtain the URL and paste it in the "),o("strong",[e._v("Path")]),e._v(" section. To learn how to publish your data resource online, check the publish your dataset section.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Create your data resource within the service.")])]),e._v(" "),o("p",[e._v("If your data resource isn’t published online, you’ll have to define its fields from scratch. Depending on how complex is your data, this can be time consuming, but it’s still easier than creating the descriptor JSON file from scratch.This option is time consuming, as a user has to manually create each field of a data resource. However, this is simpler than learning how to create a JSON file from scratch.")]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[o("strong",[e._v("Load a Data Package")]),e._v(" option")])]),e._v(" "),o("p",[e._v("With this option, you can load a pre-existing "),o("em",[e._v("datapackage.json")]),e._v(" file to view and edit its metadata and resource fields.")]),e._v(" "),o("p",[e._v("Our "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/gross-domestic-product-all/data/gdp.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(" dataset is publicly available on GitHub.")]),e._v(" "),o("p",[e._v("Obtain a link to the raw CSV file by clicking on the Raw button at the top right corner of the GitHub file preview page, as shown in figure 1 below. The resulting hyperlink looks like "),o("a",{attrs:{href:"https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv"),o("OutboundLink")],1)]),e._v(" "),o("p",[o("img",{attrs:{src:t(478),alt:"Above, raw button highlighted in red"}}),o("br"),e._v(" "),o("em",[e._v("Figure 1: Above, raw button highlighted in red.")])]),e._v(" "),o("p",[e._v("Paste your hyperlink in the "),o("em",[e._v("Path")]),e._v(" section and click on the "),o("em",[e._v("Load")]),e._v(" button. Each column in your table translates to a "),o("em",[e._v("field")]),e._v(". You should be prompted to add all fields identified in your data resource, as in Figure 2 below. Click on the prompt to load the fields.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(479),alt:"annotated in red, a prompt to add all fields inferred from your data resource"}}),o("br"),e._v(" "),o("em",[e._v("Figure 2: annotated in red, a prompt to add all fields inferred from your data resource.")])]),e._v(" "),o("p",[e._v("The page that follows looks like Figure 3 below. Each column from the GDP dataset has been mapped to a "),o("em",[e._v("field")]),e._v(". The data type for each column has been inferred correctly, and we can preview data under each field by hovering over the field name. It is also possible to edit all sections of our data resource’s fields as we can see below.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(480),alt:"all fields inferred from your data resource"}}),o("br"),e._v(" "),o("em",[e._v("Figure 3: all fields inferred from your data resource.")])]),e._v(" "),o("p",[e._v("You can now edit data types and formats as necessary, and optionally add titles and descriptive information to your fields. For example, the data type for our {Year} field should be "),o("em",[o("strong",[e._v("year")])]),e._v(" and not "),o("em",[o("strong",[e._v("integer")])]),e._v(". Our {Value} column has numeric information with decimal places.")]),e._v(" "),o("p",[e._v("By definition, values under the "),o("em",[o("strong",[e._v("integer")])]),e._v(" data type are whole numbers. The "),o("em",[o("strong",[e._v("number")])]),e._v(" data type is more appropriate for the {Value} column. When in doubt about what data type to use, consult the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#types-and-formats",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema data types cheat sheet"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Click on the "),o("img",{attrs:{src:t(481),alt:""}}),e._v(" icon to pick a suitable profile for your data resource. "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/profiles/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s more information about Frictionless Data profiles"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on Add Resource, and repeating the same process as we just did.")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on "),o("strong",[e._v("Add Resource")]),e._v(", and repeating the same process as we just did.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(482),alt:"Prompt to add more data resources"}}),o("br"),e._v(" "),o("em",[e._v("Figure 4: Prompt to add more data resources.")])]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"add-descriptive-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#add-descriptive-metadata"}},[e._v("#")]),e._v(" Add descriptive metadata")]),e._v(" "),o("p",[e._v("In the previous section, we described metadata for each of our datasets, but we’re still missing metadata for our collection of datasets. You can add it via the "),o("strong",[e._v("Metadata")]),e._v(" section on the left side bar, describing things like the dataset name, description, author, license, etc.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(483),alt:"Add Data Package Metadata"}})]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Profile")]),e._v(" section under metadata allows us to specify what kind of data collection we are packaging.")]),e._v(" "),o("ul",[o("li",[o("p",[o("em",[e._v("Data Package")]),o("br"),e._v("\nThis is the base, more general profile. Use it if your dataset contains resources of mixed formats, like tabular and geographical data. The base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Tabular Data Package")]),o("br"),e._v("\nIf your data contains only tabular resources like CSVs and spreadsheets, use the Tabular Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Fiscal Data Package")]),o("br"),e._v("\nIf your data contains fiscal information like budgets and expenditure data, use the Fiscal Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fiscal Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])])]),e._v(" "),o("p",[e._v("In our example, as we only have a CSV data resource, the "),o("em",[e._v("Tabular Data Package")]),e._v(" profile is the best option.")]),e._v(" "),o("p",[e._v("In the "),o("strong",[e._v("Keywords")]),e._v(" section, you can add any keywords that helps make your data collection more discoverable. For our dataset, we might use the keywords "),o("em",[e._v("GDP, National Accounts, National GDP, Regional GDP")]),e._v(". Other datasets could include the country name, dataset area (e.g. “health” or “environmental”), etc.")]),e._v(" "),o("p",[e._v("Now that we have created a Data Package, we can "),o("strong",[e._v("Validate")]),e._v(" or "),o("strong",[e._v("Download")]),e._v(" it. But first, let’s see what our datapackage.json file looks like. With every addition and modification, the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" has been populating the "),o("em",[e._v("datapackage.json")]),e._v(" file for us. Click on the "),o("strong",[e._v("{···}")]),e._v(" icon to view the "),o("em",[e._v("datapackage.json")]),e._v(" file. As you can see below, any edit we make to the description of the Value field reflects on the JSON file in real time.")]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Validate")]),e._v(" button allows us to confirm whether we chose the correct Profile for our Data Package. The two possible outcomes at this stage are:")]),e._v(" "),o("p",[o("img",{attrs:{src:t(484),alt:"Data Package is Invalid"}})]),e._v(" "),o("p",[e._v("This message appears when there is some validation error like if we miss some required attribute (e.g. the data package name), or have picked an incorrect profile (e.g. Tabular Data Package with geographical data)… Review the metadata and profiles to find the mistake and try validating again.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(485),alt:"Data Package is Valid"}})]),e._v(" "),o("p",[e._v("All good! This message means that your data package is valid, and we can download it.")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"create-data-packages"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#create-data-packages"}},[e._v("#")]),e._v(" Create Data Packages")]),e._v(" "),o("p",[e._v("As we said earlier, the base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file, which contains your data schema and metadata. We call this the descriptor file. You can download your descriptor file by clicking on the "),o("strong",[e._v("Download")]),e._v(" button.")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("If your data resources, like ours, were linked from an online public source, sharing the "),o("em",[e._v("datapackage.json")]),e._v(" file is sufficient, since it contains URLs to your data resources.")])]),e._v(" "),o("li",[o("p",[e._v("If you manually created a data resource and its fields, remember to add all your data resources and the downloaded "),o("em",[e._v("datapackage.json")]),e._v(" file in one folder before sharing it.")])])]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Your final Data Package file directory should look like this:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n dataresource1.csv\n dataresource2.csv\ndatapackage.json\n")])])]),o("ul",[o("li",[o("p",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there is only one: "),o("code",[e._v("data/gdp.csv")]),e._v(" .")])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files.")])])]),e._v(" "),o("p",[e._v("Congratulations! You have now created a schema for your data, and combined it with descriptive metadata and your data collection to create your first data package!")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"validate-your-packaged-data-automatically"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-your-packaged-data-automatically"}},[e._v("#")]),e._v(" Validate your packaged data automatically")]),e._v(" "),o("p",[e._v("Running continuous checks on data provides regular feedback and contributes to better data quality as errors can be flagged and fixed early on.")]),e._v(" "),o("p",[e._v("In this section, you will learn how to setup automatic tabular data validation using goodtables, so your data is validated every time it’s updated. Although not strictly necessary, it’s useful to "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[e._v("know about Data Packages and Table Schema")]),e._v(" before proceeding, as they allow you to describe your data in more detail, allowing more advanced validations.")],1),e._v(" "),o("p",[e._v("We will show how to set up automated tabular data validations for data published on:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v(", an open source platform for publishing data in the open, that makes it easy to discover, use and share data;")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),o("OutboundLink")],1),e._v(", a web platform for collaborating on projects as well as publishing, sharing and storing resources, such as data files;")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Amazon S3"),o("OutboundLink")],1),e._v(", a data storage service by Amazon.")])]),e._v(" "),o("p",[e._v("Even if you don’t use any of these platforms, you can still setup the validation using "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables-py"),o("OutboundLink")],1),e._v(", it will just require some technical knowledge")]),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-ckan"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-ckan"}},[e._v("#")]),e._v(" Validate tabular data automatically on CKAN")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v(" is an open source platform for publishing data online. It is widely used across the planet, including by the federal governments of the USA, United Kingdom, Brazil, and others.")]),e._v(" "),o("p",[e._v("To automatically validate tabular data on CKAN, enable the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension, which uses goodtables to run continuous checks on your data. The "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension:")]),e._v(" "),o("ul",[o("li",[e._v("Adds a badge next to each dataset showing the status of their validation (valid or invalid), and")]),e._v(" "),o("li",[e._v("allows users to access the validation report, making it possible for errors to be identified and fixed.")])]),e._v(" "),o("p",[o("img",{attrs:{src:t(486),alt:"annotated in red, automated validation checks on datasets in CKAN"}}),o("br"),e._v(" "),o("em",[e._v("Figure 8: annotated in red, automated validation checks on datasets in CKAN.")])]),e._v(" "),o("p",[e._v("The installation and usage instructions for "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension are available on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-github"}},[e._v("#")]),e._v(" Validate tabular data automatically on GitHub")]),e._v(" "),o("p",[e._v("If your data is hosted on GitHub, you can use "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://goodtables.io"),o("OutboundLink")],1),e._v(" to automatically validate it on every change.")]),e._v(" "),o("p",[e._v("For this section, you will first need to create a "),o("a",{attrs:{href:"https://help.github.com/articles/create-a-repo/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repository"),o("OutboundLink")],1),e._v(" and add tabular data to it.")]),e._v(" "),o("p",[e._v("Once you have tabular data in your Github repository:")]),e._v(" "),o("ol",[o("li",[e._v("Login on "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" using your GitHub account and accept the permissions confirmation.")]),e._v(" "),o("li",[e._v("Once we’ve synchronized your repository list, go to the "),o("a",{attrs:{href:"https://goodtables.io/settings",target:"_blank",rel:"noopener noreferrer"}},[e._v("Manage Sources"),o("OutboundLink")],1),e._v(" page and enable the repository with the data you want to validate.\n"),o("ul",[o("li",[e._v("If you can’t find the repository, try clicking on the Refresh button on the Manage Sources page")])])])]),e._v(" "),o("p",[e._v("Goodtables will then validate all tabular data files (CSV, XLS, XLSX, ODS) and "),o("RouterLink",{attrs:{to:"/data-package/"}},[e._v("data packages")]),e._v(" in the repository. These validations will be executed on every change, including pull requests.")],1),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-amazon-s3"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-amazon-s3"}},[e._v("#")]),e._v(" Validate tabular data automatically on Amazon S3")]),e._v(" "),o("p",[e._v("If your data is hosted on Amazon S3, you can use "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://goodtables.io"),o("OutboundLink")],1),e._v(" to automatically validate it on every change.")]),e._v(" "),o("p",[e._v("It is a technical process to set up, as you need to know how to configure your Amazon S3 bucket. However, once it’s configured, the validations happen automatically on any tabular data created or updated. Find the detailed instructions "),o("a",{attrs:{href:"https://docs.goodtables.io/getting_started/s3.html",title:"Validating data on Amazon S3",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"custom-setup-of-automatic-tabular-data-validation"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#custom-setup-of-automatic-tabular-data-validation"}},[e._v("#")]),e._v(" Custom setup of automatic tabular data validation")]),e._v(" "),o("p",[e._v("If you don’t use any of the oficially supported data publishing platforms, you can use "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables-py"),o("OutboundLink")],1),e._v(" directly to validate your data. This is the most flexible option, as you can configure exactly when, and how your tabular data is validated. For example, if your data come from an external source, you could validate it once before you process it (so you catch errors in the source data), and once after cleaning, just before you publish it, so you catch errors introduced by your data processing.")]),e._v(" "),o("p",[e._v("The instructions on how to do this are technical, and can be found on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables-py"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"publish-packaged-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data"}},[e._v("#")]),e._v(" Publish packaged data")]),e._v(" "),o("p",[e._v("Creating and Sharing Data Packages is important for both data publishers and data users because it provides a common and open specification to describe your dataset’s metadata. This facilitates data reuse, as users don’t need to understand each data publisher’s specific metadata format, and as the specification is machine-readable, it also allows tools to parse the metadata. This enables software to:")]),e._v(" "),o("ul",[o("li",[e._v("Import the data packages into different tools and languages, like Python and R")]),e._v(" "),o("li",[e._v("Validate the data contents according to the schema described in the data package")]),e._v(" "),o("li",[e._v("Convert the data package into other formats, for example loading it into a SQL database for further analysis")])]),e._v(" "),o("p",[e._v("Although these reasons are not unique to publishing data as data packages, here’s why we think data publishers should consider publishing in this format:")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Archiving data collections using data packages ensure data publishers can update data more efficiently at any time. The associated schema is a guide on existing data fields and acceptable data types for individual tabular data resources and can be easily built upon.")])]),e._v(" "),o("li",[o("p",[e._v("Sharing data with descriptive metadata and its associated schema provides context for your data no matter where it is used, and significantly cuts down on time spent researching data provenance before using acquired data.")])]),e._v(" "),o("li",[o("p",[e._v("Data Packages allow for accountability and enrich the feedback process as data publishers can add metadata with contact information for users to reach out to them and licensing to spell out accepted use of published data.")])])]),e._v(" "),o("p",[e._v("If don’t need your own data portal, there are many platforms where you can publish your data (if you need your own, check "),o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v("). In the section below, we dive into a few options. Read along and decide what option is most suitable:")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-in-our-community-ckan-instance"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-in-our-community-ckan-instance"}},[e._v("#")]),e._v(" Publish packaged data in our community CKAN instance")]),e._v(" "),o("p",[e._v("CKAN is an open source platform for publishing data that makes it easy to discover, use and share data. "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is a public instance of CKAN that allows anyone to publish their data.")]),e._v(" "),o("p",[e._v("Here’s why you should consider creating an organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" and publishing datasets therein:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is free for all to use! The file upload size limit on the platform is currently 100mb.")]),e._v(" "),o("li",[e._v("The decision on whether to publicly or privately publish datasets rests with data publishers.")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" organizations allow for multiple users to collaborate with varied privileges:")]),e._v(" "),o("li",[o("strong",[e._v("Admin")]),e._v(": Can add/edit and delete datasets, as well as manage organization members.")]),e._v(" "),o("li",[o("strong",[e._v("Editor")]),e._v(": Can add and edit datasets, but not manage organization members.")]),e._v(" "),o("li",[o("strong",[e._v("Member")]),e._v(": Can view the organization’s private datasets, but not add new datasets.")])]),e._v(" "),o("p",[e._v("To publish data on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[o("p",[e._v("Request for a new Organization to be created on the platform for you via "),o("a",{attrs:{href:"https://discuss.okfn.org/c/open-knowledge-labs/datahub",target:"_blank",rel:"noopener noreferrer"}},[e._v("our community page"),o("OutboundLink")],1),e._v("."),o("br"),e._v("\nThis is required only to ensure spammers don’t take up space and hog resources on the platform.")]),e._v(" "),o("p",[e._v("The request format is simple and requires:")]),e._v(" "),o("ul",[o("li",[o("p",[o("strong",[e._v("Title")]),e._v(": This will be the name of your Organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" i.e."),o("br"),o("br"),e._v(" "),o("em",[e._v("My New Organization")])])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("Slug")]),e._v(": This is an acronym, word or hyphenated phrase that will be added to the end of the "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" url to uniquely identify your Organization and associate your data collections with it i.e."),o("br"),o("br"),e._v(" "),o("em",[e._v("my-new-organization")])])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("Username")]),e._v(": The username you provide is associated with an email address on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" and allows us to give you admin access to your Organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(".")])])])]),e._v(" "),o("li",[o("p",[e._v("Log In and add new datasets")])])]),e._v(" "),o("p",[e._v("Adding datasets on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is no different from using any other CKAN platform, but "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2016/07/25/publish-data-packages-to-datahub-ckan.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s a good guide by Dan Fowler"),o("OutboundLink")],1),e._v(" for first timers.")]),e._v(" "),o("p",[e._v("3.Publish and share public datasets widely.")]),e._v(" "),o("p",[e._v("On "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(", you can either publish datasets privately, meaning only members of your organization have access to them, or publicly, as open data. "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2016/07/25/publish-data-packages-to-datahub-ckan.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Find out more"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-on-datahub-io"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-on-datahub-io"}},[e._v("#")]),e._v(" Publish packaged data on "),o("a",{attrs:{href:"http://DataHub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)]),e._v(" "),o("p",[o("a",{attrs:{href:"http://DataHub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(" is a platform for finding, sharing and publishing high quality data online.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" share the same name for historical reasons. "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datahub.ckan.io"),o("OutboundLink")],1),e._v(" used to be the DataHub, but was moved to its current address, and the current DataHub uses new software developed from scratch.")]),e._v(" "),o("ol",[o("li",[e._v("Set up a data publisher / user account on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)])]),e._v(" "),o("p",[e._v("Join the "),o("a",{attrs:{href:"https://gitter.im/datahubio/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.io community group"),o("OutboundLink")],1),e._v(", introduce yourself and request for an account.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Publish Datasets on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)])]),e._v(" "),o("p",[o("a",{attrs:{href:"http://datahub.io/docs/getting-started/publishing-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("This post"),o("OutboundLink")],1),e._v(" provides helpful information on publishing datasets on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-on-github"}},[e._v("#")]),e._v(" Publish packaged data on GitHub")]),e._v(" "),o("p",[e._v("GitHub is the largest repository of source code, with "),o("a",{attrs:{href:"https://github.com/blog/2345-celebrating-nine-years-of-github-with-an-anniversary-sale",target:"_blank",rel:"noopener noreferrer"}},[e._v("more than 20 million"),o("br"),e._v("\nusers"),o("OutboundLink")],1),e._v(". Although the focus is on hosting source code, any type of file can be hosted. Documents, thesis, images, shapefiles, you can even host an entire static website with "),o("a",{attrs:{href:"https://pages.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub Pages"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("By using GitHub, you get all the advantages of using a version control system as Git, where every modification to your files is tracked. You also get an issue ticketing system, wiki pages, milestones tracking, and other useful"),o("br"),e._v("\ncollaboration tools.")]),e._v(" "),o("p",[e._v("** What types of datasets can be hosted on GitHub?**")]),e._v(" "),o("p",[e._v("Although GitHub offers many useful functionalities, not all datasets are a good fit for it. The main limitations are:")]),e._v(" "),o("ul",[o("li",[e._v("Individual files have less than 100 MB")]),e._v(" "),o("li",[e._v("Entire repository have less than 1 GB\n"),o("ul",[o("li",[e._v("The repository size includes not only the current files, but all of their previous versions.")])])])]),e._v(" "),o("p",[e._v("You can store larger files using "),o("a",{attrs:{href:"https://git-lfs.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("git-lfs"),o("OutboundLink")],1),e._v(", but we won’t go in details about it in this section.")]),e._v(" "),o("p",[e._v("It’s also useful if your data files use text-based file formats like CSV or GeoJSON, as then git is able to show you exactly what changed between two versions of the files. However, even if you use binary file formats like XLS, GitHub is still useful.")]),e._v(" "),o("p",[e._v("** Step 1. Organise your dataset folder structure **")]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Even though we’ll see an example that has all of these different types of files, this isn’t always the case. For example, datasets that were manually collected might not have any scripts.")]),e._v(" "),o("p",[e._v("Consider this folder structure:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n schools.csv\n cities.csv\ndocs/\n screenshot.png\nscripts/\n clean_data.py\nMakefile\ndatapackage.json\nREADME.md\n")])])]),o("ul",[o("li",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there are two: "),o("code",[e._v("data/schools.csv")]),e._v(" and "),o("code",[e._v("data/cities.csv")]),e._v(".")]),e._v(" "),o("li",[o("strong",[e._v("docs/")]),e._v(": Images, sample analysis, and other documentation files regarding the dataset. The main documentation is in "),o("code",[e._v("README.md")]),e._v(", but in this folder you can add any images used in the README, and other writings about the dataset.")]),e._v(" "),o("li",[o("strong",[e._v("scripts/")]),e._v(": All scripts are contained in this folder. There could be scripts to scrape the data, join different files, clean them, etc. Depending on the programming language you use, you might also add requirements files like "),o("code",[e._v("requirements.txt")]),e._v(" for Python, or "),o("code",[e._v("package.json")]),e._v(" for NodeJS.")]),e._v(" "),o("li",[o("strong",[e._v("Makefile")]),e._v(": The scripts are only part of the puzzle, we also need to know how to run them. In which order they should be executed, which one should I run to update the data, and so on. You could document this information textually in the "),o("code",[e._v("README.md")]),e._v(" file, but the "),o("code",[e._v("Makefile")]),e._v(" allows you to have executable documentation. You can think of it as a script to run the scripts. If you have never written a Makefile, read "),o("a",{attrs:{href:"https://bost.ocks.org/mike/make/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Why Use Make"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("li",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files. See "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[e._v("how to create a data package")]),e._v(" on instructions on writing this file.")],1),e._v(" "),o("li",[o("strong",[o("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),o("OutboundLink")],1)]),e._v(": This is where the dataset is described for humans. We recommend the following sections:\n"),o("ul",[o("li",[o("strong",[e._v("Introduction")]),e._v(": A short description of the dataset, what it contains, the time or geographical area it covers")]),e._v(" "),o("li",[o("strong",[e._v("Data")]),e._v(": What the data structure? Does it use any codes? How do you define missing values (e.g. ‘N/A’ or ‘-1’)")]),e._v(" "),o("li",[o("strong",[e._v("Preparation")]),e._v(": How was the data collected? How do I update the data? Was it modified in any way? If you have a "),o("code",[e._v("Makefile")]),e._v(", this section will mostly document how to run it. Otherwise you can describe how to run the scripts, or how to collect the data manually.")]),e._v(" "),o("li",[o("strong",[e._v("License")]),e._v(": There are two issues here: the license of the data itself, and the license of the package you are creating (including any scripts). Our recommendation is to license the package you created as "),o("a",{attrs:{href:"https://creativecommons.org/publicdomain/zero/1.0/",title:"Creative Commons Public Domain Dedication",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC0"),o("OutboundLink")],1),e._v(", and add any relevant information or disclaimers regarding the source data’s license.")])])])]),e._v(" "),o("p",[e._v("To summarize, these are the folders, files, and their respective contents in this structure:")]),e._v(" "),o("table",[o("thead",[o("tr",[o("th",[e._v("Path")]),e._v(" "),o("th",[e._v("Type")]),e._v(" "),o("th",[e._v("Contents")])])]),e._v(" "),o("tbody",[o("tr",[o("td",[e._v("data/")]),e._v(" "),o("td",[e._v("Data")]),e._v(" "),o("td",[e._v("Dataset’s data files.")])]),e._v(" "),o("tr",[o("td",[e._v("docs/")]),e._v(" "),o("td",[e._v("Documentation")]),e._v(" "),o("td",[e._v("Images, analysis, and other documentation files.")])]),e._v(" "),o("tr",[o("td",[e._v("scripts/")]),e._v(" "),o("td",[e._v("Scripts")]),e._v(" "),o("td",[e._v("Scripts used for creating, modifying, or analysing the dataset.")])]),e._v(" "),o("tr",[o("td",[e._v("Makefile")]),e._v(" "),o("td",[e._v("Scripts")]),e._v(" "),o("td",[e._v("Executable documentation on how to run the scripts.")])]),e._v(" "),o("tr",[o("td",[e._v("datapackage.json")]),e._v(" "),o("td",[e._v("Metadata")]),e._v(" "),o("td",[e._v("Data Package descriptor file.")])]),e._v(" "),o("tr",[o("td",[o("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),o("OutboundLink")],1)]),e._v(" "),o("td",[e._v("Documentation")]),e._v(" "),o("td",[e._v("Textual description of the dataset with description, preparation steps, license, etc.")])])])]),e._v(" "),o("p",[e._v("** Step 2. Upload the dataset to GitHub **")]),e._v(" "),o("ol",[o("li",[e._v("Login (or create) a new account on GitHub")]),e._v(" "),o("li",[e._v("Create "),o("a",{attrs:{href:"https://github.com/new",title:"GitHub New Repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("a new repository"),o("OutboundLink")],1),e._v(" "),o("ul",[o("li",[e._v("Write a short description about the dataset")])])]),e._v(" "),o("li",[e._v("On your repository page, click on the “Upload files” link")]),e._v(" "),o("li",[e._v("Upload the files you created in the previous step\n"),o("ul",[o("li",[e._v("If your have files larger than 25 MB, you’ll need to either "),o("a",{attrs:{href:"https://help.github.com/articles/adding-a-file-to-a-repository-using-the-command-line/",title:"Adding a file to a repository using the command line",target:"_blank",rel:"noopener noreferrer"}},[e._v("upload using the command line"),o("OutboundLink")],1),e._v(", or the "),o("a",{attrs:{href:"https://desktop.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub Desktop client"),o("OutboundLink")],1),e._v(".")])])])]),e._v(" "),o("p",[e._v("** (Optional) Step 3. Enable automatic tabular data validation **")]),e._v(" "),o("p",[e._v("You can automatically validate your tabular data files using "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(". This will take only a few minutes, and will ensure you’ll always know when there are errors with your dataset, maintaining its quality. "),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[e._v("Read the walkthrough here")]),e._v(".")]),e._v(" "),o("p",[e._v("The sample datasets used in this example, that is, List of schools in Birmingham, UK are available "),o("a",{attrs:{href:"https://github.com/vitorbaptista/birmingham_schools",target:"_blank",rel:"noopener noreferrer"}},[e._v("in this repository"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"share-packaged-data-effectively"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#share-packaged-data-effectively"}},[e._v("#")]),e._v(" Share packaged data effectively")]),e._v(" "),o("p",[e._v("Publishing packaged data is not enough. To avoid hiding useful information in open archives online, it is necessary to engage communities that could be interested in your data. Community engagement should not be viewed as a one-off assignment, but rather, as a continuous effort to increase the impact of your data publishing work.")]),e._v(" "),o("p",[e._v("Some best practices:")]),e._v(" "),o("ol",[o("li",[o("p",[e._v("Publish quickly, update often."),o("br"),e._v("\nThe true value of published data lies in its use and reuse by open data communities. Publish data as soon as possible and update it regularly so users have access to the latest information.")])]),e._v(" "),o("li",[o("p",[e._v("Set up feedback loops."),o("br"),e._v("\nYour data publishing platform should aim to get community “buy in”, by encouraging participatory processes. Feedback loops are important because:")])])]),e._v(" "),o("ul",[o("li",[e._v("they allow data users to ask for clarifications and request for additional information about specific datasets, if need be.")]),e._v(" "),o("li",[e._v("they allow data publishers to understand what communities need and publish data driven by user demand, increasing the chance it’ll be used")]),e._v(" "),o("li",[e._v("they provide an avenue for data publishers to learn how their data is used, so they can gauge its impact.")])]),e._v(" "),o("p",[e._v("Examples of feedback loops that data publishers can set up include:")]),e._v(" "),o("ul",[o("li",[e._v("Adding a comments section in a data platform. Needless to say, the comments section should be monitored closely to ensure that responses are sent in time, and that discussions remain respectful and on topic.")]),e._v(" "),o("li",[e._v("A dedicated social platform channel, such as a Google Group or Facebook group, with a prominently placed link from the data platform for sharing updates, collating and responding to feedback.")]),e._v(" "),o("li",[e._v("An e-mail address where users can contact the people responsible for the datasets for clarifications, suggestions, or to report errors.")])]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[e._v("Meet open data communities in the places they already meet")])]),e._v(" "),o("p",[e._v("Communities thrive when there’s continued discourse over similar interests."),o("br"),e._v("\nData publishers should be active in existing networks, as supporters and collaborators in community data initiatives. Some of the ways this can be done, leveraging on Open Knowledge communities and others, include:")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Kickstarting and joining discussions in online forums")])]),e._v(" "),o("li",[o("p",[e._v("Blogs"),o("br"),e._v("\nAs a data publisher, running a data blog is a great way to create awareness about the data you publish, and an avenue to highlight how data users are drawing insight from it. This encourages use and reuse of your data. If you don’t run a data blog, there are plenty of open data blogs that welcome external contributions i.e. "),o("a",{attrs:{href:"https://blog.okfn.org/submit/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s how"),o("OutboundLink")],1),e._v(" you can contribute guest posts on "),o("a",{attrs:{href:"http://blog.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog.okfn.org"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("Open Knowledge Discuss"),o("br"),e._v("\nThe Open Knowledge discussion platform is a great place to invoke and contribute to conversation on specific subjects. "),o("a",{attrs:{href:"https://discuss.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dive in"),o("OutboundLink")],1),e._v("!")])]),e._v(" "),o("li",[o("p",[e._v("Gitter"),o("br"),e._v("\nGitter is a chat platform that’s well suited for more technical discussions around open data. If you are looking to engage technical data users, consider joining our "),o("a",{attrs:{href:"https://gitter.im/okfn/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation channel"),o("OutboundLink")],1),e._v(" or the "),o("a",{attrs:{href:"https://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data project channel"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("In-person meetups"),o("br"),e._v("\nOrganizing and participating in meetups, hackathons and domain-specific conferences is a good way to engage with communities.")])]),e._v(" "),o("li",[o("p",[e._v("Community calls, webinars and podcasts")])])]),e._v(" "),o("p",[e._v("Finally, to maintain an active community of data users as a data publisher:")]),e._v(" "),o("ul",[o("li",[e._v("Keep your datasets updated and highlight changes that might be of interest to the community. For example, if the changes are relevant to a specific data request, reach out and let the user know.")]),e._v(" "),o("li",[e._v("Have a human representative play an active role in community activities. Bots can be fun and efficient, but they are limited and can get in the way of meaningful interactions.")]),e._v(" "),o("li",[e._v("Be flexible and transparent. Listen to your community needs and respond appropriately and in timely fashion i.e. consider publishing datasets that are in high demand first, or more regularly. Archive, rather than delete datasets, but if one must be deleted, issue a forewarning and explain why.")]),e._v(" "),o("li",[e._v("Set up a sharing system to regularly showcase notable data use cases by the the community i.e. fortnightly to inspire other community members.")])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[5],{478:function(e,a,t){e.exports=t.p+"assets/img/figure-1.0b5d5da2.png"},479:function(e,a,t){e.exports=t.p+"assets/img/figure-2.a4cda338.png"},480:function(e,a,t){e.exports=t.p+"assets/img/figure-3.e234c78e.png"},481:function(e,a){e.exports=""},482:function(e,a,t){e.exports=t.p+"assets/img/figure-4.65dd4176.png"},483:function(e,a,t){e.exports=t.p+"assets/img/figure-5.a7c23193.png"},484:function(e,a,t){e.exports=t.p+"assets/img/figure-6.5520e95a.png"},485:function(e,a,t){e.exports=t.p+"assets/img/figure-7.866156c7.png"},486:function(e,a,t){e.exports=t.p+"assets/img/figure-8.3afbab68.png"},609:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,o=e._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("h2",{attrs:{id:"include-a-data-schema"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#include-a-data-schema"}},[e._v("#")]),e._v(" Include a data schema")]),e._v(" "),o("p",[e._v("Simply put, a schema is a blueprint that tells us how your data is structured, and what type of content is to be expected in it. You can think of it as a data dictionary. Having a table schema at hand makes it possible to run more precise validation checks on your data, both at a structural and content level.")]),e._v(" "),o("p",[e._v("For this section, we will use the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"http://datahub.io/core/gdp",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package")]),e._v(" is a format that makes it possible to put your data collection and relevant information that provides context about your data in one container before you share it. All contextual information, such as metadata and your data schema, is published in a JSON file named "),o("em",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package Creator")]),e._v(" is an online service that facilitates the creation and editing of data packages. The service automatically generates a "),o("em",[e._v("datapackage.json")]),e._v(" file for you as you add and edit data that is part of your data collection. We refer to each piece of data in a data collection as a "),o("strong",[e._v("data resource")]),e._v(".")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" loads with dummy data to make it easy to understand how metadata and sample resources help generate the "),o("em",[e._v("datapackage.json")]),e._v(" file. There are three ways in which a user can add data resources on "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[e._v("Provide a hyperlink to your data resource (highly recommended).")])]),e._v(" "),o("p",[e._v("If your data resource is publicly available, like on GitHub or in a data repository, simply obtain the URL and paste it in the "),o("strong",[e._v("Path")]),e._v(" section. To learn how to publish your data resource online, check the publish your dataset section.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Create your data resource within the service.")])]),e._v(" "),o("p",[e._v("If your data resource isn’t published online, you’ll have to define its fields from scratch. Depending on how complex is your data, this can be time consuming, but it’s still easier than creating the descriptor JSON file from scratch.This option is time consuming, as a user has to manually create each field of a data resource. However, this is simpler than learning how to create a JSON file from scratch.")]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[o("strong",[e._v("Load a Data Package")]),e._v(" option")])]),e._v(" "),o("p",[e._v("With this option, you can load a pre-existing "),o("em",[e._v("datapackage.json")]),e._v(" file to view and edit its metadata and resource fields.")]),e._v(" "),o("p",[e._v("Our "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/gross-domestic-product-all/data/gdp.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(" dataset is publicly available on GitHub.")]),e._v(" "),o("p",[e._v("Obtain a link to the raw CSV file by clicking on the Raw button at the top right corner of the GitHub file preview page, as shown in figure 1 below. The resulting hyperlink looks like "),o("a",{attrs:{href:"https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv"),o("OutboundLink")],1)]),e._v(" "),o("p",[o("img",{attrs:{src:t(478),alt:"Above, raw button highlighted in red"}}),o("br"),e._v(" "),o("em",[e._v("Figure 1: Above, raw button highlighted in red.")])]),e._v(" "),o("p",[e._v("Paste your hyperlink in the "),o("em",[e._v("Path")]),e._v(" section and click on the "),o("em",[e._v("Load")]),e._v(" button. Each column in your table translates to a "),o("em",[e._v("field")]),e._v(". You should be prompted to add all fields identified in your data resource, as in Figure 2 below. Click on the prompt to load the fields.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(479),alt:"annotated in red, a prompt to add all fields inferred from your data resource"}}),o("br"),e._v(" "),o("em",[e._v("Figure 2: annotated in red, a prompt to add all fields inferred from your data resource.")])]),e._v(" "),o("p",[e._v("The page that follows looks like Figure 3 below. Each column from the GDP dataset has been mapped to a "),o("em",[e._v("field")]),e._v(". The data type for each column has been inferred correctly, and we can preview data under each field by hovering over the field name. It is also possible to edit all sections of our data resource’s fields as we can see below.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(480),alt:"all fields inferred from your data resource"}}),o("br"),e._v(" "),o("em",[e._v("Figure 3: all fields inferred from your data resource.")])]),e._v(" "),o("p",[e._v("You can now edit data types and formats as necessary, and optionally add titles and descriptive information to your fields. For example, the data type for our {Year} field should be "),o("em",[o("strong",[e._v("year")])]),e._v(" and not "),o("em",[o("strong",[e._v("integer")])]),e._v(". Our {Value} column has numeric information with decimal places.")]),e._v(" "),o("p",[e._v("By definition, values under the "),o("em",[o("strong",[e._v("integer")])]),e._v(" data type are whole numbers. The "),o("em",[o("strong",[e._v("number")])]),e._v(" data type is more appropriate for the {Value} column. When in doubt about what data type to use, consult the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#types-and-formats",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema data types cheat sheet"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Click on the "),o("img",{attrs:{src:t(481),alt:""}}),e._v(" icon to pick a suitable profile for your data resource. "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/profiles/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s more information about Frictionless Data profiles"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on Add Resource, and repeating the same process as we just did.")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on "),o("strong",[e._v("Add Resource")]),e._v(", and repeating the same process as we just did.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(482),alt:"Prompt to add more data resources"}}),o("br"),e._v(" "),o("em",[e._v("Figure 4: Prompt to add more data resources.")])]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"add-descriptive-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#add-descriptive-metadata"}},[e._v("#")]),e._v(" Add descriptive metadata")]),e._v(" "),o("p",[e._v("In the previous section, we described metadata for each of our datasets, but we’re still missing metadata for our collection of datasets. You can add it via the "),o("strong",[e._v("Metadata")]),e._v(" section on the left side bar, describing things like the dataset name, description, author, license, etc.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(483),alt:"Add Data Package Metadata"}})]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Profile")]),e._v(" section under metadata allows us to specify what kind of data collection we are packaging.")]),e._v(" "),o("ul",[o("li",[o("p",[o("em",[e._v("Data Package")]),o("br"),e._v("\nThis is the base, more general profile. Use it if your dataset contains resources of mixed formats, like tabular and geographical data. The base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Tabular Data Package")]),o("br"),e._v("\nIf your data contains only tabular resources like CSVs and spreadsheets, use the Tabular Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Fiscal Data Package")]),o("br"),e._v("\nIf your data contains fiscal information like budgets and expenditure data, use the Fiscal Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fiscal Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])])]),e._v(" "),o("p",[e._v("In our example, as we only have a CSV data resource, the "),o("em",[e._v("Tabular Data Package")]),e._v(" profile is the best option.")]),e._v(" "),o("p",[e._v("In the "),o("strong",[e._v("Keywords")]),e._v(" section, you can add any keywords that helps make your data collection more discoverable. For our dataset, we might use the keywords "),o("em",[e._v("GDP, National Accounts, National GDP, Regional GDP")]),e._v(". Other datasets could include the country name, dataset area (e.g. “health” or “environmental”), etc.")]),e._v(" "),o("p",[e._v("Now that we have created a Data Package, we can "),o("strong",[e._v("Validate")]),e._v(" or "),o("strong",[e._v("Download")]),e._v(" it. But first, let’s see what our datapackage.json file looks like. With every addition and modification, the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" has been populating the "),o("em",[e._v("datapackage.json")]),e._v(" file for us. Click on the "),o("strong",[e._v("{···}")]),e._v(" icon to view the "),o("em",[e._v("datapackage.json")]),e._v(" file. As you can see below, any edit we make to the description of the Value field reflects on the JSON file in real time.")]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Validate")]),e._v(" button allows us to confirm whether we chose the correct Profile for our Data Package. The two possible outcomes at this stage are:")]),e._v(" "),o("p",[o("img",{attrs:{src:t(484),alt:"Data Package is Invalid"}})]),e._v(" "),o("p",[e._v("This message appears when there is some validation error like if we miss some required attribute (e.g. the data package name), or have picked an incorrect profile (e.g. Tabular Data Package with geographical data)… Review the metadata and profiles to find the mistake and try validating again.")]),e._v(" "),o("p",[o("img",{attrs:{src:t(485),alt:"Data Package is Valid"}})]),e._v(" "),o("p",[e._v("All good! This message means that your data package is valid, and we can download it.")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"create-data-packages"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#create-data-packages"}},[e._v("#")]),e._v(" Create Data Packages")]),e._v(" "),o("p",[e._v("As we said earlier, the base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file, which contains your data schema and metadata. We call this the descriptor file. You can download your descriptor file by clicking on the "),o("strong",[e._v("Download")]),e._v(" button.")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("If your data resources, like ours, were linked from an online public source, sharing the "),o("em",[e._v("datapackage.json")]),e._v(" file is sufficient, since it contains URLs to your data resources.")])]),e._v(" "),o("li",[o("p",[e._v("If you manually created a data resource and its fields, remember to add all your data resources and the downloaded "),o("em",[e._v("datapackage.json")]),e._v(" file in one folder before sharing it.")])])]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Your final Data Package file directory should look like this:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n dataresource1.csv\n dataresource2.csv\ndatapackage.json\n")])])]),o("ul",[o("li",[o("p",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there is only one: "),o("code",[e._v("data/gdp.csv")]),e._v(" .")])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files.")])])]),e._v(" "),o("p",[e._v("Congratulations! You have now created a schema for your data, and combined it with descriptive metadata and your data collection to create your first data package!")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"validate-your-packaged-data-automatically"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-your-packaged-data-automatically"}},[e._v("#")]),e._v(" Validate your packaged data automatically")]),e._v(" "),o("p",[e._v("Running continuous checks on data provides regular feedback and contributes to better data quality as errors can be flagged and fixed early on.")]),e._v(" "),o("p",[e._v("In this section, you will learn how to setup automatic tabular data validation using goodtables, so your data is validated every time it’s updated. Although not strictly necessary, it’s useful to "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[e._v("know about Data Packages and Table Schema")]),e._v(" before proceeding, as they allow you to describe your data in more detail, allowing more advanced validations.")],1),e._v(" "),o("p",[e._v("We will show how to set up automated tabular data validations for data published on:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v(", an open source platform for publishing data in the open, that makes it easy to discover, use and share data;")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),o("OutboundLink")],1),e._v(", a web platform for collaborating on projects as well as publishing, sharing and storing resources, such as data files;")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://aws.amazon.com/s3/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Amazon S3"),o("OutboundLink")],1),e._v(", a data storage service by Amazon.")])]),e._v(" "),o("p",[e._v("Even if you don’t use any of these platforms, you can still setup the validation using "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables-py"),o("OutboundLink")],1),e._v(", it will just require some technical knowledge")]),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-ckan"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-ckan"}},[e._v("#")]),e._v(" Validate tabular data automatically on CKAN")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v(" is an open source platform for publishing data online. It is widely used across the planet, including by the federal governments of the USA, United Kingdom, Brazil, and others.")]),e._v(" "),o("p",[e._v("To automatically validate tabular data on CKAN, enable the "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension, which uses goodtables to run continuous checks on your data. The "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension:")]),e._v(" "),o("ul",[o("li",[e._v("Adds a badge next to each dataset showing the status of their validation (valid or invalid), and")]),e._v(" "),o("li",[e._v("allows users to access the validation report, making it possible for errors to be identified and fixed.")])]),e._v(" "),o("p",[o("img",{attrs:{src:t(486),alt:"annotated in red, automated validation checks on datasets in CKAN"}}),o("br"),e._v(" "),o("em",[e._v("Figure 8: annotated in red, automated validation checks on datasets in CKAN.")])]),e._v(" "),o("p",[e._v("The installation and usage instructions for "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("ckanext-validation"),o("OutboundLink")],1),e._v(" extension are available on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-validation",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-github"}},[e._v("#")]),e._v(" Validate tabular data automatically on GitHub")]),e._v(" "),o("p",[e._v("If your data is hosted on GitHub, you can use "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://goodtables.io"),o("OutboundLink")],1),e._v(" to automatically validate it on every change.")]),e._v(" "),o("p",[e._v("For this section, you will first need to create a "),o("a",{attrs:{href:"https://help.github.com/articles/create-a-repo/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub repository"),o("OutboundLink")],1),e._v(" and add tabular data to it.")]),e._v(" "),o("p",[e._v("Once you have tabular data in your Github repository:")]),e._v(" "),o("ol",[o("li",[e._v("Login on "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(" using your GitHub account and accept the permissions confirmation.")]),e._v(" "),o("li",[e._v("Once we’ve synchronized your repository list, go to the "),o("a",{attrs:{href:"https://goodtables.io/settings",target:"_blank",rel:"noopener noreferrer"}},[e._v("Manage Sources"),o("OutboundLink")],1),e._v(" page and enable the repository with the data you want to validate.\n"),o("ul",[o("li",[e._v("If you can’t find the repository, try clicking on the Refresh button on the Manage Sources page")])])])]),e._v(" "),o("p",[e._v("Goodtables will then validate all tabular data files (CSV, XLS, XLSX, ODS) and "),o("RouterLink",{attrs:{to:"/data-package/"}},[e._v("data packages")]),e._v(" in the repository. These validations will be executed on every change, including pull requests.")],1),e._v(" "),o("h3",{attrs:{id:"validate-tabular-data-automatically-on-amazon-s3"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#validate-tabular-data-automatically-on-amazon-s3"}},[e._v("#")]),e._v(" Validate tabular data automatically on Amazon S3")]),e._v(" "),o("p",[e._v("If your data is hosted on Amazon S3, you can use "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://goodtables.io"),o("OutboundLink")],1),e._v(" to automatically validate it on every change.")]),e._v(" "),o("p",[e._v("It is a technical process to set up, as you need to know how to configure your Amazon S3 bucket. However, once it’s configured, the validations happen automatically on any tabular data created or updated. Find the detailed instructions "),o("a",{attrs:{href:"https://docs.goodtables.io/getting_started/s3.html",title:"Validating data on Amazon S3",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"custom-setup-of-automatic-tabular-data-validation"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#custom-setup-of-automatic-tabular-data-validation"}},[e._v("#")]),e._v(" Custom setup of automatic tabular data validation")]),e._v(" "),o("p",[e._v("If you don’t use any of the oficially supported data publishing platforms, you can use "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables-py"),o("OutboundLink")],1),e._v(" directly to validate your data. This is the most flexible option, as you can configure exactly when, and how your tabular data is validated. For example, if your data come from an external source, you could validate it once before you process it (so you catch errors in the source data), and once after cleaning, just before you publish it, so you catch errors introduced by your data processing.")]),e._v(" "),o("p",[e._v("The instructions on how to do this are technical, and can be found on "),o("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",title:"Goodtables.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/goodtables-py"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"publish-packaged-data"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data"}},[e._v("#")]),e._v(" Publish packaged data")]),e._v(" "),o("p",[e._v("Creating and Sharing Data Packages is important for both data publishers and data users because it provides a common and open specification to describe your dataset’s metadata. This facilitates data reuse, as users don’t need to understand each data publisher’s specific metadata format, and as the specification is machine-readable, it also allows tools to parse the metadata. This enables software to:")]),e._v(" "),o("ul",[o("li",[e._v("Import the data packages into different tools and languages, like Python and R")]),e._v(" "),o("li",[e._v("Validate the data contents according to the schema described in the data package")]),e._v(" "),o("li",[e._v("Convert the data package into other formats, for example loading it into a SQL database for further analysis")])]),e._v(" "),o("p",[e._v("Although these reasons are not unique to publishing data as data packages, here’s why we think data publishers should consider publishing in this format:")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Archiving data collections using data packages ensure data publishers can update data more efficiently at any time. The associated schema is a guide on existing data fields and acceptable data types for individual tabular data resources and can be easily built upon.")])]),e._v(" "),o("li",[o("p",[e._v("Sharing data with descriptive metadata and its associated schema provides context for your data no matter where it is used, and significantly cuts down on time spent researching data provenance before using acquired data.")])]),e._v(" "),o("li",[o("p",[e._v("Data Packages allow for accountability and enrich the feedback process as data publishers can add metadata with contact information for users to reach out to them and licensing to spell out accepted use of published data.")])])]),e._v(" "),o("p",[e._v("If don’t need your own data portal, there are many platforms where you can publish your data (if you need your own, check "),o("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),o("OutboundLink")],1),e._v("). In the section below, we dive into a few options. Read along and decide what option is most suitable:")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-in-our-community-ckan-instance"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-in-our-community-ckan-instance"}},[e._v("#")]),e._v(" Publish packaged data in our community CKAN instance")]),e._v(" "),o("p",[e._v("CKAN is an open source platform for publishing data that makes it easy to discover, use and share data. "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is a public instance of CKAN that allows anyone to publish their data.")]),e._v(" "),o("p",[e._v("Here’s why you should consider creating an organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" and publishing datasets therein:")]),e._v(" "),o("ul",[o("li",[o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is free for all to use! The file upload size limit on the platform is currently 100mb.")]),e._v(" "),o("li",[e._v("The decision on whether to publicly or privately publish datasets rests with data publishers.")]),e._v(" "),o("li",[o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" organizations allow for multiple users to collaborate with varied privileges:")]),e._v(" "),o("li",[o("strong",[e._v("Admin")]),e._v(": Can add/edit and delete datasets, as well as manage organization members.")]),e._v(" "),o("li",[o("strong",[e._v("Editor")]),e._v(": Can add and edit datasets, but not manage organization members.")]),e._v(" "),o("li",[o("strong",[e._v("Member")]),e._v(": Can view the organization’s private datasets, but not add new datasets.")])]),e._v(" "),o("p",[e._v("To publish data on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[o("p",[e._v("Request for a new Organization to be created on the platform for you via "),o("a",{attrs:{href:"https://discuss.okfn.org/c/open-knowledge-labs/datahub",target:"_blank",rel:"noopener noreferrer"}},[e._v("our community page"),o("OutboundLink")],1),e._v("."),o("br"),e._v("\nThis is required only to ensure spammers don’t take up space and hog resources on the platform.")]),e._v(" "),o("p",[e._v("The request format is simple and requires:")]),e._v(" "),o("ul",[o("li",[o("p",[o("strong",[e._v("Title")]),e._v(": This will be the name of your Organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" i.e."),o("br"),o("br"),e._v(" "),o("em",[e._v("My New Organization")])])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("Slug")]),e._v(": This is an acronym, word or hyphenated phrase that will be added to the end of the "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" url to uniquely identify your Organization and associate your data collections with it i.e."),o("br"),o("br"),e._v(" "),o("em",[e._v("my-new-organization")])])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("Username")]),e._v(": The username you provide is associated with an email address on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" and allows us to give you admin access to your Organization on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(".")])])])]),e._v(" "),o("li",[o("p",[e._v("Log In and add new datasets")])])]),e._v(" "),o("p",[e._v("Adding datasets on "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" is no different from using any other CKAN platform, but "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2016/07/25/publish-data-packages-to-datahub-ckan.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s a good guide by Dan Fowler"),o("OutboundLink")],1),e._v(" for first timers.")]),e._v(" "),o("p",[e._v("3.Publish and share public datasets widely.")]),e._v(" "),o("p",[e._v("On "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(", you can either publish datasets privately, meaning only members of your organization have access to them, or publicly, as open data. "),o("a",{attrs:{href:"http://okfnlabs.org/blog/2016/07/25/publish-data-packages-to-datahub-ckan.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Find out more"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-on-datahub-io"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-on-datahub-io"}},[e._v("#")]),e._v(" Publish packaged data on "),o("a",{attrs:{href:"http://DataHub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)]),e._v(" "),o("p",[o("a",{attrs:{href:"http://DataHub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(" is a platform for finding, sharing and publishing high quality data online.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.ckan.io"),o("OutboundLink")],1),e._v(" share the same name for historical reasons. "),o("a",{attrs:{href:"https://datahub.ckan.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Datahub.ckan.io"),o("OutboundLink")],1),e._v(" used to be the DataHub, but was moved to its current address, and the current DataHub uses new software developed from scratch.")]),e._v(" "),o("ol",[o("li",[e._v("Set up a data publisher / user account on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)])]),e._v(" "),o("p",[e._v("Join the "),o("a",{attrs:{href:"https://gitter.im/datahubio/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("datahub.io community group"),o("OutboundLink")],1),e._v(", introduce yourself and request for an account.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Publish Datasets on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1)])]),e._v(" "),o("p",[o("a",{attrs:{href:"http://datahub.io/docs/getting-started/publishing-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("This post"),o("OutboundLink")],1),e._v(" provides helpful information on publishing datasets on "),o("a",{attrs:{href:"https://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("DataHub.io"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("h3",{attrs:{id:"publish-packaged-data-on-github"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#publish-packaged-data-on-github"}},[e._v("#")]),e._v(" Publish packaged data on GitHub")]),e._v(" "),o("p",[e._v("GitHub is the largest repository of source code, with "),o("a",{attrs:{href:"https://github.com/blog/2345-celebrating-nine-years-of-github-with-an-anniversary-sale",target:"_blank",rel:"noopener noreferrer"}},[e._v("more than 20 million"),o("br"),e._v("\nusers"),o("OutboundLink")],1),e._v(". Although the focus is on hosting source code, any type of file can be hosted. Documents, thesis, images, shapefiles, you can even host an entire static website with "),o("a",{attrs:{href:"https://pages.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub Pages"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("By using GitHub, you get all the advantages of using a version control system as Git, where every modification to your files is tracked. You also get an issue ticketing system, wiki pages, milestones tracking, and other useful"),o("br"),e._v("\ncollaboration tools.")]),e._v(" "),o("p",[e._v("** What types of datasets can be hosted on GitHub?**")]),e._v(" "),o("p",[e._v("Although GitHub offers many useful functionalities, not all datasets are a good fit for it. The main limitations are:")]),e._v(" "),o("ul",[o("li",[e._v("Individual files have less than 100 MB")]),e._v(" "),o("li",[e._v("Entire repository have less than 1 GB\n"),o("ul",[o("li",[e._v("The repository size includes not only the current files, but all of their previous versions.")])])])]),e._v(" "),o("p",[e._v("You can store larger files using "),o("a",{attrs:{href:"https://git-lfs.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("git-lfs"),o("OutboundLink")],1),e._v(", but we won’t go in details about it in this section.")]),e._v(" "),o("p",[e._v("It’s also useful if your data files use text-based file formats like CSV or GeoJSON, as then git is able to show you exactly what changed between two versions of the files. However, even if you use binary file formats like XLS, GitHub is still useful.")]),e._v(" "),o("p",[e._v("** Step 1. Organise your dataset folder structure **")]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Even though we’ll see an example that has all of these different types of files, this isn’t always the case. For example, datasets that were manually collected might not have any scripts.")]),e._v(" "),o("p",[e._v("Consider this folder structure:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n schools.csv\n cities.csv\ndocs/\n screenshot.png\nscripts/\n clean_data.py\nMakefile\ndatapackage.json\nREADME.md\n")])])]),o("ul",[o("li",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there are two: "),o("code",[e._v("data/schools.csv")]),e._v(" and "),o("code",[e._v("data/cities.csv")]),e._v(".")]),e._v(" "),o("li",[o("strong",[e._v("docs/")]),e._v(": Images, sample analysis, and other documentation files regarding the dataset. The main documentation is in "),o("code",[e._v("README.md")]),e._v(", but in this folder you can add any images used in the README, and other writings about the dataset.")]),e._v(" "),o("li",[o("strong",[e._v("scripts/")]),e._v(": All scripts are contained in this folder. There could be scripts to scrape the data, join different files, clean them, etc. Depending on the programming language you use, you might also add requirements files like "),o("code",[e._v("requirements.txt")]),e._v(" for Python, or "),o("code",[e._v("package.json")]),e._v(" for NodeJS.")]),e._v(" "),o("li",[o("strong",[e._v("Makefile")]),e._v(": The scripts are only part of the puzzle, we also need to know how to run them. In which order they should be executed, which one should I run to update the data, and so on. You could document this information textually in the "),o("code",[e._v("README.md")]),e._v(" file, but the "),o("code",[e._v("Makefile")]),e._v(" allows you to have executable documentation. You can think of it as a script to run the scripts. If you have never written a Makefile, read "),o("a",{attrs:{href:"https://bost.ocks.org/mike/make/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Why Use Make"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("li",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files. See "),o("RouterLink",{attrs:{to:"/blog/2018/03/07/well-packaged-datasets/"}},[e._v("how to create a data package")]),e._v(" on instructions on writing this file.")],1),e._v(" "),o("li",[o("strong",[o("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),o("OutboundLink")],1)]),e._v(": This is where the dataset is described for humans. We recommend the following sections:\n"),o("ul",[o("li",[o("strong",[e._v("Introduction")]),e._v(": A short description of the dataset, what it contains, the time or geographical area it covers")]),e._v(" "),o("li",[o("strong",[e._v("Data")]),e._v(": What the data structure? Does it use any codes? How do you define missing values (e.g. ‘N/A’ or ‘-1’)")]),e._v(" "),o("li",[o("strong",[e._v("Preparation")]),e._v(": How was the data collected? How do I update the data? Was it modified in any way? If you have a "),o("code",[e._v("Makefile")]),e._v(", this section will mostly document how to run it. Otherwise you can describe how to run the scripts, or how to collect the data manually.")]),e._v(" "),o("li",[o("strong",[e._v("License")]),e._v(": There are two issues here: the license of the data itself, and the license of the package you are creating (including any scripts). Our recommendation is to license the package you created as "),o("a",{attrs:{href:"https://creativecommons.org/publicdomain/zero/1.0/",title:"Creative Commons Public Domain Dedication",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC0"),o("OutboundLink")],1),e._v(", and add any relevant information or disclaimers regarding the source data’s license.")])])])]),e._v(" "),o("p",[e._v("To summarize, these are the folders, files, and their respective contents in this structure:")]),e._v(" "),o("table",[o("thead",[o("tr",[o("th",[e._v("Path")]),e._v(" "),o("th",[e._v("Type")]),e._v(" "),o("th",[e._v("Contents")])])]),e._v(" "),o("tbody",[o("tr",[o("td",[e._v("data/")]),e._v(" "),o("td",[e._v("Data")]),e._v(" "),o("td",[e._v("Dataset’s data files.")])]),e._v(" "),o("tr",[o("td",[e._v("docs/")]),e._v(" "),o("td",[e._v("Documentation")]),e._v(" "),o("td",[e._v("Images, analysis, and other documentation files.")])]),e._v(" "),o("tr",[o("td",[e._v("scripts/")]),e._v(" "),o("td",[e._v("Scripts")]),e._v(" "),o("td",[e._v("Scripts used for creating, modifying, or analysing the dataset.")])]),e._v(" "),o("tr",[o("td",[e._v("Makefile")]),e._v(" "),o("td",[e._v("Scripts")]),e._v(" "),o("td",[e._v("Executable documentation on how to run the scripts.")])]),e._v(" "),o("tr",[o("td",[e._v("datapackage.json")]),e._v(" "),o("td",[e._v("Metadata")]),e._v(" "),o("td",[e._v("Data Package descriptor file.")])]),e._v(" "),o("tr",[o("td",[o("a",{attrs:{href:"http://README.md",target:"_blank",rel:"noopener noreferrer"}},[e._v("README.md"),o("OutboundLink")],1)]),e._v(" "),o("td",[e._v("Documentation")]),e._v(" "),o("td",[e._v("Textual description of the dataset with description, preparation steps, license, etc.")])])])]),e._v(" "),o("p",[e._v("** Step 2. Upload the dataset to GitHub **")]),e._v(" "),o("ol",[o("li",[e._v("Login (or create) a new account on GitHub")]),e._v(" "),o("li",[e._v("Create "),o("a",{attrs:{href:"https://github.com/new",title:"GitHub New Repository",target:"_blank",rel:"noopener noreferrer"}},[e._v("a new repository"),o("OutboundLink")],1),e._v(" "),o("ul",[o("li",[e._v("Write a short description about the dataset")])])]),e._v(" "),o("li",[e._v("On your repository page, click on the “Upload files” link")]),e._v(" "),o("li",[e._v("Upload the files you created in the previous step\n"),o("ul",[o("li",[e._v("If your have files larger than 25 MB, you’ll need to either "),o("a",{attrs:{href:"https://help.github.com/articles/adding-a-file-to-a-repository-using-the-command-line/",title:"Adding a file to a repository using the command line",target:"_blank",rel:"noopener noreferrer"}},[e._v("upload using the command line"),o("OutboundLink")],1),e._v(", or the "),o("a",{attrs:{href:"https://desktop.github.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub Desktop client"),o("OutboundLink")],1),e._v(".")])])])]),e._v(" "),o("p",[e._v("** (Optional) Step 3. Enable automatic tabular data validation **")]),e._v(" "),o("p",[e._v("You can automatically validate your tabular data files using "),o("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("goodtables.io"),o("OutboundLink")],1),e._v(". This will take only a few minutes, and will ensure you’ll always know when there are errors with your dataset, maintaining its quality. "),o("a",{attrs:{href:"/blog/2018/03/12/automatically-validated-tabular-data"}},[e._v("Read the walkthrough here")]),e._v(".")]),e._v(" "),o("p",[e._v("The sample datasets used in this example, that is, List of schools in Birmingham, UK are available "),o("a",{attrs:{href:"https://github.com/vitorbaptista/birmingham_schools",target:"_blank",rel:"noopener noreferrer"}},[e._v("in this repository"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("hr"),e._v(" "),o("h2",{attrs:{id:"share-packaged-data-effectively"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#share-packaged-data-effectively"}},[e._v("#")]),e._v(" Share packaged data effectively")]),e._v(" "),o("p",[e._v("Publishing packaged data is not enough. To avoid hiding useful information in open archives online, it is necessary to engage communities that could be interested in your data. Community engagement should not be viewed as a one-off assignment, but rather, as a continuous effort to increase the impact of your data publishing work.")]),e._v(" "),o("p",[e._v("Some best practices:")]),e._v(" "),o("ol",[o("li",[o("p",[e._v("Publish quickly, update often."),o("br"),e._v("\nThe true value of published data lies in its use and reuse by open data communities. Publish data as soon as possible and update it regularly so users have access to the latest information.")])]),e._v(" "),o("li",[o("p",[e._v("Set up feedback loops."),o("br"),e._v("\nYour data publishing platform should aim to get community “buy in”, by encouraging participatory processes. Feedback loops are important because:")])])]),e._v(" "),o("ul",[o("li",[e._v("they allow data users to ask for clarifications and request for additional information about specific datasets, if need be.")]),e._v(" "),o("li",[e._v("they allow data publishers to understand what communities need and publish data driven by user demand, increasing the chance it’ll be used")]),e._v(" "),o("li",[e._v("they provide an avenue for data publishers to learn how their data is used, so they can gauge its impact.")])]),e._v(" "),o("p",[e._v("Examples of feedback loops that data publishers can set up include:")]),e._v(" "),o("ul",[o("li",[e._v("Adding a comments section in a data platform. Needless to say, the comments section should be monitored closely to ensure that responses are sent in time, and that discussions remain respectful and on topic.")]),e._v(" "),o("li",[e._v("A dedicated social platform channel, such as a Google Group or Facebook group, with a prominently placed link from the data platform for sharing updates, collating and responding to feedback.")]),e._v(" "),o("li",[e._v("An e-mail address where users can contact the people responsible for the datasets for clarifications, suggestions, or to report errors.")])]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[e._v("Meet open data communities in the places they already meet")])]),e._v(" "),o("p",[e._v("Communities thrive when there’s continued discourse over similar interests."),o("br"),e._v("\nData publishers should be active in existing networks, as supporters and collaborators in community data initiatives. Some of the ways this can be done, leveraging on Open Knowledge communities and others, include:")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("Kickstarting and joining discussions in online forums")])]),e._v(" "),o("li",[o("p",[e._v("Blogs"),o("br"),e._v("\nAs a data publisher, running a data blog is a great way to create awareness about the data you publish, and an avenue to highlight how data users are drawing insight from it. This encourages use and reuse of your data. If you don’t run a data blog, there are plenty of open data blogs that welcome external contributions i.e. "),o("a",{attrs:{href:"https://blog.okfn.org/submit/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here’s how"),o("OutboundLink")],1),e._v(" you can contribute guest posts on "),o("a",{attrs:{href:"http://blog.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("blog.okfn.org"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("Open Knowledge Discuss"),o("br"),e._v("\nThe Open Knowledge discussion platform is a great place to invoke and contribute to conversation on specific subjects. "),o("a",{attrs:{href:"https://discuss.okfn.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dive in"),o("OutboundLink")],1),e._v("!")])]),e._v(" "),o("li",[o("p",[e._v("Gitter"),o("br"),e._v("\nGitter is a chat platform that’s well suited for more technical discussions around open data. If you are looking to engage technical data users, consider joining our "),o("a",{attrs:{href:"https://gitter.im/okfn/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Foundation channel"),o("OutboundLink")],1),e._v(" or the "),o("a",{attrs:{href:"https://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data project channel"),o("OutboundLink")],1),e._v(".")])]),e._v(" "),o("li",[o("p",[e._v("In-person meetups"),o("br"),e._v("\nOrganizing and participating in meetups, hackathons and domain-specific conferences is a good way to engage with communities.")])]),e._v(" "),o("li",[o("p",[e._v("Community calls, webinars and podcasts")])])]),e._v(" "),o("p",[e._v("Finally, to maintain an active community of data users as a data publisher:")]),e._v(" "),o("ul",[o("li",[e._v("Keep your datasets updated and highlight changes that might be of interest to the community. For example, if the changes are relevant to a specific data request, reach out and let the user know.")]),e._v(" "),o("li",[e._v("Have a human representative play an active role in community activities. Bots can be fun and efficient, but they are limited and can get in the way of meaningful interactions.")]),e._v(" "),o("li",[e._v("Be flexible and transparent. Listen to your community needs and respond appropriately and in timely fashion i.e. consider publishing datasets that are in high demand first, or more regularly. Archive, rather than delete datasets, but if one must be deleted, issue a forewarning and explain why.")]),e._v(" "),o("li",[e._v("Set up a sharing system to regularly showcase notable data use cases by the the community i.e. fortnightly to inspire other community members.")])])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/50.db9ab2c9.js b/assets/js/50.6eaad3c7.js similarity index 99% rename from assets/js/50.db9ab2c9.js rename to assets/js/50.6eaad3c7.js index dab66fc3a..8cda56162 100644 --- a/assets/js/50.db9ab2c9.js +++ b/assets/js/50.6eaad3c7.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[50],{457:function(t,a,s){t.exports=s.p+"assets/img/gdp_map_example.c3bf2487.png"},595:function(t,a,s){"use strict";s.r(a);var n=s(29),e=Object(n.a)({},(function(){var t=this,a=t.$createElement,n=t._self._c||a;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("Joining multiple datasets on a common value or set of values is a common data wrangling task. For instance, one might have a dataset listing Gross Domestic Product (GDP) per country and a separate dataset containing geographic outlines of country borders. If these independent datasets have a shared property (for instance, the three-letter country code as "),n("a",{attrs:{href:"https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3",target:"_blank",rel:"noopener noreferrer"}},[t._v("defined in ISO 3166-1"),n("OutboundLink")],1),t._v("),we should be able to create one consolidated dataset to generate a map of GDP per country. This guide will walk through this simple use case.")]),t._v(" "),n("h2",{attrs:{id:"example-data"}},[n("a",{staticClass:"header-anchor",attrs:{href:"#example-data"}},[t._v("#")]),t._v(" Example Data")]),t._v(" "),n("p",[t._v("For this example, we are going to use two example Data Packages from our "),n("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/",target:"_blank",rel:"noopener noreferrer"}},[t._v("example data packages repository"),n("OutboundLink")],1),t._v(" with the properties described above. The first is an example of Data Package containing a GeoJSON file. "),n("a",{attrs:{href:"http://geojson.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GeoJSON"),n("OutboundLink")],1),t._v(" is a format for representing geographical features in "),n("a",{attrs:{href:"http://json.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("JSON"),n("OutboundLink")],1),t._v(". This particular GeoJSON file lists countries on its "),n("code",[t._v("features")]),t._v(" array and specifies the country code as a property on each “feature”. In this case, the country code is stored on the key “ISO_A3” of the feature’s "),n("code",[t._v("properties")]),t._v(" object.")]),t._v(" "),n("div",{staticClass:"language-json extra-class"},[n("pre",{pre:!0,attrs:{class:"language-json"}},[n("code",[n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"FeatureCollection"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"features"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Feature"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"properties"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"ADMIN"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Ukraine"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"ISO_A3"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"UKR"')]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"geometry"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Polygon"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"coordinates"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"..."')]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),n("p",[t._v("The second Data Package is a typical "),n("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),n("OutboundLink")],1),t._v(" containing a GDP measure for each country in the world for the year 2014. Country codes are stored, naturally, on the “Country Code” column.")]),t._v(" "),n("table",[n("thead",[n("tr",[n("th",[t._v("Country Name")]),t._v(" "),n("th",[t._v("Country Code")]),t._v(" "),n("th",[t._v("Year")]),t._v(" "),n("th",[t._v("Value")])])]),t._v(" "),n("tbody",[n("tr",[n("td",[t._v("Ukraine")]),t._v(" "),n("td",[t._v("UKR")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("131805126738.287")])]),t._v(" "),n("tr",[n("td",[t._v("United Arab Emirates")]),t._v(" "),n("td",[t._v("ARE")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("401646583173.427")])]),t._v(" "),n("tr",[n("td",[t._v("United Kingdom")]),t._v(" "),n("td",[t._v("GBR")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("2941885537461.48")])]),t._v(" "),n("tr",[n("td",[t._v("United States")]),t._v(" "),n("td",[t._v("USA")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("17419000000000")])]),t._v(" "),n("tr",[n("td",[t._v("Uruguay")]),t._v(" "),n("td",[t._v("URY")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("57471277325.1312")])])])]),t._v(" "),n("h2",{attrs:{id:"reading-and-joining-data"}},[n("a",{staticClass:"header-anchor",attrs:{href:"#reading-and-joining-data"}},[t._v("#")]),t._v(" Reading and Joining Data")]),t._v(" "),n("p",[t._v("As in our "),n("RouterLink",{attrs:{to:"/blog/2016/08/29/using-data-packages-in-python/"}},[t._v("Using Data Packages in Python guide")]),t._v(", the first step before joining is to read the data for each Data Package onto our computer. We do this by importing the "),n("code",[t._v("datapackage")]),t._v(" library and passing the Data Package url to its "),n("code",[t._v("DataPackage")]),t._v(" method. We are also importing the standard Python "),n("code",[t._v("json")]),t._v(" library to read and write our GeoJSON file.")],1),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" json\n"),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" datapackage\n\ncountries_url "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/geo-countries/datapackage.json'")]),t._v("\ngdp_url "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/gross-domestic-product-2014/datapackage.json'")]),t._v("\n\ncountries_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_url"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\ngdp_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("gdp_url"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\nworld "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" json"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("loads"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'countries'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("raw_read"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("decode"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'UTF-8'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n")])])]),n("p",[t._v("Learn more about creating data packages in Python "),n("RouterLink",{attrs:{to:"/blog/2016/07/21/creating-tabular-data-packages-in-python/"}},[t._v("in this tutorial")]),t._v(".")],1),t._v(" "),n("p",[t._v("Our GeoJSON data is stored as a "),n("code",[t._v("bytes")]),t._v(" object in the "),n("code",[t._v("data")]),t._v(" attribute of the first (and only) element of the Data Package "),n("code",[t._v("resources")]),t._v(" array. To create our "),n("code",[t._v("world")]),t._v(" GeoJSON dict, we first need to decode this "),n("code",[t._v("bytes")]),t._v(" object to a UTF-8 string and pass it to "),n("code",[t._v("json.loads")]),t._v(".")]),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[t._v("world "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" json"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("loads"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'countries'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("raw_read"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("decode"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'UTF-8'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),n("p",[t._v("At this point, joining the data can be accomplished by iterating through each country in the "),n("code",[t._v("world['features']")]),t._v(" array and adding a property “GDP (2014)” if “Country Code” on the "),n("code",[t._v("gdp_dp")]),t._v(" Data Package object matches “ISO_A3” on the given GeoJSON feature. The value of “GDP (2014)” is derived from the “Value” column on the "),n("code",[t._v("gdp_dp")]),t._v(" Data Package object.")]),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" feature "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" world"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'features'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n matches "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("gdp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Value'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" gdp "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" gdp_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("resources"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("data "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" gdp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Country Code'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),t._v(" feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'ISO_A3'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" matches"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'GDP (2014)'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("float")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("matches"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("else")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'GDP (2014)'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n")])])]),n("p",[t._v("Finally, we can output our consolidated GeoJSON dataset into a new file called “world_gdp_2014.geojson” using "),n("code",[t._v("json.dump")]),t._v(" and create a new Data Package container for it. For a more thorough walkthrough on creating a Data Package, please consult the"),n("br"),t._v(" "),n("RouterLink",{attrs:{to:"/blog/2016/07/21/creating-tabular-data-packages-in-python/"}},[t._v("Creating Data Packages in Python")]),t._v(" guide.")],1),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[t._v("new_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'consolidated-dataset'")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'resources'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'path'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'world_gdp_2014.geojson'")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("commit"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("save"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datapackage.zip'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),n("p",[t._v("We can now quickly render this GeoJSON file into a "),n("a",{attrs:{href:"https://en.wikipedia.org/wiki/Choropleth_map",target:"_blank",rel:"noopener noreferrer"}},[t._v("chloropleth map"),n("OutboundLink")],1),t._v(" using "),n("a",{attrs:{href:"http://qgis.org/en/site/",target:"_blank",rel:"noopener noreferrer"}},[t._v("QGIS"),n("OutboundLink")],1),t._v(":")]),t._v(" "),n("p",[n("img",{attrs:{src:s(457),alt:"GDP Map Example"}})]),t._v(" "),n("p",[t._v("Or we can rely on GitHub to render our GeoJSON for us. When you click a country, it’s property list will show up featuring “ADMIN”, “ISO_A3”, and the newly added “GDP (2014)” property.")])])}),[],!1,null,null,null);a.default=e.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[50],{457:function(t,a,s){t.exports=s.p+"assets/img/gdp_map_example.c3bf2487.png"},596:function(t,a,s){"use strict";s.r(a);var n=s(29),e=Object(n.a)({},(function(){var t=this,a=t.$createElement,n=t._self._c||a;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("Joining multiple datasets on a common value or set of values is a common data wrangling task. For instance, one might have a dataset listing Gross Domestic Product (GDP) per country and a separate dataset containing geographic outlines of country borders. If these independent datasets have a shared property (for instance, the three-letter country code as "),n("a",{attrs:{href:"https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3",target:"_blank",rel:"noopener noreferrer"}},[t._v("defined in ISO 3166-1"),n("OutboundLink")],1),t._v("),we should be able to create one consolidated dataset to generate a map of GDP per country. This guide will walk through this simple use case.")]),t._v(" "),n("h2",{attrs:{id:"example-data"}},[n("a",{staticClass:"header-anchor",attrs:{href:"#example-data"}},[t._v("#")]),t._v(" Example Data")]),t._v(" "),n("p",[t._v("For this example, we are going to use two example Data Packages from our "),n("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/",target:"_blank",rel:"noopener noreferrer"}},[t._v("example data packages repository"),n("OutboundLink")],1),t._v(" with the properties described above. The first is an example of Data Package containing a GeoJSON file. "),n("a",{attrs:{href:"http://geojson.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("GeoJSON"),n("OutboundLink")],1),t._v(" is a format for representing geographical features in "),n("a",{attrs:{href:"http://json.org/",target:"_blank",rel:"noopener noreferrer"}},[t._v("JSON"),n("OutboundLink")],1),t._v(". This particular GeoJSON file lists countries on its "),n("code",[t._v("features")]),t._v(" array and specifies the country code as a property on each “feature”. In this case, the country code is stored on the key “ISO_A3” of the feature’s "),n("code",[t._v("properties")]),t._v(" object.")]),t._v(" "),n("div",{staticClass:"language-json extra-class"},[n("pre",{pre:!0,attrs:{class:"language-json"}},[n("code",[n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"FeatureCollection"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"features"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Feature"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"properties"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"ADMIN"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Ukraine"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"ISO_A3"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"UKR"')]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"geometry"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"type"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Polygon"')]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token property"}},[t._v('"coordinates"')]),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v('"..."')]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),n("p",[t._v("The second Data Package is a typical "),n("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),n("OutboundLink")],1),t._v(" containing a GDP measure for each country in the world for the year 2014. Country codes are stored, naturally, on the “Country Code” column.")]),t._v(" "),n("table",[n("thead",[n("tr",[n("th",[t._v("Country Name")]),t._v(" "),n("th",[t._v("Country Code")]),t._v(" "),n("th",[t._v("Year")]),t._v(" "),n("th",[t._v("Value")])])]),t._v(" "),n("tbody",[n("tr",[n("td",[t._v("Ukraine")]),t._v(" "),n("td",[t._v("UKR")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("131805126738.287")])]),t._v(" "),n("tr",[n("td",[t._v("United Arab Emirates")]),t._v(" "),n("td",[t._v("ARE")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("401646583173.427")])]),t._v(" "),n("tr",[n("td",[t._v("United Kingdom")]),t._v(" "),n("td",[t._v("GBR")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("2941885537461.48")])]),t._v(" "),n("tr",[n("td",[t._v("United States")]),t._v(" "),n("td",[t._v("USA")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("17419000000000")])]),t._v(" "),n("tr",[n("td",[t._v("Uruguay")]),t._v(" "),n("td",[t._v("URY")]),t._v(" "),n("td",[t._v("2014")]),t._v(" "),n("td",[t._v("57471277325.1312")])])])]),t._v(" "),n("h2",{attrs:{id:"reading-and-joining-data"}},[n("a",{staticClass:"header-anchor",attrs:{href:"#reading-and-joining-data"}},[t._v("#")]),t._v(" Reading and Joining Data")]),t._v(" "),n("p",[t._v("As in our "),n("RouterLink",{attrs:{to:"/blog/2016/08/29/using-data-packages-in-python/"}},[t._v("Using Data Packages in Python guide")]),t._v(", the first step before joining is to read the data for each Data Package onto our computer. We do this by importing the "),n("code",[t._v("datapackage")]),t._v(" library and passing the Data Package url to its "),n("code",[t._v("DataPackage")]),t._v(" method. We are also importing the standard Python "),n("code",[t._v("json")]),t._v(" library to read and write our GeoJSON file.")],1),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" json\n"),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" datapackage\n\ncountries_url "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/geo-countries/datapackage.json'")]),t._v("\ngdp_url "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/gross-domestic-product-2014/datapackage.json'")]),t._v("\n\ncountries_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_url"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\ngdp_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("gdp_url"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\nworld "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" json"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("loads"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'countries'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("raw_read"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("decode"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'UTF-8'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n")])])]),n("p",[t._v("Learn more about creating data packages in Python "),n("RouterLink",{attrs:{to:"/blog/2016/07/21/creating-tabular-data-packages-in-python/"}},[t._v("in this tutorial")]),t._v(".")],1),t._v(" "),n("p",[t._v("Our GeoJSON data is stored as a "),n("code",[t._v("bytes")]),t._v(" object in the "),n("code",[t._v("data")]),t._v(" attribute of the first (and only) element of the Data Package "),n("code",[t._v("resources")]),t._v(" array. To create our "),n("code",[t._v("world")]),t._v(" GeoJSON dict, we first need to decode this "),n("code",[t._v("bytes")]),t._v(" object to a UTF-8 string and pass it to "),n("code",[t._v("json.loads")]),t._v(".")]),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[t._v("world "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" json"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("loads"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("countries_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("get_resource"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'countries'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("raw_read"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("decode"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'UTF-8'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),n("p",[t._v("At this point, joining the data can be accomplished by iterating through each country in the "),n("code",[t._v("world['features']")]),t._v(" array and adding a property “GDP (2014)” if “Country Code” on the "),n("code",[t._v("gdp_dp")]),t._v(" Data Package object matches “ISO_A3” on the given GeoJSON feature. The value of “GDP (2014)” is derived from the “Value” column on the "),n("code",[t._v("gdp_dp")]),t._v(" Data Package object.")]),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" feature "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" world"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'features'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n matches "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("gdp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Value'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" gdp "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("in")]),t._v(" gdp_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("resources"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("data "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" gdp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Country Code'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("==")]),t._v(" feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'ISO_A3'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" matches"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'GDP (2014)'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("float")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("matches"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("else")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v("\n feature"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'properties'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'GDP (2014)'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),t._v("\n")])])]),n("p",[t._v("Finally, we can output our consolidated GeoJSON dataset into a new file called “world_gdp_2014.geojson” using "),n("code",[t._v("json.dump")]),t._v(" and create a new Data Package container for it. For a more thorough walkthrough on creating a Data Package, please consult the"),n("br"),t._v(" "),n("RouterLink",{attrs:{to:"/blog/2016/07/21/creating-tabular-data-packages-in-python/"}},[t._v("Creating Data Packages in Python")]),t._v(" guide.")],1),t._v(" "),n("div",{staticClass:"language-python extra-class"},[n("pre",{pre:!0,attrs:{class:"language-python"}},[n("code",[t._v("new_dp "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" datapackage"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Package"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'consolidated-dataset'")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("descriptor"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'resources'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'path'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(":")]),t._v(" "),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'world_gdp_2014.geojson'")]),t._v("\n "),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("commit"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nnew_dp"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("save"),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),n("span",{pre:!0,attrs:{class:"token string"}},[t._v("'datapackage.zip'")]),n("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),n("p",[t._v("We can now quickly render this GeoJSON file into a "),n("a",{attrs:{href:"https://en.wikipedia.org/wiki/Choropleth_map",target:"_blank",rel:"noopener noreferrer"}},[t._v("chloropleth map"),n("OutboundLink")],1),t._v(" using "),n("a",{attrs:{href:"http://qgis.org/en/site/",target:"_blank",rel:"noopener noreferrer"}},[t._v("QGIS"),n("OutboundLink")],1),t._v(":")]),t._v(" "),n("p",[n("img",{attrs:{src:s(457),alt:"GDP Map Example"}})]),t._v(" "),n("p",[t._v("Or we can rely on GitHub to render our GeoJSON for us. When you click a country, it’s property list will show up featuring “ADMIN”, “ISO_A3”, and the newly added “GDP (2014)” property.")])])}),[],!1,null,null,null);a.default=e.exports}}]); \ No newline at end of file diff --git a/assets/js/51.e5c66212.js b/assets/js/51.e7f7bca2.js similarity index 97% rename from assets/js/51.e5c66212.js rename to assets/js/51.e7f7bca2.js index 7c73012e0..f286a0ed5 100644 --- a/assets/js/51.e5c66212.js +++ b/assets/js/51.e7f7bca2.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[51],{501:function(e,a,t){e.exports=t.p+"assets/img/OR.8407f111.png"},622:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("This blog is part of a series showcasing projects developed during the 2019 Tool Fund."),r("br"),e._v(" "),r("br"),e._v("\nOriginally published at "),r("a",{attrs:{href:"https://blog.okfn.org/2020/01/15/frictionless-data-tool-fund-update-shelby-switzer-and-greg-bloom-open-referral/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2020/01/15/frictionless-data-tool-fund-update-shelby-switzer-and-greg-bloom-open-referral/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("em",[e._v("The 2019 Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),r("p",[e._v("Open Referral creates standards for health, human, and social services data – the data found in community resource directories used to help find resources for people in need. In many organizations, this data lives in a multitude of formats, from handwritten notes to Excel files on a laptop to Microsoft SQL databases in the cloud. For community resource directories to be maximally useful to the public, this disparate data must be converted into an interoperable format. Many organizations have decided to use Open Referral’s Human Services Data Specification (HSDS) as that format. However, to accurately represent this data, HSDS uses multiple linked tables, which can be challenging to work with. To make this process easier, Greg Bloom and Shelby Switzer from Open Referral decided to implement datapackage bundling of their CSV files using the Frictionless Data Tool Fund.")]),e._v(" "),r("p",[e._v("In order to accurately represent the relationships between organizations, the services they provide, and the locations they are offered, Open Referral aims to use their Human Service Data Specification (HSDS) makes sense of disparate data by linking multiple CSV files together by foreign keys. Open Referral used Frictionless Data’s datapackage to specify the tables’ contents and relationships in a single machine-readable file, so that this standardized format could transport HSDS-compliant data in a way that all of the teams who work with this data can use: CSVs of linked data.")]),e._v(" "),r("p",[e._v("In the Tool Fund, Open Referral worked on their HSDS Transformer tool, which enables a group or person to transform data into an HSDS-compliant data package, so that it can then be combined with other data or used in any number of applications. The HSDS-Transformer is a Ruby library that can be used during the extract, transform, load (ETL) workflow of raw community resource data. This library extracts the community resource data, transforms that data into HSDS-compliant CSVs, and generates a datapackage.json that describes the data output. The Transformer can also output the datapackage as a zip file, called HSDS Zip, enabling systems to send and receive a single compressed file rather than multiple files. The Transformer can be spun up in a docker container — and once it’s live, the API can deliver a payload that includes links to the source data and to the configuration file that maps the source data to HSDS fields. The Transformer then grabs the source data and uses the configuration file to transform the data and return a zip file of the HSDS-compliant datapackage.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(501),alt:"DemoApp"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("A demo app consuming the API generated from the HSDS Zip")])]),e._v(" "),r("p",[e._v("The Open Referral team has also been working on projects related to the HSDS Transformer and HSDS Zip. For example, the HSDS Validator checks that a given datapackage of community service data is HSDS-compliant. Additionally, they have used these tools in the field with a project in Miami. For this project, the HSDS Transformer was used to transform data from a Microsoft SQL Server into an HSDS Zip. Then that zipped datapackage was used to populate a Human Services Data API with a generated developer portal and OpenAPI Specification.")]),e._v(" "),r("p",[e._v("Further, as part of this work, the team also contributed to the original source code for the datapackage-rb Ruby gem. They added a new feature to infer a datapackage.json schema from a given set of CSVs, so that you can generate the json file automatically from your dataset.")]),e._v(" "),r("p",[e._v("Greg and Shelby are eager for the Open Referral community to use these new tools and provide feedback. To use these tools currently, users should either be a Ruby developer who can use the gem as part of another Ruby project, or be familiar enough with Docker and HTTP APIs to start a Docker container and make an HTTP request to it. You can use the HSDS Transformer as a Ruby gem in another project or as a standalone API. In the future, the project might expand to include hosting the HSDS Transformer as a cloud service that anyone can use to transform their data, eliminating many of these technical requirements.")]),e._v(" "),r("p",[e._v("Interested in using these new tools? Open Referral wants to hear your feedback. For example, would it be useful to develop an extract-transform-load API, hosted in the cloud, that enables recurring transformation of nonstandardized human service directory data source into an HSDS-compliant datapackage? You can reach them via their GitHub repos.")]),e._v(" "),r("p",[e._v("Further reading:")]),e._v(" "),r("p",[e._v("Repository: "),r("a",{attrs:{href:"https://github.com/openreferral/hsds-transformer",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/openreferral/hsds-transformer"),r("OutboundLink")],1),r("br"),e._v("\nHSDS Transformer: "),r("a",{attrs:{href:"https://openreferral.github.io/hsds-transformer/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://openreferral.github.io/hsds-transformer/"),r("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[51],{494:function(e,a,t){e.exports=t.p+"assets/img/OR.8407f111.png"},620:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("This blog is part of a series showcasing projects developed during the 2019 Tool Fund."),r("br"),e._v(" "),r("br"),e._v("\nOriginally published at "),r("a",{attrs:{href:"https://blog.okfn.org/2020/01/15/frictionless-data-tool-fund-update-shelby-switzer-and-greg-bloom-open-referral/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://blog.okfn.org/2020/01/15/frictionless-data-tool-fund-update-shelby-switzer-and-greg-bloom-open-referral/"),r("OutboundLink")],1)]),e._v(" "),r("p",[r("em",[e._v("The 2019 Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.")])]),e._v(" "),r("p",[e._v("Open Referral creates standards for health, human, and social services data – the data found in community resource directories used to help find resources for people in need. In many organizations, this data lives in a multitude of formats, from handwritten notes to Excel files on a laptop to Microsoft SQL databases in the cloud. For community resource directories to be maximally useful to the public, this disparate data must be converted into an interoperable format. Many organizations have decided to use Open Referral’s Human Services Data Specification (HSDS) as that format. However, to accurately represent this data, HSDS uses multiple linked tables, which can be challenging to work with. To make this process easier, Greg Bloom and Shelby Switzer from Open Referral decided to implement datapackage bundling of their CSV files using the Frictionless Data Tool Fund.")]),e._v(" "),r("p",[e._v("In order to accurately represent the relationships between organizations, the services they provide, and the locations they are offered, Open Referral aims to use their Human Service Data Specification (HSDS) makes sense of disparate data by linking multiple CSV files together by foreign keys. Open Referral used Frictionless Data’s datapackage to specify the tables’ contents and relationships in a single machine-readable file, so that this standardized format could transport HSDS-compliant data in a way that all of the teams who work with this data can use: CSVs of linked data.")]),e._v(" "),r("p",[e._v("In the Tool Fund, Open Referral worked on their HSDS Transformer tool, which enables a group or person to transform data into an HSDS-compliant data package, so that it can then be combined with other data or used in any number of applications. The HSDS-Transformer is a Ruby library that can be used during the extract, transform, load (ETL) workflow of raw community resource data. This library extracts the community resource data, transforms that data into HSDS-compliant CSVs, and generates a datapackage.json that describes the data output. The Transformer can also output the datapackage as a zip file, called HSDS Zip, enabling systems to send and receive a single compressed file rather than multiple files. The Transformer can be spun up in a docker container — and once it’s live, the API can deliver a payload that includes links to the source data and to the configuration file that maps the source data to HSDS fields. The Transformer then grabs the source data and uses the configuration file to transform the data and return a zip file of the HSDS-compliant datapackage.")]),e._v(" "),r("p",[r("img",{attrs:{src:t(494),alt:"DemoApp"}}),e._v(" "),r("br"),e._v(" "),r("em",[e._v("A demo app consuming the API generated from the HSDS Zip")])]),e._v(" "),r("p",[e._v("The Open Referral team has also been working on projects related to the HSDS Transformer and HSDS Zip. For example, the HSDS Validator checks that a given datapackage of community service data is HSDS-compliant. Additionally, they have used these tools in the field with a project in Miami. For this project, the HSDS Transformer was used to transform data from a Microsoft SQL Server into an HSDS Zip. Then that zipped datapackage was used to populate a Human Services Data API with a generated developer portal and OpenAPI Specification.")]),e._v(" "),r("p",[e._v("Further, as part of this work, the team also contributed to the original source code for the datapackage-rb Ruby gem. They added a new feature to infer a datapackage.json schema from a given set of CSVs, so that you can generate the json file automatically from your dataset.")]),e._v(" "),r("p",[e._v("Greg and Shelby are eager for the Open Referral community to use these new tools and provide feedback. To use these tools currently, users should either be a Ruby developer who can use the gem as part of another Ruby project, or be familiar enough with Docker and HTTP APIs to start a Docker container and make an HTTP request to it. You can use the HSDS Transformer as a Ruby gem in another project or as a standalone API. In the future, the project might expand to include hosting the HSDS Transformer as a cloud service that anyone can use to transform their data, eliminating many of these technical requirements.")]),e._v(" "),r("p",[e._v("Interested in using these new tools? Open Referral wants to hear your feedback. For example, would it be useful to develop an extract-transform-load API, hosted in the cloud, that enables recurring transformation of nonstandardized human service directory data source into an HSDS-compliant datapackage? You can reach them via their GitHub repos.")]),e._v(" "),r("p",[e._v("Further reading:")]),e._v(" "),r("p",[e._v("Repository: "),r("a",{attrs:{href:"https://github.com/openreferral/hsds-transformer",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/openreferral/hsds-transformer"),r("OutboundLink")],1),r("br"),e._v("\nHSDS Transformer: "),r("a",{attrs:{href:"https://openreferral.github.io/hsds-transformer/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://openreferral.github.io/hsds-transformer/"),r("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/52.f8dc673f.js b/assets/js/52.76a34651.js similarity index 99% rename from assets/js/52.f8dc673f.js rename to assets/js/52.76a34651.js index b9b12f80f..d3d16cc4a 100644 --- a/assets/js/52.f8dc673f.js +++ b/assets/js/52.76a34651.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[52],{502:function(e,t){e.exports=""},623:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("This blog post describes a Frictionless Data Pilot with the Public Utility Data Liberation project. Pilot projects are part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research project"),o("OutboundLink")],1),e._v(". Written by Zane Selvans, Christina Gosnell, and Lilly Winfree.")]),e._v(" "),o("p",[e._v("The Public Utility Data Liberation project, "),o("a",{attrs:{href:"https://catalyst.coop/pudl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PUDL"),o("OutboundLink")],1),e._v(", aims to make US energy data easier to access and use. Much of this data, including information about the cost of electricity, how much fuel is being burned, powerplant usage, and emissions, is not well documented or is in difficult to use formats. Last year, PUDL joined forces with the Frictionless Data for Reproducible Research team as a Pilot project to release this public utility data. PUDL takes the original spreadsheets, CSV files, and databases and turns them into unified Frictionless [tabular data packages("),o("a",{attrs:{href:"https://frictionlessdata.io/docs/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/docs/tabular-data-package/"),o("OutboundLink")],1),e._v(")] that can be used to populate a database, or read in directly with Python, R, Microsoft Access, and many other tools.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(502),alt:"Catalyst Logo"}})]),e._v(" "),o("h2",{attrs:{id:"what-is-pudl"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-is-pudl"}},[e._v("#")]),e._v(" What is PUDL?")]),e._v(" "),o("p",[e._v("The PUDL project, which is coordinated by "),o("a",{attrs:{href:"https://catalyst.coop/pudl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Catalyst Cooperative"),o("OutboundLink")],1),e._v(", is focused on creating an energy utility data product that can serve a wide range of users. PUDL was inspired to make this data more accessible because the current US utility data ecosystem fragmented, and commercial products are expensive. There are hundreds of gigabytes of information available from government agencies, but they are often difficult to work with, and different sources can be hard to combine.")]),e._v(" "),o("p",[e._v("PUDL users include researchers, activists, journalists, and policy makers. They have a wide range of technical backgrounds, from grassroots organizers who might only feel comfortable with spreadsheets, to PhDs with cloud computing resources, so it was important to provide data that would work for all users.")]),e._v(" "),o("p",[e._v("Before PUDL, much of this data was freely available to download from various sources, but it was typically messy and not well documented. This led to a lack of uniformity and reproducibility amongst projects that were using this data. The users were scraping the data together in their own way, making it hard to compare analyses or understand outcomes. Therefore, one of the goals for PUDL was to minimize these duplicated efforts, and enable the creation of lasting, cumulative outputs.")]),e._v(" "),o("h2",{attrs:{id:"what-were-the-main-pilot-goals"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-were-the-main-pilot-goals"}},[e._v("#")]),e._v(" What were the main Pilot goals?")]),e._v(" "),o("p",[e._v("The main focus of this Pilot was to create a way to openly share the utility data in a reproducible way that would be understandable to PUDL’s many potential users. The first change Catalyst identified they wanted to make during the Pilot was with their data storage medium. PUDL was previously creating a Postgresql database as the main data output. However many users, even those with technical experience, found setting up the separate database software a major hurdle that prevented them from accessing and using the processed data. They also desired a static, archivable, platform-independent format. Therefore, Catalyst decided to transition PUDL away from PostgreSQL, and instead try Frictionless Tabular Data Packages. They also wanted a way to share the processed data without needing to commit to long-term maintenance and curation, meaning they needed the outputs to continue being useful to users even if they only had minimal resources to dedicate to the maintenance and updates. The team decided to package their data into Tabular Data Packages and identified Zenodo as a good option for openly hosting that packaged data.")]),e._v(" "),o("p",[e._v("Catalyst also recognized that most users only want to download the outputs and use them directly, and did not care about reproducing the data processing pipeline themselves, but it was still important to provide the processing pipeline code publicly to support transparency and reproducibility. Therefore, in this Pilot, they focused on transitioning their existing ETL pipeline from outputting a PostgreSQL database, that was defined using SQLAlchemy, to outputting datapackages which could then be archived publicly on Zenodo. Importantly, they needed this pipeline to maintain the metadata, information about data type, and database structural information that had already been accumulated. This rich metadata needed to be stored alongside the data itself, so future users could understand where the data came from and understand its meaning. The Catalyst team used Tabular Data Packages to record and store this metadata (see the code here: "),o("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py"),o("OutboundLink")],1),e._v(").")]),e._v(" "),o("p",[e._v("Another complicating factor is that many of the PUDL datasets are fairly entangled with each other. The PUDL team ideally wanted users to be able to pick and choose which datasets they actually wanted to download and use without requiring them to download it all (currently about 100GB of data when uncompressed). However, they were worried that if single datasets were downloaded, the users might miss that some of the datasets were meant to be used together. So, the PUDL team created information, which they call “glue”, that shows which datasets are linked together and that should ideally be used in tandem.")]),e._v(" "),o("p",[e._v("The cumulation of this Pilot was a release of the PUDL data (access it here – "),o("a",{attrs:{href:"https://zenodo.org/record/3672068",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://zenodo.org/record/3672068"),o("OutboundLink")],1),e._v(" and read the corresponding documentation here – "),o("a",{attrs:{href:"https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/"),o("OutboundLink")],1),e._v("), which includes integrated data from the EIA Form 860, EIA Form 923, The EPA Continuous Emissions Monitoring System (CEMS), The EPA Integrated Planning Model (IPM), and FERC Form 1.")]),e._v(" "),o("h2",{attrs:{id:"what-problems-were-encountered-during-this-pilot"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-problems-were-encountered-during-this-pilot"}},[e._v("#")]),e._v(" What problems were encountered during this Pilot?")]),e._v(" "),o("p",[e._v("One issue that the group encountered during the Pilot was that the data types available in Postgres are substantially richer than those natively in the Tabular Data Package standard. However, this issue is an endemic problem of wanting to work with several different platforms, and so the team compromised and worked with the least common denominator. In the future, PUDL might store several different sets of data types for use in different contexts, for example, one for freezing the data out into data packages, one for SQLite, and one for Pandas.")]),e._v(" "),o("p",[e._v("Another problem encountered during the Pilot resulted from testing the limits of the draft Tabular Data Package specifications. There were aspects of the specifications that the Catalyst team assumed were fully implemented in the reference (Python) implementation of the Frictionless toolset, but were in fact still works in progress. This work led the Frictionless team to start a documentation improvement project, including a revision of the specifications website to incorporate this feedback.")]),e._v(" "),o("p",[e._v("Through the pilot, the teams worked to implement new Frictionless features, including the specification of composite primary keys and foreign key references that point to external data packages. Other new Frictionless functionality that was created with this Pilot included partitioning of large resources into resource groups in which all resources use identical table schemas, and adding gzip compression of resources. The Pilot also focused on implementing more complete validation through goodtables, including bytes/hash checks, foreign keys checks, and primary keys checks, though there is still more work to be done here.")]),e._v(" "),o("h2",{attrs:{id:"future-directions"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#future-directions"}},[e._v("#")]),e._v(" Future Directions")]),e._v(" "),o("p",[e._v("A common problem with using publicly available energy data is that the federal agencies creating the data do not use version control or maintain change logs for the data they publish, but they do frequently go back years after the fact to revise or alter previously published data — with no notification. To combat this problem, Catalyst is using data packages to encapsulate the raw inputs to the ETL process. They are setting up a process which will periodically check to see if the federal agencies’ posted data has been updated or changed, create an archive, and upload it to Zenodo. They will also store metadata in non-tabular data packages, indicating which information is stored in each file (year, state, month, etc.) so that there can be a uniform process of querying those raw input data packages. This will mean the raw inputs won’t have to be archived alongside every data release. Instead one can simply refer to these other versioned archives of the inputs. Catalyst hopes these version controlled raw archives will also be useful to other researchers.")]),e._v(" "),o("p",[e._v("Another next step for Catalyst will be to make the ETL and new dataset integration more modular to hopefully make it easier for others to integrate new datasets. For instance, they are planning on integrating the EIA 861 and the ISO/RTO LMP data next. Other future plans include simplifying metadata storage, using Docker to containerize the ETL process for better reproducibility, and setting up a "),o("a",{attrs:{href:"https://pangeo.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pangeo"),o("OutboundLink")],1),e._v(" instance for live interactive data access without requiring anyone to download any data at all. The team would also like to build visualizations that sit on top of the database, making an interactive, regularly updated map of US coal plants and their operating costs, compared to new renewable energy in the same area. They would also like to visualize power plant operational attributes from EPA CEMS (e.g., ramp rates, min/max operating loads, relationship between load factor and heat rate, marginal additional fuel required for a startup event…).")]),e._v(" "),o("p",[e._v("Have you used PUDL? The team would love to hear feedback from users of the published data so that they can understand how to improve it, based on real user experiences. If you are integrating other US energy/electricity data of interest, please talk to the PUDL team about whether they might want to integrate it into PUDL to help ensure that it’s all more standardized and can be maintained long term. Also let them know what other datasets you would find useful (E.g. FERC EQR, FERC 714, PHMSA Pipelines, MSHA mines…). If you have questions, please ask them on GitHub ("),o("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl"),o("OutboundLink")],1),e._v(") so that the answers will be public for others to find as well.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[52],{495:function(e,t){e.exports=""},621:function(e,t,a){"use strict";a.r(t);var o=a(29),r=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("This blog post describes a Frictionless Data Pilot with the Public Utility Data Liberation project. Pilot projects are part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research project"),o("OutboundLink")],1),e._v(". Written by Zane Selvans, Christina Gosnell, and Lilly Winfree.")]),e._v(" "),o("p",[e._v("The Public Utility Data Liberation project, "),o("a",{attrs:{href:"https://catalyst.coop/pudl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("PUDL"),o("OutboundLink")],1),e._v(", aims to make US energy data easier to access and use. Much of this data, including information about the cost of electricity, how much fuel is being burned, powerplant usage, and emissions, is not well documented or is in difficult to use formats. Last year, PUDL joined forces with the Frictionless Data for Reproducible Research team as a Pilot project to release this public utility data. PUDL takes the original spreadsheets, CSV files, and databases and turns them into unified Frictionless [tabular data packages("),o("a",{attrs:{href:"https://frictionlessdata.io/docs/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://frictionlessdata.io/docs/tabular-data-package/"),o("OutboundLink")],1),e._v(")] that can be used to populate a database, or read in directly with Python, R, Microsoft Access, and many other tools.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(495),alt:"Catalyst Logo"}})]),e._v(" "),o("h2",{attrs:{id:"what-is-pudl"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-is-pudl"}},[e._v("#")]),e._v(" What is PUDL?")]),e._v(" "),o("p",[e._v("The PUDL project, which is coordinated by "),o("a",{attrs:{href:"https://catalyst.coop/pudl/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Catalyst Cooperative"),o("OutboundLink")],1),e._v(", is focused on creating an energy utility data product that can serve a wide range of users. PUDL was inspired to make this data more accessible because the current US utility data ecosystem fragmented, and commercial products are expensive. There are hundreds of gigabytes of information available from government agencies, but they are often difficult to work with, and different sources can be hard to combine.")]),e._v(" "),o("p",[e._v("PUDL users include researchers, activists, journalists, and policy makers. They have a wide range of technical backgrounds, from grassroots organizers who might only feel comfortable with spreadsheets, to PhDs with cloud computing resources, so it was important to provide data that would work for all users.")]),e._v(" "),o("p",[e._v("Before PUDL, much of this data was freely available to download from various sources, but it was typically messy and not well documented. This led to a lack of uniformity and reproducibility amongst projects that were using this data. The users were scraping the data together in their own way, making it hard to compare analyses or understand outcomes. Therefore, one of the goals for PUDL was to minimize these duplicated efforts, and enable the creation of lasting, cumulative outputs.")]),e._v(" "),o("h2",{attrs:{id:"what-were-the-main-pilot-goals"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-were-the-main-pilot-goals"}},[e._v("#")]),e._v(" What were the main Pilot goals?")]),e._v(" "),o("p",[e._v("The main focus of this Pilot was to create a way to openly share the utility data in a reproducible way that would be understandable to PUDL’s many potential users. The first change Catalyst identified they wanted to make during the Pilot was with their data storage medium. PUDL was previously creating a Postgresql database as the main data output. However many users, even those with technical experience, found setting up the separate database software a major hurdle that prevented them from accessing and using the processed data. They also desired a static, archivable, platform-independent format. Therefore, Catalyst decided to transition PUDL away from PostgreSQL, and instead try Frictionless Tabular Data Packages. They also wanted a way to share the processed data without needing to commit to long-term maintenance and curation, meaning they needed the outputs to continue being useful to users even if they only had minimal resources to dedicate to the maintenance and updates. The team decided to package their data into Tabular Data Packages and identified Zenodo as a good option for openly hosting that packaged data.")]),e._v(" "),o("p",[e._v("Catalyst also recognized that most users only want to download the outputs and use them directly, and did not care about reproducing the data processing pipeline themselves, but it was still important to provide the processing pipeline code publicly to support transparency and reproducibility. Therefore, in this Pilot, they focused on transitioning their existing ETL pipeline from outputting a PostgreSQL database, that was defined using SQLAlchemy, to outputting datapackages which could then be archived publicly on Zenodo. Importantly, they needed this pipeline to maintain the metadata, information about data type, and database structural information that had already been accumulated. This rich metadata needed to be stored alongside the data itself, so future users could understand where the data came from and understand its meaning. The Catalyst team used Tabular Data Packages to record and store this metadata (see the code here: "),o("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py"),o("OutboundLink")],1),e._v(").")]),e._v(" "),o("p",[e._v("Another complicating factor is that many of the PUDL datasets are fairly entangled with each other. The PUDL team ideally wanted users to be able to pick and choose which datasets they actually wanted to download and use without requiring them to download it all (currently about 100GB of data when uncompressed). However, they were worried that if single datasets were downloaded, the users might miss that some of the datasets were meant to be used together. So, the PUDL team created information, which they call “glue”, that shows which datasets are linked together and that should ideally be used in tandem.")]),e._v(" "),o("p",[e._v("The cumulation of this Pilot was a release of the PUDL data (access it here – "),o("a",{attrs:{href:"https://zenodo.org/record/3672068",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://zenodo.org/record/3672068"),o("OutboundLink")],1),e._v(" and read the corresponding documentation here – "),o("a",{attrs:{href:"https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/"),o("OutboundLink")],1),e._v("), which includes integrated data from the EIA Form 860, EIA Form 923, The EPA Continuous Emissions Monitoring System (CEMS), The EPA Integrated Planning Model (IPM), and FERC Form 1.")]),e._v(" "),o("h2",{attrs:{id:"what-problems-were-encountered-during-this-pilot"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#what-problems-were-encountered-during-this-pilot"}},[e._v("#")]),e._v(" What problems were encountered during this Pilot?")]),e._v(" "),o("p",[e._v("One issue that the group encountered during the Pilot was that the data types available in Postgres are substantially richer than those natively in the Tabular Data Package standard. However, this issue is an endemic problem of wanting to work with several different platforms, and so the team compromised and worked with the least common denominator. In the future, PUDL might store several different sets of data types for use in different contexts, for example, one for freezing the data out into data packages, one for SQLite, and one for Pandas.")]),e._v(" "),o("p",[e._v("Another problem encountered during the Pilot resulted from testing the limits of the draft Tabular Data Package specifications. There were aspects of the specifications that the Catalyst team assumed were fully implemented in the reference (Python) implementation of the Frictionless toolset, but were in fact still works in progress. This work led the Frictionless team to start a documentation improvement project, including a revision of the specifications website to incorporate this feedback.")]),e._v(" "),o("p",[e._v("Through the pilot, the teams worked to implement new Frictionless features, including the specification of composite primary keys and foreign key references that point to external data packages. Other new Frictionless functionality that was created with this Pilot included partitioning of large resources into resource groups in which all resources use identical table schemas, and adding gzip compression of resources. The Pilot also focused on implementing more complete validation through goodtables, including bytes/hash checks, foreign keys checks, and primary keys checks, though there is still more work to be done here.")]),e._v(" "),o("h2",{attrs:{id:"future-directions"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#future-directions"}},[e._v("#")]),e._v(" Future Directions")]),e._v(" "),o("p",[e._v("A common problem with using publicly available energy data is that the federal agencies creating the data do not use version control or maintain change logs for the data they publish, but they do frequently go back years after the fact to revise or alter previously published data — with no notification. To combat this problem, Catalyst is using data packages to encapsulate the raw inputs to the ETL process. They are setting up a process which will periodically check to see if the federal agencies’ posted data has been updated or changed, create an archive, and upload it to Zenodo. They will also store metadata in non-tabular data packages, indicating which information is stored in each file (year, state, month, etc.) so that there can be a uniform process of querying those raw input data packages. This will mean the raw inputs won’t have to be archived alongside every data release. Instead one can simply refer to these other versioned archives of the inputs. Catalyst hopes these version controlled raw archives will also be useful to other researchers.")]),e._v(" "),o("p",[e._v("Another next step for Catalyst will be to make the ETL and new dataset integration more modular to hopefully make it easier for others to integrate new datasets. For instance, they are planning on integrating the EIA 861 and the ISO/RTO LMP data next. Other future plans include simplifying metadata storage, using Docker to containerize the ETL process for better reproducibility, and setting up a "),o("a",{attrs:{href:"https://pangeo.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Pangeo"),o("OutboundLink")],1),e._v(" instance for live interactive data access without requiring anyone to download any data at all. The team would also like to build visualizations that sit on top of the database, making an interactive, regularly updated map of US coal plants and their operating costs, compared to new renewable energy in the same area. They would also like to visualize power plant operational attributes from EPA CEMS (e.g., ramp rates, min/max operating loads, relationship between load factor and heat rate, marginal additional fuel required for a startup event…).")]),e._v(" "),o("p",[e._v("Have you used PUDL? The team would love to hear feedback from users of the published data so that they can understand how to improve it, based on real user experiences. If you are integrating other US energy/electricity data of interest, please talk to the PUDL team about whether they might want to integrate it into PUDL to help ensure that it’s all more standardized and can be maintained long term. Also let them know what other datasets you would find useful (E.g. FERC EQR, FERC 714, PHMSA Pipelines, MSHA mines…). If you have questions, please ask them on GitHub ("),o("a",{attrs:{href:"https://github.com/catalyst-cooperative/pudl",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/catalyst-cooperative/pudl"),o("OutboundLink")],1),e._v(") so that the answers will be public for others to find as well.")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/53.f8912803.js b/assets/js/53.202bb7e3.js similarity index 84% rename from assets/js/53.f8912803.js rename to assets/js/53.202bb7e3.js index 1c5d1fd96..f2fc47880 100644 --- a/assets/js/53.f8912803.js +++ b/assets/js/53.202bb7e3.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[53],{503:function(t,e,o){t.exports=o.p+"assets/img/community.33c3b55f.jpeg"},628:function(t,e,o){"use strict";o.r(e);var n=o(29),r=Object(n.a)({},(function(){var t=this,e=t.$createElement,n=t._self._c||e;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("We are hosting another round of our virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),n("p",[n("img",{attrs:{src:o(503),alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),n("p",[t._v("The hangout is scheduled to hold on "),n("strong",[t._v("21st May 2020 at 5 pm BST")]),t._v(". If you would like to attend the hangout, "),n("a",{attrs:{href:"https://us02web.zoom.us/meeting/register/tZMsf-qrrjopHtGZwMyM7tCmp_YyPlNms6wK",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event here"),n("OutboundLink")],1)]),t._v(" "),n("p",[t._v("Looking forward to seeing you there!")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[53],{506:function(t,e,o){t.exports=o.p+"assets/img/community.33c3b55f.jpeg"},629:function(t,e,o){"use strict";o.r(e);var n=o(29),r=Object(n.a)({},(function(){var t=this,e=t.$createElement,n=t._self._c||e;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("We are hosting another round of our virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),n("p",[n("img",{attrs:{src:o(506),alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),n("p",[t._v("The hangout is scheduled to hold on "),n("strong",[t._v("21st May 2020 at 5 pm BST")]),t._v(". If you would like to attend the hangout, "),n("a",{attrs:{href:"https://us02web.zoom.us/meeting/register/tZMsf-qrrjopHtGZwMyM7tCmp_YyPlNms6wK",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event here"),n("OutboundLink")],1)]),t._v(" "),n("p",[t._v("Looking forward to seeing you there!")])])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/54.2b15fed4.js b/assets/js/54.9e9cbb98.js similarity index 98% rename from assets/js/54.2b15fed4.js rename to assets/js/54.9e9cbb98.js index 2b77c10b6..b4a93296f 100644 --- a/assets/js/54.2b15fed4.js +++ b/assets/js/54.9e9cbb98.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[54],{516:function(e,a,t){e.exports=t.p+"assets/img/fellows-img-1.967b02cd.png"},691:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science.")]),e._v(" "),r("p",[e._v("As part of their training, we asked the 3rd cohort of Frictionless Fellows to package their research data in Frictionless "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Packages"),r("OutboundLink")],1),e._v(". Here’s what they reported on their experience:")]),e._v(" "),r("h2",{attrs:{id:"victoria"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#victoria"}},[e._v("#")]),e._v(" Victoria")]),e._v(" "),r("p",[e._v("Constantly under the impression that I’m six months behind on lab work, I am capital Q - Queen - of bad data practices. My computer is a graveyard of poorly labeled .csv files, featuring illustrative headers such as “redo,” “negative pressure why?” and “weird - see notes.” I was vaguely aware of the existence of data packages, but like learning Italian or traveling more, implementing them in my workflow got slotted in the category of “would be nice if I had the time.” That clemency, however, was not extended to my research lifeblood - molecular spectroscopy databases, you disorganised beauties you - nor to collaborators who often invoked the following feeling:")]),e._v(" "),r("p",[r("img",{attrs:{src:t(516),alt:"fellows-img-1"}})]),e._v(" "),r("p",[e._v("Particularly in fields where measurables aren’t tangible macro concepts (see: population) but abstract and insular conventions with many varied representations, clear descriptors of multivariate data are a must in order for that data to be easily used and reproduced. This is where data packages come in; they bundle up your data with a human and machine readable file containing, at minimum, standardised information regarding structure and contents. In this lil’ post here, we’re going to walk through this process together by packaging data together with its metadata, and then validating the data using Frictionless tools.")]),e._v(" "),r("p",[e._v("Keep on reading about Victoria’s experience packaging data in her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/victoria-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"lindsay"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#lindsay"}},[e._v("#")]),e._v(" Lindsay")]),e._v(" "),r("p",[e._v("The first tenet of the American Library Association’s Bill of Rights states: “Books and other library resources should be provided for the interest, information, and enlightenment of all people of the community the library serves” (American Library Association). Libraries are supposed to be for everyone. Unfortunately, like many other institutions, libraries were founded upon outdated and racist patriarchal heteronormative ideals that ostracise users from marginalized backgrounds. Most academic libraries in the United States use the Library of Congress Classification System to organize books, a system that inadvertently centers christian, heterosexual white males. Critical librarianship, or critical cataloging is “a movement of library workers dedicated to bringing social justice principles into our work in libraries” "),r("a",{attrs:{href:"http://critlib.org/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("critlib"),r("OutboundLink")],1),e._v(". I would like to use data science principles to explore bias in library MARC (machine readable catalog) records.")]),e._v(" "),r("p",[e._v("Read Lindsay’s Data Package blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lindsay-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"zarena"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#zarena"}},[e._v("#")]),e._v(" Zarena")]),e._v(" "),r("p",[e._v("As a social science researcher studying the research landscape in Central Asian countries, I decided to share a part of my dataset with key bibliometric information about the journal articles published by Kyrgyzstani authors between 1991-2021. The data I am going to share comes from the "),r("a",{attrs:{href:"https://www.lens.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lens"),r("OutboundLink")],1),e._v(" platform. To ensure the data quality, and to comply with the "),r("a",{attrs:{href:"https://howtofair.dk/what-is-fair/#fair-principles",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAIR principles"),r("OutboundLink")],1),e._v(", before sharing my data, I created a data package that consists of the cleaned raw data, "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#metadata-properties",target:"_blank",rel:"noopener noreferrer"}},[e._v("metadata"),r("OutboundLink")],1),e._v(", and "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#language",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("I tested two methods to create such a package. First, I tried to use the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package programming libraries"),r("OutboundLink")],1),e._v(". This method lets you do more than just to create a data package (e.g., describe, extract, transform, and validate your data). But I found the programming libraries a bit complicated. So, I ended up using the second method, that is the browser tool "),r("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Package Creator"),r("OutboundLink")],1),e._v(". It lets you create a data package without ay technical knowledge. The tool is comparatively simple and easy to navigate. It allows you to clean your dataset, change datatypes, provide a short description to your data as well as to add and edit associated metadata…")]),e._v(" "),r("p",[e._v("Keep on reading about how Zarena packaged here data in her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/zarena-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"kevin"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#kevin"}},[e._v("#")]),e._v(" Kevin")]),e._v(" "),r("p",[e._v("My research aims at understanding the transmission mechanisms of neglected vector-borne diseases. I mostly deal with data on the distribution and diversity of vectors of diseases and their infection status. The metadata would include but not be limited to the date of sample collection, location and GPS coordinates of the sites of sample collection, type of sample (blood or fly sample), the concentration of RNA or DNA extracted from the samples, and the infection status of the samples (whether the samples are infected with pathogens or not) as well as the blood meal sources of the insect vectors. All these datasets are supposed to be presented in a way that it can be understood by whoever accesses it and that information regarding the licensing and other attribution information can easily be accessed. One way to reduce friction when dealing with such huge datasets is to put them in a container that groups all the descriptive data and schema together. A schema tells us how the data is structured and the type of content that is expected in it. All this is contained in a data package that can be generated by a data package creator.")]),e._v(" "),r("p",[e._v("I am going to take you through a step by step process on how I created a data package for my dataset on sandflies diversity, infection status, and their blood-meal sources, using Frictionless Data Package Creator…")]),e._v(" "),r("p",[e._v("Read Kevin’s blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kk-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(" to know more about how he created data packages for his data.")]),e._v(" "),r("h2",{attrs:{id:"guo-qiang"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#guo-qiang"}},[e._v("#")]),e._v(" Guo Qiang")]),e._v(" "),r("p",[e._v("The dataset I am going to package is from a project which we have recently completed – “"),r("a",{attrs:{href:"https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003731",target:"_blank",rel:"noopener noreferrer"}},[e._v("Menopausal hormone therapy and women’s health: An umbrella review"),r("OutboundLink")],1),e._v("” which summarizes the clinical evidence on various health effects of menopausal hormone therapy in menopausal women. The full datasets are publicly available in the "),r("a",{attrs:{href:"https://osf.io/dsy37/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Science Framework"),r("OutboundLink")],1),e._v(". I am going to use one of the datasets –All-Cause Mortality.xlsx, which summarizes all the clinical trials published until 2017 investigating the effect of menopausal hormone therapy on all-cause mortality in menopausal women – to illustrate the process of creating a Data Package.")]),e._v(" "),r("p",[e._v("As the Data Package Creator currently accepts only .csv format, first I need to convert All-Cause Mortality.xlsx to .csv format…")]),e._v(" "),r("p",[e._v("Keep on reading about Guo Qiang’s experience of packaging is data in his blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/guo-qiang-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"melvin"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#melvin"}},[e._v("#")]),e._v(" Melvin")]),e._v(" "),r("p",[e._v("Being a soil science student, I felt using soil data would be useful for me to better understand this process of packaging data for future use. I got data on the impact of fertiliser recommendations on yield and felt it would be great to use it. However, this wasn’t such a good idea as I got so many error messages and clean-ups to do to suit the tabular data accepted by the data package creator ("),r("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("create.frictionlessdata.io"),r("OutboundLink")],1),e._v("). Similarly in case you want to create a data package using someone else’s data it should either have a licence or ask to use the data.Afterwards, I got around to working with a different data set that was more straightforward and easy to work with.The data was on the infection prevalence of ‘Ca. Anaplasma camelii’ in camels and camel keds evaluated in different seasons within a year…")]),e._v(" "),r("p",[e._v("To read about the errors that Melvin got and what she learned from them, read her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("hr"),e._v(" "),r("p",[e._v("You can read all the Frictionless Data Fellows’ blogs on the dedicated website: "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://fellows.frictionlessdata.io/"),r("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[54],{514:function(e,a,t){e.exports=t.p+"assets/img/fellows-img-1.967b02cd.png"},688:function(e,a,t){"use strict";t.r(a);var r=t(29),o=Object(r.a)({},(function(){var e=this,a=e.$createElement,r=e._self._c||a;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science.")]),e._v(" "),r("p",[e._v("As part of their training, we asked the 3rd cohort of Frictionless Fellows to package their research data in Frictionless "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Packages"),r("OutboundLink")],1),e._v(". Here’s what they reported on their experience:")]),e._v(" "),r("h2",{attrs:{id:"victoria"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#victoria"}},[e._v("#")]),e._v(" Victoria")]),e._v(" "),r("p",[e._v("Constantly under the impression that I’m six months behind on lab work, I am capital Q - Queen - of bad data practices. My computer is a graveyard of poorly labeled .csv files, featuring illustrative headers such as “redo,” “negative pressure why?” and “weird - see notes.” I was vaguely aware of the existence of data packages, but like learning Italian or traveling more, implementing them in my workflow got slotted in the category of “would be nice if I had the time.” That clemency, however, was not extended to my research lifeblood - molecular spectroscopy databases, you disorganised beauties you - nor to collaborators who often invoked the following feeling:")]),e._v(" "),r("p",[r("img",{attrs:{src:t(514),alt:"fellows-img-1"}})]),e._v(" "),r("p",[e._v("Particularly in fields where measurables aren’t tangible macro concepts (see: population) but abstract and insular conventions with many varied representations, clear descriptors of multivariate data are a must in order for that data to be easily used and reproduced. This is where data packages come in; they bundle up your data with a human and machine readable file containing, at minimum, standardised information regarding structure and contents. In this lil’ post here, we’re going to walk through this process together by packaging data together with its metadata, and then validating the data using Frictionless tools.")]),e._v(" "),r("p",[e._v("Keep on reading about Victoria’s experience packaging data in her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/victoria-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"lindsay"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#lindsay"}},[e._v("#")]),e._v(" Lindsay")]),e._v(" "),r("p",[e._v("The first tenet of the American Library Association’s Bill of Rights states: “Books and other library resources should be provided for the interest, information, and enlightenment of all people of the community the library serves” (American Library Association). Libraries are supposed to be for everyone. Unfortunately, like many other institutions, libraries were founded upon outdated and racist patriarchal heteronormative ideals that ostracise users from marginalized backgrounds. Most academic libraries in the United States use the Library of Congress Classification System to organize books, a system that inadvertently centers christian, heterosexual white males. Critical librarianship, or critical cataloging is “a movement of library workers dedicated to bringing social justice principles into our work in libraries” "),r("a",{attrs:{href:"http://critlib.org/about/",target:"_blank",rel:"noopener noreferrer"}},[e._v("critlib"),r("OutboundLink")],1),e._v(". I would like to use data science principles to explore bias in library MARC (machine readable catalog) records.")]),e._v(" "),r("p",[e._v("Read Lindsay’s Data Package blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/lindsay-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"zarena"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#zarena"}},[e._v("#")]),e._v(" Zarena")]),e._v(" "),r("p",[e._v("As a social science researcher studying the research landscape in Central Asian countries, I decided to share a part of my dataset with key bibliometric information about the journal articles published by Kyrgyzstani authors between 1991-2021. The data I am going to share comes from the "),r("a",{attrs:{href:"https://www.lens.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Lens"),r("OutboundLink")],1),e._v(" platform. To ensure the data quality, and to comply with the "),r("a",{attrs:{href:"https://howtofair.dk/what-is-fair/#fair-principles",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAIR principles"),r("OutboundLink")],1),e._v(", before sharing my data, I created a data package that consists of the cleaned raw data, "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#metadata-properties",target:"_blank",rel:"noopener noreferrer"}},[e._v("metadata"),r("OutboundLink")],1),e._v(", and "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#language",target:"_blank",rel:"noopener noreferrer"}},[e._v("schema"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("p",[e._v("I tested two methods to create such a package. First, I tried to use the "),r("a",{attrs:{href:"https://framework.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data package programming libraries"),r("OutboundLink")],1),e._v(". This method lets you do more than just to create a data package (e.g., describe, extract, transform, and validate your data). But I found the programming libraries a bit complicated. So, I ended up using the second method, that is the browser tool "),r("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Package Creator"),r("OutboundLink")],1),e._v(". It lets you create a data package without ay technical knowledge. The tool is comparatively simple and easy to navigate. It allows you to clean your dataset, change datatypes, provide a short description to your data as well as to add and edit associated metadata…")]),e._v(" "),r("p",[e._v("Keep on reading about how Zarena packaged here data in her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/zarena-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"kevin"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#kevin"}},[e._v("#")]),e._v(" Kevin")]),e._v(" "),r("p",[e._v("My research aims at understanding the transmission mechanisms of neglected vector-borne diseases. I mostly deal with data on the distribution and diversity of vectors of diseases and their infection status. The metadata would include but not be limited to the date of sample collection, location and GPS coordinates of the sites of sample collection, type of sample (blood or fly sample), the concentration of RNA or DNA extracted from the samples, and the infection status of the samples (whether the samples are infected with pathogens or not) as well as the blood meal sources of the insect vectors. All these datasets are supposed to be presented in a way that it can be understood by whoever accesses it and that information regarding the licensing and other attribution information can easily be accessed. One way to reduce friction when dealing with such huge datasets is to put them in a container that groups all the descriptive data and schema together. A schema tells us how the data is structured and the type of content that is expected in it. All this is contained in a data package that can be generated by a data package creator.")]),e._v(" "),r("p",[e._v("I am going to take you through a step by step process on how I created a data package for my dataset on sandflies diversity, infection status, and their blood-meal sources, using Frictionless Data Package Creator…")]),e._v(" "),r("p",[e._v("Read Kevin’s blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/kk-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(" to know more about how he created data packages for his data.")]),e._v(" "),r("h2",{attrs:{id:"guo-qiang"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#guo-qiang"}},[e._v("#")]),e._v(" Guo Qiang")]),e._v(" "),r("p",[e._v("The dataset I am going to package is from a project which we have recently completed – “"),r("a",{attrs:{href:"https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003731",target:"_blank",rel:"noopener noreferrer"}},[e._v("Menopausal hormone therapy and women’s health: An umbrella review"),r("OutboundLink")],1),e._v("” which summarizes the clinical evidence on various health effects of menopausal hormone therapy in menopausal women. The full datasets are publicly available in the "),r("a",{attrs:{href:"https://osf.io/dsy37/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Science Framework"),r("OutboundLink")],1),e._v(". I am going to use one of the datasets –All-Cause Mortality.xlsx, which summarizes all the clinical trials published until 2017 investigating the effect of menopausal hormone therapy on all-cause mortality in menopausal women – to illustrate the process of creating a Data Package.")]),e._v(" "),r("p",[e._v("As the Data Package Creator currently accepts only .csv format, first I need to convert All-Cause Mortality.xlsx to .csv format…")]),e._v(" "),r("p",[e._v("Keep on reading about Guo Qiang’s experience of packaging is data in his blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/guo-qiang-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("h2",{attrs:{id:"melvin"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#melvin"}},[e._v("#")]),e._v(" Melvin")]),e._v(" "),r("p",[e._v("Being a soil science student, I felt using soil data would be useful for me to better understand this process of packaging data for future use. I got data on the impact of fertiliser recommendations on yield and felt it would be great to use it. However, this wasn’t such a good idea as I got so many error messages and clean-ups to do to suit the tabular data accepted by the data package creator ("),r("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("create.frictionlessdata.io"),r("OutboundLink")],1),e._v("). Similarly in case you want to create a data package using someone else’s data it should either have a licence or ask to use the data.Afterwards, I got around to working with a different data set that was more straightforward and easy to work with.The data was on the infection prevalence of ‘Ca. Anaplasma camelii’ in camels and camel keds evaluated in different seasons within a year…")]),e._v(" "),r("p",[e._v("To read about the errors that Melvin got and what she learned from them, read her blog "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/blog/melvin-datapackage-blog/",target:"_blank",rel:"noopener noreferrer"}},[e._v("here"),r("OutboundLink")],1),e._v(".")]),e._v(" "),r("hr"),e._v(" "),r("p",[e._v("You can read all the Frictionless Data Fellows’ blogs on the dedicated website: "),r("a",{attrs:{href:"https://fellows.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://fellows.frictionlessdata.io/"),r("OutboundLink")],1)])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/58.389b1eba.js b/assets/js/58.2b3bd3bf.js similarity index 99% rename from assets/js/58.389b1eba.js rename to assets/js/58.2b3bd3bf.js index e068d6539..8a287c6c6 100644 --- a/assets/js/58.389b1eba.js +++ b/assets/js/58.2b3bd3bf.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[58],{555:function(a,t,e){"use strict";e.r(t);var r=e(29),s=Object(r.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("p",[a._v("FAQs and best practice patterns for publishing data packages.")]),a._v(" "),e("p",[a._v("Complete specifications are available at "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("specs/data-package"),e("OutboundLink")],1),a._v(".")]),a._v(" "),e("h2",{attrs:{id:"data-package-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data-package-name"}},[a._v("#")]),a._v(" Data Package Name")]),a._v(" "),e("p",[a._v("The Data Package name is used in the "),e("code",[a._v("name")]),a._v(" field of the "),e("code",[a._v("datapackage.json")]),a._v(".")]),a._v(" "),e("p",[e("em",[a._v("This name is also frequently used for the folder/directory in which the Data Package is stored.")])]),a._v(" "),e("p",[a._v("As per the Data Package spec The name SHOULD be:")]),a._v(" "),e("ul",[e("li",[a._v("lower-case")]),a._v(" "),e("li",[a._v("use ‘-’ for word separators")]),a._v(" "),e("li",[a._v("reasonably concise (3-4 words)")])]),a._v(" "),e("p",[e("strong",[a._v("Naming conventions")])]),a._v(" "),e("p",[a._v("For country specific datasets:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("{topic} # e.g. gdp\n{topic}-{2-digit-iso} # e.g. gdp-us\n")])])]),e("p",[a._v("For time series data:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("[...-]year\n[...-]quarter\n[...-]month\n[...-]day\n")])])]),e("hr"),a._v(" "),e("h2",{attrs:{id:"resource-and-file-names"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#resource-and-file-names"}},[a._v("#")]),a._v(" Resource and File Names")]),a._v(" "),e("p",[a._v("Similar to Data Package Names:")]),a._v(" "),e("ul",[e("li",[a._v("lower-case")]),a._v(" "),e("li",[a._v("use ‘-’ for word separators")])]),a._v(" "),e("p",[a._v("Resource names SHOULD, usually, be the same as the name of the associated file on disk but without the file extension. e.g.")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("gdp-quarterly # resource name\ngdp-quarterly.csv # on disk\n")])])]),e("p",[a._v("Naming conventions of files follow that for data packages in terms of country or time series facets.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"descriptor-datapackage-json"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#descriptor-datapackage-json"}},[a._v("#")]),a._v(" Descriptor "),e("code",[a._v("datapackage.json")])]),a._v(" "),e("h3",{attrs:{id:"alignment"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#alignment"}},[a._v("#")]),a._v(" Alignment")]),a._v(" "),e("p",[a._v("With JSON, data is structured in a nested way through curly and squared brackets. Though the alignment of these structures is not relevant for computer programs, it makes it easier for the human reader if they are properly aligned.")]),a._v(" "),e("p",[a._v("Good alignment:")]),a._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"corruption-perceptions-index"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"title"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Corruption Perceptions Index (CPI)"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"sources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Transparency International"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"web"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"http://www.transparency.org/research/cpi/overview"')]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n...\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),e("p",[a._v("Bad alignment:")]),a._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"corruption-perceptions-index"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"title"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Corruption Perceptions Index (CPI)"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"sources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Transparency International"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"web"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"http://www.transparency.org/research/cpi/overview"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n...\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),e("p",[a._v("Please make sure to have your "),e("code",[a._v("datapackage.json")]),a._v(" well structured to ease the understanding of your Data Package content. The "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Online DataPackage.json Creator"),e("OutboundLink")],1),a._v(" can help you create the general structure.")]),a._v(" "),e("h3",{attrs:{id:"contributors-fields"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#contributors-fields"}},[a._v("#")]),a._v(" Contributors fields")]),a._v(" "),e("p",[a._v("Add the ‘contributors’ field (original author of the package - see "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("specs/data-package"),e("OutboundLink")],1),a._v(" if you wish to keep the credits for the package.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"data-package-folder-names-and-structure"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data-package-folder-names-and-structure"}},[a._v("#")]),a._v(" Data Package Folder Names and Structure")]),a._v(" "),e("p",[a._v("It is standard practice to use the Data Package name (from the "),e("code",[a._v("datapackage.json")]),a._v(") for the name of the folder/directory in which the Data Package is kept.")]),a._v(" "),e("p",[a._v("If storing in e.g. git(hub) this would also be the the name of the repository.")]),a._v(" "),e("p",[a._v("If you include scripts allowing to automate the data extraction process, these should be stored in a "),e("code",[a._v("script")]),a._v(" folder/directory.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"readme"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#readme"}},[a._v("#")]),a._v(" README")]),a._v(" "),e("p",[a._v("A README is a text file giving (human-readable) information about your dataset.")]),a._v(" "),e("p",[a._v("Data Packages SHOULD have a README.")]),a._v(" "),e("h3",{attrs:{id:"formatting"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#formatting"}},[a._v("#")]),a._v(" Formatting")]),a._v(" "),e("p",[a._v("The README SHOULD be a plain text file (no word or rich text etc) and SHOULD use markdown to allow for formatting")]),a._v(" "),e("h3",{attrs:{id:"file-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#file-name"}},[a._v("#")]),a._v(" File Name")]),a._v(" "),e("p",[a._v("If markdown is used the file SHOULD be named "),e("code",[a._v("README.md")]),a._v(" and otherwise SHOULD be named "),e("code",[a._v("README.txt")]),a._v(".")]),a._v(" "),e("h3",{attrs:{id:"sections"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#sections"}},[a._v("#")]),a._v(" Sections")]),a._v(" "),e("p",[a._v("You can include anything you like in your README. It is standard practice to include some (if possible all) of the following sections: "),e("strong",[a._v("Introduction, Data, Preparation, License")]),a._v(".")]),a._v(" "),e("p",[a._v("We SHOULD NOT include the title of the Data Package at the top of the README.")]),a._v(" "),e("p",[a._v("Each section other than the introduction should be headed with its name using level 2 heading in markdown e.g. for the data section you would have the following markdown in your README:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("## Data\n")])])]),e("h4",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[a._v("#")]),a._v(" Introduction")]),a._v(" "),e("p",[a._v("Start with a short description of the dataset (the first sentence and first paragraph should be extractable to provide short standalone descriptions).")]),a._v(" "),e("p",[a._v("Unlike other sections "),e("strong",[a._v("this section SHOULD NOT have a heading")]),a._v(" as it starts the README. (i.e. you do not need the heading "),e("code",[a._v("## Introduction")])]),a._v(" "),e("h4",{attrs:{id:"data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data"}},[a._v("#")]),a._v(" Data")]),a._v(" "),e("p",[a._v("Put specific information about the data in a Data section. This can be things like information about the source of the data, the specific structure of the data, missing values etc.")]),a._v(" "),e("h4",{attrs:{id:"preparation"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#preparation"}},[a._v("#")]),a._v(" Preparation")]),a._v(" "),e("p",[a._v("Put information on preparing the data in a Preparation section. In particular, any instructions about how to run any preparation and processing scripts to generate the data should go here.")]),a._v(" "),e("h4",{attrs:{id:"license"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#license"}},[a._v("#")]),a._v(" License")]),a._v(" "),e("p",[a._v("Put additional information on the permissions and licensing of the data in the Data Package in the License section.")]),a._v(" "),e("p",[a._v("Since licensing information is often not clear from the data producers, the guideline here is to license the Data Package under the Public Domain Dedication and License, and then to add any relevant information or disclaimers regarding the source data.")]),a._v(" "),e("p",[a._v("See, for example:")]),a._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"http://datahub.io/core/corruption-perceptions-index#readme",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/corruption-perceptions-index#readme"),e("OutboundLink")],1)]),a._v(" "),e("li",[e("a",{attrs:{href:"http://datahub.io/core/geo-nuts-administrative-boundaries#readme",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/geo-nuts-administrative-boundaries#readme"),e("OutboundLink")],1)])]),a._v(" "),e("p",[a._v("See also the following thread "),e("a",{attrs:{href:"https://discuss.okfn.org/t/copyright-on-data-sources/189",target:"_blank",rel:"noopener noreferrer"}},[a._v("https://discuss.okfn.org/t/copyright-on-data-sources/189"),e("OutboundLink")],1)]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"validate-and-preview-your-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validate-and-preview-your-data-package"}},[a._v("#")]),a._v(" Validate and preview your Data Package")]),a._v(" "),e("p",[a._v("Use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),e("OutboundLink")],1),a._v(" to check that your "),e("code",[a._v("datapackage.json")]),a._v(" and Data Package are good to go. Simply drop the URL to your "),e("code",[a._v("datapackage.json")]),a._v(" file in the input box, or upload from a local source, and press "),e("code",[a._v("Validate")]),a._v(". If everything is fine, "),e("code",[a._v("Status: Valid")]),a._v(" is returned.")]),a._v(" "),e("p",[a._v("Then use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("Online Data Package viewer app"),e("OutboundLink")],1),a._v(" to have a preview of your Data Package.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"examples"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#examples"}},[a._v("#")]),a._v(" Examples")]),a._v(" "),e("p",[a._v("For examples of well-structured Data Package see:")]),a._v(" "),e("ul",[e("li",[a._v("For tabular data: "),e("a",{attrs:{href:"http://datahub.io/core/corruption-perceptions-index",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/corruption-perceptions-index"),e("OutboundLink")],1)]),a._v(" "),e("li",[a._v("For geospatial data: "),e("a",{attrs:{href:"http://datahub.io/core/geo-nuts-administrative-boundaries",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/geo-nuts-administrative-boundaries"),e("OutboundLink")],1)])]),a._v(" "),e("p",[a._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),e("a",{attrs:{href:"/tag/field-guide"}},[a._v("Frictionless Data Field Guide")]),a._v(".")])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[58],{556:function(a,t,e){"use strict";e.r(t);var r=e(29),s=Object(r.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("p",[a._v("FAQs and best practice patterns for publishing data packages.")]),a._v(" "),e("p",[a._v("Complete specifications are available at "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("specs/data-package"),e("OutboundLink")],1),a._v(".")]),a._v(" "),e("h2",{attrs:{id:"data-package-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data-package-name"}},[a._v("#")]),a._v(" Data Package Name")]),a._v(" "),e("p",[a._v("The Data Package name is used in the "),e("code",[a._v("name")]),a._v(" field of the "),e("code",[a._v("datapackage.json")]),a._v(".")]),a._v(" "),e("p",[e("em",[a._v("This name is also frequently used for the folder/directory in which the Data Package is stored.")])]),a._v(" "),e("p",[a._v("As per the Data Package spec The name SHOULD be:")]),a._v(" "),e("ul",[e("li",[a._v("lower-case")]),a._v(" "),e("li",[a._v("use ‘-’ for word separators")]),a._v(" "),e("li",[a._v("reasonably concise (3-4 words)")])]),a._v(" "),e("p",[e("strong",[a._v("Naming conventions")])]),a._v(" "),e("p",[a._v("For country specific datasets:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("{topic} # e.g. gdp\n{topic}-{2-digit-iso} # e.g. gdp-us\n")])])]),e("p",[a._v("For time series data:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("[...-]year\n[...-]quarter\n[...-]month\n[...-]day\n")])])]),e("hr"),a._v(" "),e("h2",{attrs:{id:"resource-and-file-names"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#resource-and-file-names"}},[a._v("#")]),a._v(" Resource and File Names")]),a._v(" "),e("p",[a._v("Similar to Data Package Names:")]),a._v(" "),e("ul",[e("li",[a._v("lower-case")]),a._v(" "),e("li",[a._v("use ‘-’ for word separators")])]),a._v(" "),e("p",[a._v("Resource names SHOULD, usually, be the same as the name of the associated file on disk but without the file extension. e.g.")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("gdp-quarterly # resource name\ngdp-quarterly.csv # on disk\n")])])]),e("p",[a._v("Naming conventions of files follow that for data packages in terms of country or time series facets.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"descriptor-datapackage-json"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#descriptor-datapackage-json"}},[a._v("#")]),a._v(" Descriptor "),e("code",[a._v("datapackage.json")])]),a._v(" "),e("h3",{attrs:{id:"alignment"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#alignment"}},[a._v("#")]),a._v(" Alignment")]),a._v(" "),e("p",[a._v("With JSON, data is structured in a nested way through curly and squared brackets. Though the alignment of these structures is not relevant for computer programs, it makes it easier for the human reader if they are properly aligned.")]),a._v(" "),e("p",[a._v("Good alignment:")]),a._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"corruption-perceptions-index"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"title"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Corruption Perceptions Index (CPI)"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"sources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Transparency International"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"web"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"http://www.transparency.org/research/cpi/overview"')]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n...\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),e("p",[a._v("Bad alignment:")]),a._v(" "),e("div",{staticClass:"language-json extra-class"},[e("pre",{pre:!0,attrs:{class:"language-json"}},[e("code",[e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"corruption-perceptions-index"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"title"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Corruption Perceptions Index (CPI)"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"sources"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"Transparency International"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token property"}},[a._v('"web"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[a._v('"http://www.transparency.org/research/cpi/overview"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n...\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),e("p",[a._v("Please make sure to have your "),e("code",[a._v("datapackage.json")]),a._v(" well structured to ease the understanding of your Data Package content. The "),e("a",{attrs:{href:"https://create.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[a._v("Online DataPackage.json Creator"),e("OutboundLink")],1),a._v(" can help you create the general structure.")]),a._v(" "),e("h3",{attrs:{id:"contributors-fields"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#contributors-fields"}},[a._v("#")]),a._v(" Contributors fields")]),a._v(" "),e("p",[a._v("Add the ‘contributors’ field (original author of the package - see "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[a._v("specs/data-package"),e("OutboundLink")],1),a._v(" if you wish to keep the credits for the package.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"data-package-folder-names-and-structure"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data-package-folder-names-and-structure"}},[a._v("#")]),a._v(" Data Package Folder Names and Structure")]),a._v(" "),e("p",[a._v("It is standard practice to use the Data Package name (from the "),e("code",[a._v("datapackage.json")]),a._v(") for the name of the folder/directory in which the Data Package is kept.")]),a._v(" "),e("p",[a._v("If storing in e.g. git(hub) this would also be the the name of the repository.")]),a._v(" "),e("p",[a._v("If you include scripts allowing to automate the data extraction process, these should be stored in a "),e("code",[a._v("script")]),a._v(" folder/directory.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"readme"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#readme"}},[a._v("#")]),a._v(" README")]),a._v(" "),e("p",[a._v("A README is a text file giving (human-readable) information about your dataset.")]),a._v(" "),e("p",[a._v("Data Packages SHOULD have a README.")]),a._v(" "),e("h3",{attrs:{id:"formatting"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#formatting"}},[a._v("#")]),a._v(" Formatting")]),a._v(" "),e("p",[a._v("The README SHOULD be a plain text file (no word or rich text etc) and SHOULD use markdown to allow for formatting")]),a._v(" "),e("h3",{attrs:{id:"file-name"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#file-name"}},[a._v("#")]),a._v(" File Name")]),a._v(" "),e("p",[a._v("If markdown is used the file SHOULD be named "),e("code",[a._v("README.md")]),a._v(" and otherwise SHOULD be named "),e("code",[a._v("README.txt")]),a._v(".")]),a._v(" "),e("h3",{attrs:{id:"sections"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#sections"}},[a._v("#")]),a._v(" Sections")]),a._v(" "),e("p",[a._v("You can include anything you like in your README. It is standard practice to include some (if possible all) of the following sections: "),e("strong",[a._v("Introduction, Data, Preparation, License")]),a._v(".")]),a._v(" "),e("p",[a._v("We SHOULD NOT include the title of the Data Package at the top of the README.")]),a._v(" "),e("p",[a._v("Each section other than the introduction should be headed with its name using level 2 heading in markdown e.g. for the data section you would have the following markdown in your README:")]),a._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[a._v("## Data\n")])])]),e("h4",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[a._v("#")]),a._v(" Introduction")]),a._v(" "),e("p",[a._v("Start with a short description of the dataset (the first sentence and first paragraph should be extractable to provide short standalone descriptions).")]),a._v(" "),e("p",[a._v("Unlike other sections "),e("strong",[a._v("this section SHOULD NOT have a heading")]),a._v(" as it starts the README. (i.e. you do not need the heading "),e("code",[a._v("## Introduction")])]),a._v(" "),e("h4",{attrs:{id:"data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#data"}},[a._v("#")]),a._v(" Data")]),a._v(" "),e("p",[a._v("Put specific information about the data in a Data section. This can be things like information about the source of the data, the specific structure of the data, missing values etc.")]),a._v(" "),e("h4",{attrs:{id:"preparation"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#preparation"}},[a._v("#")]),a._v(" Preparation")]),a._v(" "),e("p",[a._v("Put information on preparing the data in a Preparation section. In particular, any instructions about how to run any preparation and processing scripts to generate the data should go here.")]),a._v(" "),e("h4",{attrs:{id:"license"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#license"}},[a._v("#")]),a._v(" License")]),a._v(" "),e("p",[a._v("Put additional information on the permissions and licensing of the data in the Data Package in the License section.")]),a._v(" "),e("p",[a._v("Since licensing information is often not clear from the data producers, the guideline here is to license the Data Package under the Public Domain Dedication and License, and then to add any relevant information or disclaimers regarding the source data.")]),a._v(" "),e("p",[a._v("See, for example:")]),a._v(" "),e("ul",[e("li",[e("a",{attrs:{href:"http://datahub.io/core/corruption-perceptions-index#readme",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/corruption-perceptions-index#readme"),e("OutboundLink")],1)]),a._v(" "),e("li",[e("a",{attrs:{href:"http://datahub.io/core/geo-nuts-administrative-boundaries#readme",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/geo-nuts-administrative-boundaries#readme"),e("OutboundLink")],1)])]),a._v(" "),e("p",[a._v("See also the following thread "),e("a",{attrs:{href:"https://discuss.okfn.org/t/copyright-on-data-sources/189",target:"_blank",rel:"noopener noreferrer"}},[a._v("https://discuss.okfn.org/t/copyright-on-data-sources/189"),e("OutboundLink")],1)]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"validate-and-preview-your-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validate-and-preview-your-data-package"}},[a._v("#")]),a._v(" Validate and preview your Data Package")]),a._v(" "),e("p",[a._v("Use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package Creator"),e("OutboundLink")],1),a._v(" to check that your "),e("code",[a._v("datapackage.json")]),a._v(" and Data Package are good to go. Simply drop the URL to your "),e("code",[a._v("datapackage.json")]),a._v(" file in the input box, or upload from a local source, and press "),e("code",[a._v("Validate")]),a._v(". If everything is fine, "),e("code",[a._v("Status: Valid")]),a._v(" is returned.")]),a._v(" "),e("p",[a._v("Then use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("Online Data Package viewer app"),e("OutboundLink")],1),a._v(" to have a preview of your Data Package.")]),a._v(" "),e("hr"),a._v(" "),e("h2",{attrs:{id:"examples"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#examples"}},[a._v("#")]),a._v(" Examples")]),a._v(" "),e("p",[a._v("For examples of well-structured Data Package see:")]),a._v(" "),e("ul",[e("li",[a._v("For tabular data: "),e("a",{attrs:{href:"http://datahub.io/core/corruption-perceptions-index",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/corruption-perceptions-index"),e("OutboundLink")],1)]),a._v(" "),e("li",[a._v("For geospatial data: "),e("a",{attrs:{href:"http://datahub.io/core/geo-nuts-administrative-boundaries",target:"_blank",rel:"noopener noreferrer"}},[a._v("http://datahub.io/core/geo-nuts-administrative-boundaries"),e("OutboundLink")],1)])]),a._v(" "),e("p",[a._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),e("a",{attrs:{href:"/tag/field-guide"}},[a._v("Frictionless Data Field Guide")]),a._v(".")])])}),[],!1,null,null,null);t.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/59.cc1721f0.js b/assets/js/59.3359464c.js similarity index 98% rename from assets/js/59.cc1721f0.js rename to assets/js/59.3359464c.js index e34de85d0..0cb446b13 100644 --- a/assets/js/59.cc1721f0.js +++ b/assets/js/59.3359464c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[59],{557:function(a,e,t){"use strict";t.r(e);var o=t(29),s=Object(o.a)({},(function(){var a=this,e=a.$createElement,t=a._self._c||e;return t("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[t("p",[a._v("Publishing your Geodata as Data Packages is very easy.")]),a._v(" "),t("p",[a._v("You have two options for publishing your geodata:")]),a._v(" "),t("ul",[t("li",[t("strong",[a._v("Geo Data Package")]),a._v(" (Recommended). This is a basic Data Package with the requirement that data be in GeoJSON and with a few special additions to the metadata for geodata. See the next section for instructions on how to do this.")]),a._v(" "),t("li",[t("strong",[a._v("Generic Data Package")]),a._v(". This allows you to publish geodata in any kind of format (KML, Shapefiles, Spatialite etc). If you choose this option you will want to follow the standard "),t("RouterLink",{attrs:{to:"/blog/2016/07/21/publish-any/"}},[a._v("instructions for packaging any kind of data as a Data Package")]),a._v(".")],1)]),a._v(" "),t("p",[a._v("We recommend Geo Data Package if that is possible as it makes it much easier for you to use 3rd party tools with your Data Package. For example, the "),t("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("datapackage viewer"),t("OutboundLink")],1),a._v(" on this site will automatically preview a Geo Data Package.")]),a._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),t("p",[t("em",[a._v("Note: this document focuses on "),t("em",[a._v("vector")]),a._v(" geodata – i.e. points, lines polygons etc (not imagery or raster data).")])])]),a._v(" "),t("h2",{attrs:{id:"geo-data-packages"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#geo-data-packages"}},[a._v("#")]),a._v(" Geo Data Packages")]),a._v(" "),t("h3",{attrs:{id:"examples"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#examples"}},[a._v("#")]),a._v(" Examples")]),a._v(" "),t("h4",{attrs:{id:"traffic-signs-of-hansbeke-belgium"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#traffic-signs-of-hansbeke-belgium"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://github.com/peterdesmet/traffic-signs-hansbeke",target:"_blank",rel:"noopener noreferrer"}},[a._v("Traffic signs of Hansbeke, Belgium"),t("OutboundLink")],1)]),a._v(" "),t("p",[a._v("Example of using "),t("code",[a._v("point")]),a._v(" geometries with described properties in real world situation.")]),a._v(" "),t("p",[t("a",{attrs:{href:"http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fpeterdesmet%2Ftraffic-signs-hansbeke",target:"_blank",rel:"noopener noreferrer"}},[a._v("View it with the Data Package Viewer"),t("OutboundLink")],1),a._v("("),t("em",[a._v("deprecated")]),a._v(")")]),a._v(" "),t("h4",{attrs:{id:"geojson-example-on-datahub"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#geojson-example-on-datahub"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://datahub.io/examples/geojson-tutorial",target:"_blank",rel:"noopener noreferrer"}},[a._v("GeoJSON example on DataHub"),t("OutboundLink")],1)]),a._v(" "),t("h4",{attrs:{id:"see-more-geo-data-packages-in-the-example-data-packages-github-repository"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#see-more-geo-data-packages-in-the-example-data-packages-github-repository"}},[a._v("#")]),a._v(" See more Geo Data Packages in the "),t("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages",target:"_blank",rel:"noopener noreferrer"}},[a._v("example data packages"),t("OutboundLink")],1),a._v(" GitHub repository.")]),a._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),t("p",[a._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our "),t("a",{attrs:{href:"/introduction"}},[a._v("Introduction")]),a._v(".")])])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[59],{555:function(a,e,t){"use strict";t.r(e);var o=t(29),s=Object(o.a)({},(function(){var a=this,e=a.$createElement,t=a._self._c||e;return t("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[t("p",[a._v("Publishing your Geodata as Data Packages is very easy.")]),a._v(" "),t("p",[a._v("You have two options for publishing your geodata:")]),a._v(" "),t("ul",[t("li",[t("strong",[a._v("Geo Data Package")]),a._v(" (Recommended). This is a basic Data Package with the requirement that data be in GeoJSON and with a few special additions to the metadata for geodata. See the next section for instructions on how to do this.")]),a._v(" "),t("li",[t("strong",[a._v("Generic Data Package")]),a._v(". This allows you to publish geodata in any kind of format (KML, Shapefiles, Spatialite etc). If you choose this option you will want to follow the standard "),t("RouterLink",{attrs:{to:"/blog/2016/07/21/publish-any/"}},[a._v("instructions for packaging any kind of data as a Data Package")]),a._v(".")],1)]),a._v(" "),t("p",[a._v("We recommend Geo Data Package if that is possible as it makes it much easier for you to use 3rd party tools with your Data Package. For example, the "),t("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[a._v("datapackage viewer"),t("OutboundLink")],1),a._v(" on this site will automatically preview a Geo Data Package.")]),a._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),t("p",[t("em",[a._v("Note: this document focuses on "),t("em",[a._v("vector")]),a._v(" geodata – i.e. points, lines polygons etc (not imagery or raster data).")])])]),a._v(" "),t("h2",{attrs:{id:"geo-data-packages"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#geo-data-packages"}},[a._v("#")]),a._v(" Geo Data Packages")]),a._v(" "),t("h3",{attrs:{id:"examples"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#examples"}},[a._v("#")]),a._v(" Examples")]),a._v(" "),t("h4",{attrs:{id:"traffic-signs-of-hansbeke-belgium"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#traffic-signs-of-hansbeke-belgium"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://github.com/peterdesmet/traffic-signs-hansbeke",target:"_blank",rel:"noopener noreferrer"}},[a._v("Traffic signs of Hansbeke, Belgium"),t("OutboundLink")],1)]),a._v(" "),t("p",[a._v("Example of using "),t("code",[a._v("point")]),a._v(" geometries with described properties in real world situation.")]),a._v(" "),t("p",[t("a",{attrs:{href:"http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fpeterdesmet%2Ftraffic-signs-hansbeke",target:"_blank",rel:"noopener noreferrer"}},[a._v("View it with the Data Package Viewer"),t("OutboundLink")],1),a._v("("),t("em",[a._v("deprecated")]),a._v(")")]),a._v(" "),t("h4",{attrs:{id:"geojson-example-on-datahub"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#geojson-example-on-datahub"}},[a._v("#")]),a._v(" "),t("a",{attrs:{href:"https://datahub.io/examples/geojson-tutorial",target:"_blank",rel:"noopener noreferrer"}},[a._v("GeoJSON example on DataHub"),t("OutboundLink")],1)]),a._v(" "),t("h4",{attrs:{id:"see-more-geo-data-packages-in-the-example-data-packages-github-repository"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#see-more-geo-data-packages-in-the-example-data-packages-github-repository"}},[a._v("#")]),a._v(" See more Geo Data Packages in the "),t("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages",target:"_blank",rel:"noopener noreferrer"}},[a._v("example data packages"),t("OutboundLink")],1),a._v(" GitHub repository.")]),a._v(" "),t("div",{staticClass:"custom-block tip"},[t("p",{staticClass:"custom-block-title"},[a._v("TIP")]),a._v(" "),t("p",[a._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our "),t("a",{attrs:{href:"/introduction"}},[a._v("Introduction")]),a._v(".")])])])}),[],!1,null,null,null);e.default=s.exports}}]); \ No newline at end of file diff --git a/assets/js/6.52ce3381.js b/assets/js/6.c19d05ab.js similarity index 95% rename from assets/js/6.52ce3381.js rename to assets/js/6.c19d05ab.js index 6f18446ca..be06e9a62 100644 --- a/assets/js/6.52ce3381.js +++ b/assets/js/6.c19d05ab.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[6],{444:function(e,a,t){e.exports=t.p+"assets/img/figure-1.0b5d5da2.png"},445:function(e,a,t){e.exports=t.p+"assets/img/figure-2.a4cda338.png"},446:function(e,a,t){e.exports=t.p+"assets/img/figure-3.e234c78e.png"},447:function(e,a){e.exports=""},448:function(e,a,t){e.exports=t.p+"assets/img/figure-4.65dd4176.png"},449:function(e,a,t){e.exports=t.p+"assets/img/figure-5.a7c23193.png"},450:function(e,a,t){e.exports=t.p+"assets/img/figure-6.5520e95a.png"},451:function(e,a,t){e.exports=t.p+"assets/img/figure-7.866156c7.png"},589:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,o=e._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("When sharing multiple datasets on a specific subject with a varied audience, it is important to ensure that whoever accesses the data understands the context around it, and can quickly access licensing and other attribution information.")]),e._v(" "),o("p",[e._v("In this section, you will learn how to collate related datasets in one place, and easily create a schema that contains descriptive metadata for your data collection.")]),e._v(" "),o("h2",{attrs:{id:"write-a-table-schema"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#write-a-table-schema"}},[e._v("#")]),e._v(" Write a Table Schema")]),e._v(" "),o("p",[e._v("Simply put, a schema is a blueprint that tells us how your data is structured, and what type of content is to be expected in it. You can think of it as a data dictionary. Having a table schema at hand makes it possible to run more precise validation checks on your data, both at a structural and content level.")]),e._v(" "),o("p",[e._v("For this section, we will use the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"http://datahub.io/core/gdp",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package")]),e._v(" is a format that makes it possible to put your data collection and relevant information that provides context about your data in one container before you share it. All contextual information, such as metadata and your data schema, is published in a JSON file named "),o("em",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package Creator")]),e._v(" is an online service that facilitates the creation and editing of data packages. The service automatically generates a "),o("em",[e._v("datapackage.json")]),e._v(" file for you as you add and edit data that is part of your data collection. We refer to each piece of data in a data collection as a "),o("strong",[e._v("data resource")]),e._v(".")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" loads with dummy data to make it easy to understand how metadata and sample resources help generate the "),o("em",[e._v("datapackage.json")]),e._v(" file. There are three ways in which a user can add data resources on "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[e._v("Provide a hyperlink to your data resource (highly recommended).")])]),e._v(" "),o("p",[e._v("If your data resource is publicly available, like on GitHub or in a data repository, simply obtain the URL and paste it in the "),o("strong",[e._v("Path")]),e._v(" section. To learn how to publish your data resource online, check the publish your dataset section.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Create your data resource within the service.")])]),e._v(" "),o("p",[e._v("If your data resource isn’t published online, you’ll have to define its fields from scratch. Depending on how complex is your data, this can be time consuming, but it’s still easier than creating the descriptor JSON file from scratch.This option is time consuming, as a user has to manually create each field of a data resource. However, this is simpler than learning how to create a JSON file from scratch.")]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[o("strong",[e._v("Load a Data Package")]),e._v(" option")])]),e._v(" "),o("p",[e._v("With this option, you can load a pre-existing "),o("em",[e._v("datapackage.json")]),e._v(" file to view and edit its metadata and resource fields.")]),e._v(" "),o("hr"),e._v(" "),o("p",[e._v("Let’s use our "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/gross-domestic-product-all/data/gdp.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(" dataset, which is publicly available on GitHub.")]),e._v(" "),o("p",[e._v("Obtain a link to the raw CSV file by clicking on the Raw button at the top right corner of the GitHub file preview page, as shown in figure 1 below. The resulting hyperlink looks like "),o("code",[e._v("https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv")])]),e._v(" "),o("figure",[o("img",{attrs:{src:t(444),alt:"Above, raw button highlighted in red"}}),e._v(" "),o("figcaption",[e._v("\n Figure 1: Above, raw button highlighted in red.\n ")])]),e._v(" "),o("p",[e._v("Paste your hyperlink in the "),o("em",[e._v("Path")]),e._v(" section and click on the "),o("em",[e._v("Load")]),e._v(" button. Each column in your table translates to a "),o("em",[e._v("field")]),e._v(". You should be prompted to add all fields identified in your data resource, as in Figure 2 below. Click on the prompt to load the fields.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(445),alt:"annotated in red, a prompt to add all fields inferred from your data resource"}}),e._v(" "),o("figcaption",[e._v("\n Figure 2: annotated in red, a prompt to add all fields inferred from your data resource.\n ")])]),e._v(" "),o("p",[e._v("The page that follows looks like Figure 3 below. Each column from the GDP dataset has been mapped to a "),o("em",[e._v("field")]),e._v(". The data type for each column has been inferred correctly, and we can preview data under each field by hovering over the field name. It is also possible to edit all sections of our data resource’s fields as we can see below.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(446),alt:"all fields inferred from your data resource"}}),e._v(" "),o("figcaption",[e._v("\n Figure 3: all fields inferred from your data resource.\n ")])]),e._v(" "),o("p",[e._v("You can now edit data types and formats as necessary, and optionally add titles and descriptive information to your fields. For example, the data type for our {Year} field should be "),o("em",[o("strong",[e._v("year")])]),e._v(" and not "),o("em",[o("strong",[e._v("integer")])]),e._v(". Our {Value} column has numeric information with decimal places.")]),e._v(" "),o("p",[e._v("By definition, values under the "),o("em",[o("strong",[e._v("integer")])]),e._v(" data type are whole numbers. The "),o("em",[o("strong",[e._v("number")])]),e._v(" data type is more appropriate for the {Value} column. When in doubt about what data type to use, consult the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#types-and-formats",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema data types cheat sheet"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Click on the "),o("img",{attrs:{src:t(447),alt:"settings"}}),e._v(" icon to pick a suitable profile for your data resource. "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/profiles/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s more information about Frictionless Data profiles"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on Add Resource, and repeating the same process as we just did.")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on "),o("strong",[e._v("Add Resource")]),e._v(", and repeating the same process as we just did.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(448),alt:"Prompt to add more data resources"}}),e._v(" "),o("figcaption",[e._v("\n Figure 4: Prompt to add more data resources.\n ")])]),e._v(" "),o("h2",{attrs:{id:"add-your-dataset-s-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#add-your-dataset-s-metadata"}},[e._v("#")]),e._v(" Add your dataset’s metadata")]),e._v(" "),o("p",[e._v("In the previous section, we described metadata for each of our datasets, but we’re still missing metadata for our collection of datasets. You can add it via the "),o("strong",[e._v("Metadata")]),e._v(" section on the left side bar, describing things like the dataset name, description, author, license, etc.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(449),alt:"Add Data Package Metadata "}})]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Profile")]),e._v(" section under metadata allows us to specify what kind of data collection we are packaging.")]),e._v(" "),o("ul",[o("li",[o("p",[o("em",[e._v("Data Package")]),o("br"),e._v("\nThis is the base, more general profile. Use it if your dataset contains resources of mixed formats, like tabular and geographical data. The base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Tabular Data Package")]),o("br"),e._v("\nIf your data contains only tabular resources like CSVs and spreadsheets, use the Tabular Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Fiscal Data Package")]),o("br"),e._v("\nIf your data contains fiscal information like budgets and expenditure data, use the Fiscal Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fiscal Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])])]),e._v(" "),o("p",[e._v("In our example, as we only have a CSV data resource, the "),o("em",[e._v("Tabular Data Package")]),e._v(" profile is the best option.")]),e._v(" "),o("p",[e._v("In the "),o("strong",[e._v("Keywords")]),e._v(" section, you can add any keywords that helps make your data collection more discoverable. For our dataset, we might use the keywords "),o("em",[e._v("GDP, National Accounts, National GDP, Regional GDP")]),e._v(". Other datasets could include the country name, dataset area (e.g. “health” or “environmental”), etc.")]),e._v(" "),o("p",[e._v("Now that we have created a Data Package, we can "),o("strong",[e._v("Validate")]),e._v(" or "),o("strong",[e._v("Download")]),e._v(" it. But first, let’s see what our datapackage.json file looks like. With every addition and modification, the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" has been populating the "),o("em",[e._v("datapackage.json")]),e._v(" file for us. Click on the "),o("strong",[e._v("{···}")]),e._v(" icon to view the "),o("em",[e._v("datapackage.json")]),e._v(" file. As you can see below, any edit we make to the description of the Value field reflects on the JSON file in real time.")]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Validate")]),e._v(" button allows us to confirm whether we chose the correct Profile for our Data Package. The two possible outcomes at this stage are:")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(450),alt:"Data Package is Invalid"}})]),e._v(" "),o("p",[e._v("This message appears when there is some validation error like if we miss some required attribute (e.g. the data package name), or have picked an incorrect profile (e.g. Tabular Data Package with geographical data)… Review the metadata and profiles to find the mistake and try validating again.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(451),alt:"Data Package is Valid"}})]),e._v(" "),o("p",[e._v("All good! This message means that your data package is valid, and we can download it.")]),e._v(" "),o("h2",{attrs:{id:"download-your-data-package"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#download-your-data-package"}},[e._v("#")]),e._v(" Download your Data Package")]),e._v(" "),o("p",[e._v("As we said earlier, the base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file, which contains your data schema and metadata. We call this the descriptor file. You can download your descriptor file by clicking on the "),o("strong",[e._v("Download")]),e._v(" button.")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("If your data resources, like ours, were linked from an online public source, sharing the "),o("em",[e._v("datapackage.json")]),e._v(" file is sufficient, since it contains URLs to your data resources.")])]),e._v(" "),o("li",[o("p",[e._v("If you manually created a data resource and its fields, remember to add all your data resources and the downloaded "),o("em",[e._v("datapackage.json")]),e._v(" file in one folder before sharing it.")])])]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Your final Data Package file directory should look like this:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n dataresource1.csv\n dataresource2.csv\ndatapackage.json\n")])])]),o("ul",[o("li",[o("p",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there is only one: "),o("code",[e._v("data/gdp.csv")]),e._v(" .")])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files.")])])]),e._v(" "),o("p",[e._v("Congratulations! You have now created a schema for your data, and combined it with descriptive metadata and your data collection to create your first data package!")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[6],{442:function(e,a,t){e.exports=t.p+"assets/img/figure-1.0b5d5da2.png"},443:function(e,a,t){e.exports=t.p+"assets/img/figure-2.a4cda338.png"},444:function(e,a,t){e.exports=t.p+"assets/img/figure-3.e234c78e.png"},445:function(e,a){e.exports=""},446:function(e,a,t){e.exports=t.p+"assets/img/figure-4.65dd4176.png"},447:function(e,a,t){e.exports=t.p+"assets/img/figure-5.a7c23193.png"},448:function(e,a,t){e.exports=t.p+"assets/img/figure-6.5520e95a.png"},449:function(e,a,t){e.exports=t.p+"assets/img/figure-7.866156c7.png"},589:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,o=e._self._c||a;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("When sharing multiple datasets on a specific subject with a varied audience, it is important to ensure that whoever accesses the data understands the context around it, and can quickly access licensing and other attribution information.")]),e._v(" "),o("p",[e._v("In this section, you will learn how to collate related datasets in one place, and easily create a schema that contains descriptive metadata for your data collection.")]),e._v(" "),o("h2",{attrs:{id:"write-a-table-schema"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#write-a-table-schema"}},[e._v("#")]),e._v(" Write a Table Schema")]),e._v(" "),o("p",[e._v("Simply put, a schema is a blueprint that tells us how your data is structured, and what type of content is to be expected in it. You can think of it as a data dictionary. Having a table schema at hand makes it possible to run more precise validation checks on your data, both at a structural and content level.")]),e._v(" "),o("p",[e._v("For this section, we will use the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" and "),o("a",{attrs:{href:"http://datahub.io/core/gdp",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package")]),e._v(" is a format that makes it possible to put your data collection and relevant information that provides context about your data in one container before you share it. All contextual information, such as metadata and your data schema, is published in a JSON file named "),o("em",[e._v("datapackage.json")]),e._v(".")]),e._v(" "),o("p",[o("strong",[e._v("Data Package Creator")]),e._v(" is an online service that facilitates the creation and editing of data packages. The service automatically generates a "),o("em",[e._v("datapackage.json")]),e._v(" file for you as you add and edit data that is part of your data collection. We refer to each piece of data in a data collection as a "),o("strong",[e._v("data resource")]),e._v(".")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" loads with dummy data to make it easy to understand how metadata and sample resources help generate the "),o("em",[e._v("datapackage.json")]),e._v(" file. There are three ways in which a user can add data resources on "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(":")]),e._v(" "),o("ol",[o("li",[e._v("Provide a hyperlink to your data resource (highly recommended).")])]),e._v(" "),o("p",[e._v("If your data resource is publicly available, like on GitHub or in a data repository, simply obtain the URL and paste it in the "),o("strong",[e._v("Path")]),e._v(" section. To learn how to publish your data resource online, check the publish your dataset section.")]),e._v(" "),o("ol",{attrs:{start:"2"}},[o("li",[e._v("Create your data resource within the service.")])]),e._v(" "),o("p",[e._v("If your data resource isn’t published online, you’ll have to define its fields from scratch. Depending on how complex is your data, this can be time consuming, but it’s still easier than creating the descriptor JSON file from scratch.This option is time consuming, as a user has to manually create each field of a data resource. However, this is simpler than learning how to create a JSON file from scratch.")]),e._v(" "),o("ol",{attrs:{start:"3"}},[o("li",[o("strong",[e._v("Load a Data Package")]),e._v(" option")])]),e._v(" "),o("p",[e._v("With this option, you can load a pre-existing "),o("em",[e._v("datapackage.json")]),e._v(" file to view and edit its metadata and resource fields.")]),e._v(" "),o("hr"),e._v(" "),o("p",[e._v("Let’s use our "),o("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/gross-domestic-product-all/data/gdp.csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("Gross Domestic Product dataset for all countries (1960 - 2014)"),o("OutboundLink")],1),e._v(" dataset, which is publicly available on GitHub.")]),e._v(" "),o("p",[e._v("Obtain a link to the raw CSV file by clicking on the Raw button at the top right corner of the GitHub file preview page, as shown in figure 1 below. The resulting hyperlink looks like "),o("code",[e._v("https://raw.githubusercontent.com/datasets/continent-codes/master/data/continent-codes.csv")])]),e._v(" "),o("figure",[o("img",{attrs:{src:t(442),alt:"Above, raw button highlighted in red"}}),e._v(" "),o("figcaption",[e._v("\n Figure 1: Above, raw button highlighted in red.\n ")])]),e._v(" "),o("p",[e._v("Paste your hyperlink in the "),o("em",[e._v("Path")]),e._v(" section and click on the "),o("em",[e._v("Load")]),e._v(" button. Each column in your table translates to a "),o("em",[e._v("field")]),e._v(". You should be prompted to add all fields identified in your data resource, as in Figure 2 below. Click on the prompt to load the fields.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(443),alt:"annotated in red, a prompt to add all fields inferred from your data resource"}}),e._v(" "),o("figcaption",[e._v("\n Figure 2: annotated in red, a prompt to add all fields inferred from your data resource.\n ")])]),e._v(" "),o("p",[e._v("The page that follows looks like Figure 3 below. Each column from the GDP dataset has been mapped to a "),o("em",[e._v("field")]),e._v(". The data type for each column has been inferred correctly, and we can preview data under each field by hovering over the field name. It is also possible to edit all sections of our data resource’s fields as we can see below.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(444),alt:"all fields inferred from your data resource"}}),e._v(" "),o("figcaption",[e._v("\n Figure 3: all fields inferred from your data resource.\n ")])]),e._v(" "),o("p",[e._v("You can now edit data types and formats as necessary, and optionally add titles and descriptive information to your fields. For example, the data type for our {Year} field should be "),o("em",[o("strong",[e._v("year")])]),e._v(" and not "),o("em",[o("strong",[e._v("integer")])]),e._v(". Our {Value} column has numeric information with decimal places.")]),e._v(" "),o("p",[e._v("By definition, values under the "),o("em",[o("strong",[e._v("integer")])]),e._v(" data type are whole numbers. The "),o("em",[o("strong",[e._v("number")])]),e._v(" data type is more appropriate for the {Value} column. When in doubt about what data type to use, consult the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/#types-and-formats",target:"_blank",rel:"noopener noreferrer"}},[e._v("Table Schema data types cheat sheet"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Click on the "),o("img",{attrs:{src:t(445),alt:"settings"}}),e._v(" icon to pick a suitable profile for your data resource. "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/profiles/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Here’s more information about Frictionless Data profiles"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on Add Resource, and repeating the same process as we just did.")]),e._v(" "),o("p",[e._v("If your dataset has other data resources, add them by scrolling to the bottom of the page, clicking on "),o("strong",[e._v("Add Resource")]),e._v(", and repeating the same process as we just did.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(446),alt:"Prompt to add more data resources"}}),e._v(" "),o("figcaption",[e._v("\n Figure 4: Prompt to add more data resources.\n ")])]),e._v(" "),o("h2",{attrs:{id:"add-your-dataset-s-metadata"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#add-your-dataset-s-metadata"}},[e._v("#")]),e._v(" Add your dataset’s metadata")]),e._v(" "),o("p",[e._v("In the previous section, we described metadata for each of our datasets, but we’re still missing metadata for our collection of datasets. You can add it via the "),o("strong",[e._v("Metadata")]),e._v(" section on the left side bar, describing things like the dataset name, description, author, license, etc.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(447),alt:"Add Data Package Metadata "}})]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Profile")]),e._v(" section under metadata allows us to specify what kind of data collection we are packaging.")]),e._v(" "),o("ul",[o("li",[o("p",[o("em",[e._v("Data Package")]),o("br"),e._v("\nThis is the base, more general profile. Use it if your dataset contains resources of mixed formats, like tabular and geographical data. The base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Tabular Data Package")]),o("br"),e._v("\nIf your data contains only tabular resources like CSVs and spreadsheets, use the Tabular Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Tabular Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])]),e._v(" "),o("li",[o("p",[o("em",[e._v("Fiscal Data Package")]),o("br"),e._v("\nIf your data contains fiscal information like budgets and expenditure data, use the Fiscal Data Package profile. See the "),o("a",{attrs:{href:"https://specs.frictionlessdata.io/fiscal-data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Fiscal Data Package specification"),o("OutboundLink")],1),e._v(" for more information.")])])]),e._v(" "),o("p",[e._v("In our example, as we only have a CSV data resource, the "),o("em",[e._v("Tabular Data Package")]),e._v(" profile is the best option.")]),e._v(" "),o("p",[e._v("In the "),o("strong",[e._v("Keywords")]),e._v(" section, you can add any keywords that helps make your data collection more discoverable. For our dataset, we might use the keywords "),o("em",[e._v("GDP, National Accounts, National GDP, Regional GDP")]),e._v(". Other datasets could include the country name, dataset area (e.g. “health” or “environmental”), etc.")]),e._v(" "),o("p",[e._v("Now that we have created a Data Package, we can "),o("strong",[e._v("Validate")]),e._v(" or "),o("strong",[e._v("Download")]),e._v(" it. But first, let’s see what our datapackage.json file looks like. With every addition and modification, the "),o("a",{attrs:{href:"https://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Creator"),o("OutboundLink")],1),e._v(" has been populating the "),o("em",[e._v("datapackage.json")]),e._v(" file for us. Click on the "),o("strong",[e._v("{···}")]),e._v(" icon to view the "),o("em",[e._v("datapackage.json")]),e._v(" file. As you can see below, any edit we make to the description of the Value field reflects on the JSON file in real time.")]),e._v(" "),o("p",[e._v("The "),o("strong",[e._v("Validate")]),e._v(" button allows us to confirm whether we chose the correct Profile for our Data Package. The two possible outcomes at this stage are:")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(448),alt:"Data Package is Invalid"}})]),e._v(" "),o("p",[e._v("This message appears when there is some validation error like if we miss some required attribute (e.g. the data package name), or have picked an incorrect profile (e.g. Tabular Data Package with geographical data)… Review the metadata and profiles to find the mistake and try validating again.")]),e._v(" "),o("figure",[o("img",{attrs:{src:t(449),alt:"Data Package is Valid"}})]),e._v(" "),o("p",[e._v("All good! This message means that your data package is valid, and we can download it.")]),e._v(" "),o("h2",{attrs:{id:"download-your-data-package"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#download-your-data-package"}},[e._v("#")]),e._v(" Download your Data Package")]),e._v(" "),o("p",[e._v("As we said earlier, the base requirement for a valid Data Package profile is the "),o("em",[e._v("datapackage.json")]),e._v(" file, which contains your data schema and metadata. We call this the descriptor file. You can download your descriptor file by clicking on the "),o("strong",[e._v("Download")]),e._v(" button.")]),e._v(" "),o("ul",[o("li",[o("p",[e._v("If your data resources, like ours, were linked from an online public source, sharing the "),o("em",[e._v("datapackage.json")]),e._v(" file is sufficient, since it contains URLs to your data resources.")])]),e._v(" "),o("li",[o("p",[e._v("If you manually created a data resource and its fields, remember to add all your data resources and the downloaded "),o("em",[e._v("datapackage.json")]),e._v(" file in one folder before sharing it.")])])]),e._v(" "),o("p",[e._v("The way to structure your dataset depends on your data, and what extra artifacts it contains (e.g. images, scripts, reports, etc.). In this section, we’ll show a complete example with:")]),e._v(" "),o("ul",[o("li",[o("strong",[e._v("Data files")]),e._v(": The files with the actual data (e.g. CSV, XLS, GeoJSON, …)")]),e._v(" "),o("li",[o("strong",[e._v("Documentation")]),e._v(": How was the data collected, any caveats, how to update it, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Metadata")]),e._v(": Where the data comes from, what’s in the files, what’s their source and license, etc.")]),e._v(" "),o("li",[o("strong",[e._v("Scripts")]),e._v(": Software scripts that were used to generate, update, or modify the data.")])]),e._v(" "),o("p",[e._v("Your final Data Package file directory should look like this:")]),e._v(" "),o("div",{staticClass:"language- extra-class"},[o("pre",{pre:!0,attrs:{class:"language-text"}},[o("code",[e._v("data/\n dataresource1.csv\n dataresource2.csv\ndatapackage.json\n")])])]),o("ul",[o("li",[o("p",[o("strong",[e._v("data/")]),e._v(": All data files are contained in this folder. In our example, there is only one: "),o("code",[e._v("data/gdp.csv")]),e._v(" .")])]),e._v(" "),o("li",[o("p",[o("strong",[e._v("datapackage.json")]),e._v(": This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files.")])])]),e._v(" "),o("p",[e._v("Congratulations! You have now created a schema for your data, and combined it with descriptive metadata and your data collection to create your first data package!")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/60.00bd78b5.js b/assets/js/60.385842bd.js similarity index 99% rename from assets/js/60.00bd78b5.js rename to assets/js/60.385842bd.js index 53572c761..8f419b79c 100644 --- a/assets/js/60.00bd78b5.js +++ b/assets/js/60.385842bd.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[60],{556:function(a,t,s){"use strict";s.r(t);var e=s(29),n=Object(e.a)({},(function(){var a=this,t=a.$createElement,s=a._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[s("p",[a._v("This tutorial will show you how to install the Python library for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),a._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[a._v("#")]),a._v(" Setup")]),a._v(" "),s("p",[a._v("For this tutorial, we will need the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package library"),s("OutboundLink")],1),a._v(" ("),s("a",{attrs:{href:"https://pypi.python.org/pypi/datapackage",target:"_blank",rel:"noopener noreferrer"}},[a._v("PyPI"),s("OutboundLink")],1),a._v(") library.")]),a._v(" "),s("div",{staticClass:"language-bash extra-class"},[s("pre",{pre:!0,attrs:{class:"language-bash"}},[s("code",[a._v("pip "),s("span",{pre:!0,attrs:{class:"token function"}},[a._v("install")]),a._v(" datapackage\n")])])]),s("h2",{attrs:{id:"creating-basic-metadata"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#creating-basic-metadata"}},[a._v("#")]),a._v(" Creating basic metadata")]),a._v(" "),s("p",[a._v("You can start using the library by importing "),s("code",[a._v("datapackage")]),a._v(".")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[a._v("import")]),a._v(" datapackage\n")])])]),s("p",[a._v("The Package() class allows you to work with data packages. Use it to create a blank datapackage called package like so:")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("Package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\n")])])]),s("p",[a._v("You can then add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required "),s("code",[a._v("name")]),a._v(" key as well as a human-readable "),s("code",[a._v("title")]),a._v(" key. For the keys supported, please consult the full "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#metadata",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package spec"),s("OutboundLink")],1),a._v(". Note, we will be creating the required "),s("code",[a._v("resources")]),a._v(" key further down below.")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'name'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'period-table'")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'title'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'Periodic Table'")]),a._v("\n")])])]),s("p",[a._v("To view your descriptor file at any time, simply type")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("h2",{attrs:{id:"inferring-a-csv-schema"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inferring-a-csv-schema"}},[a._v("#")]),a._v(" Inferring a CSV Schema")]),a._v(" "),s("p",[a._v("Let’s say we have a file called "),s("code",[a._v("data.csv")]),a._v(" ("),s("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[a._v("download"),s("OutboundLink")],1),a._v(") in our working directory that looks like this:")]),a._v(" "),s("table",[s("thead",[s("tr",[s("th",[a._v("atomic number")]),a._v(" "),s("th",[a._v("symbol")]),a._v(" "),s("th",[a._v("name")]),a._v(" "),s("th",[a._v("atomic mass")]),a._v(" "),s("th",[a._v("metal or nonmetal?")])])]),a._v(" "),s("tbody",[s("tr",[s("td",[a._v("1")]),a._v(" "),s("td",[a._v("H")]),a._v(" "),s("td",[a._v("Hydrogen")]),a._v(" "),s("td",[a._v("1.00794")]),a._v(" "),s("td",[a._v("nonmetal")])]),a._v(" "),s("tr",[s("td",[a._v("2")]),a._v(" "),s("td",[a._v("He")]),a._v(" "),s("td",[a._v("Helium")]),a._v(" "),s("td",[a._v("4.002602")]),a._v(" "),s("td",[a._v("noble gas")])]),a._v(" "),s("tr",[s("td",[a._v("3")]),a._v(" "),s("td",[a._v("Li")]),a._v(" "),s("td",[a._v("Lithium")]),a._v(" "),s("td",[a._v("6.941")]),a._v(" "),s("td",[a._v("alkali metal")])]),a._v(" "),s("tr",[s("td",[a._v("4")]),a._v(" "),s("td",[a._v("Be")]),a._v(" "),s("td",[a._v("Beryllium")]),a._v(" "),s("td",[a._v("9.012182")]),a._v(" "),s("td",[a._v("alkaline earth metal")])]),a._v(" "),s("tr",[s("td",[a._v("5")]),a._v(" "),s("td",[a._v("B")]),a._v(" "),s("td",[a._v("Boron")]),a._v(" "),s("td",[a._v("10.811")]),a._v(" "),s("td",[a._v("metalloid")])])])]),a._v(" "),s("p",[a._v("We can extrapolate our CSV’s schema by using "),s("code",[a._v("infer")]),a._v(" from the Table Schema library. The "),s("code",[a._v("infer")]),a._v(" function checks a small subset of your dataset and summarizes expected datatypes against each column, etc. To infer a schema for our dataset and view it, we will simply run")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("infer"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'periodic-table/data.csv'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("p",[a._v("Where there’s need to infer a schema for more than one tabular data resource, use the glob pattern "),s("code",[a._v("**/*.csv")]),a._v(" instead to infer a schema:")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("infer"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'**/*.csv'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("p",[a._v("We are now ready to save our "),s("code",[a._v("datapackage.json")]),a._v(" file locally. The dp.save() function makes this possible.")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("save"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'datapackage.json'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\n")])])]),s("p",[a._v("The "),s("code",[a._v("datapackage.json")]),s("br"),a._v("\n("),s("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[a._v("download"),s("OutboundLink")],1),a._v(") is inlined below. Note that atomic number has been correctly inferred as an "),s("code",[a._v("integer")]),a._v(" and atomic mass as a "),s("code",[a._v("number")]),a._v(" (float) while every other column is a "),s("code",[a._v("string")]),a._v(".")]),a._v(" "),s("div",{staticClass:"language-json extra-class"},[s("pre",{pre:!0,attrs:{class:"language-json"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'profile'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'tabular-data-package'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'resources'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'path'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'data.csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'profile'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'tabular-data-resource'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'data'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'mediatype'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'text/csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'encoding'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'UTF"),s("span",{pre:!0,attrs:{class:"token number"}},[a._v("-8")]),a._v("'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'schema'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'fields'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'atomic number'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'integer'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'symbol'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'name'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'atomic mass'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'number'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'metal or nonmetal?'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'missingValues'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),a._v("''"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'periodic-table'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'title'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'Periodic Table'\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),s("h2",{attrs:{id:"publishing"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[a._v("#")]),a._v(" Publishing")]),a._v(" "),s("p",[a._v("Now that you have created your Data Package, you might want to "),s("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[a._v("publish your data online")]),a._v(" so that you can share it with others.")],1)])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[60],{557:function(a,t,s){"use strict";s.r(t);var e=s(29),n=Object(e.a)({},(function(){var a=this,t=a.$createElement,s=a._self._c||t;return s("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[s("p",[a._v("This tutorial will show you how to install the Python library for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),a._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[a._v("#")]),a._v(" Setup")]),a._v(" "),s("p",[a._v("For this tutorial, we will need the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package library"),s("OutboundLink")],1),a._v(" ("),s("a",{attrs:{href:"https://pypi.python.org/pypi/datapackage",target:"_blank",rel:"noopener noreferrer"}},[a._v("PyPI"),s("OutboundLink")],1),a._v(") library.")]),a._v(" "),s("div",{staticClass:"language-bash extra-class"},[s("pre",{pre:!0,attrs:{class:"language-bash"}},[s("code",[a._v("pip "),s("span",{pre:!0,attrs:{class:"token function"}},[a._v("install")]),a._v(" datapackage\n")])])]),s("h2",{attrs:{id:"creating-basic-metadata"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#creating-basic-metadata"}},[a._v("#")]),a._v(" Creating basic metadata")]),a._v(" "),s("p",[a._v("You can start using the library by importing "),s("code",[a._v("datapackage")]),a._v(".")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[a._v("import")]),a._v(" datapackage\n")])])]),s("p",[a._v("The Package() class allows you to work with data packages. Use it to create a blank datapackage called package like so:")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("Package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\n")])])]),s("p",[a._v("You can then add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required "),s("code",[a._v("name")]),a._v(" key as well as a human-readable "),s("code",[a._v("title")]),a._v(" key. For the keys supported, please consult the full "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#metadata",target:"_blank",rel:"noopener noreferrer"}},[a._v("Data Package spec"),s("OutboundLink")],1),a._v(". Note, we will be creating the required "),s("code",[a._v("resources")]),a._v(" key further down below.")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'name'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'period-table'")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'title'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v("=")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'Periodic Table'")]),a._v("\n")])])]),s("p",[a._v("To view your descriptor file at any time, simply type")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("h2",{attrs:{id:"inferring-a-csv-schema"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inferring-a-csv-schema"}},[a._v("#")]),a._v(" Inferring a CSV Schema")]),a._v(" "),s("p",[a._v("Let’s say we have a file called "),s("code",[a._v("data.csv")]),a._v(" ("),s("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[a._v("download"),s("OutboundLink")],1),a._v(") in our working directory that looks like this:")]),a._v(" "),s("table",[s("thead",[s("tr",[s("th",[a._v("atomic number")]),a._v(" "),s("th",[a._v("symbol")]),a._v(" "),s("th",[a._v("name")]),a._v(" "),s("th",[a._v("atomic mass")]),a._v(" "),s("th",[a._v("metal or nonmetal?")])])]),a._v(" "),s("tbody",[s("tr",[s("td",[a._v("1")]),a._v(" "),s("td",[a._v("H")]),a._v(" "),s("td",[a._v("Hydrogen")]),a._v(" "),s("td",[a._v("1.00794")]),a._v(" "),s("td",[a._v("nonmetal")])]),a._v(" "),s("tr",[s("td",[a._v("2")]),a._v(" "),s("td",[a._v("He")]),a._v(" "),s("td",[a._v("Helium")]),a._v(" "),s("td",[a._v("4.002602")]),a._v(" "),s("td",[a._v("noble gas")])]),a._v(" "),s("tr",[s("td",[a._v("3")]),a._v(" "),s("td",[a._v("Li")]),a._v(" "),s("td",[a._v("Lithium")]),a._v(" "),s("td",[a._v("6.941")]),a._v(" "),s("td",[a._v("alkali metal")])]),a._v(" "),s("tr",[s("td",[a._v("4")]),a._v(" "),s("td",[a._v("Be")]),a._v(" "),s("td",[a._v("Beryllium")]),a._v(" "),s("td",[a._v("9.012182")]),a._v(" "),s("td",[a._v("alkaline earth metal")])]),a._v(" "),s("tr",[s("td",[a._v("5")]),a._v(" "),s("td",[a._v("B")]),a._v(" "),s("td",[a._v("Boron")]),a._v(" "),s("td",[a._v("10.811")]),a._v(" "),s("td",[a._v("metalloid")])])])]),a._v(" "),s("p",[a._v("We can extrapolate our CSV’s schema by using "),s("code",[a._v("infer")]),a._v(" from the Table Schema library. The "),s("code",[a._v("infer")]),a._v(" function checks a small subset of your dataset and summarizes expected datatypes against each column, etc. To infer a schema for our dataset and view it, we will simply run")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("infer"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'periodic-table/data.csv'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("p",[a._v("Where there’s need to infer a schema for more than one tabular data resource, use the glob pattern "),s("code",[a._v("**/*.csv")]),a._v(" instead to infer a schema:")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("package"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("infer"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'**/*.csv'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\npackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("descriptor\n")])])]),s("p",[a._v("We are now ready to save our "),s("code",[a._v("datapackage.json")]),a._v(" file locally. The dp.save() function makes this possible.")]),a._v(" "),s("div",{staticClass:"language-python extra-class"},[s("pre",{pre:!0,attrs:{class:"language-python"}},[s("code",[a._v("dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(".")]),a._v("save"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[a._v("'datapackage.json'")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(")")]),a._v("\n")])])]),s("p",[a._v("The "),s("code",[a._v("datapackage.json")]),s("br"),a._v("\n("),s("a",{attrs:{href:"https://github.com/frictionlessdata/example-data-packages/blob/master/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[a._v("download"),s("OutboundLink")],1),a._v(") is inlined below. Note that atomic number has been correctly inferred as an "),s("code",[a._v("integer")]),a._v(" and atomic mass as a "),s("code",[a._v("number")]),a._v(" (float) while every other column is a "),s("code",[a._v("string")]),a._v(".")]),a._v(" "),s("div",{staticClass:"language-json extra-class"},[s("pre",{pre:!0,attrs:{class:"language-json"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'profile'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'tabular-data-package'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'resources'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'path'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'data.csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'profile'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'tabular-data-resource'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'data'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'mediatype'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'text/csv'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'encoding'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'UTF"),s("span",{pre:!0,attrs:{class:"token number"}},[a._v("-8")]),a._v("'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'schema'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'fields'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'atomic number'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'integer'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'symbol'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'name'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'atomic mass'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'number'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("{")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'metal or nonmetal?'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'type'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'string'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'format'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'default'\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'missingValues'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("[")]),a._v("''"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'name'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'periodic-table'"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v(",")]),a._v("\n 'title'"),s("span",{pre:!0,attrs:{class:"token operator"}},[a._v(":")]),a._v(" 'Periodic Table'\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[a._v("}")]),a._v("\n")])])]),s("h2",{attrs:{id:"publishing"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[a._v("#")]),a._v(" Publishing")]),a._v(" "),s("p",[a._v("Now that you have created your Data Package, you might want to "),s("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[a._v("publish your data online")]),a._v(" so that you can share it with others.")],1)])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/64.2e216624.js b/assets/js/64.c108e977.js similarity index 99% rename from assets/js/64.2e216624.js rename to assets/js/64.c108e977.js index 1f5c4e49d..a12b10d09 100644 --- a/assets/js/64.2e216624.js +++ b/assets/js/64.c108e977.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[64],{564:function(e,t,a){"use strict";a.r(t);var s=a(29),i=Object(s.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("a",{attrs:{href:"https://www.johnsnowlabs.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("John Snow Labs"),a("OutboundLink")],1),e._v(" accelerates data science and analytics teams, by providing clean, rich and current data sets for analysis. Our customers typically license between 50 and 500 data sets for a given project, so providing both data and metadata in a simple, standard format that is easily usable with a wide range of tools is important.")]),e._v(" "),a("p",[e._v("Each data set we license is curated by a domain expert, which then goes through both an automated DataOps platform and a manual review process. This is done in order to deal with a string of data challenges. First, it’s often hard to find the right data sets for a given problem. Second, data files come in different formats, and include dirty and missing data. Data types are inconsistent across different files, making it hard to join multiple data sets in one analysis. Null values, dates, currencies, units and identifiers are represented differently. Datasets aren’t updated on a standard or public schedule, which often requires manual labor to know when they’ve been updated. And then, data sets from different sources have different licenses - we use over 100 data sources which means well over 100 different"),a("br"),e._v("\ndata licenses that we help our clients be compliant with.")]),e._v(" "),a("p",[e._v("The most popular data format in which we deliver data is the Data Package "),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(". Each of our datasets is available, among other formats, as a pair of data.csv and datapackage.json files, complying with the specs "),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(". We currently provide over 900 data sets that leverage the Frictionless Data specifications.")]),e._v(" "),a("p",[e._v("Two years ago, when we were defining the product requirements and architecture, we researched six different standards for metadata definition over a few months. We found Frictionless Data as part of that research, and after careful consideration have decided to adopt it for all the datasets we curate. The Frictionless Data specifications were the simplest to implement, the simplest to explain to our customers, and enable immediate loading of data into the widest variety of analytical tools.")]),e._v(" "),a("p",[e._v("Our data curation guidelines have added more specific requirements, that are currently underspecified in the Frictionless Data specifications. For example, there are guidelines for dataset naming, keywords, length of the description, field naming, identifier field naming and types, and some of the properties supported for each field. Adding these to Frictionless Data would make it harder to comply with the specifications, but would also raise the quality bar of standard datasets; so it may be best to add them as recommendation.")]),e._v(" "),a("p",[e._v("Another area where the Frictionless Data specifications are worth expanding is more explicit definition of the properties of each data type - in particular geospatial data, timestamp data, identifiers, currencies and units. We have found a need to extend the type system and properties for each field’s type, in order to enable consistent mapping of schemas to different analytics tools that our customers use (Hadoop, Spark, MySQL, ElasticSearch, etc). We recommend adding these to the specifications.")]),e._v(" "),a("p",[e._v("We are working with "),a("a",{attrs:{href:"http://www.okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),a("OutboundLink")],1),e._v(" on open sourcing some of the libraries and tools we’re building. Internally, we are adding more automated validations, additional output file formats, and automated pipelines to load data into ElasticSearch"),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" and Kibana"),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", to enable interactive data discovery & visualization.")]),e._v(" "),a("p",[e._v("The core use case we see for Frictionless Data specs is making data ready for analytics. There is a lot of Open Data out there, but a lot of effort is still required to make it usable. This single use case expands into as many variations as there are BI & data management tools, so we have many years of work ahead of us to address this one core use case.")]),e._v(" "),a("hr",{staticClass:"footnotes-sep"}),e._v(" "),a("section",{staticClass:"footnotes"},[a("ol",{staticClass:"footnotes-list"},[a("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[a("p",[e._v("Data Package: "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package/"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[a("p",[e._v("Frictionless Data Specifications "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("specs"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[a("p",[e._v("Elastic Search "),a("a",{attrs:{href:"https://www.elastic.co/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[a("p",[e._v("kibana "),a("a",{attrs:{href:"https://www.elastic.co/products/kibana",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/kibana"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[64],{565:function(e,t,a){"use strict";a.r(t);var s=a(29),i=Object(s.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("a",{attrs:{href:"https://www.johnsnowlabs.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("John Snow Labs"),a("OutboundLink")],1),e._v(" accelerates data science and analytics teams, by providing clean, rich and current data sets for analysis. Our customers typically license between 50 and 500 data sets for a given project, so providing both data and metadata in a simple, standard format that is easily usable with a wide range of tools is important.")]),e._v(" "),a("p",[e._v("Each data set we license is curated by a domain expert, which then goes through both an automated DataOps platform and a manual review process. This is done in order to deal with a string of data challenges. First, it’s often hard to find the right data sets for a given problem. Second, data files come in different formats, and include dirty and missing data. Data types are inconsistent across different files, making it hard to join multiple data sets in one analysis. Null values, dates, currencies, units and identifiers are represented differently. Datasets aren’t updated on a standard or public schedule, which often requires manual labor to know when they’ve been updated. And then, data sets from different sources have different licenses - we use over 100 data sources which means well over 100 different"),a("br"),e._v("\ndata licenses that we help our clients be compliant with.")]),e._v(" "),a("p",[e._v("The most popular data format in which we deliver data is the Data Package "),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn1",id:"fnref1"}},[e._v("[1]")])]),e._v(". Each of our datasets is available, among other formats, as a pair of data.csv and datapackage.json files, complying with the specs "),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn2",id:"fnref2"}},[e._v("[2]")])]),e._v(". We currently provide over 900 data sets that leverage the Frictionless Data specifications.")]),e._v(" "),a("p",[e._v("Two years ago, when we were defining the product requirements and architecture, we researched six different standards for metadata definition over a few months. We found Frictionless Data as part of that research, and after careful consideration have decided to adopt it for all the datasets we curate. The Frictionless Data specifications were the simplest to implement, the simplest to explain to our customers, and enable immediate loading of data into the widest variety of analytical tools.")]),e._v(" "),a("p",[e._v("Our data curation guidelines have added more specific requirements, that are currently underspecified in the Frictionless Data specifications. For example, there are guidelines for dataset naming, keywords, length of the description, field naming, identifier field naming and types, and some of the properties supported for each field. Adding these to Frictionless Data would make it harder to comply with the specifications, but would also raise the quality bar of standard datasets; so it may be best to add them as recommendation.")]),e._v(" "),a("p",[e._v("Another area where the Frictionless Data specifications are worth expanding is more explicit definition of the properties of each data type - in particular geospatial data, timestamp data, identifiers, currencies and units. We have found a need to extend the type system and properties for each field’s type, in order to enable consistent mapping of schemas to different analytics tools that our customers use (Hadoop, Spark, MySQL, ElasticSearch, etc). We recommend adding these to the specifications.")]),e._v(" "),a("p",[e._v("We are working with "),a("a",{attrs:{href:"http://www.okfn.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International"),a("OutboundLink")],1),e._v(" on open sourcing some of the libraries and tools we’re building. Internally, we are adding more automated validations, additional output file formats, and automated pipelines to load data into ElasticSearch"),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn3",id:"fnref3"}},[e._v("[3]")])]),e._v(" and Kibana"),a("sup",{staticClass:"footnote-ref"},[a("a",{attrs:{href:"#fn4",id:"fnref4"}},[e._v("[4]")])]),e._v(", to enable interactive data discovery & visualization.")]),e._v(" "),a("p",[e._v("The core use case we see for Frictionless Data specs is making data ready for analytics. There is a lot of Open Data out there, but a lot of effort is still required to make it usable. This single use case expands into as many variations as there are BI & data management tools, so we have many years of work ahead of us to address this one core use case.")]),e._v(" "),a("hr",{staticClass:"footnotes-sep"}),e._v(" "),a("section",{staticClass:"footnotes"},[a("ol",{staticClass:"footnotes-list"},[a("li",{staticClass:"footnote-item",attrs:{id:"fn1"}},[a("p",[e._v("Data Package: "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://specs.frictionlessdata.io/data-package/"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref1"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn2"}},[a("p",[e._v("Frictionless Data Specifications "),a("a",{attrs:{href:"https://specs.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("specs"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref2"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn3"}},[a("p",[e._v("Elastic Search "),a("a",{attrs:{href:"https://www.elastic.co/",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref3"}},[e._v("↩︎")])])]),e._v(" "),a("li",{staticClass:"footnote-item",attrs:{id:"fn4"}},[a("p",[e._v("kibana "),a("a",{attrs:{href:"https://www.elastic.co/products/kibana",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.elastic.co/products/kibana"),a("OutboundLink")],1),e._v(" "),a("a",{staticClass:"footnote-backref",attrs:{href:"#fnref4"}},[e._v("↩︎")])])])])])])}),[],!1,null,null,null);t.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/66.291b99fd.js b/assets/js/66.a6ffa49d.js similarity index 92% rename from assets/js/66.291b99fd.js rename to assets/js/66.a6ffa49d.js index 4b6185164..3a5caadc7 100644 --- a/assets/js/66.291b99fd.js +++ b/assets/js/66.a6ffa49d.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[66],{571:function(t,e,s){"use strict";s.r(e);var a=s(29),o=Object(a.a)({},(function(){var t=this.$createElement,e=this._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":this.$parent.slotKey}},[e("p",[this._v("This blog post was "),e("a",{attrs:{href:"https://collectionsasdata.github.io/facet2/",target:"_blank",rel:"noopener noreferrer"}},[this._v("originally published as part of the Collections as Data Facets document collections"),e("OutboundLink")],1),this._v(" on the Always Already Computational - Collections as Data website."),e("br"),this._v(" "),e("iframe",{staticStyle:{width:"100%",height:"1000px"},attrs:{src:"https://collectionsasdata.github.io/facet2/"}},[e("br")])])])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[66],{570:function(t,e,s){"use strict";s.r(e);var a=s(29),o=Object(a.a)({},(function(){var t=this.$createElement,e=this._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":this.$parent.slotKey}},[e("p",[this._v("This blog post was "),e("a",{attrs:{href:"https://collectionsasdata.github.io/facet2/",target:"_blank",rel:"noopener noreferrer"}},[this._v("originally published as part of the Collections as Data Facets document collections"),e("OutboundLink")],1),this._v(" on the Always Already Computational - Collections as Data website."),e("br"),this._v(" "),e("iframe",{staticStyle:{width:"100%",height:"1000px"},attrs:{src:"https://collectionsasdata.github.io/facet2/"}},[e("br")])])])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/67.3e7762b2.js b/assets/js/67.c668ed43.js similarity index 98% rename from assets/js/67.3e7762b2.js rename to assets/js/67.c668ed43.js index 261879b05..63916aa41 100644 --- a/assets/js/67.3e7762b2.js +++ b/assets/js/67.c668ed43.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[67],{573:function(t,a,e){"use strict";e.r(a);var r=e(29),o=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("This post provides you with a template for writing Frictionless Data tutorials. Specifically, tutorials of the form: "),e("strong",[t._v("How to do X thing using Y Frictionless Data tool")]),t._v(".")]),t._v(" "),e("h2",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("You want to start introducting what you are doing e.g.")]),t._v(" "),e("blockquote",[e("p",[t._v("In this tutorial you’ll learn how to {do a thing using a tool} to {provide some benefit} (This first sentence may be inspired by a "),e("a",{attrs:{href:"http://frictionlessdata.io/user-stories/",target:"_blank",rel:"noopener noreferrer"}},[t._v("user story"),e("OutboundLink")],1),t._v(").")])]),t._v(" "),e("p",[t._v("Clearly state the objective of your tutorial in the title and then once again in more detail at the very beginning of the tutorial. This gives readers an idea of what to expect and helps them determine if they want to continue reading.")]),t._v(" "),e("blockquote",[e("p",[e("strong",[t._v("Tutorial time")]),t._v(" : 20 minutes")]),t._v(" "),e("p",[e("strong",[t._v("Audience")]),t._v(" : Beginner Data Packagers {user role} with {skill level}.")])]),t._v(" "),e("p",[t._v("Then continue like this:")]),t._v(" "),e("blockquote",[e("h2",{attrs:{id:"what-you-ll-need"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#what-you-ll-need"}},[t._v("#")]),t._v(" What you’ll need")]),t._v(" "),e("p",[t._v("You’ll need a basic understanding of:")]),t._v(" "),e("ul",[e("li",[t._v("JSON syntax")]),t._v(" "),e("li",[t._v("how to run commands in Terminal")])]),t._v(" "),e("p",[t._v("To complete this tutorial you’ll need:")]),t._v(" "),e("ul",[e("li",[t._v("a computer (macOS or Windows) with access to the internet")]),t._v(" "),e("li",[t._v("an account on "),e("a",{attrs:{href:"http://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),e("OutboundLink")],1),t._v(" ("),e("a",{attrs:{href:"https://datahub.ckan.io/about",target:"_blank",rel:"noopener noreferrer"}},[t._v("here’s how"),e("OutboundLink")],1),t._v(")")])]),t._v(" "),e("h2",{attrs:{id:"introduction-2"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction-2"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("Introduce any basic concepts.")]),t._v(" "),e("p",[t._v("To {achieve the benefit} we’ll guide you through these steps:")]),t._v(" "),e("ol",[e("li",[e("a",{attrs:{href:"#1-import-the-data"}},[t._v("import the data")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#2-generate-a-table-schema"}},[t._v("generate a table schema")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#3-create-a-data-package"}},[t._v("create a data package")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#4-publish-the-data-package"}},[t._v("publish the data package")])])]),t._v(" "),e("h3",{attrs:{id:"_1-import-the-data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_1-import-the-data"}},[t._v("#")]),t._v(" 1. Import the data")]),t._v(" "),e("p",[t._v("Write in a friendly, conversational style. Using humor is fine.")]),t._v(" "),e("h3",{attrs:{id:"_2-generate-a-table-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_2-generate-a-table-schema"}},[t._v("#")]),t._v(" 2. Generate a table schema")]),t._v(" "),e("p",[t._v("Include pictures. Highlight key items on screenshots. Make sure pictures can be view in fullsize.")]),t._v(" "),e("h3",{attrs:{id:"_3-create-a-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_3-create-a-data-package"}},[t._v("#")]),t._v(" 3. Create a data package")]),t._v(" "),e("p",[t._v("Explain why something must be done, not just how to do it.")]),t._v(" "),e("h3",{attrs:{id:"_4-publish-the-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_4-publish-the-data-package"}},[t._v("#")]),t._v(" 4. Publish the data package")]),t._v(" "),e("p",[t._v("In this step you’ll…")]),t._v(" "),e("h2",{attrs:{id:"congratulations"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#congratulations"}},[t._v("#")]),t._v(" Congratulations")]),t._v(" "),e("p",[t._v("In 4 simple steps you’ve learned how {do a thing}. With this new knowledge, now you can {achieve a benefit}.")]),t._v(" "),e("p",[t._v("Now go {do something}")]),t._v(" "),e("h2",{attrs:{id:"learn-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#learn-more"}},[t._v("#")]),t._v(" Learn more")]),t._v(" "),e("h3",{attrs:{id:"related-guides"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#related-guides"}},[t._v("#")]),t._v(" Related Guides")]),t._v(" "),e("ul",[e("li",[t._v("Tabular Data Package guide - "),e("a",{attrs:{href:"http://frictionlessdata.io/docs/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://frictionlessdata.io/docs/tabular-data-package/"),e("OutboundLink")],1)])]),t._v(" "),e("h3",{attrs:{id:"references"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#references"}},[t._v("#")]),t._v(" References")]),t._v(" "),e("ul",[e("li",[e("RouterLink",{attrs:{to:"/specs/tabular-data-package/"}},[t._v("Tabular Data Package specification")])],1)])])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[67],{572:function(t,a,e){"use strict";e.r(a);var r=e(29),o=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("This post provides you with a template for writing Frictionless Data tutorials. Specifically, tutorials of the form: "),e("strong",[t._v("How to do X thing using Y Frictionless Data tool")]),t._v(".")]),t._v(" "),e("h2",{attrs:{id:"introduction"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("You want to start introducting what you are doing e.g.")]),t._v(" "),e("blockquote",[e("p",[t._v("In this tutorial you’ll learn how to {do a thing using a tool} to {provide some benefit} (This first sentence may be inspired by a "),e("a",{attrs:{href:"http://frictionlessdata.io/user-stories/",target:"_blank",rel:"noopener noreferrer"}},[t._v("user story"),e("OutboundLink")],1),t._v(").")])]),t._v(" "),e("p",[t._v("Clearly state the objective of your tutorial in the title and then once again in more detail at the very beginning of the tutorial. This gives readers an idea of what to expect and helps them determine if they want to continue reading.")]),t._v(" "),e("blockquote",[e("p",[e("strong",[t._v("Tutorial time")]),t._v(" : 20 minutes")]),t._v(" "),e("p",[e("strong",[t._v("Audience")]),t._v(" : Beginner Data Packagers {user role} with {skill level}.")])]),t._v(" "),e("p",[t._v("Then continue like this:")]),t._v(" "),e("blockquote",[e("h2",{attrs:{id:"what-you-ll-need"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#what-you-ll-need"}},[t._v("#")]),t._v(" What you’ll need")]),t._v(" "),e("p",[t._v("You’ll need a basic understanding of:")]),t._v(" "),e("ul",[e("li",[t._v("JSON syntax")]),t._v(" "),e("li",[t._v("how to run commands in Terminal")])]),t._v(" "),e("p",[t._v("To complete this tutorial you’ll need:")]),t._v(" "),e("ul",[e("li",[t._v("a computer (macOS or Windows) with access to the internet")]),t._v(" "),e("li",[t._v("an account on "),e("a",{attrs:{href:"http://datahub.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("datahub.io"),e("OutboundLink")],1),t._v(" ("),e("a",{attrs:{href:"https://datahub.ckan.io/about",target:"_blank",rel:"noopener noreferrer"}},[t._v("here’s how"),e("OutboundLink")],1),t._v(")")])]),t._v(" "),e("h2",{attrs:{id:"introduction-2"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#introduction-2"}},[t._v("#")]),t._v(" Introduction")]),t._v(" "),e("p",[t._v("Introduce any basic concepts.")]),t._v(" "),e("p",[t._v("To {achieve the benefit} we’ll guide you through these steps:")]),t._v(" "),e("ol",[e("li",[e("a",{attrs:{href:"#1-import-the-data"}},[t._v("import the data")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#2-generate-a-table-schema"}},[t._v("generate a table schema")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#3-create-a-data-package"}},[t._v("create a data package")])]),t._v(" "),e("li",[e("a",{attrs:{href:"#4-publish-the-data-package"}},[t._v("publish the data package")])])]),t._v(" "),e("h3",{attrs:{id:"_1-import-the-data"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_1-import-the-data"}},[t._v("#")]),t._v(" 1. Import the data")]),t._v(" "),e("p",[t._v("Write in a friendly, conversational style. Using humor is fine.")]),t._v(" "),e("h3",{attrs:{id:"_2-generate-a-table-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_2-generate-a-table-schema"}},[t._v("#")]),t._v(" 2. Generate a table schema")]),t._v(" "),e("p",[t._v("Include pictures. Highlight key items on screenshots. Make sure pictures can be view in fullsize.")]),t._v(" "),e("h3",{attrs:{id:"_3-create-a-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_3-create-a-data-package"}},[t._v("#")]),t._v(" 3. Create a data package")]),t._v(" "),e("p",[t._v("Explain why something must be done, not just how to do it.")]),t._v(" "),e("h3",{attrs:{id:"_4-publish-the-data-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#_4-publish-the-data-package"}},[t._v("#")]),t._v(" 4. Publish the data package")]),t._v(" "),e("p",[t._v("In this step you’ll…")]),t._v(" "),e("h2",{attrs:{id:"congratulations"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#congratulations"}},[t._v("#")]),t._v(" Congratulations")]),t._v(" "),e("p",[t._v("In 4 simple steps you’ve learned how {do a thing}. With this new knowledge, now you can {achieve a benefit}.")]),t._v(" "),e("p",[t._v("Now go {do something}")]),t._v(" "),e("h2",{attrs:{id:"learn-more"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#learn-more"}},[t._v("#")]),t._v(" Learn more")]),t._v(" "),e("h3",{attrs:{id:"related-guides"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#related-guides"}},[t._v("#")]),t._v(" Related Guides")]),t._v(" "),e("ul",[e("li",[t._v("Tabular Data Package guide - "),e("a",{attrs:{href:"http://frictionlessdata.io/docs/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://frictionlessdata.io/docs/tabular-data-package/"),e("OutboundLink")],1)])]),t._v(" "),e("h3",{attrs:{id:"references"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#references"}},[t._v("#")]),t._v(" References")]),t._v(" "),e("ul",[e("li",[e("RouterLink",{attrs:{to:"/specs/tabular-data-package/"}},[t._v("Tabular Data Package specification")])],1)])])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/7.3954119c.js b/assets/js/7.340e27ee.js similarity index 99% rename from assets/js/7.3954119c.js rename to assets/js/7.340e27ee.js index 06c6f9b8b..a6e682867 100644 --- a/assets/js/7.3954119c.js +++ b/assets/js/7.340e27ee.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[7],{461:function(t,a,s){t.exports=s.p+"assets/img/goodtables-screenshot.48a106ce.png"},462:function(t,a,s){t.exports=s.p+"assets/img/goodtables-provide-data.78adcaea.png"},463:function(t,a){t.exports=""},464:function(t,a,s){t.exports=s.p+"assets/img/goodtables-valid.5e65080f.png"},465:function(t,a,s){t.exports=s.p+"assets/img/goodtables-invalid.d1ae3ac6.png"},466:function(t,a,s){t.exports=s.p+"assets/img/goodtables-provide-schema.3e7cdcb9.png"},467:function(t,a,s){t.exports=s.p+"assets/img/goodtables-continuous-validation.2d4abd27.png"},601:function(t,a,s){"use strict";s.r(a);var e=s(29),o=Object(e.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("Tabular data (e.g. data stored in "),e("RouterLink",{attrs:{to:"/blog/2018/07/09/csv/"}},[t._v("CSV")]),t._v(" and Excel worksheets) is one of the most common forms of data available on the web. This guide will walk through validating tabular data using Frictionless Data software.")],1),t._v(" "),e("p",[t._v("This guide show how you can validate your tabular data and check both:")]),t._v(" "),e("ul",[e("li",[t._v("Structure: are there too many rows or columns in some places?")]),t._v(" "),e("li",[t._v("Schema: does the data fit its schema. Are the values in the date column actually dates? Are all the numbers greater than zero?")])]),t._v(" "),e("p",[t._v("We will walk through two methods of performing validation:")]),t._v(" "),e("ul",[e("li",[t._v("Web service: an online service called "),e("strong",[t._v("goodtables")]),t._v(". This option requires no technical knowledge or expertise.")]),t._v(" "),e("li",[t._v("Using the "),e("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("Python goodtables library"),e("OutboundLink")],1),t._v(". This allows you full control over the validation process but requires knowledge of Python.")])]),t._v(" "),e("h2",{attrs:{id:"goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#goodtables"}},[t._v("#")]),t._v(" goodtables")]),t._v(" "),e("p",[e("a",{attrs:{href:"http://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables"),e("OutboundLink")],1),t._v(" is a free, open-source, hosted service for validating tabular data. goodtables checks your data for its "),e("em",[t._v("structure")]),t._v(", and, optionally, its adherence to a specified "),e("em",[t._v("schema")]),t._v(". Where the latter fails, goodtables highlights content errors so you can fix them speedily.")]),t._v(" "),e("p",[t._v("goodtables will give quick and simple feedback on where your tabular data may not yet be quite perfect.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(461),alt:"goodtables screenshot"}})]),t._v(" "),e("p",[t._v("To get started with one-off validation of your tabular datasets, use "),e("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),e("OutboundLink")],1),t._v(". All you need to do is upload or provide a link to a CSV file and hit the “Validate” button.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(462),alt:"goodtables Provide URL"}})]),t._v(" "),e("p",[e("img",{attrs:{src:s(463),alt:"goodtables Validate button"}})]),t._v(" "),e("p",[t._v("If your data is structurally valid, you should receive the following result:")]),t._v(" "),e("p",[e("img",{attrs:{src:s(464),alt:"goodtables Valid"}})]),t._v(" "),e("p",[t._v("If not…")]),t._v(" "),e("p",[e("img",{attrs:{src:s(465),alt:"goodtables Invalid"}})]),t._v(" "),e("p",[t._v("The report should highlight the structural issues found in your data for correction. For instance, a poorly structured tabular dataset may consist of a header row with too many (or too few) columns when compared to of data rows with an equal amount of columns.")]),t._v(" "),e("p",[t._v("You can also provide a schema for your tabular data defined using JSON Table Schema.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(466),alt:"goodtables Provide Schema"}})]),t._v(" "),e("p",[t._v("Briefly, the format allows users to specify not only the types of information within each column in a tabular dataset, but also expected values. For more information, see the "),e("RouterLink",{attrs:{to:"/introduction/"}},[t._v("introduction")]),t._v(" or "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("the full standard"),e("OutboundLink")],1),t._v(".")],1),t._v(" "),e("h2",{attrs:{id:"python-goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#python-goodtables"}},[t._v("#")]),t._v(" Python + goodtables")]),t._v(" "),e("p",[t._v("goodtables is also available as a Python library. The following short snippets demonstrate examples of loading and validating data in a file called "),e("code",[t._v("data.csv")]),t._v("(and in the second example, validating the same data file against "),e("code",[t._v("schema.json")]),t._v(")")]),t._v(" "),e("h3",{attrs:{id:"validating-structure"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-structure"}},[t._v("#")]),t._v(" Validating Structure")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" goodtables "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" validate\n\nreport "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" validate"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data.csv'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table-count'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'error-count'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'source'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'errors'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'code'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n")])])]),e("h3",{attrs:{id:"validating-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-schema"}},[t._v("#")]),t._v(" Validating Schema")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" goodtables "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" validate\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# sync source/schema fields order")]),t._v("\nreport "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" validate"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data.csv'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" schema"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'schema.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" order_fields"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("\n")])])]),e("h2",{attrs:{id:"continuous-data-validation"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#continuous-data-validation"}},[t._v("#")]),t._v(" Continuous Data Validation")]),t._v(" "),e("p",[t._v("In a bid to streamline the process of data validation and ensure seamless integration is possible in different publishing workflows, we have set up a continuous data validation hosted service that builds on top of Frictionless Data libraries. "),e("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),e("OutboundLink")],1),t._v(" provides support for different backends. At this time, users can use it to check any datasets hosted on GitHub and Amazon S3 buckets, automatically running validation against data files every time they are updated, and providing a user friendly report of any issues found.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(467),alt:"Data Valid"}})]),t._v(" "),e("p",[t._v("Start your continuous data validation here: "),e("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://goodtables.io"),e("OutboundLink")],1)]),t._v(" "),e("p",[t._v("Blog post on goodtables python library and goodtables web service: "),e("a",{attrs:{href:"http://okfnlabs.org/blog/2017/05/22/introducing-the-new-goodtables-library-and-goodtablesio.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://okfnlabs.org/blog/2017/05/22/introducing-the-new-goodtables-library-and-goodtablesio.html"),e("OutboundLink")],1)]),t._v(" "),e("p",[t._v("See the "),e("code",[t._v("README.md")]),t._v(" for more information.")]),t._v(" "),e("p",[t._v("Find more examples on validating tabular data in the "),e("a",{attrs:{href:"/tag/field-guide"}},[t._v("Frictionless Data Field Guide")])])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[7],{461:function(t,a,s){t.exports=s.p+"assets/img/goodtables-screenshot.48a106ce.png"},462:function(t,a,s){t.exports=s.p+"assets/img/goodtables-provide-data.78adcaea.png"},463:function(t,a){t.exports=""},464:function(t,a,s){t.exports=s.p+"assets/img/goodtables-valid.5e65080f.png"},465:function(t,a,s){t.exports=s.p+"assets/img/goodtables-invalid.d1ae3ac6.png"},466:function(t,a,s){t.exports=s.p+"assets/img/goodtables-provide-schema.3e7cdcb9.png"},467:function(t,a,s){t.exports=s.p+"assets/img/goodtables-continuous-validation.2d4abd27.png"},600:function(t,a,s){"use strict";s.r(a);var e=s(29),o=Object(e.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("Tabular data (e.g. data stored in "),e("RouterLink",{attrs:{to:"/blog/2018/07/09/csv/"}},[t._v("CSV")]),t._v(" and Excel worksheets) is one of the most common forms of data available on the web. This guide will walk through validating tabular data using Frictionless Data software.")],1),t._v(" "),e("p",[t._v("This guide show how you can validate your tabular data and check both:")]),t._v(" "),e("ul",[e("li",[t._v("Structure: are there too many rows or columns in some places?")]),t._v(" "),e("li",[t._v("Schema: does the data fit its schema. Are the values in the date column actually dates? Are all the numbers greater than zero?")])]),t._v(" "),e("p",[t._v("We will walk through two methods of performing validation:")]),t._v(" "),e("ul",[e("li",[t._v("Web service: an online service called "),e("strong",[t._v("goodtables")]),t._v(". This option requires no technical knowledge or expertise.")]),t._v(" "),e("li",[t._v("Using the "),e("a",{attrs:{href:"https://github.com/frictionlessdata/goodtables-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("Python goodtables library"),e("OutboundLink")],1),t._v(". This allows you full control over the validation process but requires knowledge of Python.")])]),t._v(" "),e("h2",{attrs:{id:"goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#goodtables"}},[t._v("#")]),t._v(" goodtables")]),t._v(" "),e("p",[e("a",{attrs:{href:"http://goodtables.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables"),e("OutboundLink")],1),t._v(" is a free, open-source, hosted service for validating tabular data. goodtables checks your data for its "),e("em",[t._v("structure")]),t._v(", and, optionally, its adherence to a specified "),e("em",[t._v("schema")]),t._v(". Where the latter fails, goodtables highlights content errors so you can fix them speedily.")]),t._v(" "),e("p",[t._v("goodtables will give quick and simple feedback on where your tabular data may not yet be quite perfect.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(461),alt:"goodtables screenshot"}})]),t._v(" "),e("p",[t._v("To get started with one-off validation of your tabular datasets, use "),e("a",{attrs:{href:"http://try.goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("try.goodtables.io"),e("OutboundLink")],1),t._v(". All you need to do is upload or provide a link to a CSV file and hit the “Validate” button.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(462),alt:"goodtables Provide URL"}})]),t._v(" "),e("p",[e("img",{attrs:{src:s(463),alt:"goodtables Validate button"}})]),t._v(" "),e("p",[t._v("If your data is structurally valid, you should receive the following result:")]),t._v(" "),e("p",[e("img",{attrs:{src:s(464),alt:"goodtables Valid"}})]),t._v(" "),e("p",[t._v("If not…")]),t._v(" "),e("p",[e("img",{attrs:{src:s(465),alt:"goodtables Invalid"}})]),t._v(" "),e("p",[t._v("The report should highlight the structural issues found in your data for correction. For instance, a poorly structured tabular dataset may consist of a header row with too many (or too few) columns when compared to of data rows with an equal amount of columns.")]),t._v(" "),e("p",[t._v("You can also provide a schema for your tabular data defined using JSON Table Schema.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(466),alt:"goodtables Provide Schema"}})]),t._v(" "),e("p",[t._v("Briefly, the format allows users to specify not only the types of information within each column in a tabular dataset, but also expected values. For more information, see the "),e("RouterLink",{attrs:{to:"/introduction/"}},[t._v("introduction")]),t._v(" or "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("the full standard"),e("OutboundLink")],1),t._v(".")],1),t._v(" "),e("h2",{attrs:{id:"python-goodtables"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#python-goodtables"}},[t._v("#")]),t._v(" Python + goodtables")]),t._v(" "),e("p",[t._v("goodtables is also available as a Python library. The following short snippets demonstrate examples of loading and validating data in a file called "),e("code",[t._v("data.csv")]),t._v("(and in the second example, validating the same data file against "),e("code",[t._v("schema.json")]),t._v(")")]),t._v(" "),e("h3",{attrs:{id:"validating-structure"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-structure"}},[t._v("#")]),t._v(" Validating Structure")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" goodtables "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" validate\n\nreport "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" validate"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data.csv'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'table-count'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'error-count'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'valid'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'source'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\nreport"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'tables'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'errors'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'code'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n")])])]),e("h3",{attrs:{id:"validating-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#validating-schema"}},[t._v("#")]),t._v(" Validating Schema")]),t._v(" "),e("div",{staticClass:"language-python extra-class"},[e("pre",{pre:!0,attrs:{class:"language-python"}},[e("code",[e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("from")]),t._v(" goodtables "),e("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" validate\n\n"),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# sync source/schema fields order")]),t._v("\nreport "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" validate"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'data.csv'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" schema"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'schema.json'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" order_fields"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),e("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("True")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("\n")])])]),e("h2",{attrs:{id:"continuous-data-validation"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#continuous-data-validation"}},[t._v("#")]),t._v(" Continuous Data Validation")]),t._v(" "),e("p",[t._v("In a bid to streamline the process of data validation and ensure seamless integration is possible in different publishing workflows, we have set up a continuous data validation hosted service that builds on top of Frictionless Data libraries. "),e("a",{attrs:{href:"http://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("goodtables.io"),e("OutboundLink")],1),t._v(" provides support for different backends. At this time, users can use it to check any datasets hosted on GitHub and Amazon S3 buckets, automatically running validation against data files every time they are updated, and providing a user friendly report of any issues found.")]),t._v(" "),e("p",[e("img",{attrs:{src:s(467),alt:"Data Valid"}})]),t._v(" "),e("p",[t._v("Start your continuous data validation here: "),e("a",{attrs:{href:"https://goodtables.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("https://goodtables.io"),e("OutboundLink")],1)]),t._v(" "),e("p",[t._v("Blog post on goodtables python library and goodtables web service: "),e("a",{attrs:{href:"http://okfnlabs.org/blog/2017/05/22/introducing-the-new-goodtables-library-and-goodtablesio.html",target:"_blank",rel:"noopener noreferrer"}},[t._v("http://okfnlabs.org/blog/2017/05/22/introducing-the-new-goodtables-library-and-goodtablesio.html"),e("OutboundLink")],1)]),t._v(" "),e("p",[t._v("See the "),e("code",[t._v("README.md")]),t._v(" for more information.")]),t._v(" "),e("p",[t._v("Find more examples on validating tabular data in the "),e("a",{attrs:{href:"/tag/field-guide"}},[t._v("Frictionless Data Field Guide")])])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/72.58029e7d.js b/assets/js/72.c80b57c7.js similarity index 98% rename from assets/js/72.58029e7d.js rename to assets/js/72.c80b57c7.js index c570b451b..8757a6810 100644 --- a/assets/js/72.58029e7d.js +++ b/assets/js/72.c80b57c7.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[72],{581:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("This grantee profile features Daniel Fireman for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),a("p",[e._v("I was born in "),a("a",{attrs:{href:"https://www.google.com/search?site=&tbm=isch&source=hp&biw=1600&bih=783&q=Macei%C3%B3&oq=Macei%C3%B3&gs_l=img.3..0l7j0i30k1l3.707.4892.0.5214.9.7.0.0.0.0.245.904.0j1j3.4.0....0...1.1.64.img..5.4.903.0..35i39k1.p1SYqvZtcYw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Maceió"),a("OutboundLink")],1),e._v(", a sunny coastal city in the Northeast of Brazil. It was 20th century still when I had a first contact with an Intel 80386 and installed Conectiva Linux Guarani 3.0. A lot has happened since, for instance, a bachelor’s degree in Computer Science at UFCG after three years as a research assistant in the Distributed Systems Lab (LSD). It was already the 21st century when I realized that distributed and scalable systems were the way to go. I kept on studying the field and pursued a MSc at UFMG. From there I joined Google and spent 6 happy years working at multiple offices (NYC, ZRH, BHZ). I’ve got the chance to work on a myriad of projects, ranging from social networks to Google’s default Java HTTP/RPC server framework. Currently, I’m back to UFCG doing a Ph.D. in cloud computing performance. It is easy to find me at hackathons and other efforts to increase transparency of public data. I have also been busy working on projects like "),a("a",{attrs:{href:"http://www.madrid.org/cs/Satellite?pagename=PortalContratacion/Page/PCON_home",target:"_blank",rel:"noopener noreferrer"}},[e._v("contratospublicos.info"),a("OutboundLink")],1),e._v(" and Frictionless Data, using Go to improve data transparency in Brazil and around the world.")]),e._v(" "),a("p",[e._v("I started following "),a("a",{attrs:{href:"https://twitter.com/OKFN",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International (OKI) on Twitter"),a("OutboundLink")],1),e._v(" after watching a talk from "),a("a",{attrs:{href:"https://github.com/vitorbaptista",target:"_blank",rel:"noopener noreferrer"}},[e._v("Vitor Baptista"),a("OutboundLink")],1),e._v(" at UFCG. I learnt about Frictionless Data from posts by OKI and liked the overall idea a lot. I have been a Golang enthusiast for a while now, but I hadn’t thought of applying to the fund until I had a quick chat with "),a("a",{attrs:{href:"https://github.com/nazareno",target:"_blank",rel:"noopener noreferrer"}},[e._v("Nazareno Andrade"),a("OutboundLink")],1),e._v(" that started with Golang and ended with: “what about the "),a("a",{attrs:{href:"https://toolfund.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Tool Fund"),a("OutboundLink")],1),e._v("?”")]),e._v(" "),a("p",[e._v("Go has a lot to deliver in terms of approximating simplicity of reading/writing, correctness, and performance. I believe bringing the experience and solid specifications of Frictionless Data to the Go ecosystem will not only make data description, validation and processing easier and faster, but also help to decrease the distance between data analysis/processing and production serving systems, resulting in simpler and more solid infrastructure.")]),e._v(" "),a("p",[e._v("In the coming weeks, I hope to use the Tool Fund grant I received to bring Go’s performance and concurrency capabilities to data processing and to have a set of tools distributed as standalone and multi-platform binaries which are very easy to download and install. I am currently working on my Ph.D. and one pitfall I have come across is the use of one environment/system to collect/generate data and another to process. I will be working to alleviate this issue in order to make it easier to process tabular data in Go.")]),e._v(" "),a("p",[e._v("From the developer’s perspective, it is really great to use open source software. This is especially true when the community around the software fosters it’s usage and welcome contributors. That ends up increasing the overall quality of the software, which benefits all users.")]),e._v(" "),a("p",[e._v("The source code will be hosted at Github’s "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-go",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-go"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[e._v("datapackage-go"),a("OutboundLink")],1),e._v(" repositories. We are going to use issues to track development progress and next steps.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[72],{582:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("This grantee profile features Daniel Fireman for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),a("p",[e._v("I was born in "),a("a",{attrs:{href:"https://www.google.com/search?site=&tbm=isch&source=hp&biw=1600&bih=783&q=Macei%C3%B3&oq=Macei%C3%B3&gs_l=img.3..0l7j0i30k1l3.707.4892.0.5214.9.7.0.0.0.0.245.904.0j1j3.4.0....0...1.1.64.img..5.4.903.0..35i39k1.p1SYqvZtcYw",target:"_blank",rel:"noopener noreferrer"}},[e._v("Maceió"),a("OutboundLink")],1),e._v(", a sunny coastal city in the Northeast of Brazil. It was 20th century still when I had a first contact with an Intel 80386 and installed Conectiva Linux Guarani 3.0. A lot has happened since, for instance, a bachelor’s degree in Computer Science at UFCG after three years as a research assistant in the Distributed Systems Lab (LSD). It was already the 21st century when I realized that distributed and scalable systems were the way to go. I kept on studying the field and pursued a MSc at UFMG. From there I joined Google and spent 6 happy years working at multiple offices (NYC, ZRH, BHZ). I’ve got the chance to work on a myriad of projects, ranging from social networks to Google’s default Java HTTP/RPC server framework. Currently, I’m back to UFCG doing a Ph.D. in cloud computing performance. It is easy to find me at hackathons and other efforts to increase transparency of public data. I have also been busy working on projects like "),a("a",{attrs:{href:"http://www.madrid.org/cs/Satellite?pagename=PortalContratacion/Page/PCON_home",target:"_blank",rel:"noopener noreferrer"}},[e._v("contratospublicos.info"),a("OutboundLink")],1),e._v(" and Frictionless Data, using Go to improve data transparency in Brazil and around the world.")]),e._v(" "),a("p",[e._v("I started following "),a("a",{attrs:{href:"https://twitter.com/OKFN",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge International (OKI) on Twitter"),a("OutboundLink")],1),e._v(" after watching a talk from "),a("a",{attrs:{href:"https://github.com/vitorbaptista",target:"_blank",rel:"noopener noreferrer"}},[e._v("Vitor Baptista"),a("OutboundLink")],1),e._v(" at UFCG. I learnt about Frictionless Data from posts by OKI and liked the overall idea a lot. I have been a Golang enthusiast for a while now, but I hadn’t thought of applying to the fund until I had a quick chat with "),a("a",{attrs:{href:"https://github.com/nazareno",target:"_blank",rel:"noopener noreferrer"}},[e._v("Nazareno Andrade"),a("OutboundLink")],1),e._v(" that started with Golang and ended with: “what about the "),a("a",{attrs:{href:"https://toolfund.frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data Tool Fund"),a("OutboundLink")],1),e._v("?”")]),e._v(" "),a("p",[e._v("Go has a lot to deliver in terms of approximating simplicity of reading/writing, correctness, and performance. I believe bringing the experience and solid specifications of Frictionless Data to the Go ecosystem will not only make data description, validation and processing easier and faster, but also help to decrease the distance between data analysis/processing and production serving systems, resulting in simpler and more solid infrastructure.")]),e._v(" "),a("p",[e._v("In the coming weeks, I hope to use the Tool Fund grant I received to bring Go’s performance and concurrency capabilities to data processing and to have a set of tools distributed as standalone and multi-platform binaries which are very easy to download and install. I am currently working on my Ph.D. and one pitfall I have come across is the use of one environment/system to collect/generate data and another to process. I will be working to alleviate this issue in order to make it easier to process tabular data in Go.")]),e._v(" "),a("p",[e._v("From the developer’s perspective, it is really great to use open source software. This is especially true when the community around the software fosters it’s usage and welcome contributors. That ends up increasing the overall quality of the software, which benefits all users.")]),e._v(" "),a("p",[e._v("The source code will be hosted at Github’s "),a("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-go",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-go"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[e._v("datapackage-go"),a("OutboundLink")],1),e._v(" repositories. We are going to use issues to track development progress and next steps.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/73.39edf2a1.js b/assets/js/73.bff5f5c0.js similarity index 99% rename from assets/js/73.39edf2a1.js rename to assets/js/73.bff5f5c0.js index ae7e45e01..d346124ee 100644 --- a/assets/js/73.39edf2a1.js +++ b/assets/js/73.bff5f5c0.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[73],{590:function(t,a,e){"use strict";e.r(a);var s=e(29),n=Object(s.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[e("a",{attrs:{href:"http://okfn.gr/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Open Knowledge Greece"),e("OutboundLink")],1),t._v(" was one of 2017’s "),e("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),e("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in R programming language. You can read more about this in "),e("a",{attrs:{href:"https://frictionlessdata.io/articles/open-knowledge-greece/",target:"_blank",rel:"noopener noreferrer"}},[t._v("their grantee profile"),e("OutboundLink")],1),t._v(". In this tutorial, "),e("a",{attrs:{href:"https://gr.linkedin.com/in/kleanthis-koupidis-8348b88b",target:"_blank",rel:"noopener noreferrer"}},[t._v("Kleanthis Koupidis"),e("OutboundLink")],1),t._v(", a Data Scientist and Statistician at Open Knowledge Greece, explains how to create Data Packages in R.")]),t._v(" "),e("h2",{attrs:{id:"creating-data-packages-in-r"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-data-packages-in-r"}},[t._v("#")]),t._v(" Creating Data Packages in R")]),t._v(" "),e("p",[t._v("This tutorial will show you how to install the R library for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),t._v(" "),e("h2",{attrs:{id:"load"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#load"}},[t._v("#")]),t._v(" Load")]),t._v(" "),e("p",[t._v("For this tutorial, we will need the Data Package R library ("),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-r",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage.r"),e("OutboundLink")],1),t._v(")."),e("br"),t._v("\nYou can start using the library by loading "),e("code",[t._v("datapackage.r")]),t._v(".")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" library"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("datapackage.r"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("You can add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required "),e("code",[t._v("name")]),t._v(" key as well as a human-readable "),e("code",[t._v("title")]),t._v(" key. For the keys supported, please consult the full "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package spec"),e("OutboundLink")],1),t._v(". Note, we will be creating the required "),e("code",[t._v("resources")]),t._v(" key further down below.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package.load"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'period-table'")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'title'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Periodic Table'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# commit the changes to Package class")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("commit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("## [1] TRUE")]),t._v("\n")])])]),e("h2",{attrs:{id:"infer-a-csv-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#infer-a-csv-schema"}},[t._v("#")]),t._v(" Infer a CSV Schema")]),t._v(" "),e("p",[t._v("We will use periodic-table data from "),e("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/datapackage-r/9eed05d1710fd69a0cb74f7941c7f142563f571b/vignettes/example_data/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("remote path"),e("OutboundLink")],1)]),t._v(" "),e("table",[e("thead",[e("tr",[e("th",[t._v("atomic.number")]),t._v(" "),e("th",[t._v("symbol")]),t._v(" "),e("th",[t._v("name")]),t._v(" "),e("th",[t._v("atomic.mass")]),t._v(" "),e("th",[t._v("metal.or.nonmetal.")])])]),t._v(" "),e("tbody",[e("tr",[e("td",[t._v("1")]),t._v(" "),e("td",[t._v("H")]),t._v(" "),e("td",[t._v("Hydrogen")]),t._v(" "),e("td",[t._v("1.00794")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("2")]),t._v(" "),e("td",[t._v("He")]),t._v(" "),e("td",[t._v("Helium")]),t._v(" "),e("td",[t._v("4.002602")]),t._v(" "),e("td",[t._v("noble gas")])]),t._v(" "),e("tr",[e("td",[t._v("3")]),t._v(" "),e("td",[t._v("Li")]),t._v(" "),e("td",[t._v("Lithium")]),t._v(" "),e("td",[t._v("6.941")]),t._v(" "),e("td",[t._v("alkali metal")])]),t._v(" "),e("tr",[e("td",[t._v("4")]),t._v(" "),e("td",[t._v("Be")]),t._v(" "),e("td",[t._v("Beryllium")]),t._v(" "),e("td",[t._v("9.012182")]),t._v(" "),e("td",[t._v("alkaline earth metal")])]),t._v(" "),e("tr",[e("td",[t._v("5")]),t._v(" "),e("td",[t._v("B")]),t._v(" "),e("td",[t._v("Boron")]),t._v(" "),e("td",[t._v("10.811")]),t._v(" "),e("td",[t._v("metalloid")])]),t._v(" "),e("tr",[e("td",[t._v("6")]),t._v(" "),e("td",[t._v("C")]),t._v(" "),e("td",[t._v("Carbon")]),t._v(" "),e("td",[t._v("12.0107")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("7")]),t._v(" "),e("td",[t._v("N")]),t._v(" "),e("td",[t._v("Nitrogen")]),t._v(" "),e("td",[t._v("14.0067")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("8")]),t._v(" "),e("td",[t._v("O")]),t._v(" "),e("td",[t._v("Oxygen")]),t._v(" "),e("td",[t._v("15.9994")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("9")]),t._v(" "),e("td",[t._v("F")]),t._v(" "),e("td",[t._v("Fluorine")]),t._v(" "),e("td",[t._v("18.9984032")]),t._v(" "),e("td",[t._v("halogen")])]),t._v(" "),e("tr",[e("td",[t._v("10")]),t._v(" "),e("td",[t._v("Ne")]),t._v(" "),e("td",[t._v("Neon")]),t._v(" "),e("td",[t._v("20.1797")]),t._v(" "),e("td",[t._v("noble gas")])])])]),t._v(" "),e("p",[t._v("We can guess our CSV’s "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("schema"),e("OutboundLink")],1),t._v(" by using "),e("code",[t._v("infer")]),t._v(" from the Table Schema library. We pass directly the remote link to the infer function, the result of which is an inferred schema. For example, if the processor detects only integers in a given column, it will assign "),e("code",[t._v("integer")]),t._v(" as a column type.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" filepath "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv'")]),t._v("\n\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" tableschema.r"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("::")]),t._v("infer"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Once we have a schema, we are now ready to add a "),e("code",[t._v("resource")]),t._v(" key to the Data Package which points to the resource path and its newly created schema. Below we define resources with three ways, using json text format with usual assignment operator in R list objects and directly using "),e("code",[t._v("addResource")]),t._v(" function of "),e("code",[t._v("Package")]),t._v(" class:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# define resources using json text")]),t._v("\n resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" helpers.from.json.to.list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n '"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"path"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"filepath"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"schema"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"schema"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("'\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# or define resources using list object")]),t._v("\n resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n name "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("And now, add resources to the Data Package:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'resources'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resources\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("commit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("## [1] TRUE")]),t._v("\n")])])]),e("p",[t._v("Or you can directly add resources using "),e("code",[t._v("addResources")]),t._v(" function of "),e("code",[t._v("Package")]),t._v(" class:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n name "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("addResource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Now we are ready to write our "),e("code",[t._v("datapackage.json")]),t._v(" file to the current working directory.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("save"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'example_data'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("The "),e("code",[t._v("datapackage.json")]),t._v(" ("),e("a",{attrs:{href:"https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/package.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("download"),e("OutboundLink")],1),t._v(") is inlined below. Note that atomic number has been correctly inferred as an "),e("code",[t._v("integer")]),t._v(" and atomic mass as a "),e("code",[t._v("number")]),t._v(" (float) while every other column is a "),e("code",[t._v("string")]),t._v(".")]),t._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[t._v(' jsonlite::prettify(helpers.from.list.to.json(dataPackage$descriptor))\n\n ## {\n ## "profile": "data-package",\n ## "name": "period-table",\n ## "title": "Periodic Table",\n ## "resources": [\n ## {\n ## "name": "data",\n ## "path": "https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv",\n ## "schema": {\n ## "fields": [\n ## {\n ## "name": "atomic number",\n ## "type": "integer",\n ## "format": "default"\n ## },\n ## {\n ## "name": "symbol",\n ## "type": "string",\n ## "format": "default"\n ## },\n ## {\n ## "name": "name",\n ## "type": "string",\n ## "format": "default"\n ## },\n ## {\n ## "name": "atomic mass",\n ## "type": "number",\n ## "format": "default"\n ## },\n ## {\n ## "name": "metal or nonmetal?",\n ## "type": "string",\n ## "format": "default"\n ## }\n ## ],\n ## "missingValues": [\n ## ""\n ## ]\n ## },\n ## "profile": "data-resource",\n ## "encoding": "utf-8"\n ## }\n ## ]\n ## }\n ##\n')])])]),e("h2",{attrs:{id:"publishing"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[t._v("#")]),t._v(" Publishing")]),t._v(" "),e("p",[t._v("Now that you have created your Data Package, you might want to "),e("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[t._v("publish your data online")]),t._v(" so that you can share it with others.")],1),t._v(" "),e("p",[t._v("Now that you have created a data package in R, "),e("RouterLink",{attrs:{to:"/blog/2018/02/14/using-data-packages-in-r/"}},[t._v("find out how to use data packages in R in this tutorial")]),t._v(".")],1)])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[73],{586:function(t,a,e){"use strict";e.r(a);var s=e(29),n=Object(s.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[e("a",{attrs:{href:"http://okfn.gr/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Open Knowledge Greece"),e("OutboundLink")],1),t._v(" was one of 2017’s "),e("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),e("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in R programming language. You can read more about this in "),e("a",{attrs:{href:"https://frictionlessdata.io/articles/open-knowledge-greece/",target:"_blank",rel:"noopener noreferrer"}},[t._v("their grantee profile"),e("OutboundLink")],1),t._v(". In this tutorial, "),e("a",{attrs:{href:"https://gr.linkedin.com/in/kleanthis-koupidis-8348b88b",target:"_blank",rel:"noopener noreferrer"}},[t._v("Kleanthis Koupidis"),e("OutboundLink")],1),t._v(", a Data Scientist and Statistician at Open Knowledge Greece, explains how to create Data Packages in R.")]),t._v(" "),e("h2",{attrs:{id:"creating-data-packages-in-r"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-data-packages-in-r"}},[t._v("#")]),t._v(" Creating Data Packages in R")]),t._v(" "),e("p",[t._v("This tutorial will show you how to install the R library for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),t._v(" "),e("h2",{attrs:{id:"load"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#load"}},[t._v("#")]),t._v(" Load")]),t._v(" "),e("p",[t._v("For this tutorial, we will need the Data Package R library ("),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-r",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage.r"),e("OutboundLink")],1),t._v(")."),e("br"),t._v("\nYou can start using the library by loading "),e("code",[t._v("datapackage.r")]),t._v(".")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" library"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("datapackage.r"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("You can add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required "),e("code",[t._v("name")]),t._v(" key as well as a human-readable "),e("code",[t._v("title")]),t._v(" key. For the keys supported, please consult the full "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package spec"),e("OutboundLink")],1),t._v(". Note, we will be creating the required "),e("code",[t._v("resources")]),t._v(" key further down below.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" Package.load"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'name'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'period-table'")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'title'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'Periodic Table'")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# commit the changes to Package class")]),t._v("\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("commit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("## [1] TRUE")]),t._v("\n")])])]),e("h2",{attrs:{id:"infer-a-csv-schema"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#infer-a-csv-schema"}},[t._v("#")]),t._v(" Infer a CSV Schema")]),t._v(" "),e("p",[t._v("We will use periodic-table data from "),e("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/datapackage-r/9eed05d1710fd69a0cb74f7941c7f142563f571b/vignettes/example_data/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("remote path"),e("OutboundLink")],1)]),t._v(" "),e("table",[e("thead",[e("tr",[e("th",[t._v("atomic.number")]),t._v(" "),e("th",[t._v("symbol")]),t._v(" "),e("th",[t._v("name")]),t._v(" "),e("th",[t._v("atomic.mass")]),t._v(" "),e("th",[t._v("metal.or.nonmetal.")])])]),t._v(" "),e("tbody",[e("tr",[e("td",[t._v("1")]),t._v(" "),e("td",[t._v("H")]),t._v(" "),e("td",[t._v("Hydrogen")]),t._v(" "),e("td",[t._v("1.00794")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("2")]),t._v(" "),e("td",[t._v("He")]),t._v(" "),e("td",[t._v("Helium")]),t._v(" "),e("td",[t._v("4.002602")]),t._v(" "),e("td",[t._v("noble gas")])]),t._v(" "),e("tr",[e("td",[t._v("3")]),t._v(" "),e("td",[t._v("Li")]),t._v(" "),e("td",[t._v("Lithium")]),t._v(" "),e("td",[t._v("6.941")]),t._v(" "),e("td",[t._v("alkali metal")])]),t._v(" "),e("tr",[e("td",[t._v("4")]),t._v(" "),e("td",[t._v("Be")]),t._v(" "),e("td",[t._v("Beryllium")]),t._v(" "),e("td",[t._v("9.012182")]),t._v(" "),e("td",[t._v("alkaline earth metal")])]),t._v(" "),e("tr",[e("td",[t._v("5")]),t._v(" "),e("td",[t._v("B")]),t._v(" "),e("td",[t._v("Boron")]),t._v(" "),e("td",[t._v("10.811")]),t._v(" "),e("td",[t._v("metalloid")])]),t._v(" "),e("tr",[e("td",[t._v("6")]),t._v(" "),e("td",[t._v("C")]),t._v(" "),e("td",[t._v("Carbon")]),t._v(" "),e("td",[t._v("12.0107")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("7")]),t._v(" "),e("td",[t._v("N")]),t._v(" "),e("td",[t._v("Nitrogen")]),t._v(" "),e("td",[t._v("14.0067")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("8")]),t._v(" "),e("td",[t._v("O")]),t._v(" "),e("td",[t._v("Oxygen")]),t._v(" "),e("td",[t._v("15.9994")]),t._v(" "),e("td",[t._v("nonmetal")])]),t._v(" "),e("tr",[e("td",[t._v("9")]),t._v(" "),e("td",[t._v("F")]),t._v(" "),e("td",[t._v("Fluorine")]),t._v(" "),e("td",[t._v("18.9984032")]),t._v(" "),e("td",[t._v("halogen")])]),t._v(" "),e("tr",[e("td",[t._v("10")]),t._v(" "),e("td",[t._v("Ne")]),t._v(" "),e("td",[t._v("Neon")]),t._v(" "),e("td",[t._v("20.1797")]),t._v(" "),e("td",[t._v("noble gas")])])])]),t._v(" "),e("p",[t._v("We can guess our CSV’s "),e("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("schema"),e("OutboundLink")],1),t._v(" by using "),e("code",[t._v("infer")]),t._v(" from the Table Schema library. We pass directly the remote link to the infer function, the result of which is an inferred schema. For example, if the processor detects only integers in a given column, it will assign "),e("code",[t._v("integer")]),t._v(" as a column type.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" filepath "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv'")]),t._v("\n\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" tableschema.r"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("::")]),t._v("infer"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Once we have a schema, we are now ready to add a "),e("code",[t._v("resource")]),t._v(" key to the Data Package which points to the resource path and its newly created schema. Below we define resources with three ways, using json text format with usual assignment operator in R list objects and directly using "),e("code",[t._v("addResource")]),t._v(" function of "),e("code",[t._v("Package")]),t._v(" class:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# define resources using json text")]),t._v("\n resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" helpers.from.json.to.list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n '"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"name"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"path"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"filepath"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"schema"')]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"schema"')]),t._v("\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("'\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# or define resources using list object")]),t._v("\n resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n name "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("And now, add resources to the Data Package:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("descriptor"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'resources'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resources\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("commit"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),e("span",{pre:!0,attrs:{class:"token comment"}},[t._v("## [1] TRUE")]),t._v("\n")])])]),e("p",[t._v("Or you can directly add resources using "),e("code",[t._v("addResources")]),t._v(" function of "),e("code",[t._v("Package")]),t._v(" class:")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" resources "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("list"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n name "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),e("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n path "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" filepath"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v("\n schema "),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" schema\n "),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("addResource"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("resources"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("Now we are ready to write our "),e("code",[t._v("datapackage.json")]),t._v(" file to the current working directory.")]),t._v(" "),e("div",{staticClass:"language-r extra-class"},[e("pre",{pre:!0,attrs:{class:"language-r"}},[e("code",[t._v(" dataPackage"),e("span",{pre:!0,attrs:{class:"token operator"}},[t._v("$")]),t._v("save"),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),e("span",{pre:!0,attrs:{class:"token string"}},[t._v("'example_data'")]),e("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),e("p",[t._v("The "),e("code",[t._v("datapackage.json")]),t._v(" ("),e("a",{attrs:{href:"https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/package.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("download"),e("OutboundLink")],1),t._v(") is inlined below. Note that atomic number has been correctly inferred as an "),e("code",[t._v("integer")]),t._v(" and atomic mass as a "),e("code",[t._v("number")]),t._v(" (float) while every other column is a "),e("code",[t._v("string")]),t._v(".")]),t._v(" "),e("div",{staticClass:"language- extra-class"},[e("pre",{pre:!0,attrs:{class:"language-text"}},[e("code",[t._v(' jsonlite::prettify(helpers.from.list.to.json(dataPackage$descriptor))\n\n ## {\n ## "profile": "data-package",\n ## "name": "period-table",\n ## "title": "Periodic Table",\n ## "resources": [\n ## {\n ## "name": "data",\n ## "path": "https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv",\n ## "schema": {\n ## "fields": [\n ## {\n ## "name": "atomic number",\n ## "type": "integer",\n ## "format": "default"\n ## },\n ## {\n ## "name": "symbol",\n ## "type": "string",\n ## "format": "default"\n ## },\n ## {\n ## "name": "name",\n ## "type": "string",\n ## "format": "default"\n ## },\n ## {\n ## "name": "atomic mass",\n ## "type": "number",\n ## "format": "default"\n ## },\n ## {\n ## "name": "metal or nonmetal?",\n ## "type": "string",\n ## "format": "default"\n ## }\n ## ],\n ## "missingValues": [\n ## ""\n ## ]\n ## },\n ## "profile": "data-resource",\n ## "encoding": "utf-8"\n ## }\n ## ]\n ## }\n ##\n')])])]),e("h2",{attrs:{id:"publishing"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[t._v("#")]),t._v(" Publishing")]),t._v(" "),e("p",[t._v("Now that you have created your Data Package, you might want to "),e("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[t._v("publish your data online")]),t._v(" so that you can share it with others.")],1),t._v(" "),e("p",[t._v("Now that you have created a data package in R, "),e("RouterLink",{attrs:{to:"/blog/2018/02/14/using-data-packages-in-r/"}},[t._v("find out how to use data packages in R in this tutorial")]),t._v(".")],1)])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/75.591b6129.js b/assets/js/75.ec5d7d14.js similarity index 99% rename from assets/js/75.591b6129.js rename to assets/js/75.ec5d7d14.js index 893fc5e1d..9ef1385ae 100644 --- a/assets/js/75.591b6129.js +++ b/assets/js/75.ec5d7d14.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[75],{587:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Daniel Fireman was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in Go programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/11/01/daniel-fireman/"}},[t._v("his grantee profile")]),t._v(". In this post, Fireman will show you how to install and use the "),s("a",{attrs:{href:"http://golang.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Go"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")],1),t._v(" "),s("p",[t._v("Our goal in this tutorial is to load a data package from the web and read its metadata and contents.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("For this tutorial, we will need the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-go"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-go"),s("OutboundLink")],1),t._v(" packages, which provide all the functionality to deal with a Data Package’s metadata and its contents.")]),t._v(" "),s("p",[t._v("We are going to use the "),s("a",{attrs:{href:"https://golang.github.io/dep/",target:"_blank",rel:"noopener noreferrer"}},[t._v("dep tool"),s("OutboundLink")],1),t._v(" to manage the dependencies of our new project:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("cd")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token variable"}},[t._v("$GOPATH")]),t._v("/src/newdataproj\n$ dep init\n")])])]),s("h2",{attrs:{id:"the-periodic-table-data-package"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-periodic-table-data-package"}},[t._v("#")]),t._v(" The Periodic Table Data Package")]),t._v(" "),s("p",[t._v("A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("p",[t._v("In this tutorial, we are using a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),s("OutboundLink")],1),t._v(" containing the periodic table. The package descriptor ("),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage.json"),s("OutboundLink")],1),t._v(") and contents ("),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("data.csv"),s("OutboundLink")],1),t._v(") are stored on GitHub. This dataset includes the atomic number, symbol, element name, atomic mass, and the metallicity of the element. Here are the header and the first three rows:")]),t._v(" "),s("table",[s("thead",[s("tr",[s("th",[t._v("atomic number")]),t._v(" "),s("th",[t._v("symbol")]),t._v(" "),s("th",[t._v("name")]),t._v(" "),s("th",[t._v("atomic mass")]),t._v(" "),s("th",[t._v("metal or nonmetal?")])])]),t._v(" "),s("tbody",[s("tr",[s("td",[t._v("1")]),t._v(" "),s("td",[t._v("H")]),t._v(" "),s("td",[t._v("Hydrogen")]),t._v(" "),s("td",[t._v("1.00794")]),t._v(" "),s("td",[t._v("nonmetal")])]),t._v(" "),s("tr",[s("td",[t._v("2")]),t._v(" "),s("td",[t._v("He")]),t._v(" "),s("td",[t._v("Helium")]),t._v(" "),s("td",[t._v("4.002602")]),t._v(" "),s("td",[t._v("noble gas")])]),t._v(" "),s("tr",[s("td",[t._v("3")]),t._v(" "),s("td",[t._v("Li")]),t._v(" "),s("td",[t._v("Lithium")]),t._v(" "),s("td",[t._v("6.941")]),t._v(" "),s("td",[t._v("alkali metal")])])])]),t._v(" "),s("h2",{attrs:{id:"inspecting-package-metadata"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inspecting-package-metadata"}},[t._v("#")]),t._v(" Inspecting Package Metadata")]),t._v(" "),s("p",[t._v("Let’s start off by creating the "),s("code",[t._v("main.go")]),t._v(", which loads the data package and inspects some of its metadata.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("package")]),t._v(" main\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt"')]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/datapackage-go/datapackage"')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" err "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" err "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("!=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("nil")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("panic")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("err"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package loaded successfully."')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("Before running the code, you need to tell the dep tool to update our project dependencies. Don’t worry; you won’t need to do it again in this tutorial.")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ dep ensure\n$ go run main.go\nPackage loaded successfully.\n")])])]),s("p",[t._v("Now that you have loaded the periodic table Data Package, you have access to its "),s("code",[t._v("title")]),t._v(" and "),s("code",[t._v("name")]),t._v(" fields through the "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Package.Descriptor",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package.Descriptor() function"),s("OutboundLink")],1),t._v(". To do so, let’s change our main function to (omitting error handling for the sake of brevity, but we know it is "),s("em",[t._v("very")]),t._v(" important):")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Name:"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Descriptor")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"name"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Title:"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Descriptor")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"title"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("And rerun the program:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\nName: period-table\nTitle: Periodic Table\n")])])]),s("p",[t._v("And as you can see, the printed fields match the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("package descriptor"),s("OutboundLink")],1),t._v(". For more information about the Data Package structure, please take a look at the "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("specification"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h2",{attrs:{id:"quick-look-at-the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#quick-look-at-the-data"}},[t._v("#")]),t._v(" Quick Look At the Data")]),t._v(" "),s("p",[t._v("Now that you have loaded your Data Package, it is time to process its contents. The package content consists of one or more resources. You can access "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resources"),s("OutboundLink")],1),t._v(" via the "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Package.GetResource()",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package.GetResource()"),s("OutboundLink")],1),t._v(" method. Let’s print the periodic table "),s("code",[t._v("data")]),t._v(" resource contents.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n res "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" res"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("ReadAll")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("range")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("atomic number symbol name atomic mass metal or nonmetal?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v(" H Hydrogen "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.00794")]),t._v(" nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" He Helium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.002602")]),t._v(" noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" Li Lithium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("6.941")]),t._v(" alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),t._v(" Be Beryllium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("9.012182")]),t._v(" alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("..")]),t._v(".\n")])])]),s("p",[t._v("The "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource.ReadAll",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource.ReadAll()"),s("OutboundLink")],1),t._v(" method loads the whole table in memory as raw strings and returns it as a Go "),s("code",[t._v("[][]string")]),t._v(". This can be quick useful to take a quick look or perform a visual sanity check at the data.")]),t._v(" "),s("h2",{attrs:{id:"processing-the-data-package-s-content"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#processing-the-data-package-s-content"}},[t._v("#")]),t._v(" Processing the Data Package’s Content")]),t._v(" "),s("p",[t._v("Even though the string representation can be useful for a quick sanity check, you probably want to use actual language types to process the data. Don’t worry, you won’t need to fight the casting battle yourself. Data Package Go libraries provide a rich set of methods to deal with data loading in a very idiomatic way (very similar to "),s("a",{attrs:{href:"https://golang.org/pkg/encoding/json/",target:"_blank",rel:"noopener noreferrer"}},[t._v("encoding/json"),s("OutboundLink")],1),t._v(").")]),t._v(" "),s("p",[t._v("As an example, let’s change our "),s("code",[t._v("main")]),t._v(" function to use actual types to store the periodic table and print the elements with atomic mass smaller than 10.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("package")]),t._v(" main\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt"')]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/datapackage-go/datapackage"')]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/tableschema-go/csv"')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("type")]),t._v(" element "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("struct")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n Number "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("int")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"atomic number"`')]),t._v("\n Symbol "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"symbol"`')]),t._v("\n Name "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"name"`')]),t._v("\n Mass "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("float64")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"atomic mass"`')]),t._v("\n Metal "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"metal or nonmetal?"`')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("var")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("element\n resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Cast")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("&")]),t._v("elements"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" csv"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("LoadHeaders")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("range")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Mass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Printf")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%+v\\n"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:1 Symbol:H Name:Hydrogen Mass:1.00794 Metal:nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:2 Symbol:He Name:Helium Mass:4.002602 Metal:noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:3 Symbol:Li Name:Lithium Mass:6.941 Metal:alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:4 Symbol:Be Name:Beryllium Mass:9.012182 Metal:alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("In the example above, all rows in the table are loaded into memory. Then every row is parsed into an "),s("code",[t._v("element")]),t._v(" object and appended to the slice. The "),s("code",[t._v("resource.Cast")]),t._v(" call returns an error if the whole table cannot be successfully parsed.")]),t._v(" "),s("p",[t._v("If you don’t want to load all data in memory at once, you can lazily access each row using "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource.Iter",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource.Iter"),s("OutboundLink")],1),t._v(" and use "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/tableschema-go/schema#Schema.CastRow",target:"_blank",rel:"noopener noreferrer"}},[t._v("Schema.CastRow"),s("OutboundLink")],1),t._v(" to cast each row into an "),s("code",[t._v("element")]),t._v(" object. That would change our main function to:")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("csv"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("LoadHeaders")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n sch"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("var")]),t._v(" e element\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n sch"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("CastRow")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Row")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("&")]),t._v("e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Mass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Printf")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%+v\\n"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:1 Symbol:H Name:Hydrogen Mass:1.00794 Metal:nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:2 Symbol:He Name:Helium Mass:4.002602 Metal:noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:3 Symbol:Li Name:Lithium Mass:6.941 Metal:alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:4 Symbol:Be Name:Beryllium Mass:9.012182 Metal:alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("And our code is ready to deal with the growth of the periodic table in a very memory-efficient way 😃")]),t._v(" "),s("p",[t._v("We welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the datapackage-go repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[75],{588:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Daniel Fireman was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in Go programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/11/01/daniel-fireman/"}},[t._v("his grantee profile")]),t._v(". In this post, Fireman will show you how to install and use the "),s("a",{attrs:{href:"http://golang.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Go"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")],1),t._v(" "),s("p",[t._v("Our goal in this tutorial is to load a data package from the web and read its metadata and contents.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("For this tutorial, we will need the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-go"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-go"),s("OutboundLink")],1),t._v(" packages, which provide all the functionality to deal with a Data Package’s metadata and its contents.")]),t._v(" "),s("p",[t._v("We are going to use the "),s("a",{attrs:{href:"https://golang.github.io/dep/",target:"_blank",rel:"noopener noreferrer"}},[t._v("dep tool"),s("OutboundLink")],1),t._v(" to manage the dependencies of our new project:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("cd")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token variable"}},[t._v("$GOPATH")]),t._v("/src/newdataproj\n$ dep init\n")])])]),s("h2",{attrs:{id:"the-periodic-table-data-package"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-periodic-table-data-package"}},[t._v("#")]),t._v(" The Periodic Table Data Package")]),t._v(" "),s("p",[t._v("A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("p",[t._v("In this tutorial, we are using a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),s("OutboundLink")],1),t._v(" containing the periodic table. The package descriptor ("),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage.json"),s("OutboundLink")],1),t._v(") and contents ("),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("data.csv"),s("OutboundLink")],1),t._v(") are stored on GitHub. This dataset includes the atomic number, symbol, element name, atomic mass, and the metallicity of the element. Here are the header and the first three rows:")]),t._v(" "),s("table",[s("thead",[s("tr",[s("th",[t._v("atomic number")]),t._v(" "),s("th",[t._v("symbol")]),t._v(" "),s("th",[t._v("name")]),t._v(" "),s("th",[t._v("atomic mass")]),t._v(" "),s("th",[t._v("metal or nonmetal?")])])]),t._v(" "),s("tbody",[s("tr",[s("td",[t._v("1")]),t._v(" "),s("td",[t._v("H")]),t._v(" "),s("td",[t._v("Hydrogen")]),t._v(" "),s("td",[t._v("1.00794")]),t._v(" "),s("td",[t._v("nonmetal")])]),t._v(" "),s("tr",[s("td",[t._v("2")]),t._v(" "),s("td",[t._v("He")]),t._v(" "),s("td",[t._v("Helium")]),t._v(" "),s("td",[t._v("4.002602")]),t._v(" "),s("td",[t._v("noble gas")])]),t._v(" "),s("tr",[s("td",[t._v("3")]),t._v(" "),s("td",[t._v("Li")]),t._v(" "),s("td",[t._v("Lithium")]),t._v(" "),s("td",[t._v("6.941")]),t._v(" "),s("td",[t._v("alkali metal")])])])]),t._v(" "),s("h2",{attrs:{id:"inspecting-package-metadata"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inspecting-package-metadata"}},[t._v("#")]),t._v(" Inspecting Package Metadata")]),t._v(" "),s("p",[t._v("Let’s start off by creating the "),s("code",[t._v("main.go")]),t._v(", which loads the data package and inspects some of its metadata.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("package")]),t._v(" main\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt"')]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/datapackage-go/datapackage"')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" err "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" err "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("!=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("nil")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("panic")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("err"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package loaded successfully."')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("Before running the code, you need to tell the dep tool to update our project dependencies. Don’t worry; you won’t need to do it again in this tutorial.")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ dep ensure\n$ go run main.go\nPackage loaded successfully.\n")])])]),s("p",[t._v("Now that you have loaded the periodic table Data Package, you have access to its "),s("code",[t._v("title")]),t._v(" and "),s("code",[t._v("name")]),t._v(" fields through the "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Package.Descriptor",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package.Descriptor() function"),s("OutboundLink")],1),t._v(". To do so, let’s change our main function to (omitting error handling for the sake of brevity, but we know it is "),s("em",[t._v("very")]),t._v(" important):")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Name:"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Descriptor")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"name"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Title:"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Descriptor")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"title"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("And rerun the program:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\nName: period-table\nTitle: Periodic Table\n")])])]),s("p",[t._v("And as you can see, the printed fields match the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("package descriptor"),s("OutboundLink")],1),t._v(". For more information about the Data Package structure, please take a look at the "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("specification"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h2",{attrs:{id:"quick-look-at-the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#quick-look-at-the-data"}},[t._v("#")]),t._v(" Quick Look At the Data")]),t._v(" "),s("p",[t._v("Now that you have loaded your Data Package, it is time to process its contents. The package content consists of one or more resources. You can access "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resources"),s("OutboundLink")],1),t._v(" via the "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Package.GetResource()",target:"_blank",rel:"noopener noreferrer"}},[t._v("Package.GetResource()"),s("OutboundLink")],1),t._v(" method. Let’s print the periodic table "),s("code",[t._v("data")]),t._v(" resource contents.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n res "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" res"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("ReadAll")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("range")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Println")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("atomic number symbol name atomic mass metal or nonmetal?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v(" H Hydrogen "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.00794")]),t._v(" nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" He Helium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.002602")]),t._v(" noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" Li Lithium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("6.941")]),t._v(" alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),t._v(" Be Beryllium "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("9.012182")]),t._v(" alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("..")]),t._v(".\n")])])]),s("p",[t._v("The "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource.ReadAll",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource.ReadAll()"),s("OutboundLink")],1),t._v(" method loads the whole table in memory as raw strings and returns it as a Go "),s("code",[t._v("[][]string")]),t._v(". This can be quick useful to take a quick look or perform a visual sanity check at the data.")]),t._v(" "),s("h2",{attrs:{id:"processing-the-data-package-s-content"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#processing-the-data-package-s-content"}},[t._v("#")]),t._v(" Processing the Data Package’s Content")]),t._v(" "),s("p",[t._v("Even though the string representation can be useful for a quick sanity check, you probably want to use actual language types to process the data. Don’t worry, you won’t need to fight the casting battle yourself. Data Package Go libraries provide a rich set of methods to deal with data loading in a very idiomatic way (very similar to "),s("a",{attrs:{href:"https://golang.org/pkg/encoding/json/",target:"_blank",rel:"noopener noreferrer"}},[t._v("encoding/json"),s("OutboundLink")],1),t._v(").")]),t._v(" "),s("p",[t._v("As an example, let’s change our "),s("code",[t._v("main")]),t._v(" function to use actual types to store the periodic table and print the elements with atomic mass smaller than 10.")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("package")]),t._v(" main\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("import")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"fmt"')]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/datapackage-go/datapackage"')]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"github.com/frictionlessdata/tableschema-go/csv"')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("type")]),t._v(" element "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("struct")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n Number "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("int")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"atomic number"`')]),t._v("\n Symbol "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"symbol"`')]),t._v("\n Name "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"name"`')]),t._v("\n Mass "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("float64")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"atomic mass"`')]),t._v("\n Metal "),s("span",{pre:!0,attrs:{class:"token builtin"}},[t._v("string")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('`tableheader:"metal or nonmetal?"`')]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("var")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("element\n resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Cast")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("&")]),t._v("elements"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" csv"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("LoadHeaders")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("range")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Mass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Printf")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%+v\\n"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:1 Symbol:H Name:Hydrogen Mass:1.00794 Metal:nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:2 Symbol:He Name:Helium Mass:4.002602 Metal:noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:3 Symbol:Li Name:Lithium Mass:6.941 Metal:alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:4 Symbol:Be Name:Beryllium Mass:9.012182 Metal:alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("In the example above, all rows in the table are loaded into memory. Then every row is parsed into an "),s("code",[t._v("element")]),t._v(" object and appended to the slice. The "),s("code",[t._v("resource.Cast")]),t._v(" call returns an error if the whole table cannot be successfully parsed.")]),t._v(" "),s("p",[t._v("If you don’t want to load all data in memory at once, you can lazily access each row using "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/datapackage-go/datapackage#Resource.Iter",target:"_blank",rel:"noopener noreferrer"}},[t._v("Resource.Iter"),s("OutboundLink")],1),t._v(" and use "),s("a",{attrs:{href:"https://godoc.org/github.com/frictionlessdata/tableschema-go/schema#Schema.CastRow",target:"_blank",rel:"noopener noreferrer"}},[t._v("Schema.CastRow"),s("OutboundLink")],1),t._v(" to cast each row into an "),s("code",[t._v("element")]),t._v(" object. That would change our main function to:")]),t._v(" "),s("div",{staticClass:"language-go extra-class"},[s("pre",{pre:!0,attrs:{class:"language-go"}},[s("code",[s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("func")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("main")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" datapackage"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Load")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("csv"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("LoadHeaders")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n sch"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("_")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(":=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("GetSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("var")]),t._v(" e element\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("for")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n sch"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("CastRow")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Row")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("&")]),t._v("e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),t._v("Mass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n fmt"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("Printf")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"%+v\\n"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ go run main.go\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:1 Symbol:H Name:Hydrogen Mass:1.00794 Metal:nonmetal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:2 Symbol:He Name:Helium Mass:4.002602 Metal:noble gas"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:3 Symbol:Li Name:Lithium Mass:6.941 Metal:alkali metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("Number:4 Symbol:Be Name:Beryllium Mass:9.012182 Metal:alkaline earth metal"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("And our code is ready to deal with the growth of the periodic table in a very memory-efficient way 😃")]),t._v(" "),s("p",[t._v("We welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the datapackage-go repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/76.8fa5c677.js b/assets/js/76.8a93fa60.js similarity index 99% rename from assets/js/76.8fa5c677.js rename to assets/js/76.8a93fa60.js index 07fdd0342..85dd0689c 100644 --- a/assets/js/76.8fa5c677.js +++ b/assets/js/76.8a93fa60.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[76],{592:function(e,a,t){"use strict";t.r(a);var n=t(29),i=Object(n.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Applying licenses, waivers or public domain marks to "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data resources"),t("OutboundLink")],1),e._v(" helps people understand how they can use, modify and share the contents of a data package.")]),e._v(" "),t("p",[e._v("It is recommended to that you apply a license, waiver or public domain mark to a data package using the "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[t("code",[e._v("licenses")]),t("OutboundLink")],1),e._v(" property. The value assigned to the data package "),t("code",[e._v("licenses")]),e._v(" property applies to all the data, files and metadata in the data package unless specified otherwise.")]),e._v(" "),t("p",[e._v("You can optionally apply a license to a data resource. This allows a license that differs from the data package license to be applied to the data resource. If the data resource "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#optional-properties",target:"_blank",rel:"noopener noreferrer"}},[t("code",[e._v("licenses")]),t("OutboundLink")],1),e._v(" property is not specified, it inherits the data package "),t("code",[e._v("licenses")]),e._v(".")]),e._v(" "),t("h2",{attrs:{id:"specifying-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specifying-a-license"}},[e._v("#")]),e._v(" Specifying a license")]),e._v(" "),t("p",[e._v("The Frictionless Data specification states that a "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[e._v("license"),t("OutboundLink")],1),e._v(" must contain a "),t("code",[e._v("name")]),e._v(" property and/or a "),t("code",[e._v("path")]),e._v(" property, and may contain a "),t("code",[e._v("title")]),e._v(" property.")]),e._v(" "),t("blockquote",[t("ul",[t("li",[t("code",[e._v("name")]),e._v(": The name MUST be an "),t("a",{attrs:{href:"http://licenses.opendefinition.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition license ID"),t("OutboundLink")],1)]),e._v(" "),t("li",[t("code",[e._v("path")]),e._v(": A "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#url-or-path",target:"_blank",rel:"noopener noreferrer"}},[e._v("url-or-path"),t("OutboundLink")],1),e._v(" string, that is a fully qualified HTTP address, or a relative POSIX path")]),e._v(" "),t("li",[t("code",[e._v("title")]),e._v(": A human-readable title")])])]),e._v(" "),t("p",[e._v("You can specify the location of a license using a URL or a Path.")]),e._v(" "),t("h3",{attrs:{id:"specify-a-license-using-a-url"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specify-a-license-using-a-url"}},[e._v("#")]),e._v(" Specify a license using a URL")]),e._v(" "),t("p",[e._v("To specify a license using a URL, use the fully qualified HTTP address as the value in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "https://cdla.io/sharing-1-0/",\n "title": "Community Data License Agreement – Sharing, Version 1.0"\n}]\n')])])]),t("h3",{attrs:{id:"specify-a-license-using-a-path"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specify-a-license-using-a-path"}},[e._v("#")]),e._v(" Specify a license using a Path")]),e._v(" "),t("p",[e._v("To specify a license using a path, use a relative POSIX path to the file in the data package as the value in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "LICENSE.pdf"\n}]\n')])])]),t("p",[e._v("In this example, LICENSE.pdf would be in the root of the data package folder, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("folder\n |- datapackage.json\n |- LICENSE.pdf\n |- README.md\n |- data\n |- data.csv\n |- reference-data.csv\n\n")])])]),t("p",[e._v("It is recommended that the licence is provided in "),t("a",{attrs:{href:"http://commonmark.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("markdown"),t("OutboundLink")],1),e._v(" format to simplify its display in data platforms and other software.")]),e._v(" "),t("p",[e._v("The license can be a separate file or included in the "),t("code",[e._v("README.md")]),e._v(" file. If license information is included in the "),t("code",[e._v("README.md")]),e._v(" file, it is recommended that it follows the "),t("RouterLink",{attrs:{to:"/blog/2016/04/20/publish-faq/#readme"}},[e._v("guide for formatting a README file")]),e._v(".")],1),e._v(" "),t("h2",{attrs:{id:"applying-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#applying-a-license"}},[e._v("#")]),e._v(" Applying a license")]),e._v(" "),t("p",[e._v("These scenarios apply to either the data package or a data resource.")]),e._v(" "),t("ol",[t("li",[t("a",{attrs:{href:"#apply-an-open-license"}},[e._v("Apply an open license")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-non-open-license"}},[e._v("Apply a non-open license")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-waiver"}},[e._v("Apply a waiver")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-public-domain-mark"}},[e._v("Apply a public domain mark")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#do-not-apply-a-license"}},[e._v("Do not apply a license")])])]),e._v(" "),t("p",[e._v("Other considerations:")]),e._v(" "),t("ul",[t("li",[t("a",{attrs:{href:"#provide-additional-license-information"}},[e._v("Provide additional license information")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#copyright-belongs-to-multiple-parties"}},[e._v("Copyright belongs to multiple parties")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#license-may-become-legally-binding"}},[e._v("License may become legally binding")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#software-may-not-fully-support-the-frictionless-data-specification"}},[e._v("Software may not fully support the Frictionless Data specification")])])]),e._v(" "),t("h3",{attrs:{id:"apply-an-open-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-an-open-license"}},[e._v("#")]),e._v(" Apply an open license")]),e._v(" "),t("p",[e._v("For an "),t("a",{attrs:{href:"http://opendefinition.org/licenses/",target:"_blank",rel:"noopener noreferrer"}},[e._v("open license"),t("OutboundLink")],1),e._v(", use "),t("code",[e._v("name")]),e._v(", "),t("code",[e._v("path")]),e._v(" and "),t("code",[e._v("title")]),e._v(", e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "CC-BY-4.0",\n "path": "https://creativecommons.org/licenses/by/4.0/",\n "title": "Creative Commons Attribution 4.0"\n}]\n')])])]),t("p",[t("code",[e._v("name")]),e._v(" must be an "),t("a",{attrs:{href:"http://licenses.opendefinition.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition license ID"),t("OutboundLink")],1),e._v(" however note that some license IDs are placeholders or have been retired and should not be used, e.g. "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-at.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-at"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-open.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-open"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-pd.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-pd"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/notspecified.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("notspecified"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/ukcrown-withrights.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("ukcrown-withrights"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-non-open-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-non-open-license"}},[e._v("#")]),e._v(" Apply a non-open license")]),e._v(" "),t("p",[e._v("To apply an non-open license, use the "),t("code",[e._v("path")]),e._v(" and optionally the "),t("code",[e._v("title")]),e._v(" properties. It is preferred that the license is published at a URL (a fully qualified HTTP address), e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "https://creativecommons.org/licenses/by-nc-nd/4.0/",\n "title": "Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)"\n}]\n')])])]),t("p",[e._v("If the license is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify a license using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-waiver"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-waiver"}},[e._v("#")]),e._v(" Apply a waiver")]),e._v(" "),t("p",[e._v("You can indicate that copyright has been waived by referencing a waiver at a URL in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "CC0-1.0"\n "path": "https://creativecommons.org/publicdomain/zero/1.0/",\n "title": "CC0 1.0"\n}]\n')])])]),t("p",[e._v("If the waiver is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify a waiver using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-public-domain-mark"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-public-domain-mark"}},[e._v("#")]),e._v(" Apply a public domain mark")]),e._v(" "),t("p",[e._v("You can indicate that there is no copyright in the data or that copyright has expired, using the "),t("a",{attrs:{href:"https://creativecommons.org/share-your-work/public-domain/pdm/",target:"_blank",rel:"noopener noreferrer"}},[e._v("public domain mark"),t("OutboundLink")],1),e._v(" or other public domain dedications, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "http://creativecommons.org/publicdomain/mark/1.0/",\n "title": "Public Domain Mark"\n}]\n')])])]),t("p",[e._v("If the public domain dedication is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify the public domain dedication using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"do-not-apply-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#do-not-apply-a-license"}},[e._v("#")]),e._v(" Do not apply a license")]),e._v(" "),t("p",[e._v("If you have not decided what license to apply but still want to publish the data package, describe the situation in a file in the data package, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "README.md"\n}]\n')])])]),t("h2",{attrs:{id:"other-considerations"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#other-considerations"}},[e._v("#")]),e._v(" Other considerations")]),e._v(" "),t("h3",{attrs:{id:"provide-additional-license-information"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#provide-additional-license-information"}},[e._v("#")]),e._v(" Provide additional license information")]),e._v(" "),t("p",[e._v("It can be helpful to data consumers to provide additional copyright or attribution information such as:")]),e._v(" "),t("ul",[t("li",[e._v("copyright notice - this allows a data publisher to specify a short copyright notice")]),e._v(" "),t("li",[e._v("copyright statement URL - a URL to a copyright statement")]),e._v(" "),t("li",[e._v("preferred attribution text - the text to be used when attributing the creator(s) of the data")]),e._v(" "),t("li",[e._v("attribution URL - a URL to be used when building an attribution link")])]),e._v(" "),t("p",[e._v("This is explained in the ODI "),t("a",{attrs:{href:"https://theodi.org/guides/publishers-guide-to-the-open-data-rights-statement-vocabulary",target:"_blank",rel:"noopener noreferrer"}},[e._v("Publisher’s Guide to the Open Data Rights Statement Vocabulary"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://theodi.org/guides/odrs-reusers-guide",target:"_blank",rel:"noopener noreferrer"}},[e._v("Re-users Guide to the Open Data Rights Statement Vocabulary"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Some licenses require that data consumers provide the copyright notice in the attribution (e.g. "),t("a",{attrs:{href:"https://creativecommons.org/licenses/by/4.0/legalcode#s3",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC BY 4.0 Section 3"),t("OutboundLink")],1),e._v(").")]),e._v(" "),t("p",[e._v("Some data publishers may waive some of their rights under a license, e.g.")]),e._v(" "),t("blockquote",[t("p",[t("a",{attrs:{href:"https://data.gov.au/dataset/noosa-wedding-locations",target:"_blank",rel:"noopener noreferrer"}},[e._v("Noosa Wedding Locations"),t("OutboundLink")],1),e._v(" data by "),t("a",{attrs:{href:"https://www.noosa.qld.gov.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Noosa Shire Council"),t("OutboundLink")],1),e._v(" is licensed under a "),t("a",{attrs:{href:"https://creativecommons.org/licenses/by/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Creative Commons Attribution 4.0"),t("OutboundLink")],1),e._v(" licence."),t("br"),e._v("\nNoosa Shire Council waives the requirements of attribution under this licence, for this data.")])]),e._v(" "),t("p",[e._v("You can include this information, either:")]),e._v(" "),t("ul",[t("li",[e._v("in the file containing license information (e.g. "),t("code",[e._v("README.md")]),e._v(")")]),e._v(" "),t("li",[e._v("as additional metadata properties in the datapackage.json")])]),e._v(" "),t("p",[e._v("The data package specification supports adding "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#descriptor",target:"_blank",rel:"noopener noreferrer"}},[e._v("additional metadata properties"),t("OutboundLink")],1),e._v(" to the datapackage.json, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('{\n "name" : "coastal-data-system-near-real-time-wave-data",\n "title" : "Coastal Data System – Near real time wave data",\n "licenses" : [{\n "name": "CC-BY-4.0",\n "path": "https://creativecommons.org/licenses/by/4.0/",\n "title": "Creative Commons Attribution 4.0"\n }],\n "copyrightNotice": "© The State of Queensland 1995–2017",\n "copyrightStatement": "https://www.qld.gov.au/legal/copyright",\n "attributionText": "Science, Information Technology and Innovation, Queensland Government, Coastal Data System – Near real time wave data, licensed under Creative Commons Attribution 4.0 sourced on 26 December 2017",\n "resources": [\n {\n "path": "https://data.qld.gov.au/dataset/coastal-data-system-near-real-time-wave-data",\n ...\n }\n ]\n}\n')])])]),t("h3",{attrs:{id:"copyright-belongs-to-multiple-parties"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#copyright-belongs-to-multiple-parties"}},[e._v("#")]),e._v(" Copyright belongs to multiple parties")]),e._v(" "),t("p",[e._v("Sometimes data in a resource may be combined from multiple sources that are licensed in different ways. You can indicate this by placing two or more licenses in the "),t("code",[e._v("licenses")]),e._v(" property. Further explanation should be given in the "),t("code",[e._v("README.md")]),e._v(".")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "PDDL-1.0",\n "path": "http://opendatacommons.org/licenses/pddl/",\n "title": "Open Data Commons Public Domain Dedication and License v1.0"\n },\n {\n "name": "CC-BY-SA-4.0",\n "path": "https://creativecommons.org/licenses/by-sa/4.0/",\n "title": "Creative Commons Attribution Share-Alike 4.0"\n }]\n')])])]),t("h3",{attrs:{id:"license-may-become-legally-binding"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#license-may-become-legally-binding"}},[e._v("#")]),e._v(" License may become legally binding")]),e._v(" "),t("p",[e._v("The "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[e._v("specification"),t("OutboundLink")],1),e._v(" for "),t("code",[e._v("licenses")]),e._v(" states:")]),e._v(" "),t("blockquote",[t("p",[t("strong",[e._v("This property is not legally binding and does not guarantee the package is licensed under the terms defined in this property.")])])]),e._v(" "),t("p",[e._v("A data package may be uploaded to a data platform and the "),t("code",[e._v("licenses")]),e._v(" applied to the data resources may be publicly displayed. This may make, or give the perception that, the license is legally binding. Please check your specific situation before publishing the data.")]),e._v(" "),t("h3",{attrs:{id:"software-may-not-fully-support-the-frictionless-data-specification"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#software-may-not-fully-support-the-frictionless-data-specification"}},[e._v("#")]),e._v(" Software may not fully support the Frictionless Data specification")]),e._v(" "),t("p",[e._v("Be aware that some data platforms or software may not fully support the Frictionless Data specification. This may result in license information being lost or other issues. Always test your data publication to ensure you communicate the correct license information.")]),e._v(" "),t("p",[e._v("For example, at the time of writing:")]),e._v(" "),t("ul",[t("li",[t("p",[t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN Data Package extension"),t("OutboundLink")],1),e._v(":")]),e._v(" "),t("ul",[t("li",[e._v("does not upload the "),t("code",[e._v("README.md")]),e._v(" file in a data package. If you have described licence information in the "),t("code",[e._v("README.md")]),e._v(" file, this will be lost ("),t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager/issues/60",target:"_blank",rel:"noopener noreferrer"}},[e._v("issue #60"),t("OutboundLink")],1),e._v(")")]),e._v(" "),t("li",[e._v("does not display license information in the datapackage.json file correctly ("),t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager/issues/62",target:"_blank",rel:"noopener noreferrer"}},[e._v("issue #62"),t("OutboundLink")],1),e._v(")")])])]),e._v(" "),t("li",[t("p",[t("RouterLink",{attrs:{to:"/blog/2019/03/01/datacurator/"}},[e._v("Data Curator")]),e._v(" only allows the user to select from a limited set of open licenses to describe the data package and data resource licenses.")],1)])])])}),[],!1,null,null,null);a.default=i.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[76],{593:function(e,a,t){"use strict";t.r(a);var n=t(29),i=Object(n.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("Applying licenses, waivers or public domain marks to "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data packages"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/",target:"_blank",rel:"noopener noreferrer"}},[e._v("data resources"),t("OutboundLink")],1),e._v(" helps people understand how they can use, modify and share the contents of a data package.")]),e._v(" "),t("p",[e._v("It is recommended to that you apply a license, waiver or public domain mark to a data package using the "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[t("code",[e._v("licenses")]),t("OutboundLink")],1),e._v(" property. The value assigned to the data package "),t("code",[e._v("licenses")]),e._v(" property applies to all the data, files and metadata in the data package unless specified otherwise.")]),e._v(" "),t("p",[e._v("You can optionally apply a license to a data resource. This allows a license that differs from the data package license to be applied to the data resource. If the data resource "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#optional-properties",target:"_blank",rel:"noopener noreferrer"}},[t("code",[e._v("licenses")]),t("OutboundLink")],1),e._v(" property is not specified, it inherits the data package "),t("code",[e._v("licenses")]),e._v(".")]),e._v(" "),t("h2",{attrs:{id:"specifying-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specifying-a-license"}},[e._v("#")]),e._v(" Specifying a license")]),e._v(" "),t("p",[e._v("The Frictionless Data specification states that a "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[e._v("license"),t("OutboundLink")],1),e._v(" must contain a "),t("code",[e._v("name")]),e._v(" property and/or a "),t("code",[e._v("path")]),e._v(" property, and may contain a "),t("code",[e._v("title")]),e._v(" property.")]),e._v(" "),t("blockquote",[t("ul",[t("li",[t("code",[e._v("name")]),e._v(": The name MUST be an "),t("a",{attrs:{href:"http://licenses.opendefinition.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition license ID"),t("OutboundLink")],1)]),e._v(" "),t("li",[t("code",[e._v("path")]),e._v(": A "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-resource/#url-or-path",target:"_blank",rel:"noopener noreferrer"}},[e._v("url-or-path"),t("OutboundLink")],1),e._v(" string, that is a fully qualified HTTP address, or a relative POSIX path")]),e._v(" "),t("li",[t("code",[e._v("title")]),e._v(": A human-readable title")])])]),e._v(" "),t("p",[e._v("You can specify the location of a license using a URL or a Path.")]),e._v(" "),t("h3",{attrs:{id:"specify-a-license-using-a-url"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specify-a-license-using-a-url"}},[e._v("#")]),e._v(" Specify a license using a URL")]),e._v(" "),t("p",[e._v("To specify a license using a URL, use the fully qualified HTTP address as the value in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "https://cdla.io/sharing-1-0/",\n "title": "Community Data License Agreement – Sharing, Version 1.0"\n}]\n')])])]),t("h3",{attrs:{id:"specify-a-license-using-a-path"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#specify-a-license-using-a-path"}},[e._v("#")]),e._v(" Specify a license using a Path")]),e._v(" "),t("p",[e._v("To specify a license using a path, use a relative POSIX path to the file in the data package as the value in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "LICENSE.pdf"\n}]\n')])])]),t("p",[e._v("In this example, LICENSE.pdf would be in the root of the data package folder, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("folder\n |- datapackage.json\n |- LICENSE.pdf\n |- README.md\n |- data\n |- data.csv\n |- reference-data.csv\n\n")])])]),t("p",[e._v("It is recommended that the licence is provided in "),t("a",{attrs:{href:"http://commonmark.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("markdown"),t("OutboundLink")],1),e._v(" format to simplify its display in data platforms and other software.")]),e._v(" "),t("p",[e._v("The license can be a separate file or included in the "),t("code",[e._v("README.md")]),e._v(" file. If license information is included in the "),t("code",[e._v("README.md")]),e._v(" file, it is recommended that it follows the "),t("RouterLink",{attrs:{to:"/blog/2016/04/20/publish-faq/#readme"}},[e._v("guide for formatting a README file")]),e._v(".")],1),e._v(" "),t("h2",{attrs:{id:"applying-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#applying-a-license"}},[e._v("#")]),e._v(" Applying a license")]),e._v(" "),t("p",[e._v("These scenarios apply to either the data package or a data resource.")]),e._v(" "),t("ol",[t("li",[t("a",{attrs:{href:"#apply-an-open-license"}},[e._v("Apply an open license")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-non-open-license"}},[e._v("Apply a non-open license")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-waiver"}},[e._v("Apply a waiver")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#apply-a-public-domain-mark"}},[e._v("Apply a public domain mark")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#do-not-apply-a-license"}},[e._v("Do not apply a license")])])]),e._v(" "),t("p",[e._v("Other considerations:")]),e._v(" "),t("ul",[t("li",[t("a",{attrs:{href:"#provide-additional-license-information"}},[e._v("Provide additional license information")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#copyright-belongs-to-multiple-parties"}},[e._v("Copyright belongs to multiple parties")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#license-may-become-legally-binding"}},[e._v("License may become legally binding")])]),e._v(" "),t("li",[t("a",{attrs:{href:"#software-may-not-fully-support-the-frictionless-data-specification"}},[e._v("Software may not fully support the Frictionless Data specification")])])]),e._v(" "),t("h3",{attrs:{id:"apply-an-open-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-an-open-license"}},[e._v("#")]),e._v(" Apply an open license")]),e._v(" "),t("p",[e._v("For an "),t("a",{attrs:{href:"http://opendefinition.org/licenses/",target:"_blank",rel:"noopener noreferrer"}},[e._v("open license"),t("OutboundLink")],1),e._v(", use "),t("code",[e._v("name")]),e._v(", "),t("code",[e._v("path")]),e._v(" and "),t("code",[e._v("title")]),e._v(", e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "CC-BY-4.0",\n "path": "https://creativecommons.org/licenses/by/4.0/",\n "title": "Creative Commons Attribution 4.0"\n}]\n')])])]),t("p",[t("code",[e._v("name")]),e._v(" must be an "),t("a",{attrs:{href:"http://licenses.opendefinition.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition license ID"),t("OutboundLink")],1),e._v(" however note that some license IDs are placeholders or have been retired and should not be used, e.g. "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-at.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-at"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-open.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-open"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/other-pd.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("other-pd"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/notspecified.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("notspecified"),t("OutboundLink")],1),e._v(", "),t("a",{attrs:{href:"http://licenses.opendefinition.org/licenses/ukcrown-withrights.json",target:"_blank",rel:"noopener noreferrer"}},[e._v("ukcrown-withrights"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-non-open-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-non-open-license"}},[e._v("#")]),e._v(" Apply a non-open license")]),e._v(" "),t("p",[e._v("To apply an non-open license, use the "),t("code",[e._v("path")]),e._v(" and optionally the "),t("code",[e._v("title")]),e._v(" properties. It is preferred that the license is published at a URL (a fully qualified HTTP address), e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "https://creativecommons.org/licenses/by-nc-nd/4.0/",\n "title": "Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)"\n}]\n')])])]),t("p",[e._v("If the license is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify a license using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-waiver"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-waiver"}},[e._v("#")]),e._v(" Apply a waiver")]),e._v(" "),t("p",[e._v("You can indicate that copyright has been waived by referencing a waiver at a URL in the "),t("code",[e._v("path")]),e._v(" property, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "CC0-1.0"\n "path": "https://creativecommons.org/publicdomain/zero/1.0/",\n "title": "CC0 1.0"\n}]\n')])])]),t("p",[e._v("If the waiver is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify a waiver using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"apply-a-public-domain-mark"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#apply-a-public-domain-mark"}},[e._v("#")]),e._v(" Apply a public domain mark")]),e._v(" "),t("p",[e._v("You can indicate that there is no copyright in the data or that copyright has expired, using the "),t("a",{attrs:{href:"https://creativecommons.org/share-your-work/public-domain/pdm/",target:"_blank",rel:"noopener noreferrer"}},[e._v("public domain mark"),t("OutboundLink")],1),e._v(" or other public domain dedications, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "http://creativecommons.org/publicdomain/mark/1.0/",\n "title": "Public Domain Mark"\n}]\n')])])]),t("p",[e._v("If the public domain dedication is not available at a URL, you can "),t("a",{attrs:{href:"#specify-a-license-using-a-path"}},[e._v("specify the public domain dedication using a path")]),e._v(".")]),e._v(" "),t("h3",{attrs:{id:"do-not-apply-a-license"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#do-not-apply-a-license"}},[e._v("#")]),e._v(" Do not apply a license")]),e._v(" "),t("p",[e._v("If you have not decided what license to apply but still want to publish the data package, describe the situation in a file in the data package, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "path": "README.md"\n}]\n')])])]),t("h2",{attrs:{id:"other-considerations"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#other-considerations"}},[e._v("#")]),e._v(" Other considerations")]),e._v(" "),t("h3",{attrs:{id:"provide-additional-license-information"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#provide-additional-license-information"}},[e._v("#")]),e._v(" Provide additional license information")]),e._v(" "),t("p",[e._v("It can be helpful to data consumers to provide additional copyright or attribution information such as:")]),e._v(" "),t("ul",[t("li",[e._v("copyright notice - this allows a data publisher to specify a short copyright notice")]),e._v(" "),t("li",[e._v("copyright statement URL - a URL to a copyright statement")]),e._v(" "),t("li",[e._v("preferred attribution text - the text to be used when attributing the creator(s) of the data")]),e._v(" "),t("li",[e._v("attribution URL - a URL to be used when building an attribution link")])]),e._v(" "),t("p",[e._v("This is explained in the ODI "),t("a",{attrs:{href:"https://theodi.org/guides/publishers-guide-to-the-open-data-rights-statement-vocabulary",target:"_blank",rel:"noopener noreferrer"}},[e._v("Publisher’s Guide to the Open Data Rights Statement Vocabulary"),t("OutboundLink")],1),e._v(" and "),t("a",{attrs:{href:"https://theodi.org/guides/odrs-reusers-guide",target:"_blank",rel:"noopener noreferrer"}},[e._v("Re-users Guide to the Open Data Rights Statement Vocabulary"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("Some licenses require that data consumers provide the copyright notice in the attribution (e.g. "),t("a",{attrs:{href:"https://creativecommons.org/licenses/by/4.0/legalcode#s3",target:"_blank",rel:"noopener noreferrer"}},[e._v("CC BY 4.0 Section 3"),t("OutboundLink")],1),e._v(").")]),e._v(" "),t("p",[e._v("Some data publishers may waive some of their rights under a license, e.g.")]),e._v(" "),t("blockquote",[t("p",[t("a",{attrs:{href:"https://data.gov.au/dataset/noosa-wedding-locations",target:"_blank",rel:"noopener noreferrer"}},[e._v("Noosa Wedding Locations"),t("OutboundLink")],1),e._v(" data by "),t("a",{attrs:{href:"https://www.noosa.qld.gov.au",target:"_blank",rel:"noopener noreferrer"}},[e._v("Noosa Shire Council"),t("OutboundLink")],1),e._v(" is licensed under a "),t("a",{attrs:{href:"https://creativecommons.org/licenses/by/4.0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Creative Commons Attribution 4.0"),t("OutboundLink")],1),e._v(" licence."),t("br"),e._v("\nNoosa Shire Council waives the requirements of attribution under this licence, for this data.")])]),e._v(" "),t("p",[e._v("You can include this information, either:")]),e._v(" "),t("ul",[t("li",[e._v("in the file containing license information (e.g. "),t("code",[e._v("README.md")]),e._v(")")]),e._v(" "),t("li",[e._v("as additional metadata properties in the datapackage.json")])]),e._v(" "),t("p",[e._v("The data package specification supports adding "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#descriptor",target:"_blank",rel:"noopener noreferrer"}},[e._v("additional metadata properties"),t("OutboundLink")],1),e._v(" to the datapackage.json, e.g.")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('{\n "name" : "coastal-data-system-near-real-time-wave-data",\n "title" : "Coastal Data System – Near real time wave data",\n "licenses" : [{\n "name": "CC-BY-4.0",\n "path": "https://creativecommons.org/licenses/by/4.0/",\n "title": "Creative Commons Attribution 4.0"\n }],\n "copyrightNotice": "© The State of Queensland 1995–2017",\n "copyrightStatement": "https://www.qld.gov.au/legal/copyright",\n "attributionText": "Science, Information Technology and Innovation, Queensland Government, Coastal Data System – Near real time wave data, licensed under Creative Commons Attribution 4.0 sourced on 26 December 2017",\n "resources": [\n {\n "path": "https://data.qld.gov.au/dataset/coastal-data-system-near-real-time-wave-data",\n ...\n }\n ]\n}\n')])])]),t("h3",{attrs:{id:"copyright-belongs-to-multiple-parties"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#copyright-belongs-to-multiple-parties"}},[e._v("#")]),e._v(" Copyright belongs to multiple parties")]),e._v(" "),t("p",[e._v("Sometimes data in a resource may be combined from multiple sources that are licensed in different ways. You can indicate this by placing two or more licenses in the "),t("code",[e._v("licenses")]),e._v(" property. Further explanation should be given in the "),t("code",[e._v("README.md")]),e._v(".")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v('"licenses": [{\n "name": "PDDL-1.0",\n "path": "http://opendatacommons.org/licenses/pddl/",\n "title": "Open Data Commons Public Domain Dedication and License v1.0"\n },\n {\n "name": "CC-BY-SA-4.0",\n "path": "https://creativecommons.org/licenses/by-sa/4.0/",\n "title": "Creative Commons Attribution Share-Alike 4.0"\n }]\n')])])]),t("h3",{attrs:{id:"license-may-become-legally-binding"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#license-may-become-legally-binding"}},[e._v("#")]),e._v(" License may become legally binding")]),e._v(" "),t("p",[e._v("The "),t("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/#licenses",target:"_blank",rel:"noopener noreferrer"}},[e._v("specification"),t("OutboundLink")],1),e._v(" for "),t("code",[e._v("licenses")]),e._v(" states:")]),e._v(" "),t("blockquote",[t("p",[t("strong",[e._v("This property is not legally binding and does not guarantee the package is licensed under the terms defined in this property.")])])]),e._v(" "),t("p",[e._v("A data package may be uploaded to a data platform and the "),t("code",[e._v("licenses")]),e._v(" applied to the data resources may be publicly displayed. This may make, or give the perception that, the license is legally binding. Please check your specific situation before publishing the data.")]),e._v(" "),t("h3",{attrs:{id:"software-may-not-fully-support-the-frictionless-data-specification"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#software-may-not-fully-support-the-frictionless-data-specification"}},[e._v("#")]),e._v(" Software may not fully support the Frictionless Data specification")]),e._v(" "),t("p",[e._v("Be aware that some data platforms or software may not fully support the Frictionless Data specification. This may result in license information being lost or other issues. Always test your data publication to ensure you communicate the correct license information.")]),e._v(" "),t("p",[e._v("For example, at the time of writing:")]),e._v(" "),t("ul",[t("li",[t("p",[t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN Data Package extension"),t("OutboundLink")],1),e._v(":")]),e._v(" "),t("ul",[t("li",[e._v("does not upload the "),t("code",[e._v("README.md")]),e._v(" file in a data package. If you have described licence information in the "),t("code",[e._v("README.md")]),e._v(" file, this will be lost ("),t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager/issues/60",target:"_blank",rel:"noopener noreferrer"}},[e._v("issue #60"),t("OutboundLink")],1),e._v(")")]),e._v(" "),t("li",[e._v("does not display license information in the datapackage.json file correctly ("),t("a",{attrs:{href:"https://github.com/frictionlessdata/ckanext-datapackager/issues/62",target:"_blank",rel:"noopener noreferrer"}},[e._v("issue #62"),t("OutboundLink")],1),e._v(")")])])]),e._v(" "),t("li",[t("p",[t("RouterLink",{attrs:{to:"/blog/2019/03/01/datacurator/"}},[e._v("Data Curator")]),e._v(" only allows the user to select from a limited set of open licenses to describe the data package and data resource licenses.")],1)])])])}),[],!1,null,null,null);a.default=i.exports}}]); \ No newline at end of file diff --git a/assets/js/77.6a6fc44c.js b/assets/js/77.ad0bf117.js similarity index 97% rename from assets/js/77.6a6fc44c.js rename to assets/js/77.ad0bf117.js index 6168e4171..8bf573ed4 100644 --- a/assets/js/77.6a6fc44c.js +++ b/assets/js/77.ad0bf117.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[77],{593:function(a,t,e){"use strict";e.r(t);var s=e(29),o=Object(s.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("p",[a._v("This tutorial will show you how to install the JavaScript libraries for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),a._v(" "),e("h2",{attrs:{id:"setup"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[a._v("#")]),a._v(" Setup")]),a._v(" "),e("p",[a._v("For this tutorial we will need "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-js",target:"_blank",rel:"noopener noreferrer"}},[a._v("datapackage-js"),e("OutboundLink")],1),a._v(" which is a JavaScript library for working with Data Packages.")]),a._v(" "),e("p",[a._v("Using Node Package Manager ("),e("code",[a._v("npm")]),a._v("), install the latest version of "),e("code",[a._v("datapackage-js")]),a._v(" by entering the following into your command line:")]),a._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[e("span",{pre:!0,attrs:{class:"token function"}},[a._v("npm")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token function"}},[a._v("install")]),a._v(" datapackage@latest\n")])])]),e("p",[a._v("Run the "),e("code",[a._v("datapackage --help")]),a._v(" command to find out all options available to you.")]),a._v(" "),e("h2",{attrs:{id:"creating-a-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-package"}},[a._v("#")]),a._v(" Creating a package")]),a._v(" "),e("p",[a._v("The basic building block of a data package is the "),e("code",[a._v("datapackage.json")]),a._v(" file. It contains the schema and metadata of your data collections.")]),a._v(" "),e("p",[a._v("Now that the node package for working with data packages has been installed, create a directory for your project, and use the command "),e("code",[a._v("datapackage infer path/to/file.csv")]),a._v(" to generate a schema for your dataset. To save this file in the directory for editing and sharing, simply append "),e("code",[a._v("> datapackage.json")]),a._v(" to the command above, like so:")]),a._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[a._v("datapackage infer path/to/file.csv "),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(">")]),a._v(" datapackage.json\n")])])]),e("p",[a._v("This creates a "),e("code",[a._v("datapackage.json")]),a._v(" file in this directory.")]),a._v(" "),e("h2",{attrs:{id:"publishing"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[a._v("#")]),a._v(" Publishing")]),a._v(" "),e("p",[a._v("Now that you have created your Data Package, you might want to "),e("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[a._v("publish your data online")]),a._v(" so that you can share it with others.")],1)])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[77],{592:function(a,t,e){"use strict";e.r(t);var s=e(29),o=Object(s.a)({},(function(){var a=this,t=a.$createElement,e=a._self._c||t;return e("ContentSlotsDistributor",{attrs:{"slot-key":a.$parent.slotKey}},[e("p",[a._v("This tutorial will show you how to install the JavaScript libraries for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.")]),a._v(" "),e("h2",{attrs:{id:"setup"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[a._v("#")]),a._v(" Setup")]),a._v(" "),e("p",[a._v("For this tutorial we will need "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-js",target:"_blank",rel:"noopener noreferrer"}},[a._v("datapackage-js"),e("OutboundLink")],1),a._v(" which is a JavaScript library for working with Data Packages.")]),a._v(" "),e("p",[a._v("Using Node Package Manager ("),e("code",[a._v("npm")]),a._v("), install the latest version of "),e("code",[a._v("datapackage-js")]),a._v(" by entering the following into your command line:")]),a._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[e("span",{pre:!0,attrs:{class:"token function"}},[a._v("npm")]),a._v(" "),e("span",{pre:!0,attrs:{class:"token function"}},[a._v("install")]),a._v(" datapackage@latest\n")])])]),e("p",[a._v("Run the "),e("code",[a._v("datapackage --help")]),a._v(" command to find out all options available to you.")]),a._v(" "),e("h2",{attrs:{id:"creating-a-package"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#creating-a-package"}},[a._v("#")]),a._v(" Creating a package")]),a._v(" "),e("p",[a._v("The basic building block of a data package is the "),e("code",[a._v("datapackage.json")]),a._v(" file. It contains the schema and metadata of your data collections.")]),a._v(" "),e("p",[a._v("Now that the node package for working with data packages has been installed, create a directory for your project, and use the command "),e("code",[a._v("datapackage infer path/to/file.csv")]),a._v(" to generate a schema for your dataset. To save this file in the directory for editing and sharing, simply append "),e("code",[a._v("> datapackage.json")]),a._v(" to the command above, like so:")]),a._v(" "),e("div",{staticClass:"language-bash extra-class"},[e("pre",{pre:!0,attrs:{class:"language-bash"}},[e("code",[a._v("datapackage infer path/to/file.csv "),e("span",{pre:!0,attrs:{class:"token operator"}},[a._v(">")]),a._v(" datapackage.json\n")])])]),e("p",[a._v("This creates a "),e("code",[a._v("datapackage.json")]),a._v(" file in this directory.")]),a._v(" "),e("h2",{attrs:{id:"publishing"}},[e("a",{staticClass:"header-anchor",attrs:{href:"#publishing"}},[a._v("#")]),a._v(" Publishing")]),a._v(" "),e("p",[a._v("Now that you have created your Data Package, you might want to "),e("RouterLink",{attrs:{to:"/blog/2016/08/29/publish-online/"}},[a._v("publish your data online")]),a._v(" so that you can share it with others.")],1)])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/79.01904507.js b/assets/js/79.df9ef6ef.js similarity index 99% rename from assets/js/79.01904507.js rename to assets/js/79.df9ef6ef.js index 673676473..43fce605e 100644 --- a/assets/js/79.01904507.js +++ b/assets/js/79.df9ef6ef.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[79],{596:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Georges Labrèche was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in Java programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/10/24/georges-labreche/"}},[t._v("his grantee profile")]),t._v(".")],1),t._v(" "),s("p",[t._v("In this post, Labrèche will show you how to install and use the "),s("a",{attrs:{href:"https://www.java.com/en/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Java"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("Our goal in this tutorial is to load tabular data from a CSV file, infer data types and the table’s schema.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("First things first, you’ll want to grab "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-java"),s("OutboundLink")],1),t._v(" and the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-java"),s("OutboundLink")],1),t._v(" libraries.")]),t._v(" "),s("h2",{attrs:{id:"the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-data"}},[t._v("#")]),t._v(" The Data")]),t._v(" "),s("p",[t._v("For our example, we will use a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),s("OutboundLink")],1),t._v(" containing the periodic table. You can find the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("data package descriptor"),s("OutboundLink")],1),t._v(" and the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("data"),s("OutboundLink")],1),t._v(" on GitHub.")]),t._v(" "),s("p",[t._v("A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("h2",{attrs:{id:"packaging"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#packaging"}},[t._v("#")]),t._v(" Packaging")]),t._v(" "),s("p",[t._v("Let’s start by fetching and packaging the data:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// fetch the data")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("URL")]),t._v(" url "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("URL")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// package the data")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Package")]),t._v(" dp "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Package")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("url"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("That’s it, you’re all set to start playing with the packaged data. There are parameters you can set such as loading a schema or imposing strict validation so be sure to go through the project’s "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java/blob/master/README.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("README"),s("OutboundLink")],1),t._v(" for more detail.")]),t._v(" "),s("h2",{attrs:{id:"iterating"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#iterating"}},[t._v("#")]),t._v(" Iterating")]),t._v(" "),s("p",[t._v("Now that you have a Data Package instance, let’s see what the data looks like. A data package can contain more than one resource so you have to use the "),s("code",[t._v("Package.getResource()")]),t._v(" method to specify which resource you’d like to access.")]),t._v(" "),s("p",[t._v("Let’s iterate over the data:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get a resource named data from the data package")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Resource")]),t._v(" resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("getResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get the Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Iterator")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" iter "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Iterate")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("while")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("hasNext")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n\t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" atomicNumber "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" symbol "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" name "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" atomicMass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" metalOrNonMetal "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n")])])]),s("p",[t._v("Notice how we’re fetching all values as "),s("code",[t._v("String")]),t._v(". This may not be what you want, particularly for the atomic number and mass. Alternatively, you can trigger data type inference and casting like this:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Third boolean is the cast flag.")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Iterator")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Object")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" iter "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("while")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("hasNext")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n\t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("int")]),t._v(" atomicNumber "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" symbol "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" name "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("float")]),t._v(" atomicMass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" metalOrNonMetal "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n")])])]),s("p",[t._v("And that’s it, your data is now associated with the appropriate data types!")]),t._v(" "),s("h2",{attrs:{id:"inferring-the-schema"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inferring-the-schema"}},[t._v("#")]),t._v(" Inferring the Schema")]),t._v(" "),s("p",[t._v("We wouldn’t have had to infer the data types if we had included a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Table Schema"),s("OutboundLink")],1),t._v(" when creating an instance of our Data Package. If a Table Schema is not available, then it’s something that can also be inferred and created with "),s("code",[t._v("tableschema-java")]),t._v(":")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("URL")]),t._v(" url "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("URL")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Table")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Table")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("url"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Schema")]),t._v(" schema "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("inferSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\nschema"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("write")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"/path/to/write/schema.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("The type inference algorithm tries to cast to available types and each successful type casting increments a popularity score for the successful type cast in question. At the end, the best score so far is returned.")]),t._v(" "),s("p",[t._v("The inference algorithm traverses all of the table’s rows and attempts to cast every single value of the table. When dealing with large tables, you might want to limit the number of rows that the inference algorithm processes:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Only process the first 25 rows for type inference.")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Schema")]),t._v(" schema "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("inferSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("25")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("Be sure to go through "),s("code",[t._v("tableschema-java")]),t._v("'s "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-java/blob/master/README.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("README"),s("OutboundLink")],1),t._v(" as well to learn more about how to operate with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Table Schema"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h2",{attrs:{id:"contributing"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#contributing"}},[t._v("#")]),t._v(" Contributing")]),t._v(" "),s("p",[t._v("In case you discovered an issue that you’d like to contribute a fix for, or if you would like to extend functionality:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# install jabba and maven2")]),t._v("\n$ "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("cd")]),t._v(" tableschema-java\n$ jabba "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("install")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.8")]),t._v("\n$ jabba use "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.8")]),t._v("\n$ mvn "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("install")]),t._v(" -DskipTests"),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("true -Dmaven.javadoc.skip"),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("true -B -V\n$ mvn "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("test")]),t._v(" -B\n\n")])])]),s("p",[t._v("Make sure that all tests pass, and submit a PR with your contributions once you’re ready.")]),t._v(" "),s("p",[t._v("We also welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the datapackage-java repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[79],{595:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Georges Labrèche was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data libraries in Java programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/10/24/georges-labreche/"}},[t._v("his grantee profile")]),t._v(".")],1),t._v(" "),s("p",[t._v("In this post, Labrèche will show you how to install and use the "),s("a",{attrs:{href:"https://www.java.com/en/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Java"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("Our goal in this tutorial is to load tabular data from a CSV file, infer data types and the table’s schema.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("First things first, you’ll want to grab "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-java"),s("OutboundLink")],1),t._v(" and the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("tableschema-java"),s("OutboundLink")],1),t._v(" libraries.")]),t._v(" "),s("h2",{attrs:{id:"the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-data"}},[t._v("#")]),t._v(" The Data")]),t._v(" "),s("p",[t._v("For our example, we will use a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Package"),s("OutboundLink")],1),t._v(" containing the periodic table. You can find the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("data package descriptor"),s("OutboundLink")],1),t._v(" and the "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv",target:"_blank",rel:"noopener noreferrer"}},[t._v("data"),s("OutboundLink")],1),t._v(" on GitHub.")]),t._v(" "),s("p",[t._v("A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("h2",{attrs:{id:"packaging"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#packaging"}},[t._v("#")]),t._v(" Packaging")]),t._v(" "),s("p",[t._v("Let’s start by fetching and packaging the data:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// fetch the data")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("URL")]),t._v(" url "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("URL")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// package the data")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Package")]),t._v(" dp "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Package")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("url"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("That’s it, you’re all set to start playing with the packaged data. There are parameters you can set such as loading a schema or imposing strict validation so be sure to go through the project’s "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java/blob/master/README.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("README"),s("OutboundLink")],1),t._v(" for more detail.")]),t._v(" "),s("h2",{attrs:{id:"iterating"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#iterating"}},[t._v("#")]),t._v(" Iterating")]),t._v(" "),s("p",[t._v("Now that you have a Data Package instance, let’s see what the data looks like. A data package can contain more than one resource so you have to use the "),s("code",[t._v("Package.getResource()")]),t._v(" method to specify which resource you’d like to access.")]),t._v(" "),s("p",[t._v("Let’s iterate over the data:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get a resource named data from the data package")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Resource")]),t._v(" resource "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" pkg"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("getResource")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"data"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get the Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Iterator")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" iter "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Iterate")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("while")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("hasNext")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n\t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" atomicNumber "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" symbol "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" name "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" atomicMass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" metalOrNonMetal "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n")])])]),s("p",[t._v("Notice how we’re fetching all values as "),s("code",[t._v("String")]),t._v(". This may not be what you want, particularly for the atomic number and mass. Alternatively, you can trigger data type inference and casting like this:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Get Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Third boolean is the cast flag.")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Iterator")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("<")]),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Object")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v(">")]),t._v(" iter "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" resource"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("iter")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("false")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(",")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token boolean"}},[t._v("true")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Iterator")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("while")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("hasNext")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("\n\t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v(" row "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" iter"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("next")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("int")]),t._v(" atomicNumber "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("0")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" symbol "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" name "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("float")]),t._v(" atomicMass "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n \t"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("String")]),t._v(" metalOrNonMetal "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n\n")])])]),s("p",[t._v("And that’s it, your data is now associated with the appropriate data types!")]),t._v(" "),s("h2",{attrs:{id:"inferring-the-schema"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#inferring-the-schema"}},[t._v("#")]),t._v(" Inferring the Schema")]),t._v(" "),s("p",[t._v("We wouldn’t have had to infer the data types if we had included a "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Table Schema"),s("OutboundLink")],1),t._v(" when creating an instance of our Data Package. If a Table Schema is not available, then it’s something that can also be inferred and created with "),s("code",[t._v("tableschema-java")]),t._v(":")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("URL")]),t._v(" url "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("URL")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/data.csv"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Table")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("new")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Table")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),t._v("url"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Schema")]),t._v(" schema "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("inferSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\nschema"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("write")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"/path/to/write/schema.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("The type inference algorithm tries to cast to available types and each successful type casting increments a popularity score for the successful type cast in question. At the end, the best score so far is returned.")]),t._v(" "),s("p",[t._v("The inference algorithm traverses all of the table’s rows and attempts to cast every single value of the table. When dealing with large tables, you might want to limit the number of rows that the inference algorithm processes:")]),t._v(" "),s("div",{staticClass:"language-java extra-class"},[s("pre",{pre:!0,attrs:{class:"language-java"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("// Only process the first 25 rows for type inference.")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token class-name"}},[t._v("Schema")]),t._v(" schema "),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v(" table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(".")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("inferSchema")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("25")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(";")]),t._v("\n\n")])])]),s("p",[t._v("Be sure to go through "),s("code",[t._v("tableschema-java")]),t._v("'s "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-java/blob/master/README.md",target:"_blank",rel:"noopener noreferrer"}},[t._v("README"),s("OutboundLink")],1),t._v(" as well to learn more about how to operate with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/table-schema/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Table Schema"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("h2",{attrs:{id:"contributing"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#contributing"}},[t._v("#")]),t._v(" Contributing")]),t._v(" "),s("p",[t._v("In case you discovered an issue that you’d like to contribute a fix for, or if you would like to extend functionality:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("\n"),s("span",{pre:!0,attrs:{class:"token comment"}},[t._v("# install jabba and maven2")]),t._v("\n$ "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("cd")]),t._v(" tableschema-java\n$ jabba "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("install")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.8")]),t._v("\n$ jabba use "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.8")]),t._v("\n$ mvn "),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("install")]),t._v(" -DskipTests"),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("true -Dmaven.javadoc.skip"),s("span",{pre:!0,attrs:{class:"token operator"}},[t._v("=")]),t._v("true -B -V\n$ mvn "),s("span",{pre:!0,attrs:{class:"token builtin class-name"}},[t._v("test")]),t._v(" -B\n\n")])])]),s("p",[t._v("Make sure that all tests pass, and submit a PR with your contributions once you’re ready.")]),t._v(" "),s("p",[t._v("We also welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the datapackage-java repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/8.215bd622.js b/assets/js/8.51c3d6d9.js similarity index 94% rename from assets/js/8.215bd622.js rename to assets/js/8.51c3d6d9.js index 9f1f3a9c6..0b996db8f 100644 --- a/assets/js/8.215bd622.js +++ b/assets/js/8.51c3d6d9.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[8],{494:function(e,t,a){e.exports=a.p+"assets/img/bcodmoLogo.958d74b9.jpg"},495:function(e,t,a){e.exports=a.p+"assets/img/bcodmo1.1e0069cf.png"},496:function(e,t,a){e.exports=a.p+"assets/img/bcodmo2.1e6fde83.png"},497:function(e,t,a){e.exports=a.p+"assets/img/bcodmo3.a2871755.png"},498:function(e,t,a){e.exports=a.p+"assets/img/bcodmo4.74b606a5.png"},499:function(e,t,a){e.exports=a.p+"assets/img/bcodmo5.ab522411.png"},500:function(e,t,a){e.exports=a.p+"assets/img/bcodmo6.c90593b8.png"},620:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("This blog post describes a Frictionless Data Pilot with the Biological and Chemical Oceanography Data Management Office (BCO-DMO). Pilot projects are part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research project"),o("OutboundLink")],1),e._v(". Written by the BCO-DMO team members Adam Shepherd, Amber York, Danie Kinkade, and development by Conrad Schloer.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(494),alt:"BCO-DMO logo"}})]),e._v(" "),o("p",[e._v("Scientific research is implicitly reliant upon the creation, management, analysis, synthesis, and interpretation of data. When properly stewarded, data hold great potential to demonstrate the reproducibility of scientific results and accelerate scientific discovery. "),o("a",{attrs:{href:"https://www.bco-dmo.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Biological and Chemical Oceanography Data Management Office (BCO-DMO)"),o("OutboundLink")],1),e._v(" is a publicly accessible earth science data repository established by the National Science Foundation "),o("a",{attrs:{href:"https://www.nsf.gov/",target:"_blank",rel:"noopener noreferrer"}},[e._v("(NSF)"),o("OutboundLink")],1),e._v(" for the curation of biological, chemical, and biogeochemical oceanographic data from research in coastal, marine, and laboratory environments. With the groundswell surrounding the "),o("a",{attrs:{href:"https://doi.org/10.1038/sdata.2016.18",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAIR data principles"),o("OutboundLink")],1),e._v(", BCO-DMO recognized an opportunity to improve its curation services to better support reproducibility of results, while increasing process efficiencies for incoming data submissions. "),o("strong",[e._v("In 2019, BCO-DMO worked with the Frictionless Data team at Open Knowledge Foundation to develop a web application called Laminar for creating Frictionlessdata Data Package Pipelines that help data managers process data efficiently while recording the provenance of their activities to support reproducibility of results.")])]),e._v(" "),o("p",[e._v("The mission of BCO-DMO is to provide investigators with data management services that span the full data lifecycle from data management planning, to data publication, and archiving.")]),e._v(" "),o("p",[e._v("BCO-DMO provides free access to oceanographic data through a web-based catalog with tools and features facilitating assessment of fitness for purpose. The result of this effort is a database containing over "),o("strong",[e._v("9,000 datasets from a variety of oceanographic and limnological measurements")]),e._v(" including those from: in situ sampling, moorings, floats and gliders, sediment traps; laboratory and mesocosm experiments; satellite images; derived parameters and model output; and synthesis products from data integration efforts. The project has worked with over 2,600 data contributors representing over 1,000 funded projects.")]),e._v(" "),o("p",[e._v("As the catalog of data holdings continued to grow in both size and the variety of data types it curates, BCO-DMO needed to retool its data infrastructure with three goals. First, to improve the transportation of data to, from, and within BCO-DMO’s ecosystem. Second, to support reproducibility of research by making all curation activities of the office completely transparent and traceable. Finally, to improve the efficiency and consistency across data management staff. Until recently, data curation activities in the office were largely dependent on the individual capabilities of each data manager. While some of the staff were fluent in Python and other scripting languages, others were dependent on in-house custom developed tools. These in-house tools were extremely useful and flexible, but they were developed for an aging computing paradigm grounded in physical hardware accessing local data resources on disk. While locally stored data is still the convention at BCO-DMO, the distributed nature of the web coupled with the challenges of big data stretched this toolset beyond its original intention.")]),e._v(" "),o("p",[e._v("In 2015, we were introduced to the idea of data containerization and the Frictionless Data project in a "),o("a",{attrs:{href:"https://www.rd-alliance.org/data-packages-bof-p6-bof-session.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Packages BoF"),o("OutboundLink")],1),e._v(" at the "),o("a",{attrs:{href:"https://www.rd-alliance.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Research Data Alliance"),o("OutboundLink")],1),e._v(" conference in Paris, France. After evaluating the Frictionless Data specifications and tools, BCO-DMO developed a strategy to underpin its new data infrastructure on the ideas behind this project.")]),e._v(" "),o("p",[e._v("While the concept of data packaging is not new, the simplicity and extendibility of the Frictionless Data implementation made it easy to adopt within an existing infrastructure. "),o("strong",[e._v("BCO-DMO identified the Data Package Pipelines (DPP) project in the Frictionless Data toolset as key to achieving its data curation goals.")]),e._v(" DPP implements the philosophy of declarative workflows which trade code in a specific programming language that tells a computer how a task should be completed, for imperative, structured statements that detail what should be done. These structured statements abstract the user writing the statements from the actual code executing them, and are useful for reproducibility over long periods of time where programming languages age, change or algorithms improve. This flexibility was appealing because it meant the intent of the data manager could be translated into many varying programming (and data) languages over time without having to refactor older workflows. In data management, that means that one of the languages a DPP workflow captures is provenance – a common need across oceanographic datasets for reproducibility. DPP Workflows translated into records of provenance explicitly communicates to data submitters and future data users what BCO-DMO had done during the curation phase. Secondly, because workflow steps need to be interpreted by computers into code that carries out the instructions, it helped data management staff converge on a declarative language they could all share. This convergence meant cohesiveness, consistency, and efficiency across the team if we could implement DPP in a way they could all use.")]),e._v(" "),o("p",[o("strong",[e._v("In 2018, BCO-DMO formed a partnership with Open Knowledge Foundation (OKF) to develop a web application that would help any BCO-DMO data manager use the declarative language they had developed in a consistent way.")]),e._v(" Why develop a web application for DPP? As the data management staff evaluated DPP and Frictionless Data, they found that there was a learning curve to setting up the DPP environment and a deep understanding of the Frictionlessdata ‘Data Package’ specification was required. The web application abstracted this required knowledge to achieve two main goals: 1) consistently structured Data Packages (datapackage.json) with all the required metadata employed at BCO-DMO, and 2) efficiencies of time by eliminating typos and syntax errors made by data managers. Thus, the partnership with OKF focused on making the needs of scientific research data a possibility within the Frictionless Data ecosystem of specs and tools.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Pipelines"),o("OutboundLink")],1),e._v(" is implemented in Python and comes with some built-in processors that can be used in a workflow. BCO-DMO took its own declarative language and identified gaps in the built-in processors. For these gaps, BCO-DMO and OKF developed Python implementations for the missing declarations to support the curation of oceanographic data, and the result was a new set of processors made available on "),o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Some notable BCO-DMO processors are:")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsboolean_add_computed_field",target:"_blank",rel:"noopener noreferrer"}},[e._v("boolean_add_computed_field"),o("OutboundLink")],1),e._v("– Computes a new field to add to the data whether a particular row satisfies a certain set of criteria."),o("br"),e._v("\nExample: Where Cruise_ID = ‘AT39-05’ and Station = 6, set Latitude to 22.1645.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsconvert_date",target:"_blank",rel:"noopener noreferrer"}},[e._v("convert_date"),o("OutboundLink")],1),e._v(" – Converts any number of fields containing date information into a single date field with display format and timezone options. Often data information is reported in multiple columns such as "),o("code",[e._v("year")]),e._v(", "),o("code",[e._v("month")]),e._v(", "),o("code",[e._v("day")]),e._v(", "),o("code",[e._v("hours_local_time")]),e._v(", "),o("code",[e._v("minutes_local_time")]),e._v(", "),o("code",[e._v("seconds_local_time")]),e._v(". For spatio-temporal datasets, it’s important to know the UTC date and time of the recorded data to ensure that searches for data with a time range are accurate. Here, these columns are combined to form an ISO 8601-compliant UTC datetime value.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsconvert_to_decimal_degrees",target:"_blank",rel:"noopener noreferrer"}},[e._v("convert_to_decimal_degrees"),o("OutboundLink")],1),e._v(" – Convert a single field containing coordinate information from degrees-minutes-seconds or degrees-decimal_minutes to decimal_degrees. The standard representation at BCO-DMO for spatial data conforms to the decimal degrees specification.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsreorder_fields",target:"_blank",rel:"noopener noreferrer"}},[e._v("reorder_fields"),o("OutboundLink")],1),e._v(" – Changes the order of columns within the data. This is a convention within the oceanographic data community to put certain columns at the beginning of tabular data to help contextualize the following columns. Examples of columns that are typically moved to the beginning are: dates, locations, instrument or vessel identifiers, and depth at collection.")]),e._v(" "),o("p",[e._v("The remaining processors used by BCO-DMO can be found at "),o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/BCODMO/bcodmo_processors"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("How does Laminar work?"),o("br"),e._v("\nIn our collaboration with OKF, BCO-DMO developed use cases based on real-world data submissions. One such example is a recent Arctic Nitrogen Fixation Rates dataset.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(495),alt:"Arctic dataset"}})]),e._v(" "),o("p",[e._v("The original dataset shown above needed the following curation steps to make the data more interoperable and reusable:")]),e._v(" "),o("p",[e._v("Convert lat/lon to decimal degrees"),o("br"),e._v("\nAdd timestamp (UTC) in ISO format"),o("br"),e._v("\n‘Collection Depth’ with value “surface” should be changed to 0"),o("br"),e._v("\nRemove parenthesis and units from column names (field descriptions and units captured in metadata)."),o("br"),e._v("\nRemove spaces from column names"),o("br"),e._v("\nThe web application, named Laminar, built on top of DPP helps Data Managers at BCO-DMO perform these operations in a consistent way. First, Laminar prompts us to name and describe the current pipeline being developed, and assumes that the data manager wants to load some data in to start the pipeline, and prompts for a source location.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(496),alt:"Laminar"}})]),e._v(" "),o("p",[e._v("After providing a name and description of our DPP workflow, we provide a data source to load, and give it the name, ‘nfix’.")]),e._v(" "),o("p",[e._v("In subsequent pipeline steps, we refer to ‘nfix’ as the resource we want to transform. For example, to convert the latitude and longitude into decimal degrees, we add a new step to the pipeline, select the ‘Convert to decimal degrees’ processor, a proxy for our custom processor convert_to_decimal_degrees’, select the ‘nfix’ resource, select a field form that ‘nfix’ data source, and specify the Python regex pattern identifying where the values for the degrees, minutes and seconds can be found in each value of the latitude column.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(497),alt:"processor step"}})]),e._v(" "),o("p",[e._v("Similarly, in step 7 of this pipeline, we want to generate an ISO 8601-compliant UTC datetime value by combining the pre-existing ‘Date’ and ‘Local Time’ columns. This step is depicted below:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(498),alt:"date processing step"}})]),e._v(" "),o("p",[e._v("After the pipeline is completed, the interface displays all steps, and lets the data manager execute the pipeline by clicking the green ‘play’ button at the bottom. This button then generates the pipeline-spec.yaml file, executes the pipeline, and can display the resulting dataset.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(499),alt:"all steps"}})]),e._v(" "),o("p",[o("img",{attrs:{src:a(500),alt:"data"}})]),e._v(" "),o("p",[e._v("The resulting DPP workflow contained 223 lines across this 12-step operation, and for a data manager, the web application reduces the chance of error if this pipelines was being generated by hand. Ultimately, our work with OKF helped us develop processors that follow the DPP conventions.")]),e._v(" "),o("p",[e._v("Our goal for the pilot project with OKF was to have BCO-DMO data managers using the Laminar for processing 80% of the data submissions we receive. The pilot was so successful, that data managers have processed 95% of new data submissions to the repository using the application.")]),e._v(" "),o("p",[e._v("This is exciting from a data management processing perspective because the use of Laminar is more sustainable, and acted to bring the team together to determine best strategies for processing, documentation, etc. This increase in consistency and efficiency is welcomed from an administrative perspective and helps with the training of any new data managers coming to the team.")]),e._v(" "),o("p",[e._v("The OKF team are excellent partners, who were the catalysts to a successful project. The next steps for BCO-DMO are to build on the success of The Frictionlessdata Data Package Pipelines by implementing the Frictionlessdata Goodtables specification for data validation to help us develop submission guidelines for common data types. Special thanks to the OKF team – Lilly Winfree, Evgeny Karev, and Jo Barrett.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[8],{496:function(e,t,a){e.exports=a.p+"assets/img/bcodmoLogo.958d74b9.jpg"},497:function(e,t,a){e.exports=a.p+"assets/img/bcodmo1.1e0069cf.png"},498:function(e,t,a){e.exports=a.p+"assets/img/bcodmo2.1e6fde83.png"},499:function(e,t,a){e.exports=a.p+"assets/img/bcodmo3.a2871755.png"},500:function(e,t,a){e.exports=a.p+"assets/img/bcodmo4.74b606a5.png"},501:function(e,t,a){e.exports=a.p+"assets/img/bcodmo5.ab522411.png"},502:function(e,t,a){e.exports=a.p+"assets/img/bcodmo6.c90593b8.png"},625:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[e._v("This blog post describes a Frictionless Data Pilot with the Biological and Chemical Oceanography Data Management Office (BCO-DMO). Pilot projects are part of the "),o("a",{attrs:{href:"https://frictionlessdata.io/reproducible-research/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Frictionless Data for Reproducible Research project"),o("OutboundLink")],1),e._v(". Written by the BCO-DMO team members Adam Shepherd, Amber York, Danie Kinkade, and development by Conrad Schloer.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(496),alt:"BCO-DMO logo"}})]),e._v(" "),o("p",[e._v("Scientific research is implicitly reliant upon the creation, management, analysis, synthesis, and interpretation of data. When properly stewarded, data hold great potential to demonstrate the reproducibility of scientific results and accelerate scientific discovery. "),o("a",{attrs:{href:"https://www.bco-dmo.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("The Biological and Chemical Oceanography Data Management Office (BCO-DMO)"),o("OutboundLink")],1),e._v(" is a publicly accessible earth science data repository established by the National Science Foundation "),o("a",{attrs:{href:"https://www.nsf.gov/",target:"_blank",rel:"noopener noreferrer"}},[e._v("(NSF)"),o("OutboundLink")],1),e._v(" for the curation of biological, chemical, and biogeochemical oceanographic data from research in coastal, marine, and laboratory environments. With the groundswell surrounding the "),o("a",{attrs:{href:"https://doi.org/10.1038/sdata.2016.18",target:"_blank",rel:"noopener noreferrer"}},[e._v("FAIR data principles"),o("OutboundLink")],1),e._v(", BCO-DMO recognized an opportunity to improve its curation services to better support reproducibility of results, while increasing process efficiencies for incoming data submissions. "),o("strong",[e._v("In 2019, BCO-DMO worked with the Frictionless Data team at Open Knowledge Foundation to develop a web application called Laminar for creating Frictionlessdata Data Package Pipelines that help data managers process data efficiently while recording the provenance of their activities to support reproducibility of results.")])]),e._v(" "),o("p",[e._v("The mission of BCO-DMO is to provide investigators with data management services that span the full data lifecycle from data management planning, to data publication, and archiving.")]),e._v(" "),o("p",[e._v("BCO-DMO provides free access to oceanographic data through a web-based catalog with tools and features facilitating assessment of fitness for purpose. The result of this effort is a database containing over "),o("strong",[e._v("9,000 datasets from a variety of oceanographic and limnological measurements")]),e._v(" including those from: in situ sampling, moorings, floats and gliders, sediment traps; laboratory and mesocosm experiments; satellite images; derived parameters and model output; and synthesis products from data integration efforts. The project has worked with over 2,600 data contributors representing over 1,000 funded projects.")]),e._v(" "),o("p",[e._v("As the catalog of data holdings continued to grow in both size and the variety of data types it curates, BCO-DMO needed to retool its data infrastructure with three goals. First, to improve the transportation of data to, from, and within BCO-DMO’s ecosystem. Second, to support reproducibility of research by making all curation activities of the office completely transparent and traceable. Finally, to improve the efficiency and consistency across data management staff. Until recently, data curation activities in the office were largely dependent on the individual capabilities of each data manager. While some of the staff were fluent in Python and other scripting languages, others were dependent on in-house custom developed tools. These in-house tools were extremely useful and flexible, but they were developed for an aging computing paradigm grounded in physical hardware accessing local data resources on disk. While locally stored data is still the convention at BCO-DMO, the distributed nature of the web coupled with the challenges of big data stretched this toolset beyond its original intention.")]),e._v(" "),o("p",[e._v("In 2015, we were introduced to the idea of data containerization and the Frictionless Data project in a "),o("a",{attrs:{href:"https://www.rd-alliance.org/data-packages-bof-p6-bof-session.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Packages BoF"),o("OutboundLink")],1),e._v(" at the "),o("a",{attrs:{href:"https://www.rd-alliance.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Research Data Alliance"),o("OutboundLink")],1),e._v(" conference in Paris, France. After evaluating the Frictionless Data specifications and tools, BCO-DMO developed a strategy to underpin its new data infrastructure on the ideas behind this project.")]),e._v(" "),o("p",[e._v("While the concept of data packaging is not new, the simplicity and extendibility of the Frictionless Data implementation made it easy to adopt within an existing infrastructure. "),o("strong",[e._v("BCO-DMO identified the Data Package Pipelines (DPP) project in the Frictionless Data toolset as key to achieving its data curation goals.")]),e._v(" DPP implements the philosophy of declarative workflows which trade code in a specific programming language that tells a computer how a task should be completed, for imperative, structured statements that detail what should be done. These structured statements abstract the user writing the statements from the actual code executing them, and are useful for reproducibility over long periods of time where programming languages age, change or algorithms improve. This flexibility was appealing because it meant the intent of the data manager could be translated into many varying programming (and data) languages over time without having to refactor older workflows. In data management, that means that one of the languages a DPP workflow captures is provenance – a common need across oceanographic datasets for reproducibility. DPP Workflows translated into records of provenance explicitly communicates to data submitters and future data users what BCO-DMO had done during the curation phase. Secondly, because workflow steps need to be interpreted by computers into code that carries out the instructions, it helped data management staff converge on a declarative language they could all share. This convergence meant cohesiveness, consistency, and efficiency across the team if we could implement DPP in a way they could all use.")]),e._v(" "),o("p",[o("strong",[e._v("In 2018, BCO-DMO formed a partnership with Open Knowledge Foundation (OKF) to develop a web application that would help any BCO-DMO data manager use the declarative language they had developed in a consistent way.")]),e._v(" Why develop a web application for DPP? As the data management staff evaluated DPP and Frictionless Data, they found that there was a learning curve to setting up the DPP environment and a deep understanding of the Frictionlessdata ‘Data Package’ specification was required. The web application abstracted this required knowledge to achieve two main goals: 1) consistently structured Data Packages (datapackage.json) with all the required metadata employed at BCO-DMO, and 2) efficiencies of time by eliminating typos and syntax errors made by data managers. Thus, the partnership with OKF focused on making the needs of scientific research data a possibility within the Frictionless Data ecosystem of specs and tools.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-pipelines",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Package Pipelines"),o("OutboundLink")],1),e._v(" is implemented in Python and comes with some built-in processors that can be used in a workflow. BCO-DMO took its own declarative language and identified gaps in the built-in processors. For these gaps, BCO-DMO and OKF developed Python implementations for the missing declarations to support the curation of oceanographic data, and the result was a new set of processors made available on "),o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors",target:"_blank",rel:"noopener noreferrer"}},[e._v("Github"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("Some notable BCO-DMO processors are:")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsboolean_add_computed_field",target:"_blank",rel:"noopener noreferrer"}},[e._v("boolean_add_computed_field"),o("OutboundLink")],1),e._v("– Computes a new field to add to the data whether a particular row satisfies a certain set of criteria."),o("br"),e._v("\nExample: Where Cruise_ID = ‘AT39-05’ and Station = 6, set Latitude to 22.1645.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsconvert_date",target:"_blank",rel:"noopener noreferrer"}},[e._v("convert_date"),o("OutboundLink")],1),e._v(" – Converts any number of fields containing date information into a single date field with display format and timezone options. Often data information is reported in multiple columns such as "),o("code",[e._v("year")]),e._v(", "),o("code",[e._v("month")]),e._v(", "),o("code",[e._v("day")]),e._v(", "),o("code",[e._v("hours_local_time")]),e._v(", "),o("code",[e._v("minutes_local_time")]),e._v(", "),o("code",[e._v("seconds_local_time")]),e._v(". For spatio-temporal datasets, it’s important to know the UTC date and time of the recorded data to ensure that searches for data with a time range are accurate. Here, these columns are combined to form an ISO 8601-compliant UTC datetime value.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsconvert_to_decimal_degrees",target:"_blank",rel:"noopener noreferrer"}},[e._v("convert_to_decimal_degrees"),o("OutboundLink")],1),e._v(" – Convert a single field containing coordinate information from degrees-minutes-seconds or degrees-decimal_minutes to decimal_degrees. The standard representation at BCO-DMO for spatial data conforms to the decimal degrees specification.")]),e._v(" "),o("p",[o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors#bcodmo_pipeline_processorsreorder_fields",target:"_blank",rel:"noopener noreferrer"}},[e._v("reorder_fields"),o("OutboundLink")],1),e._v(" – Changes the order of columns within the data. This is a convention within the oceanographic data community to put certain columns at the beginning of tabular data to help contextualize the following columns. Examples of columns that are typically moved to the beginning are: dates, locations, instrument or vessel identifiers, and depth at collection.")]),e._v(" "),o("p",[e._v("The remaining processors used by BCO-DMO can be found at "),o("a",{attrs:{href:"https://github.com/BCODMO/bcodmo_processors",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/BCODMO/bcodmo_processors"),o("OutboundLink")],1),e._v(".")]),e._v(" "),o("p",[e._v("How does Laminar work?"),o("br"),e._v("\nIn our collaboration with OKF, BCO-DMO developed use cases based on real-world data submissions. One such example is a recent Arctic Nitrogen Fixation Rates dataset.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(497),alt:"Arctic dataset"}})]),e._v(" "),o("p",[e._v("The original dataset shown above needed the following curation steps to make the data more interoperable and reusable:")]),e._v(" "),o("p",[e._v("Convert lat/lon to decimal degrees"),o("br"),e._v("\nAdd timestamp (UTC) in ISO format"),o("br"),e._v("\n‘Collection Depth’ with value “surface” should be changed to 0"),o("br"),e._v("\nRemove parenthesis and units from column names (field descriptions and units captured in metadata)."),o("br"),e._v("\nRemove spaces from column names"),o("br"),e._v("\nThe web application, named Laminar, built on top of DPP helps Data Managers at BCO-DMO perform these operations in a consistent way. First, Laminar prompts us to name and describe the current pipeline being developed, and assumes that the data manager wants to load some data in to start the pipeline, and prompts for a source location.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(498),alt:"Laminar"}})]),e._v(" "),o("p",[e._v("After providing a name and description of our DPP workflow, we provide a data source to load, and give it the name, ‘nfix’.")]),e._v(" "),o("p",[e._v("In subsequent pipeline steps, we refer to ‘nfix’ as the resource we want to transform. For example, to convert the latitude and longitude into decimal degrees, we add a new step to the pipeline, select the ‘Convert to decimal degrees’ processor, a proxy for our custom processor convert_to_decimal_degrees’, select the ‘nfix’ resource, select a field form that ‘nfix’ data source, and specify the Python regex pattern identifying where the values for the degrees, minutes and seconds can be found in each value of the latitude column.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(499),alt:"processor step"}})]),e._v(" "),o("p",[e._v("Similarly, in step 7 of this pipeline, we want to generate an ISO 8601-compliant UTC datetime value by combining the pre-existing ‘Date’ and ‘Local Time’ columns. This step is depicted below:")]),e._v(" "),o("p",[o("img",{attrs:{src:a(500),alt:"date processing step"}})]),e._v(" "),o("p",[e._v("After the pipeline is completed, the interface displays all steps, and lets the data manager execute the pipeline by clicking the green ‘play’ button at the bottom. This button then generates the pipeline-spec.yaml file, executes the pipeline, and can display the resulting dataset.")]),e._v(" "),o("p",[o("img",{attrs:{src:a(501),alt:"all steps"}})]),e._v(" "),o("p",[o("img",{attrs:{src:a(502),alt:"data"}})]),e._v(" "),o("p",[e._v("The resulting DPP workflow contained 223 lines across this 12-step operation, and for a data manager, the web application reduces the chance of error if this pipelines was being generated by hand. Ultimately, our work with OKF helped us develop processors that follow the DPP conventions.")]),e._v(" "),o("p",[e._v("Our goal for the pilot project with OKF was to have BCO-DMO data managers using the Laminar for processing 80% of the data submissions we receive. The pilot was so successful, that data managers have processed 95% of new data submissions to the repository using the application.")]),e._v(" "),o("p",[e._v("This is exciting from a data management processing perspective because the use of Laminar is more sustainable, and acted to bring the team together to determine best strategies for processing, documentation, etc. This increase in consistency and efficiency is welcomed from an administrative perspective and helps with the training of any new data managers coming to the team.")]),e._v(" "),o("p",[e._v("The OKF team are excellent partners, who were the catalysts to a successful project. The next steps for BCO-DMO are to build on the success of The Frictionlessdata Data Package Pipelines by implementing the Frictionlessdata Goodtables specification for data validation to help us develop submission guidelines for common data types. Special thanks to the OKF team – Lilly Winfree, Evgeny Karev, and Jo Barrett.")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/80.b4a05a86.js b/assets/js/80.fe41b3f3.js similarity index 99% rename from assets/js/80.b4a05a86.js rename to assets/js/80.fe41b3f3.js index 5802a5995..d22212cbb 100644 --- a/assets/js/80.b4a05a86.js +++ b/assets/js/80.fe41b3f3.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[80],{597:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Matt Thompson was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("data package"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("table schema"),s("OutboundLink")],1),t._v(" libraries in Clojure programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/10/26/matt-thompson/"}},[t._v("his grantee profile")]),t._v(". In this post, Thompson will show you how to set up and use the "),s("a",{attrs:{href:"http://clojure.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Clojure"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")],1),t._v(" "),s("p",[t._v("This tutorial uses a worked example of downloading a data package from a remote location on the web, and using the Frictionless Data tools to read its contents and metadata into Clojure data structures.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("First, we need to set up the project structure using the "),s("a",{attrs:{href:"http://leiningen.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Leiningen"),s("OutboundLink")],1),t._v(" tool. If you don’t have Leiningen set up on your system, follow the link to download and install it. Once it is set up, run the following command from the command line to create the folders and files for a basic Clojure project:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("\nlein new periodic-table\n\n")])])]),s("p",[t._v("This will create the "),s("em",[t._v("periodic-table")]),t._v(" folder. Inside the "),s("em",[t._v("periodic-table/src/periodic-table")]),t._v(" folder should be a file named "),s("em",[t._v("core.clj")]),t._v(". This is the file you need to edit during this tutorial.")]),t._v(" "),s("h2",{attrs:{id:"the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-data"}},[t._v("#")]),t._v(" The Data")]),t._v(" "),s("p",[t._v("For this tutorial, we will use a pre-created data package, the Periodic Table Data Package hosted by the Frictionless Data project. A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("p",[t._v("Our Clojure code will download the data package and process it using the metadata information contained in the"),s("br"),t._v("\npackage. The data package can be found "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("here on GitHub"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The data package contains data about elements in the periodic table, including each element’s name, atomic number, symbol and atomic weight. The table below shows a sample taken from the first three rows of the CSV file:")]),t._v(" "),s("table",[s("thead",[s("tr",[s("th",[t._v("atomic number")]),t._v(" "),s("th",[t._v("symbol")]),t._v(" "),s("th",[t._v("name")]),t._v(" "),s("th",[t._v("atomic mass")]),t._v(" "),s("th",[t._v("metal or nonmetal?")])])]),t._v(" "),s("tbody",[s("tr",[s("td",[t._v("1")]),t._v(" "),s("td",[t._v("H")]),t._v(" "),s("td",[t._v("Hydrogen")]),t._v(" "),s("td",[t._v("1.00794")]),t._v(" "),s("td",[t._v("nonmetal")])]),t._v(" "),s("tr",[s("td",[t._v("2")]),t._v(" "),s("td",[t._v("He")]),t._v(" "),s("td",[t._v("Helium")]),t._v(" "),s("td",[t._v("4.002602")]),t._v(" "),s("td",[t._v("noble gas")])]),t._v(" "),s("tr",[s("td",[t._v("3")]),t._v(" "),s("td",[t._v("Li")]),t._v(" "),s("td",[t._v("Lithium")]),t._v(" "),s("td",[t._v("6.941")]),t._v(" "),s("td",[t._v("alkali metal")])])])]),t._v(" "),s("h2",{attrs:{id:"loading-the-data-package"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#loading-the-data-package"}},[t._v("#")]),t._v(" Loading the Data Package")]),t._v(" "),s("p",[t._v("The first step is to load the data package into a Clojure data structure (a map). The initial step is to require the data package library in our code (which we will give the alias "),s("strong",[t._v("dp")]),t._v("). Then we can use the "),s("strong",[t._v("load")]),t._v(" function to load our data package into our project. Enter the following code into the core.clj file:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("ns")]),t._v(" periodic-table.core\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":require")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.datapackage "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.tableschema "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" ts"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("clojure.spec.alpha "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" s"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" pkg\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/load")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("This pulls the data in from the remote GitHub location and converts the metadata into a Clojure map. We can access this metadata by using the "),s("code",[t._v("descriptor")]),t._v(" function along with keys such as "),s("code",[t._v(":name")]),t._v(" and "),s("code",[t._v(":title")]),t._v(" to get the relevant information:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("str")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package name:"')]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/descriptor")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":name")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("str")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package title:"')]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/descriptor")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":title")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The package descriptor contains metadata that describes the contents of the data package. What about accessing the data itself? We can get to it using the "),s("code",[t._v("get-resources")]),t._v(" function:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/get-resources")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":data")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("doseq")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("row table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The above code locates the data in the data package, then goes through it line by line and prints the contents.")]),t._v(" "),s("h2",{attrs:{id:"casting-types-with-core-spec"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#casting-types-with-core-spec"}},[t._v("#")]),t._v(" Casting Types with core.spec")]),t._v(" "),s("p",[t._v("We can use Clojure’s "),s("a",{attrs:{href:"https://clojure.org/guides/spec",target:"_blank",rel:"noopener noreferrer"}},[t._v("spec"),s("OutboundLink")],1),t._v(" library to define a schema for our data, which can then be used to cast the types of the data in the CSV file.")]),t._v(" "),s("p",[t._v("Below is a spec description of a periodic element type, consisting of an atomic number, atomic symbol, the element’s name, its mass, and whether or not the element is a metal or non-metal:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" int?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" float?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::element")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/keys")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":req")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The above spec can be used to cast values in our tabular data so that they match the specified schema. The example below shows our tabular data values being cast to fit the spec description. Then the "),s("code",[t._v("-main")]),t._v(" function loops through the elements, printing only those with an atomic mass of over 10.")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("ns")]),t._v(" periodic-table.core\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":require")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.datapackage "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.tableschema "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" ts"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("clojure.spec.alpha "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" s"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" int?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" float?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::element")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/keys")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":req")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" pkg\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/load")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" resources "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/get-resources")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":data")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/cast")]),t._v(" resources element"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("defn")]),t._v(" -main "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("doseq")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("e elements"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":mass")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("When run, the program produces the following output:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ lein run\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"H"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Hydrogen"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.00794")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"nonmetal"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"He"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Helium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.002602")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"noble gas"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Li"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Lithium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("6.941")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"alkali gas"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Be"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Beryllium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("9.012182")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"alkaline earth metal"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("This concludes our simple tutorial for using the Clojure libraries for Frictionless Data.")]),t._v(" "),s("p",[t._v("We welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-clj"),s("OutboundLink")],1),t._v(" repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[80],{598:function(t,a,s){"use strict";s.r(a);var e=s(29),n=Object(e.a)({},(function(){var t=this,a=t.$createElement,s=t._self._c||a;return s("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[s("p",[t._v("Matt Thompson was one of 2017’s "),s("a",{attrs:{href:"https://toolfund.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Tool Fund"),s("OutboundLink")],1),t._v(" grantees tasked with extending implementation of core Frictionless Data "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("data package"),s("OutboundLink")],1),t._v(" and "),s("a",{attrs:{href:"https://github.com/frictionlessdata/tableschema-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("table schema"),s("OutboundLink")],1),t._v(" libraries in Clojure programming language. You can read more about this in "),s("RouterLink",{attrs:{to:"/blog/2017/10/26/matt-thompson/"}},[t._v("his grantee profile")]),t._v(". In this post, Thompson will show you how to set up and use the "),s("a",{attrs:{href:"http://clojure.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Clojure"),s("OutboundLink")],1),t._v(" libraries for working with "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/tabular-data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Tabular Data Packages"),s("OutboundLink")],1),t._v(".")],1),t._v(" "),s("p",[t._v("This tutorial uses a worked example of downloading a data package from a remote location on the web, and using the Frictionless Data tools to read its contents and metadata into Clojure data structures.")]),t._v(" "),s("h2",{attrs:{id:"setup"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#setup"}},[t._v("#")]),t._v(" Setup")]),t._v(" "),s("p",[t._v("First, we need to set up the project structure using the "),s("a",{attrs:{href:"http://leiningen.org",target:"_blank",rel:"noopener noreferrer"}},[t._v("Leiningen"),s("OutboundLink")],1),t._v(" tool. If you don’t have Leiningen set up on your system, follow the link to download and install it. Once it is set up, run the following command from the command line to create the folders and files for a basic Clojure project:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("\nlein new periodic-table\n\n")])])]),s("p",[t._v("This will create the "),s("em",[t._v("periodic-table")]),t._v(" folder. Inside the "),s("em",[t._v("periodic-table/src/periodic-table")]),t._v(" folder should be a file named "),s("em",[t._v("core.clj")]),t._v(". This is the file you need to edit during this tutorial.")]),t._v(" "),s("h2",{attrs:{id:"the-data"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#the-data"}},[t._v("#")]),t._v(" The Data")]),t._v(" "),s("p",[t._v("For this tutorial, we will use a pre-created data package, the Periodic Table Data Package hosted by the Frictionless Data project. A "),s("a",{attrs:{href:"https://specs.frictionlessdata.io/data-package/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package"),s("OutboundLink")],1),t._v(" is a simple container format used to describe and package a collection of data. It consists of two parts:")]),t._v(" "),s("ul",[s("li",[t._v("Metadata that describes the structure and contents of the package")]),t._v(" "),s("li",[t._v("Resources such as data files that form the contents of the package")])]),t._v(" "),s("p",[t._v("Our Clojure code will download the data package and process it using the metadata information contained in the"),s("br"),t._v("\npackage. The data package can be found "),s("a",{attrs:{href:"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json",target:"_blank",rel:"noopener noreferrer"}},[t._v("here on GitHub"),s("OutboundLink")],1),t._v(".")]),t._v(" "),s("p",[t._v("The data package contains data about elements in the periodic table, including each element’s name, atomic number, symbol and atomic weight. The table below shows a sample taken from the first three rows of the CSV file:")]),t._v(" "),s("table",[s("thead",[s("tr",[s("th",[t._v("atomic number")]),t._v(" "),s("th",[t._v("symbol")]),t._v(" "),s("th",[t._v("name")]),t._v(" "),s("th",[t._v("atomic mass")]),t._v(" "),s("th",[t._v("metal or nonmetal?")])])]),t._v(" "),s("tbody",[s("tr",[s("td",[t._v("1")]),t._v(" "),s("td",[t._v("H")]),t._v(" "),s("td",[t._v("Hydrogen")]),t._v(" "),s("td",[t._v("1.00794")]),t._v(" "),s("td",[t._v("nonmetal")])]),t._v(" "),s("tr",[s("td",[t._v("2")]),t._v(" "),s("td",[t._v("He")]),t._v(" "),s("td",[t._v("Helium")]),t._v(" "),s("td",[t._v("4.002602")]),t._v(" "),s("td",[t._v("noble gas")])]),t._v(" "),s("tr",[s("td",[t._v("3")]),t._v(" "),s("td",[t._v("Li")]),t._v(" "),s("td",[t._v("Lithium")]),t._v(" "),s("td",[t._v("6.941")]),t._v(" "),s("td",[t._v("alkali metal")])])])]),t._v(" "),s("h2",{attrs:{id:"loading-the-data-package"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#loading-the-data-package"}},[t._v("#")]),t._v(" Loading the Data Package")]),t._v(" "),s("p",[t._v("The first step is to load the data package into a Clojure data structure (a map). The initial step is to require the data package library in our code (which we will give the alias "),s("strong",[t._v("dp")]),t._v("). Then we can use the "),s("strong",[t._v("load")]),t._v(" function to load our data package into our project. Enter the following code into the core.clj file:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("ns")]),t._v(" periodic-table.core\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":require")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.datapackage "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.tableschema "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" ts"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("clojure.spec.alpha "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" s"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" pkg\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/load")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("This pulls the data in from the remote GitHub location and converts the metadata into a Clojure map. We can access this metadata by using the "),s("code",[t._v("descriptor")]),t._v(" function along with keys such as "),s("code",[t._v(":name")]),t._v(" and "),s("code",[t._v(":title")]),t._v(" to get the relevant information:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("str")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package name:"')]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/descriptor")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":name")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("str")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Package title:"')]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/descriptor")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":title")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The package descriptor contains metadata that describes the contents of the data package. What about accessing the data itself? We can get to it using the "),s("code",[t._v("get-resources")]),t._v(" function:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" table "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/get-resources")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":data")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("doseq")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("row table"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" row"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The above code locates the data in the data package, then goes through it line by line and prints the contents.")]),t._v(" "),s("h2",{attrs:{id:"casting-types-with-core-spec"}},[s("a",{staticClass:"header-anchor",attrs:{href:"#casting-types-with-core-spec"}},[t._v("#")]),t._v(" Casting Types with core.spec")]),t._v(" "),s("p",[t._v("We can use Clojure’s "),s("a",{attrs:{href:"https://clojure.org/guides/spec",target:"_blank",rel:"noopener noreferrer"}},[t._v("spec"),s("OutboundLink")],1),t._v(" library to define a schema for our data, which can then be used to cast the types of the data in the CSV file.")]),t._v(" "),s("p",[t._v("Below is a spec description of a periodic element type, consisting of an atomic number, atomic symbol, the element’s name, its mass, and whether or not the element is a metal or non-metal:")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" int?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" float?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::element")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/keys")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":req")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("The above spec can be used to cast values in our tabular data so that they match the specified schema. The example below shows our tabular data values being cast to fit the spec description. Then the "),s("code",[t._v("-main")]),t._v(" function loops through the elements, printing only those with an atomic mass of over 10.")]),t._v(" "),s("div",{staticClass:"language-clojure extra-class"},[s("pre",{pre:!0,attrs:{class:"language-clojure"}},[s("code",[s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("ns")]),t._v(" periodic-table.core\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":require")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.datapackage "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" dp"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("frictionlessdata.tableschema "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" ts"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("clojure.spec.alpha "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":as")]),t._v(" s"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" int?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" float?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),t._v(" string?"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/def")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::element")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("s/keys")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":req")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::number")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::symbol")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::name")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::mass")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v("::metal")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" pkg\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/load")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" resources "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/get-resources")]),t._v(" pkg "),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":data")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("def")]),t._v(" elements "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token function"}},[t._v("dp/cast")]),t._v(" resources element"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("defn")]),t._v(" -main "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("doseq")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("[")]),t._v("e elements"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("]")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("if")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("<")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token symbol"}},[t._v(":mass")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v(" "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("10")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n "),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("(")]),s("span",{pre:!0,attrs:{class:"token keyword"}},[t._v("println")]),t._v(" e"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v(")")]),t._v("\n")])])]),s("p",[t._v("When run, the program produces the following output:")]),t._v(" "),s("div",{staticClass:"language-sh extra-class"},[s("pre",{pre:!0,attrs:{class:"language-sh"}},[s("code",[t._v("$ lein run\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"H"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Hydrogen"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("1.00794")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"nonmetal"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("2")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"He"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Helium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4.002602")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"noble gas"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("3")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Li"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Lithium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("6.941")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"alkali gas"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n"),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("{")]),t._v("::number "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("4")]),t._v(" ::symbol "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Be"')]),t._v(" ::name "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"Beryllium"')]),t._v(" ::mass "),s("span",{pre:!0,attrs:{class:"token number"}},[t._v("9.012182")]),t._v(" ::metal "),s("span",{pre:!0,attrs:{class:"token string"}},[t._v('"alkaline earth metal"')]),s("span",{pre:!0,attrs:{class:"token punctuation"}},[t._v("}")]),t._v("\n")])])]),s("p",[t._v("This concludes our simple tutorial for using the Clojure libraries for Frictionless Data.")]),t._v(" "),s("p",[t._v("We welcome your feedback and questions via our "),s("a",{attrs:{href:"http://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data Gitter chat"),s("OutboundLink")],1),t._v(" or via "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj/issues",target:"_blank",rel:"noopener noreferrer"}},[t._v("GitHub issues"),s("OutboundLink")],1),t._v(" on the "),s("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("datapackage-clj"),s("OutboundLink")],1),t._v(" repository.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/81.119cea57.js b/assets/js/81.f80c9b50.js similarity index 99% rename from assets/js/81.119cea57.js rename to assets/js/81.f80c9b50.js index d2c85097b..ec7f87bec 100644 --- a/assets/js/81.119cea57.js +++ b/assets/js/81.f80c9b50.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[81],{598:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("This page provides an overview CSV (Comma Separated Values) format for data.")]),e._v(" "),r("p",[e._v("CSV is a very old, very simple and very common “standard” for (tabular) data."),r("br"),e._v("\nWe say “standard” in quotes because there was never a formal standard for CSV,"),r("br"),e._v("\nthough in 2005 someone did put together a "),r("a",{attrs:{href:"http://tools.ietf.org/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC"),r("OutboundLink")],1),e._v(" for it.")]),e._v(" "),r("p",[e._v("CSV is supported by a "),r("strong",[e._v("huge")]),e._v(" number of tools from spreadsheets like Excel,"),r("br"),e._v("\nOpenOffice and Google Docs to complex databases to almost all programming"),r("br"),e._v("\nlanguages. As such it is probably the most widely supported structured data"),r("br"),e._v("\nformat in the world.")]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"the-format"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#the-format"}},[e._v("#")]),e._v(" The Format")]),e._v(" "),r("p",[e._v("Key points are:")]),e._v(" "),r("ul",[r("li",[e._v("CSV is probably the simplest possible structured format for data")]),e._v(" "),r("li",[e._v("CSV strikes a delicate balance, remaining readable by both machines & humans")]),e._v(" "),r("li",[e._v("CSV is a two dimensional structure consisting of rows of data, each row"),r("br"),e._v("\ncontaining multiple cells. Rows are (usually) separated by line terminators"),r("br"),e._v("\nso each row corresponds to one line. Cells within a row are separated by"),r("br"),e._v("\ncommas (hence the C(ommmas) part)\n"),r("ul",[r("li",[e._v("Note that strictly we’re really talking about DSV files in that we can"),r("br"),e._v("\nallow ‘delimiters’ between cells other than a comma. However, many people"),r("br"),e._v("\nand many programs still call such data CSV (since comma is so common as the"),r("br"),e._v("\ndelimiter)")])])]),e._v(" "),r("li",[e._v("CSV is a “text-based” format, i.e. a CSV file "),r("em",[e._v("is")]),e._v(" a text file. This makes it"),r("br"),e._v("\namenable for processing with all kinds of text-oriented tools (from text"),r("br"),e._v("\neditors to "),r("a",{attrs:{href:"https://github.com/rgrp/command-line-data-wrangling",target:"_blank",rel:"noopener noreferrer"}},[e._v("unix tools like sed, grep etc"),r("OutboundLink")],1),e._v(")")])]),e._v(" "),r("h3",{attrs:{id:"what-a-csv-looks-like"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-a-csv-looks-like"}},[e._v("#")]),e._v(" What a CSV looks like")]),e._v(" "),r("p",[e._v("If you open up a CSV file in a text editor it would look something like:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v('A,B,C\n1,2,3\n4,"5,3",6\n')])])]),r("p",[e._v("Here there are 3 rows each of 3 columns. Notice how the second column in the last line is"),r("br"),e._v("\n“quoted” because the content of that value actually contains a “,” character. Without"),r("br"),e._v("\nthe quotes this character would be interpreted as a column separator. To avoid this"),r("br"),e._v("\nconfusion we put quotes around the whole value. The result is that we have 3 rows each"),r("br"),e._v("\nof 3 columns (Note a CSV file does not "),r("em",[e._v("have")]),e._v(" to have"),r("br"),e._v("\nthe same number of columns in each row).")]),e._v(" "),r("h3",{attrs:{id:"dialects-of-csvs"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#dialects-of-csvs"}},[e._v("#")]),e._v(" Dialects of CSVs")]),e._v(" "),r("p",[e._v("As mentioned above, CSV files can have quite a bit of variation in"),r("br"),e._v("\nstructure. Key options are:")]),e._v(" "),r("ul",[r("li",[e._v("Field delimiter: rather than comma "),r("code",[e._v(",")]),e._v(" people often use things like "),r("code",[e._v("\\t")]),r("br"),e._v("\n(tab), "),r("code",[e._v(";")]),e._v(" or "),r("code",[e._v("|")])]),e._v(" "),r("li",[e._v("Record terminator / line terminator: is "),r("code",[e._v("\\n")]),e._v(" (unix), "),r("code",[e._v("\\n\\r")]),e._v(" (dos) or something else …")]),e._v(" "),r("li",[e._v("How do you quote records that contain your delimiter")])]),e._v(" "),r("p",[e._v("You can read more in the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV Dialect Description Format"),r("OutboundLink")],1),e._v(" which defines"),r("br"),e._v("\na small JSON-oriented structure for specifying what options a CSV uses.")]),e._v(" "),r("h3",{attrs:{id:"what-is-missing-in-csv"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-is-missing-in-csv"}},[e._v("#")]),e._v(" What is Missing in CSV?")]),e._v(" "),r("ul",[r("li",[e._v("CSV lacks any way to specify type information: that is, there is no way to"),r("br"),e._v("\ndistinguish “1” the string from 1 the number. This shortcoming can be"),r("br"),e._v("\naddressed by adding some form of simple schema. For example "),r("RouterLink",{attrs:{to:"/table-schema/"}},[e._v("Table"),r("br"),e._v("\nSchema")]),e._v(" provides a very simple way to describe your schema externally"),r("br"),e._v("\nwhilst "),r("a",{attrs:{href:"http://jenit.github.io/linked-csv/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Linked CSV"),r("OutboundLink")],1),e._v(" is an example of doing this “inline” (that"),r("br"),e._v("\nis, in the CSV).")],1),e._v(" "),r("li",[e._v("No support for relationships between different “tables”. This is similar to"),r("br"),e._v("\nthe previous point and again "),r("RouterLink",{attrs:{to:"/table-schema/"}},[e._v("Table Schema")]),e._v(" provides a way to address"),r("br"),e._v("\nthis by providing additional schema information externally.")],1),e._v(" "),r("li",[e._v("CSV is really only for tabular data – it is not so good for data with"),r("br"),e._v("\nnesting or where structure is not especially tabular (though remember most"),r("br"),e._v("\ndata can be put into tabular form if you try hard enough!)")])]),e._v(" "),r("h3",{attrs:{id:"links"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#links"}},[e._v("#")]),e._v(" Links")]),e._v(" "),r("p",[e._v("Specifications and overviews:")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"http://tools.ietf.org/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC specification of CSV"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("[CSV Dialect Description Format][csvddf]")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://en.wikipedia.org/wiki/Comma-separated_values",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV on Wikipedia"),r("OutboundLink")],1)])]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"tools"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#tools"}},[e._v("#")]),e._v(" Tools")]),e._v(" "),r("p",[e._v("The great thing about CSV is the huge level of tool support. The following is"),r("br"),e._v("\nnot intended to be comprehensive but is more at the electic end of the spectrum.")]),e._v(" "),r("h3",{attrs:{id:"desktop"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#desktop"}},[e._v("#")]),e._v(" Desktop")]),e._v(" "),r("p",[e._v("All spreadsheet programs including Excel, OpenOffice, Google Docs"),r("br"),e._v("\nSpreadsheets supporting opening, editing and saving CSVs.")]),e._v(" "),r("h3",{attrs:{id:"view-a-csv-file-in-your-browser"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#view-a-csv-file-in-your-browser"}},[e._v("#")]),e._v(" View a CSV file in your Browser")]),e._v(" "),r("p",[e._v("You can view a CSV file (saving you the hassle of downloading it and opening"),r("br"),e._v("\nit). Options include:")]),e._v(" "),r("ul",[r("li",[r("p",[e._v("You can use datapipes: "),r("a",{attrs:{href:"http://datapipes.okfnlabs.org/csv/html",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://datapipes.okfnlabs.org/csv/html"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Just paste your CSV file and away you go.")])]),e._v(" "),r("li",[r("p",[e._v("Install this "),r("a",{attrs:{href:"https://chrome.google.com/webstore/detail/recline-csv-viewer/ibfcfelnbfhlbpelldnngdcklnndhael",target:"_blank",rel:"noopener noreferrer"}},[e._v("Chrome Browser Extension"),r("OutboundLink")],1),e._v(". This can be used both"),r("br"),e._v("\nfor online files and for files on your local disk (if you open them with your"),r("br"),e._v("\nbrowser!)")])])]),e._v(" "),r("h3",{attrs:{id:"unix-command-line-manipulation"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#unix-command-line-manipulation"}},[e._v("#")]),e._v(" Unix Command Line Manipulation")]),e._v(" "),r("p",[e._v("See")]),e._v(" "),r("ul",[r("li",[e._v("Using "),r("a",{attrs:{href:"https://github.com/rgrp/command-line-data-wrangling",target:"_blank",rel:"noopener noreferrer"}},[e._v("unix command line tools on CSV"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("The wonderful "),r("a",{attrs:{href:"http://csvkit.readthedocs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csvkit"),r("OutboundLink")],1),e._v(" (python)")])]),e._v(" "),r("h3",{attrs:{id:"power-tools"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#power-tools"}},[e._v("#")]),e._v(" Power Tools")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"http://openrefine.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenRefine"),r("OutboundLink")],1),e._v(" is a powerful tool for editing and manipulating data and works"),r("br"),e._v("\nvery well with CSV")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://explorer.okfnlabs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Explorer"),r("OutboundLink")],1),e._v(" supports importing CSVs and manipulating and changing"),r("br"),e._v("\nthem using javascript in the browser")])]),e._v(" "),r("h3",{attrs:{id:"libraries"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#libraries"}},[e._v("#")]),e._v(" Libraries")]),e._v(" "),r("p",[e._v("This is heavily biased towards python!")]),e._v(" "),r("h4",{attrs:{id:"python"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#python"}},[e._v("#")]),e._v(" Python")]),e._v(" "),r("ul",[r("li",[e._v("Built in csv library is good")]),e._v(" "),r("li",[e._v("The wonderful "),r("a",{attrs:{href:"http://csvkit.readthedocs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csvkit"),r("OutboundLink")],1),e._v(" (python)")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://messytables.readthedocs.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("messytables"),r("OutboundLink")],1),e._v(" (python) - convert lots of badly structured data into CSV (or"),r("br"),e._v("\nother formats)")])]),e._v(" "),r("h4",{attrs:{id:"node"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#node"}},[e._v("#")]),e._v(" Node")]),e._v(" "),r("p",[e._v("Nothing in standard lib yet and best option seems to be:")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"https://github.com/wdavidw/node-csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/wdavidw/node-csv"),r("OutboundLink")],1)])]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"tips-and-tricks"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#tips-and-tricks"}},[e._v("#")]),e._v(" Tips and Tricks")]),e._v(" "),r("h3",{attrs:{id:"csvs-and-git"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#csvs-and-git"}},[e._v("#")]),e._v(" CSVs and Git")]),e._v(" "),r("p",[e._v("Get git to handle CSV diffs in a sensible way (very useful if you are "),r("a",{attrs:{href:"http://blog.okfn.org/2013/07/02/git-and-github-for-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("using"),r("br"),e._v("\ngit or another version control system to store data"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Make these changes to config files:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v('# ~/.config/git/attributes\n*.csv diff=csv\n\n# ~/.gitconfig\n[diff "csv"]\n wordRegex = [^,\\n]+[,\\n]|[,]\n')])])]),r("p",[e._v("Then do:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v("git diff --word-diff\n# make it even nicer\ngit diff --word-diff --color-words\n")])])]),r("p",[e._v("Credit for these fixups to "),r("a",{attrs:{href:"http://opendata.stackexchange.com/questions/748/is-there-a-git-for-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("contributors on this question on"),r("br"),e._v("\nStackExchange"),r("OutboundLink")],1),r("br"),e._v("\nand to "),r("a",{attrs:{href:"http://theodi.org/blog/adapting-git-simple-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("James Smith"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[81],{597:function(e,t,r){"use strict";r.r(t);var a=r(29),o=Object(a.a)({},(function(){var e=this,t=e.$createElement,r=e._self._c||t;return r("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[r("p",[e._v("This page provides an overview CSV (Comma Separated Values) format for data.")]),e._v(" "),r("p",[e._v("CSV is a very old, very simple and very common “standard” for (tabular) data."),r("br"),e._v("\nWe say “standard” in quotes because there was never a formal standard for CSV,"),r("br"),e._v("\nthough in 2005 someone did put together a "),r("a",{attrs:{href:"http://tools.ietf.org/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC"),r("OutboundLink")],1),e._v(" for it.")]),e._v(" "),r("p",[e._v("CSV is supported by a "),r("strong",[e._v("huge")]),e._v(" number of tools from spreadsheets like Excel,"),r("br"),e._v("\nOpenOffice and Google Docs to complex databases to almost all programming"),r("br"),e._v("\nlanguages. As such it is probably the most widely supported structured data"),r("br"),e._v("\nformat in the world.")]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"the-format"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#the-format"}},[e._v("#")]),e._v(" The Format")]),e._v(" "),r("p",[e._v("Key points are:")]),e._v(" "),r("ul",[r("li",[e._v("CSV is probably the simplest possible structured format for data")]),e._v(" "),r("li",[e._v("CSV strikes a delicate balance, remaining readable by both machines & humans")]),e._v(" "),r("li",[e._v("CSV is a two dimensional structure consisting of rows of data, each row"),r("br"),e._v("\ncontaining multiple cells. Rows are (usually) separated by line terminators"),r("br"),e._v("\nso each row corresponds to one line. Cells within a row are separated by"),r("br"),e._v("\ncommas (hence the C(ommmas) part)\n"),r("ul",[r("li",[e._v("Note that strictly we’re really talking about DSV files in that we can"),r("br"),e._v("\nallow ‘delimiters’ between cells other than a comma. However, many people"),r("br"),e._v("\nand many programs still call such data CSV (since comma is so common as the"),r("br"),e._v("\ndelimiter)")])])]),e._v(" "),r("li",[e._v("CSV is a “text-based” format, i.e. a CSV file "),r("em",[e._v("is")]),e._v(" a text file. This makes it"),r("br"),e._v("\namenable for processing with all kinds of text-oriented tools (from text"),r("br"),e._v("\neditors to "),r("a",{attrs:{href:"https://github.com/rgrp/command-line-data-wrangling",target:"_blank",rel:"noopener noreferrer"}},[e._v("unix tools like sed, grep etc"),r("OutboundLink")],1),e._v(")")])]),e._v(" "),r("h3",{attrs:{id:"what-a-csv-looks-like"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-a-csv-looks-like"}},[e._v("#")]),e._v(" What a CSV looks like")]),e._v(" "),r("p",[e._v("If you open up a CSV file in a text editor it would look something like:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v('A,B,C\n1,2,3\n4,"5,3",6\n')])])]),r("p",[e._v("Here there are 3 rows each of 3 columns. Notice how the second column in the last line is"),r("br"),e._v("\n“quoted” because the content of that value actually contains a “,” character. Without"),r("br"),e._v("\nthe quotes this character would be interpreted as a column separator. To avoid this"),r("br"),e._v("\nconfusion we put quotes around the whole value. The result is that we have 3 rows each"),r("br"),e._v("\nof 3 columns (Note a CSV file does not "),r("em",[e._v("have")]),e._v(" to have"),r("br"),e._v("\nthe same number of columns in each row).")]),e._v(" "),r("h3",{attrs:{id:"dialects-of-csvs"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#dialects-of-csvs"}},[e._v("#")]),e._v(" Dialects of CSVs")]),e._v(" "),r("p",[e._v("As mentioned above, CSV files can have quite a bit of variation in"),r("br"),e._v("\nstructure. Key options are:")]),e._v(" "),r("ul",[r("li",[e._v("Field delimiter: rather than comma "),r("code",[e._v(",")]),e._v(" people often use things like "),r("code",[e._v("\\t")]),r("br"),e._v("\n(tab), "),r("code",[e._v(";")]),e._v(" or "),r("code",[e._v("|")])]),e._v(" "),r("li",[e._v("Record terminator / line terminator: is "),r("code",[e._v("\\n")]),e._v(" (unix), "),r("code",[e._v("\\n\\r")]),e._v(" (dos) or something else …")]),e._v(" "),r("li",[e._v("How do you quote records that contain your delimiter")])]),e._v(" "),r("p",[e._v("You can read more in the "),r("a",{attrs:{href:"https://specs.frictionlessdata.io/csv-dialect/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV Dialect Description Format"),r("OutboundLink")],1),e._v(" which defines"),r("br"),e._v("\na small JSON-oriented structure for specifying what options a CSV uses.")]),e._v(" "),r("h3",{attrs:{id:"what-is-missing-in-csv"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#what-is-missing-in-csv"}},[e._v("#")]),e._v(" What is Missing in CSV?")]),e._v(" "),r("ul",[r("li",[e._v("CSV lacks any way to specify type information: that is, there is no way to"),r("br"),e._v("\ndistinguish “1” the string from 1 the number. This shortcoming can be"),r("br"),e._v("\naddressed by adding some form of simple schema. For example "),r("RouterLink",{attrs:{to:"/table-schema/"}},[e._v("Table"),r("br"),e._v("\nSchema")]),e._v(" provides a very simple way to describe your schema externally"),r("br"),e._v("\nwhilst "),r("a",{attrs:{href:"http://jenit.github.io/linked-csv/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Linked CSV"),r("OutboundLink")],1),e._v(" is an example of doing this “inline” (that"),r("br"),e._v("\nis, in the CSV).")],1),e._v(" "),r("li",[e._v("No support for relationships between different “tables”. This is similar to"),r("br"),e._v("\nthe previous point and again "),r("RouterLink",{attrs:{to:"/table-schema/"}},[e._v("Table Schema")]),e._v(" provides a way to address"),r("br"),e._v("\nthis by providing additional schema information externally.")],1),e._v(" "),r("li",[e._v("CSV is really only for tabular data – it is not so good for data with"),r("br"),e._v("\nnesting or where structure is not especially tabular (though remember most"),r("br"),e._v("\ndata can be put into tabular form if you try hard enough!)")])]),e._v(" "),r("h3",{attrs:{id:"links"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#links"}},[e._v("#")]),e._v(" Links")]),e._v(" "),r("p",[e._v("Specifications and overviews:")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"http://tools.ietf.org/html/rfc4180",target:"_blank",rel:"noopener noreferrer"}},[e._v("RFC specification of CSV"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("[CSV Dialect Description Format][csvddf]")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://en.wikipedia.org/wiki/Comma-separated_values",target:"_blank",rel:"noopener noreferrer"}},[e._v("CSV on Wikipedia"),r("OutboundLink")],1)])]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"tools"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#tools"}},[e._v("#")]),e._v(" Tools")]),e._v(" "),r("p",[e._v("The great thing about CSV is the huge level of tool support. The following is"),r("br"),e._v("\nnot intended to be comprehensive but is more at the electic end of the spectrum.")]),e._v(" "),r("h3",{attrs:{id:"desktop"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#desktop"}},[e._v("#")]),e._v(" Desktop")]),e._v(" "),r("p",[e._v("All spreadsheet programs including Excel, OpenOffice, Google Docs"),r("br"),e._v("\nSpreadsheets supporting opening, editing and saving CSVs.")]),e._v(" "),r("h3",{attrs:{id:"view-a-csv-file-in-your-browser"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#view-a-csv-file-in-your-browser"}},[e._v("#")]),e._v(" View a CSV file in your Browser")]),e._v(" "),r("p",[e._v("You can view a CSV file (saving you the hassle of downloading it and opening"),r("br"),e._v("\nit). Options include:")]),e._v(" "),r("ul",[r("li",[r("p",[e._v("You can use datapipes: "),r("a",{attrs:{href:"http://datapipes.okfnlabs.org/csv/html",target:"_blank",rel:"noopener noreferrer"}},[e._v("http://datapipes.okfnlabs.org/csv/html"),r("OutboundLink")],1)]),e._v(" "),r("p",[e._v("Just paste your CSV file and away you go.")])]),e._v(" "),r("li",[r("p",[e._v("Install this "),r("a",{attrs:{href:"https://chrome.google.com/webstore/detail/recline-csv-viewer/ibfcfelnbfhlbpelldnngdcklnndhael",target:"_blank",rel:"noopener noreferrer"}},[e._v("Chrome Browser Extension"),r("OutboundLink")],1),e._v(". This can be used both"),r("br"),e._v("\nfor online files and for files on your local disk (if you open them with your"),r("br"),e._v("\nbrowser!)")])])]),e._v(" "),r("h3",{attrs:{id:"unix-command-line-manipulation"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#unix-command-line-manipulation"}},[e._v("#")]),e._v(" Unix Command Line Manipulation")]),e._v(" "),r("p",[e._v("See")]),e._v(" "),r("ul",[r("li",[e._v("Using "),r("a",{attrs:{href:"https://github.com/rgrp/command-line-data-wrangling",target:"_blank",rel:"noopener noreferrer"}},[e._v("unix command line tools on CSV"),r("OutboundLink")],1)]),e._v(" "),r("li",[e._v("The wonderful "),r("a",{attrs:{href:"http://csvkit.readthedocs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csvkit"),r("OutboundLink")],1),e._v(" (python)")])]),e._v(" "),r("h3",{attrs:{id:"power-tools"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#power-tools"}},[e._v("#")]),e._v(" Power Tools")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"http://openrefine.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OpenRefine"),r("OutboundLink")],1),e._v(" is a powerful tool for editing and manipulating data and works"),r("br"),e._v("\nvery well with CSV")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://explorer.okfnlabs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Data Explorer"),r("OutboundLink")],1),e._v(" supports importing CSVs and manipulating and changing"),r("br"),e._v("\nthem using javascript in the browser")])]),e._v(" "),r("h3",{attrs:{id:"libraries"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#libraries"}},[e._v("#")]),e._v(" Libraries")]),e._v(" "),r("p",[e._v("This is heavily biased towards python!")]),e._v(" "),r("h4",{attrs:{id:"python"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#python"}},[e._v("#")]),e._v(" Python")]),e._v(" "),r("ul",[r("li",[e._v("Built in csv library is good")]),e._v(" "),r("li",[e._v("The wonderful "),r("a",{attrs:{href:"http://csvkit.readthedocs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("csvkit"),r("OutboundLink")],1),e._v(" (python)")]),e._v(" "),r("li",[r("a",{attrs:{href:"http://messytables.readthedocs.org",target:"_blank",rel:"noopener noreferrer"}},[e._v("messytables"),r("OutboundLink")],1),e._v(" (python) - convert lots of badly structured data into CSV (or"),r("br"),e._v("\nother formats)")])]),e._v(" "),r("h4",{attrs:{id:"node"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#node"}},[e._v("#")]),e._v(" Node")]),e._v(" "),r("p",[e._v("Nothing in standard lib yet and best option seems to be:")]),e._v(" "),r("ul",[r("li",[r("a",{attrs:{href:"https://github.com/wdavidw/node-csv",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/wdavidw/node-csv"),r("OutboundLink")],1)])]),e._v(" "),r("hr"),e._v(" "),r("h2",{attrs:{id:"tips-and-tricks"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#tips-and-tricks"}},[e._v("#")]),e._v(" Tips and Tricks")]),e._v(" "),r("h3",{attrs:{id:"csvs-and-git"}},[r("a",{staticClass:"header-anchor",attrs:{href:"#csvs-and-git"}},[e._v("#")]),e._v(" CSVs and Git")]),e._v(" "),r("p",[e._v("Get git to handle CSV diffs in a sensible way (very useful if you are "),r("a",{attrs:{href:"http://blog.okfn.org/2013/07/02/git-and-github-for-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("using"),r("br"),e._v("\ngit or another version control system to store data"),r("OutboundLink")],1),e._v(").")]),e._v(" "),r("p",[e._v("Make these changes to config files:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v('# ~/.config/git/attributes\n*.csv diff=csv\n\n# ~/.gitconfig\n[diff "csv"]\n wordRegex = [^,\\n]+[,\\n]|[,]\n')])])]),r("p",[e._v("Then do:")]),e._v(" "),r("div",{staticClass:"language- extra-class"},[r("pre",[r("code",[e._v("git diff --word-diff\n# make it even nicer\ngit diff --word-diff --color-words\n")])])]),r("p",[e._v("Credit for these fixups to "),r("a",{attrs:{href:"http://opendata.stackexchange.com/questions/748/is-there-a-git-for-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("contributors on this question on"),r("br"),e._v("\nStackExchange"),r("OutboundLink")],1),r("br"),e._v("\nand to "),r("a",{attrs:{href:"http://theodi.org/blog/adapting-git-simple-data",target:"_blank",rel:"noopener noreferrer"}},[e._v("James Smith"),r("OutboundLink")],1),e._v(".")])])}),[],!1,null,null,null);t.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/82.2f296314.js b/assets/js/82.ea0e1c19.js similarity index 99% rename from assets/js/82.2f296314.js rename to assets/js/82.ea0e1c19.js index 26d2eb22b..dfcc1bf95 100644 --- a/assets/js/82.2f296314.js +++ b/assets/js/82.ea0e1c19.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[82],{600:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This grantee profile features Oleg Lavrovsky for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")])]),e._v(" "),a("p",[e._v("We are digital natives, dazzled by the boundless information and cultural resources of electronic networks, tuned in to a life on - and offline, dimly aware of all kinds of borders being rewritten. I was born in the Soviet Union and grew up in Canada, immersed in the wonders of creative code on Apple II and DOS-era personal computers, doing fun things in programming environments from "),a("a",{attrs:{href:"https://www.scullinsteel.com/apple2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("BASIC"),a("OutboundLink")],1),e._v(" to C++/C#/.NET (hey "),a("a",{attrs:{href:"https://github.com/ooswald",target:"_blank",rel:"noopener noreferrer"}},[e._v("@ooswald"),a("OutboundLink")],1),e._v("!) to Perl (hey "),a("a",{attrs:{href:"https://github.com/virtualsue",target:"_blank",rel:"noopener noreferrer"}},[e._v("@virtualsue"),a("OutboundLink")],1),e._v("!) to Java (hey "),a("a",{attrs:{href:"https://github.com/timcolson",target:"_blank",rel:"noopener noreferrer"}},[e._v("@timcolson"),a("OutboundLink")],1),e._v("!) to JavaScript (hey "),a("a",{attrs:{href:"https://github.com/jermolene",target:"_blank",rel:"noopener noreferrer"}},[e._v("@jermolene"),a("OutboundLink")],1),e._v("!) to Python (hey "),a("a",{attrs:{href:"https://github.com/gasman",target:"_blank",rel:"noopener noreferrer"}},[e._v("@gasman"),a("OutboundLink")],1),e._v("!), all of which find some use in the freelance work I now do based in my adoptive home of Switzerland - a country of "),a("a",{attrs:{href:"https://en.wikipedia.org/wiki/Swiss_people",target:"_blank",rel:"noopener noreferrer"}},[e._v("plurality"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Over the years, I have tried other languages like Clojure and Pascal, Groovy and Go, Erlang and Haskell, Scala and R, even ARM C/C++ and x86 assembly. Some have stuck in my dev chain, others have not. As far as possible, I hope to keep a beginner’s mind open to new paradigms, a solid craft of working on code and data with care, and the wisdom to avoid jumping off every tempting new thing on the horizon.")]),e._v(" "),a("p",[e._v("I first came across tendrils of Open Knowledge ten years ago while living in Oxford, a vibrant community of thinkers and civic reformers. After we started a "),a("a",{attrs:{href:"https://oxhack.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("hackspace"),a("OutboundLink")],1),e._v(", I got more involved in extracurricular open source activities, joined barcamps and hackathons, started contributing to projects. I started to see so-called ‘big IT’ or ‘enterprise software’ challenges to be, on many levels, problems of incompatible or intractable data standards. It was in the U.K. that I also discovered civic tech and open data activism.")]),e._v(" "),a("p",[e._v("Helping to start a Swiss "),a("a",{attrs:{href:"http://make.opendata.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge chapter"),a("OutboundLink")],1),e._v(" presented me with the opportunity to be involved in an ambitious and exciting techno-political movement, and to learn from some of the most deeply ethical and forward-thinking people in Information Technology. Running the "),a("a",{attrs:{href:"http://forum.schoolofdata.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),a("OutboundLink")],1),e._v(" working group and supporting many projects in the Swiss "),a("a",{attrs:{href:"https://opendata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("Opendata.ch"),a("OutboundLink")],1),e._v(" association and international network is today no longer just a weekend activity: it is my "),a("code",[e._v("master")]),e._v(" branch.")]),e._v(" "),a("p",[e._v("I first heard the term "),a("em",[e._v("frictionless")]),e._v(" from a "),a("a",{attrs:{href:"https://andrewjtaggart.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("philosopher"),a("OutboundLink")],1),e._v(" who warned of a world where IT removes friction to the point where we live anywhere, and do anything, at the cost of social alienation - and, along with it, grave consequences to our well-being. There are parallels here to “closed datasets”, which may well be padlocked for a reason. Throwing them into the wind may deprive them of the nurturing care of the original owners. The open data community offers them a softer landing.")]),e._v(" "),a("p",[e._v("Some of the conversations that led to "),a("em",[e._v("Frictionless Data")]),e._v(" took place at "),a("a",{attrs:{href:"https://opendata.ch/2013/09/okcon-2013-some-swiss-highlights/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKCon 2013"),a("OutboundLink")],1),e._v(" in Geneva, where I was busy "),a("a",{attrs:{href:"https://make.opendata.ch/legal/",target:"_blank",rel:"noopener noreferrer"}},[e._v("mining the Law"),a("OutboundLink")],1),e._v(". Max Ogden mentioned related ideas in his "),a("a",{attrs:{href:"https://vimeo.com/channels/okcon2013/79932550",target:"_blank",rel:"noopener noreferrer"}},[e._v("talk"),a("OutboundLink")],1),e._v(" there on "),a("a",{attrs:{href:"https://datproject.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dat Project"),a("OutboundLink")],1),e._v(". It later became a regular topic in the "),a("a",{attrs:{href:"http://okfnlabs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Labs hangouts"),a("OutboundLink")],1),e._v(" and elsewhere. My first impression was mixed: I liked the idea in principle, but found it hard to foresee what the standardization process could accomplish. It took me a couple of years to catch up, gain experience in putting the "),a("a",{attrs:{href:"http://opendefinition.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition"),a("OutboundLink")],1),e._v(" to use, struggle with some of the fundamental issues myself - just to wholly accept the idea of an open data ecosystem.")]),e._v(" "),a("p",[e._v("Working with more unwieldy data as well as having an interest in Data Science, and the great vibe of a growing community all led me to test the waters with the "),a("a",{attrs:{href:"https://julialang.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Julia language"),a("OutboundLink")],1),e._v(". I quickly became a fan, and started looking for ways to include it in my workflow. Thanks to the collaboration enabled by the Frictionless Data Tool Fund, I will now be able to focus on this goal and start connecting the dots more quickly. More bridges need to be built to help open data users use Julia’s computing environment, and Julia users could use sturdier access to open data.")]),e._v(" "),a("p",[e._v("There are two high level use cases which I think are particularly interesting when it comes to Frictionless Data: strongly typed and easy to validate dataset schema leading to a “light” version of semantic interoperability, helping data analysts, developers, even automated agents, to see at a glance how compatible datasets might be. Take a look at "),a("RouterLink",{attrs:{to:"/blog/2016/11/15/dataship/"}},[e._v("dataship")]),e._v(", "),a("RouterLink",{attrs:{to:"/blog/2016/11/15/open-power-system-data/"}},[e._v("open power system data")]),e._v(" and other case studies at "),a("RouterLink",{attrs:{to:"/"}},[e._v("Frictionlessdata.io")]),e._v(" for examples. The other is the pipelines approach which, as a "),a("a",{attrs:{href:"https://en.wikipedia.org/wiki/Pipeline_(Unix)"}},[e._v(" feature of Unix")]),e._v(" and "),a("a",{attrs:{href:"https://docs.microsoft.com/en-us/powershell/scripting/learn/understanding-the-powershell-pipeline?view=powershell-7",target:"_blank",rel:"noopener noreferrer"}},[e._v("other OS"),a("OutboundLink")],1),e._v(" is the basis for an incredibly powerful system building tool, now laying the foundation of a rich and reliable world of "),a("a",{attrs:{href:"http://datahub.io/blog/core-data-essential-datasets-for-data-wranglers-and-data-scientists",target:"_blank",rel:"noopener noreferrer"}},[e._v("shared data"),a("OutboundLink")],1),e._v(".")],1),e._v(" "),a("p",[e._v("At a more practical level, I have been using Data Packages to publish data for "),a("a",{attrs:{href:"http://hack.opendata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("hackathons"),a("OutboundLink")],1),e._v(", School of Data "),a("a",{attrs:{href:"http://schoolofdata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("workshops"),a("OutboundLink")],1),e._v(" and other activities in my Open Knowledge chapter, and regularly explaining the concepts and training people to use Frictionless Data tools in the Open Data module I teach at the "),a("a",{attrs:{href:"https://www.bfh.ch/en/home.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Bern University of Applied Sciences"),a("OutboundLink")],1),e._v(". I have built support for them into "),a("a",{attrs:{href:"http://datalets.ch/dribdat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dribdat"),a("OutboundLink")],1),e._v(", a tool we use for connecting the dots between people, code and data.")]),e._v(" "),a("p",[e._v("Over the years, I have made small contributions to OKI’s codebases on projects like "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(". Contributing to the Frictionless Data project clears the way to the frontlines of development: putting better tools in users’ hands, committing directly to the needs of the community, setting an elevated expectation of responsibility and quality. That said, I am a novice in Julia. But my initial ambition is modest: make a working set of tools, produce a stable "),a("a",{attrs:{href:"https://blog.okfn.org/2017/09/05/frictionless-data-v1-0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("v1.0 specification"),a("OutboundLink")],1),e._v(" release. Run tests, get reviewed, interact with the community, and iterate. This project will be a learning process, and my intention is to widen the goalposts as much as I can for others to follow.")]),e._v(" "),a("p",[e._v("The Julia language also needs to be better known, so I will start threads on the "),a("a",{attrs:{href:"https://discuss.okfn.org/u/loleg",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKI forums"),a("OutboundLink")],1),e._v(", at the "),a("a",{attrs:{href:"https://schoolofdata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),a("OutboundLink")],1),e._v(", in technical and academic circles. I am likewise really looking forward to representing Frictionless Data in the diverse and wide-ranging "),a("a",{attrs:{href:"https://julialang.org/community/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Julia community"),a("OutboundLink")],1),e._v(", sharing whatever questions and needs arise both ways. The specifications, libraries and tools will help to preserve key information on widely used datasets, foster a more in-depth technical discussion between everyone involved in data sharing, and open the door to more critical feedback loops between creators, publishers and users of open data.")]),e._v(" "),a("p",[e._v("I will be developing the "),a("a",{attrs:{href:"https://github.com/loleg/datapackage-jl",target:"_blank",rel:"noopener noreferrer"}},[e._v("datapackage-jl"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://github.com/loleg/tableschema-jl",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-jl"),a("OutboundLink")],1),e._v(" libraries on GitHub, and you can follow me on "),a("a",{attrs:{href:"http://github.com/loleg/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(" to see how this develops and read stories about putting Frictionless Data libraries to use. Please feel free to "),a("a",{attrs:{href:"http://datalets.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("write me a note"),a("OutboundLink")],1),e._v(", send in your use case, respond to anything I’m working on or writing about, share a tricky dataset or any other kind of challenge - and "),a("a",{attrs:{href:"https://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("let’s chat"),a("OutboundLink")],1),e._v("!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[82],{601:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[a("em",[e._v("This grantee profile features Oleg Lavrovsky for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")])]),e._v(" "),a("p",[e._v("We are digital natives, dazzled by the boundless information and cultural resources of electronic networks, tuned in to a life on - and offline, dimly aware of all kinds of borders being rewritten. I was born in the Soviet Union and grew up in Canada, immersed in the wonders of creative code on Apple II and DOS-era personal computers, doing fun things in programming environments from "),a("a",{attrs:{href:"https://www.scullinsteel.com/apple2/",target:"_blank",rel:"noopener noreferrer"}},[e._v("BASIC"),a("OutboundLink")],1),e._v(" to C++/C#/.NET (hey "),a("a",{attrs:{href:"https://github.com/ooswald",target:"_blank",rel:"noopener noreferrer"}},[e._v("@ooswald"),a("OutboundLink")],1),e._v("!) to Perl (hey "),a("a",{attrs:{href:"https://github.com/virtualsue",target:"_blank",rel:"noopener noreferrer"}},[e._v("@virtualsue"),a("OutboundLink")],1),e._v("!) to Java (hey "),a("a",{attrs:{href:"https://github.com/timcolson",target:"_blank",rel:"noopener noreferrer"}},[e._v("@timcolson"),a("OutboundLink")],1),e._v("!) to JavaScript (hey "),a("a",{attrs:{href:"https://github.com/jermolene",target:"_blank",rel:"noopener noreferrer"}},[e._v("@jermolene"),a("OutboundLink")],1),e._v("!) to Python (hey "),a("a",{attrs:{href:"https://github.com/gasman",target:"_blank",rel:"noopener noreferrer"}},[e._v("@gasman"),a("OutboundLink")],1),e._v("!), all of which find some use in the freelance work I now do based in my adoptive home of Switzerland - a country of "),a("a",{attrs:{href:"https://en.wikipedia.org/wiki/Swiss_people",target:"_blank",rel:"noopener noreferrer"}},[e._v("plurality"),a("OutboundLink")],1),e._v(".")]),e._v(" "),a("p",[e._v("Over the years, I have tried other languages like Clojure and Pascal, Groovy and Go, Erlang and Haskell, Scala and R, even ARM C/C++ and x86 assembly. Some have stuck in my dev chain, others have not. As far as possible, I hope to keep a beginner’s mind open to new paradigms, a solid craft of working on code and data with care, and the wisdom to avoid jumping off every tempting new thing on the horizon.")]),e._v(" "),a("p",[e._v("I first came across tendrils of Open Knowledge ten years ago while living in Oxford, a vibrant community of thinkers and civic reformers. After we started a "),a("a",{attrs:{href:"https://oxhack.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("hackspace"),a("OutboundLink")],1),e._v(", I got more involved in extracurricular open source activities, joined barcamps and hackathons, started contributing to projects. I started to see so-called ‘big IT’ or ‘enterprise software’ challenges to be, on many levels, problems of incompatible or intractable data standards. It was in the U.K. that I also discovered civic tech and open data activism.")]),e._v(" "),a("p",[e._v("Helping to start a Swiss "),a("a",{attrs:{href:"http://make.opendata.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge chapter"),a("OutboundLink")],1),e._v(" presented me with the opportunity to be involved in an ambitious and exciting techno-political movement, and to learn from some of the most deeply ethical and forward-thinking people in Information Technology. Running the "),a("a",{attrs:{href:"http://forum.schoolofdata.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),a("OutboundLink")],1),e._v(" working group and supporting many projects in the Swiss "),a("a",{attrs:{href:"https://opendata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("Opendata.ch"),a("OutboundLink")],1),e._v(" association and international network is today no longer just a weekend activity: it is my "),a("code",[e._v("master")]),e._v(" branch.")]),e._v(" "),a("p",[e._v("I first heard the term "),a("em",[e._v("frictionless")]),e._v(" from a "),a("a",{attrs:{href:"https://andrewjtaggart.com/",target:"_blank",rel:"noopener noreferrer"}},[e._v("philosopher"),a("OutboundLink")],1),e._v(" who warned of a world where IT removes friction to the point where we live anywhere, and do anything, at the cost of social alienation - and, along with it, grave consequences to our well-being. There are parallels here to “closed datasets”, which may well be padlocked for a reason. Throwing them into the wind may deprive them of the nurturing care of the original owners. The open data community offers them a softer landing.")]),e._v(" "),a("p",[e._v("Some of the conversations that led to "),a("em",[e._v("Frictionless Data")]),e._v(" took place at "),a("a",{attrs:{href:"https://opendata.ch/2013/09/okcon-2013-some-swiss-highlights/",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKCon 2013"),a("OutboundLink")],1),e._v(" in Geneva, where I was busy "),a("a",{attrs:{href:"https://make.opendata.ch/legal/",target:"_blank",rel:"noopener noreferrer"}},[e._v("mining the Law"),a("OutboundLink")],1),e._v(". Max Ogden mentioned related ideas in his "),a("a",{attrs:{href:"https://vimeo.com/channels/okcon2013/79932550",target:"_blank",rel:"noopener noreferrer"}},[e._v("talk"),a("OutboundLink")],1),e._v(" there on "),a("a",{attrs:{href:"https://datproject.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dat Project"),a("OutboundLink")],1),e._v(". It later became a regular topic in the "),a("a",{attrs:{href:"http://okfnlabs.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Knowledge Labs hangouts"),a("OutboundLink")],1),e._v(" and elsewhere. My first impression was mixed: I liked the idea in principle, but found it hard to foresee what the standardization process could accomplish. It took me a couple of years to catch up, gain experience in putting the "),a("a",{attrs:{href:"http://opendefinition.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Open Definition"),a("OutboundLink")],1),e._v(" to use, struggle with some of the fundamental issues myself - just to wholly accept the idea of an open data ecosystem.")]),e._v(" "),a("p",[e._v("Working with more unwieldy data as well as having an interest in Data Science, and the great vibe of a growing community all led me to test the waters with the "),a("a",{attrs:{href:"https://julialang.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Julia language"),a("OutboundLink")],1),e._v(". I quickly became a fan, and started looking for ways to include it in my workflow. Thanks to the collaboration enabled by the Frictionless Data Tool Fund, I will now be able to focus on this goal and start connecting the dots more quickly. More bridges need to be built to help open data users use Julia’s computing environment, and Julia users could use sturdier access to open data.")]),e._v(" "),a("p",[e._v("There are two high level use cases which I think are particularly interesting when it comes to Frictionless Data: strongly typed and easy to validate dataset schema leading to a “light” version of semantic interoperability, helping data analysts, developers, even automated agents, to see at a glance how compatible datasets might be. Take a look at "),a("RouterLink",{attrs:{to:"/blog/2016/11/15/dataship/"}},[e._v("dataship")]),e._v(", "),a("RouterLink",{attrs:{to:"/blog/2016/11/15/open-power-system-data/"}},[e._v("open power system data")]),e._v(" and other case studies at "),a("RouterLink",{attrs:{to:"/"}},[e._v("Frictionlessdata.io")]),e._v(" for examples. The other is the pipelines approach which, as a "),a("a",{attrs:{href:"https://en.wikipedia.org/wiki/Pipeline_(Unix)"}},[e._v(" feature of Unix")]),e._v(" and "),a("a",{attrs:{href:"https://docs.microsoft.com/en-us/powershell/scripting/learn/understanding-the-powershell-pipeline?view=powershell-7",target:"_blank",rel:"noopener noreferrer"}},[e._v("other OS"),a("OutboundLink")],1),e._v(" is the basis for an incredibly powerful system building tool, now laying the foundation of a rich and reliable world of "),a("a",{attrs:{href:"http://datahub.io/blog/core-data-essential-datasets-for-data-wranglers-and-data-scientists",target:"_blank",rel:"noopener noreferrer"}},[e._v("shared data"),a("OutboundLink")],1),e._v(".")],1),e._v(" "),a("p",[e._v("At a more practical level, I have been using Data Packages to publish data for "),a("a",{attrs:{href:"http://hack.opendata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("hackathons"),a("OutboundLink")],1),e._v(", School of Data "),a("a",{attrs:{href:"http://schoolofdata.ch",target:"_blank",rel:"noopener noreferrer"}},[e._v("workshops"),a("OutboundLink")],1),e._v(" and other activities in my Open Knowledge chapter, and regularly explaining the concepts and training people to use Frictionless Data tools in the Open Data module I teach at the "),a("a",{attrs:{href:"https://www.bfh.ch/en/home.html",target:"_blank",rel:"noopener noreferrer"}},[e._v("Bern University of Applied Sciences"),a("OutboundLink")],1),e._v(". I have built support for them into "),a("a",{attrs:{href:"http://datalets.ch/dribdat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Dribdat"),a("OutboundLink")],1),e._v(", a tool we use for connecting the dots between people, code and data.")]),e._v(" "),a("p",[e._v("Over the years, I have made small contributions to OKI’s codebases on projects like "),a("a",{attrs:{href:"https://ckan.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("CKAN"),a("OutboundLink")],1),e._v(". Contributing to the Frictionless Data project clears the way to the frontlines of development: putting better tools in users’ hands, committing directly to the needs of the community, setting an elevated expectation of responsibility and quality. That said, I am a novice in Julia. But my initial ambition is modest: make a working set of tools, produce a stable "),a("a",{attrs:{href:"https://blog.okfn.org/2017/09/05/frictionless-data-v1-0/",target:"_blank",rel:"noopener noreferrer"}},[e._v("v1.0 specification"),a("OutboundLink")],1),e._v(" release. Run tests, get reviewed, interact with the community, and iterate. This project will be a learning process, and my intention is to widen the goalposts as much as I can for others to follow.")]),e._v(" "),a("p",[e._v("The Julia language also needs to be better known, so I will start threads on the "),a("a",{attrs:{href:"https://discuss.okfn.org/u/loleg",target:"_blank",rel:"noopener noreferrer"}},[e._v("OKI forums"),a("OutboundLink")],1),e._v(", at the "),a("a",{attrs:{href:"https://schoolofdata.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("School of Data"),a("OutboundLink")],1),e._v(", in technical and academic circles. I am likewise really looking forward to representing Frictionless Data in the diverse and wide-ranging "),a("a",{attrs:{href:"https://julialang.org/community/",target:"_blank",rel:"noopener noreferrer"}},[e._v("Julia community"),a("OutboundLink")],1),e._v(", sharing whatever questions and needs arise both ways. The specifications, libraries and tools will help to preserve key information on widely used datasets, foster a more in-depth technical discussion between everyone involved in data sharing, and open the door to more critical feedback loops between creators, publishers and users of open data.")]),e._v(" "),a("p",[e._v("I will be developing the "),a("a",{attrs:{href:"https://github.com/loleg/datapackage-jl",target:"_blank",rel:"noopener noreferrer"}},[e._v("datapackage-jl"),a("OutboundLink")],1),e._v(" and "),a("a",{attrs:{href:"https://github.com/loleg/tableschema-jl",target:"_blank",rel:"noopener noreferrer"}},[e._v("tableschema-jl"),a("OutboundLink")],1),e._v(" libraries on GitHub, and you can follow me on "),a("a",{attrs:{href:"http://github.com/loleg/",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),a("OutboundLink")],1),e._v(" to see how this develops and read stories about putting Frictionless Data libraries to use. Please feel free to "),a("a",{attrs:{href:"http://datalets.ch/",target:"_blank",rel:"noopener noreferrer"}},[e._v("write me a note"),a("OutboundLink")],1),e._v(", send in your use case, respond to anything I’m working on or writing about, share a tricky dataset or any other kind of challenge - and "),a("a",{attrs:{href:"https://gitter.im/frictionlessdata/chat",target:"_blank",rel:"noopener noreferrer"}},[e._v("let’s chat"),a("OutboundLink")],1),e._v("!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/85.b4892b6b.js b/assets/js/85.0abba798.js similarity index 98% rename from assets/js/85.b4892b6b.js rename to assets/js/85.0abba798.js index 750dcb119..d3182ccce 100644 --- a/assets/js/85.b4892b6b.js +++ b/assets/js/85.0abba798.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[85],{604:function(t,a,e){"use strict";e.r(a);var r=e(29),o=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("You can package any kind of data as a Data Package."),e("br")]),t._v(" "),e("ol",[e("li",[t._v("Get your data together\n"),e("ol",[e("li",[t._v("Get your data together in one folder (you can have data in subfolders of that folder too if you wish).")])])]),t._v(" "),e("li",[t._v("Add a "),e("code",[t._v("datapackage.json")]),t._v(" file to package those data files into a useful whole (with key information like the license, title and format)\n"),e("ol",[e("li",[t._v("The datapackage.json is a small file in JSON format that gives a bit of information about your dataset. You’ll need to create this file and then place it in the directory you created.")]),t._v(" "),e("li",[t._v("Don’t worry if you don’t know what JSON is - we provide some tools that can automatically create your this file for you.")]),t._v(" "),e("li",[t._v("There are 2 options for creating the datapackage.json:\n"),e("ol",[e("li",[t._v("Use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package Creator"),e("OutboundLink")],1),t._v(") tool\n"),e("ol",[e("li",[t._v("Just answer a few questions and give it your data files and it will spit out a datapackage.json for you to include in your project")])])]),t._v(" "),e("li",[t._v("Use the "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("Python"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-js",target:"_blank",rel:"noopener noreferrer"}},[t._v("JavaScript"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-php",target:"_blank",rel:"noopener noreferrer"}},[t._v("PHP"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-jl",target:"_blank",rel:"noopener noreferrer"}},[t._v("Julia"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-r",target:"_blank",rel:"noopener noreferrer"}},[t._v("R"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("Clojure"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("Java"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-rb",target:"_blank",rel:"noopener noreferrer"}},[t._v("Ruby"),e("OutboundLink")],1),t._v(" or "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("Go"),e("OutboundLink")],1),t._v(" libraries for working with data packages.")])])])])])]),t._v(" "),e("p",[t._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),e("a",{attrs:{href:"/tag/field-guide"}},[t._v("Frictionless Data Field Guide")]),t._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[85],{605:function(t,a,e){"use strict";e.r(a);var r=e(29),o=Object(r.a)({},(function(){var t=this,a=t.$createElement,e=t._self._c||a;return e("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[e("p",[t._v("You can package any kind of data as a Data Package."),e("br")]),t._v(" "),e("ol",[e("li",[t._v("Get your data together\n"),e("ol",[e("li",[t._v("Get your data together in one folder (you can have data in subfolders of that folder too if you wish).")])])]),t._v(" "),e("li",[t._v("Add a "),e("code",[t._v("datapackage.json")]),t._v(" file to package those data files into a useful whole (with key information like the license, title and format)\n"),e("ol",[e("li",[t._v("The datapackage.json is a small file in JSON format that gives a bit of information about your dataset. You’ll need to create this file and then place it in the directory you created.")]),t._v(" "),e("li",[t._v("Don’t worry if you don’t know what JSON is - we provide some tools that can automatically create your this file for you.")]),t._v(" "),e("li",[t._v("There are 2 options for creating the datapackage.json:\n"),e("ol",[e("li",[t._v("Use the "),e("a",{attrs:{href:"http://create.frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Data Package Creator"),e("OutboundLink")],1),t._v(") tool\n"),e("ol",[e("li",[t._v("Just answer a few questions and give it your data files and it will spit out a datapackage.json for you to include in your project")])])]),t._v(" "),e("li",[t._v("Use the "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-py",target:"_blank",rel:"noopener noreferrer"}},[t._v("Python"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-js",target:"_blank",rel:"noopener noreferrer"}},[t._v("JavaScript"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-php",target:"_blank",rel:"noopener noreferrer"}},[t._v("PHP"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-jl",target:"_blank",rel:"noopener noreferrer"}},[t._v("Julia"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-r",target:"_blank",rel:"noopener noreferrer"}},[t._v("R"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-clj",target:"_blank",rel:"noopener noreferrer"}},[t._v("Clojure"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-java",target:"_blank",rel:"noopener noreferrer"}},[t._v("Java"),e("OutboundLink")],1),t._v(", "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-rb",target:"_blank",rel:"noopener noreferrer"}},[t._v("Ruby"),e("OutboundLink")],1),t._v(" or "),e("a",{attrs:{href:"https://github.com/frictionlessdata/datapackage-go",target:"_blank",rel:"noopener noreferrer"}},[t._v("Go"),e("OutboundLink")],1),t._v(" libraries for working with data packages.")])])])])])]),t._v(" "),e("p",[t._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),e("a",{attrs:{href:"/tag/field-guide"}},[t._v("Frictionless Data Field Guide")]),t._v(".")])])}),[],!1,null,null,null);a.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/87.69707213.js b/assets/js/87.142f6345.js similarity index 99% rename from assets/js/87.69707213.js rename to assets/js/87.142f6345.js index dc05894f7..3fa06bf8f 100644 --- a/assets/js/87.69707213.js +++ b/assets/js/87.142f6345.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[87],{608:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("This tutorial is about how to publish your Data Package online for others to find and use.")]),e._v(" "),t("p",[e._v("It assumes you have already finished packaging up your data as a Data Package (if not, "),t("RouterLink",{attrs:{to:"/blog/2018/07/16/publish-data-as-data-packages/"}},[e._v("check out the instructions here")]),e._v(").")],1),e._v(" "),t("h2",{attrs:{id:"it-s-only-files-online"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#it-s-only-files-online"}},[e._v("#")]),e._v(" It’s Only Files Online")]),e._v(" "),t("p",[e._v("Publishing your Data Package is incredibly simple: you just need to post it online somewhere that others can access.")]),e._v(" "),t("p",[t("strong",[e._v("Note:")]),e._v(" if you just want to to share your Data Package with a few others you can just send it directly, for example via email. Since a Data Package is just some files there are as many ways to do this as there are ways to put files online. Here we will just provide some general tips and illustrate some of the most popular publishing options.")]),e._v(" "),t("p",[t("strong",[e._v("Advertise it")])]),e._v(" "),t("p",[e._v("Once you have published your data package you may want to advertise it to others. One way to advertise the existence of your dataset is to add it to the catalog-list file in the "),t("a",{attrs:{href:"https://github.com/datasets/registry/",target:"_blank",rel:"noopener noreferrer"}},[e._v("registry repo"),t("OutboundLink")],1),e._v(", it will then automagically appear as a community dataset on the "),t("a",{attrs:{href:"http://data.okfn.org/data",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.okfn.org"),t("OutboundLink")],1),e._v(" site")]),e._v(" "),t("h2",{attrs:{id:"github-bitbucket-etc"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#github-bitbucket-etc"}},[e._v("#")]),e._v(" Github, Bitbucket etc")]),e._v(" "),t("p",[e._v("One nice option for the more sophisticated is to manage your Data Package in a git or mercurial repo and push it to github, gitorious, bitbucket or similar.")]),e._v(" "),t("h2",{attrs:{id:"s3-google-storage-etc"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#s3-google-storage-etc"}},[e._v("#")]),e._v(" S3, Google Storage etc")]),e._v(" "),t("p",[e._v("Cloud storage like S3 and Google Storage are perfect for storing your Data Packages.")]),e._v(" "),t("h2",{attrs:{id:"google-drive"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#google-drive"}},[e._v("#")]),e._v(" Google Drive")]),e._v(" "),t("p",[e._v("The directory structure of a Data Package shared on Google Drive must be flat; that is, the Data Package must not contain any folders.")]),e._v(" "),t("p",[t("strong",[e._v("OK")])]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("shared-folder\n|-- datapackage.json\n|-- README.md\n|-- data.csv\n")])])]),t("p",[t("strong",[e._v("Not OK")])]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("shared-folder\n|-- datapackage.json\n|-- README.md\n|-- data\n |-- data.csv\n")])])]),t("ol",[t("li",[t("p",[e._v("Upload your Data Package folder ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2424368",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Change your folder’s share setting to "),t("strong",[e._v("Public on the web - Anyone on the Internet can find and view")]),e._v(" ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2494886",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Get a shareable link for your folder ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2494822",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Find your folder’s ID in the link")])])]),e._v(" "),t("ul",[t("li",[t("em",[e._v("Example Link:")]),e._v(" "),t("ul",[t("li",[t("code",[e._v("https://drive.google.com/open?id=0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM&authuser=0")])])])]),e._v(" "),t("li",[t("em",[e._v("Example ID:")]),e._v(" "),t("ul",[t("li",[t("code",[e._v("0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM")])])])])]),e._v(" "),t("ol",{attrs:{start:"5"}},[t("li",[e._v("Your "),t("code",[e._v("datapackage.json")]),e._v(" link is "),t("code",[e._v("https://googledrive.com/host/{ID}/datapackage.json")]),e._v("; for example, using the "),t("em",[e._v("Example ID")]),e._v(" from the previous step, the "),t("code",[e._v("datapackage.json")]),e._v(" link is:")])]),e._v(" "),t("ul",[t("li",[t("code",[e._v("https://googledrive.com/host/0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM/datapackage.json")])])]),e._v(" "),t("h2",{attrs:{id:"dropbox"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#dropbox"}},[e._v("#")]),e._v(" Dropbox")]),e._v(" "),t("p",[e._v("Just upload your files to Dropbox.")]),e._v(" "),t("p",[e._v("You do need to be a bit careful as Dropbox does not always replicate your local file layout in its online URLs. Therefore, make sure you read the "),t("a",{attrs:{href:"#key-tips"}},[e._v("Key Tips")]),e._v(" section below.")]),e._v(" "),t("h2",{attrs:{id:"key-tips"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#key-tips"}},[e._v("#")]),e._v(" Key Tips")]),e._v(" "),t("p",[e._v("However you publish your Data Package there are a few key points to keep in"),t("br"),e._v("\nmind:")]),e._v(" "),t("ul",[t("li",[t("p",[e._v("All the files in the Data Package should be accessible online")])]),e._v(" "),t("li",[t("p",[e._v("The structure of your Data Package should be preserved. Specifically the paths between your "),t("code",[e._v("datapackage.json")]),e._v(" and the data files must be preserved. For example, if your Data Package directory looked like this on disk:")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",[t("code",[e._v("datapackage.json\ndata.csv\nsomedir/other-data.csv\n")])])]),t("p",[e._v("then online it should look like:")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",[t("code",[e._v("http://your.website.com/mydatapackage/datapackage.json\nhttp://your.website.com/mydatapackage/data.csv\nhttp://your.website.com/mydatapackage/somedir/other-data.csv\n")])])]),t("p",[e._v("This can be a problem with services like e.g. Google Drive where files in a given folder don’t have a web address that relates to that folder. The reason we need to preserve relative paths is that when using the Data Package client software will compute the full path from the location of the "),t("code",[e._v("datapackage.json")]),e._v(" itself plus the relative path for the file give in the "),t("code",[e._v("datapackage.json")]),e._v(" resources section.")])])]),e._v(" "),t("p",[e._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),t("a",{attrs:{href:"/tag/field-guide"}},[e._v("Frictionless Data Field Guide")]),e._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[87],{610:function(e,a,t){"use strict";t.r(a);var o=t(29),r=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("This tutorial is about how to publish your Data Package online for others to find and use.")]),e._v(" "),t("p",[e._v("It assumes you have already finished packaging up your data as a Data Package (if not, "),t("RouterLink",{attrs:{to:"/blog/2018/07/16/publish-data-as-data-packages/"}},[e._v("check out the instructions here")]),e._v(").")],1),e._v(" "),t("h2",{attrs:{id:"it-s-only-files-online"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#it-s-only-files-online"}},[e._v("#")]),e._v(" It’s Only Files Online")]),e._v(" "),t("p",[e._v("Publishing your Data Package is incredibly simple: you just need to post it online somewhere that others can access.")]),e._v(" "),t("p",[t("strong",[e._v("Note:")]),e._v(" if you just want to to share your Data Package with a few others you can just send it directly, for example via email. Since a Data Package is just some files there are as many ways to do this as there are ways to put files online. Here we will just provide some general tips and illustrate some of the most popular publishing options.")]),e._v(" "),t("p",[t("strong",[e._v("Advertise it")])]),e._v(" "),t("p",[e._v("Once you have published your data package you may want to advertise it to others. One way to advertise the existence of your dataset is to add it to the catalog-list file in the "),t("a",{attrs:{href:"https://github.com/datasets/registry/",target:"_blank",rel:"noopener noreferrer"}},[e._v("registry repo"),t("OutboundLink")],1),e._v(", it will then automagically appear as a community dataset on the "),t("a",{attrs:{href:"http://data.okfn.org/data",target:"_blank",rel:"noopener noreferrer"}},[e._v("data.okfn.org"),t("OutboundLink")],1),e._v(" site")]),e._v(" "),t("h2",{attrs:{id:"github-bitbucket-etc"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#github-bitbucket-etc"}},[e._v("#")]),e._v(" Github, Bitbucket etc")]),e._v(" "),t("p",[e._v("One nice option for the more sophisticated is to manage your Data Package in a git or mercurial repo and push it to github, gitorious, bitbucket or similar.")]),e._v(" "),t("h2",{attrs:{id:"s3-google-storage-etc"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#s3-google-storage-etc"}},[e._v("#")]),e._v(" S3, Google Storage etc")]),e._v(" "),t("p",[e._v("Cloud storage like S3 and Google Storage are perfect for storing your Data Packages.")]),e._v(" "),t("h2",{attrs:{id:"google-drive"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#google-drive"}},[e._v("#")]),e._v(" Google Drive")]),e._v(" "),t("p",[e._v("The directory structure of a Data Package shared on Google Drive must be flat; that is, the Data Package must not contain any folders.")]),e._v(" "),t("p",[t("strong",[e._v("OK")])]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("shared-folder\n|-- datapackage.json\n|-- README.md\n|-- data.csv\n")])])]),t("p",[t("strong",[e._v("Not OK")])]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",{pre:!0,attrs:{class:"language-text"}},[t("code",[e._v("shared-folder\n|-- datapackage.json\n|-- README.md\n|-- data\n |-- data.csv\n")])])]),t("ol",[t("li",[t("p",[e._v("Upload your Data Package folder ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2424368",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Change your folder’s share setting to "),t("strong",[e._v("Public on the web - Anyone on the Internet can find and view")]),e._v(" ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2494886",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Get a shareable link for your folder ("),t("a",{attrs:{href:"https://support.google.com/drive/answer/2494822",target:"_blank",rel:"noopener noreferrer"}},[e._v("help"),t("OutboundLink")],1),e._v(")")])]),e._v(" "),t("li",[t("p",[e._v("Find your folder’s ID in the link")])])]),e._v(" "),t("ul",[t("li",[t("em",[e._v("Example Link:")]),e._v(" "),t("ul",[t("li",[t("code",[e._v("https://drive.google.com/open?id=0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM&authuser=0")])])])]),e._v(" "),t("li",[t("em",[e._v("Example ID:")]),e._v(" "),t("ul",[t("li",[t("code",[e._v("0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM")])])])])]),e._v(" "),t("ol",{attrs:{start:"5"}},[t("li",[e._v("Your "),t("code",[e._v("datapackage.json")]),e._v(" link is "),t("code",[e._v("https://googledrive.com/host/{ID}/datapackage.json")]),e._v("; for example, using the "),t("em",[e._v("Example ID")]),e._v(" from the previous step, the "),t("code",[e._v("datapackage.json")]),e._v(" link is:")])]),e._v(" "),t("ul",[t("li",[t("code",[e._v("https://googledrive.com/host/0B-f6D5RM8awSfkdtRWpiTlpxdmhPblJRd2NhdHpHMFZPOFZKcWhpT2NkQlZCUlNWUnFwaHM/datapackage.json")])])]),e._v(" "),t("h2",{attrs:{id:"dropbox"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#dropbox"}},[e._v("#")]),e._v(" Dropbox")]),e._v(" "),t("p",[e._v("Just upload your files to Dropbox.")]),e._v(" "),t("p",[e._v("You do need to be a bit careful as Dropbox does not always replicate your local file layout in its online URLs. Therefore, make sure you read the "),t("a",{attrs:{href:"#key-tips"}},[e._v("Key Tips")]),e._v(" section below.")]),e._v(" "),t("h2",{attrs:{id:"key-tips"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#key-tips"}},[e._v("#")]),e._v(" Key Tips")]),e._v(" "),t("p",[e._v("However you publish your Data Package there are a few key points to keep in"),t("br"),e._v("\nmind:")]),e._v(" "),t("ul",[t("li",[t("p",[e._v("All the files in the Data Package should be accessible online")])]),e._v(" "),t("li",[t("p",[e._v("The structure of your Data Package should be preserved. Specifically the paths between your "),t("code",[e._v("datapackage.json")]),e._v(" and the data files must be preserved. For example, if your Data Package directory looked like this on disk:")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",[t("code",[e._v("datapackage.json\ndata.csv\nsomedir/other-data.csv\n")])])]),t("p",[e._v("then online it should look like:")]),e._v(" "),t("div",{staticClass:"language- extra-class"},[t("pre",[t("code",[e._v("http://your.website.com/mydatapackage/datapackage.json\nhttp://your.website.com/mydatapackage/data.csv\nhttp://your.website.com/mydatapackage/somedir/other-data.csv\n")])])]),t("p",[e._v("This can be a problem with services like e.g. Google Drive where files in a given folder don’t have a web address that relates to that folder. The reason we need to preserve relative paths is that when using the Data Package client software will compute the full path from the location of the "),t("code",[e._v("datapackage.json")]),e._v(" itself plus the relative path for the file give in the "),t("code",[e._v("datapackage.json")]),e._v(" resources section.")])])]),e._v(" "),t("p",[e._v("Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive "),t("a",{attrs:{href:"/tag/field-guide"}},[e._v("Frictionless Data Field Guide")]),e._v(".")])])}),[],!1,null,null,null);a.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/88.993fe8bc.js b/assets/js/88.151550eb.js similarity index 98% rename from assets/js/88.993fe8bc.js rename to assets/js/88.151550eb.js index 94e53240b..8ef3f1e78 100644 --- a/assets/js/88.993fe8bc.js +++ b/assets/js/88.151550eb.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[88],{611:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("This grantee profile features Stephan Max for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),a("h3",{attrs:{id:"meet-stephan-max"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#meet-stephan-max"}},[e._v("#")]),e._v(" Meet Stephan Max")]),e._v(" "),a("p",[e._v("Hi, my name is Stephan Max and I am a computer scientist based in Cologne, Germany. I’ve been in the industry for over 10 years now and worked for all kinds of companies, ranging from own startup (crowd-funded online journalism), over big corporate (IBM), to established African business data startup (Asoko Insight). I am now a filter engineer at eyeo trying to make the web a fair, open, and safe place for everybody.")]),e._v(" "),a("p",[e._v("I love working with kids and teenagers, cooking, and doing music—I just recently started drum lessons!")]),e._v(" "),a("h3",{attrs:{id:"how-did-you-first-hear-about-frictionless-data"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data?")]),e._v(" "),a("p",[e._v("I’ve been following the work of the Open Knowledge Foundation for a while now and contributed to the German branch as a mentor for the teenage hackathon weekends project “Jugend Hackt” (Youth Hacks). I first heard about the Frictionless Data program when the OKF announced funding by the Sloan Foundation in 2018. After watching Serah Njambi Rono’s talk on Youtube ("),a("a",{attrs:{href:"https://www.youtube.com/watch?v=3Ranx9Jz0Ro",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.youtube.com/watch?v=3Ranx9Jz0Ro"),a("OutboundLink")],1),e._v(") and reading about the Reproducible Research Tool Fund on Twitter, I knew I wanted to contribute.")]),e._v(" "),a("h3",{attrs:{id:"why-did-you-apply-for-a-tool-fund-grant"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-did-you-apply-for-a-tool-fund-grant"}},[e._v("#")]),e._v(" Why did you apply for a Tool Fund grant?")]),e._v(" "),a("p",[e._v("I first heard about the concepts and challenges around Reproducible Research when taking the MOOC “Data Science” from Johns Hopkins University on Coursera. Since I had my fair share of work inside proprietary data formats and tools, I was happy to see that there are people out there making serious efforts to remedy the loss of attribution and data manipulation steps. After browsing through OKF’s Frictionless Data website, I was even happier that there are actual tools, libraries, and standards already available. Applying for the tool fund and contributing my own humble idea was a no-brainer for me.")]),e._v(" "),a("h3",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),a("p",[e._v("My goal is to add a Data Package import/export add-on to Google Sheets. I understand that a lot of data wrangling is still done in Sheets, Excel, and files being swapped around. A lot of information is lost that way. Where did the data initially come from? How was it manipulated, cleaned, or otherwise altered? How can we feed spreadsheets back into a Reproducible Research pipeline? I think Data Packages is a brilliant format to model and preserve exactly that information. While I do not want to lure people away from the tools they are already familiar with, I think we can bridge the gap between Google Sheets and Frictionless Data by making Data Packages a first-class citizen.")]),e._v(" "),a("h3",{attrs:{id:"how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-google-sheets-add-on"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-google-sheets-add-on"}},[e._v("#")]),e._v(" How can the open data, open source, community engage with the work you are doing around Frictionless Data Google Sheets add-on?")]),e._v(" "),a("p",[e._v("I think open source and data is a unique and wonderful opportunity to get access to the “wisdom of the crowd” and ensure that software and information is and remains accessible to everyone. In the first few weeks I will focus on getting a first prototype and sufficient documentation up, so you can all play with the Data Package import/export add-on as soon as possible. After that, I invite you to take a look at our Github repository ("),a("a",{attrs:{href:"https://github.com/frictionlessdata/googlesheets-datapackage-tools",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/googlesheets-datapackage-tools"),a("OutboundLink")],1),e._v("), play around with the tool, and contribute. Raising an issue, opening a pull request, improving the documentation, giving feedback on the user experience—everything counts! I am so stoked to be part of this Frictionless Data journey and can’t wait to see what we will accomplish. Thank you very much in advance!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[88],{612:function(e,t,a){"use strict";a.r(t);var o=a(29),n=Object(o.a)({},(function(){var e=this,t=e.$createElement,a=e._self._c||t;return a("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[a("p",[e._v("This grantee profile features Stephan Max for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),a("h3",{attrs:{id:"meet-stephan-max"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#meet-stephan-max"}},[e._v("#")]),e._v(" Meet Stephan Max")]),e._v(" "),a("p",[e._v("Hi, my name is Stephan Max and I am a computer scientist based in Cologne, Germany. I’ve been in the industry for over 10 years now and worked for all kinds of companies, ranging from own startup (crowd-funded online journalism), over big corporate (IBM), to established African business data startup (Asoko Insight). I am now a filter engineer at eyeo trying to make the web a fair, open, and safe place for everybody.")]),e._v(" "),a("p",[e._v("I love working with kids and teenagers, cooking, and doing music—I just recently started drum lessons!")]),e._v(" "),a("h3",{attrs:{id:"how-did-you-first-hear-about-frictionless-data"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data?")]),e._v(" "),a("p",[e._v("I’ve been following the work of the Open Knowledge Foundation for a while now and contributed to the German branch as a mentor for the teenage hackathon weekends project “Jugend Hackt” (Youth Hacks). I first heard about the Frictionless Data program when the OKF announced funding by the Sloan Foundation in 2018. After watching Serah Njambi Rono’s talk on Youtube ("),a("a",{attrs:{href:"https://www.youtube.com/watch?v=3Ranx9Jz0Ro",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://www.youtube.com/watch?v=3Ranx9Jz0Ro"),a("OutboundLink")],1),e._v(") and reading about the Reproducible Research Tool Fund on Twitter, I knew I wanted to contribute.")]),e._v(" "),a("h3",{attrs:{id:"why-did-you-apply-for-a-tool-fund-grant"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#why-did-you-apply-for-a-tool-fund-grant"}},[e._v("#")]),e._v(" Why did you apply for a Tool Fund grant?")]),e._v(" "),a("p",[e._v("I first heard about the concepts and challenges around Reproducible Research when taking the MOOC “Data Science” from Johns Hopkins University on Coursera. Since I had my fair share of work inside proprietary data formats and tools, I was happy to see that there are people out there making serious efforts to remedy the loss of attribution and data manipulation steps. After browsing through OKF’s Frictionless Data website, I was even happier that there are actual tools, libraries, and standards already available. Applying for the tool fund and contributing my own humble idea was a no-brainer for me.")]),e._v(" "),a("h3",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),a("p",[e._v("My goal is to add a Data Package import/export add-on to Google Sheets. I understand that a lot of data wrangling is still done in Sheets, Excel, and files being swapped around. A lot of information is lost that way. Where did the data initially come from? How was it manipulated, cleaned, or otherwise altered? How can we feed spreadsheets back into a Reproducible Research pipeline? I think Data Packages is a brilliant format to model and preserve exactly that information. While I do not want to lure people away from the tools they are already familiar with, I think we can bridge the gap between Google Sheets and Frictionless Data by making Data Packages a first-class citizen.")]),e._v(" "),a("h3",{attrs:{id:"how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-google-sheets-add-on"}},[a("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-google-sheets-add-on"}},[e._v("#")]),e._v(" How can the open data, open source, community engage with the work you are doing around Frictionless Data Google Sheets add-on?")]),e._v(" "),a("p",[e._v("I think open source and data is a unique and wonderful opportunity to get access to the “wisdom of the crowd” and ensure that software and information is and remains accessible to everyone. In the first few weeks I will focus on getting a first prototype and sufficient documentation up, so you can all play with the Data Package import/export add-on as soon as possible. After that, I invite you to take a look at our Github repository ("),a("a",{attrs:{href:"https://github.com/frictionlessdata/googlesheets-datapackage-tools",target:"_blank",rel:"noopener noreferrer"}},[e._v("https://github.com/frictionlessdata/googlesheets-datapackage-tools"),a("OutboundLink")],1),e._v("), play around with the tool, and contribute. Raising an issue, opening a pull request, improving the documentation, giving feedback on the user experience—everything counts! I am so stoked to be part of this Frictionless Data journey and can’t wait to see what we will accomplish. Thank you very much in advance!")])])}),[],!1,null,null,null);t.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/89.2288f7b1.js b/assets/js/89.757a6bde.js similarity index 99% rename from assets/js/89.2288f7b1.js rename to assets/js/89.757a6bde.js index f42edcf4b..3e18400c9 100644 --- a/assets/js/89.2288f7b1.js +++ b/assets/js/89.757a6bde.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[89],{610:function(e,a,t){"use strict";t.r(a);var o=t(29),n=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("This grantee profile features Carlos Eduardo Ribas and João Alexandre Peschanski from the Neuroscience Experiments System (NES) for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),t("h2",{attrs:{id:"meet-carlos-joao-and-ridc-neuromat"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#meet-carlos-joao-and-ridc-neuromat"}},[e._v("#")]),e._v(" Meet Carlos, João, and RIDC NeuroMat")]),e._v(" "),t("p",[e._v("João Alexandre Peschanski is the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/Faculdade_C%C3%A1sper_L%C3%ADbero",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cásper Líbero "),t("OutboundLink")],1),e._v("Professor of Digital Media and Computational Journalism and the research supervisor of the dissemination team of the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/NeuroMat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Research, Innovation and Dissemination Center for Neuromathematics"),t("OutboundLink")],1),e._v(" (RIDC NeuroMat), from the São Paulo Research Foundation. He is also the president of the "),t("a",{attrs:{href:"https://meta.wikimedia.org/wiki/Wikimedia_Community_User_Group_Brasil",target:"_blank",rel:"noopener noreferrer"}},[e._v("Wiki Movimento Brasi"),t("OutboundLink")],1),e._v("l, the Brazilian affiliate of the Wikimedia movement. As an academic, he has worked on open crowdsourcing resources as well as structured narratives and semantic web.")]),e._v(" "),t("p",[e._v("Carlos Eduardo Ribas is the leading software developer at the RIDC NeuroMat. He holds a position at the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/University_of_S%C3%A3o_Paulo",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of São Paulo"),t("OutboundLink")],1),e._v(" as a systems analyst. He is the development team leader of the "),t("a",{attrs:{href:"https://github.com/neuromat/nes",target:"_blank",rel:"noopener noreferrer"}},[e._v("Neuroscience Experiments System"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("The RIDC NeuroMat is a research center established in 2013 at the University of São Paulo, in Brazil. Among the core missions of NeuroMat are the development of open-source computational tools, keeping an active role under the context of open knowledge, open science and scientific dissemination. The NeuroMat project was recently renewed until July 31, 2024.")]),e._v(" "),t("h2",{attrs:{id:"how-did-you-first-hear-about-frictionless-data-and-why-did-you-apply-for-a-tool-fund-grant"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data-and-why-did-you-apply-for-a-tool-fund-grant"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data and why did you apply for a Tool Fund grant?")]),e._v(" "),t("p",[e._v("We learned about the Tool Fund from an "),t("a",{attrs:{href:"https://br.okfn.org/2019/02/21/open-knowledge-internacional-anuncia-fundo-para-ferramenta-de-frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("announcement"),t("OutboundLink")],1),e._v(" in Portuguese that was posted by Open Knowledge Brasil. The Frictionless Data Tool Fund grant is also an opportunity to connect with like-minded professionals and their projects, and eventually building and supporting a community deeply engaged with the development of open science and tools.")]),e._v(" "),t("p",[e._v("Public databases are seen as crucial by many members of the neuroscientific community as a means of moving forward more effectively in understanding the functioning and treatment of brain pathologies. However, only open data are not enough, it should be created in a way that can be easily shared and used. Data and metadata should be readable by researchers and machines and Frictionless Data can certainly help with this.")]),e._v(" "),t("p",[e._v("In our case, NES and the NeuroMat Open Database were developed to establish a standard for data collection in neuroscientific experiments. The standardization of data collection is key for reproducible science. The advantages of the Frictionless Data approach for us is fundamentally to be able to standardize data opening and sharing within the scientific community.")]),e._v(" "),t("h2",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),t("p",[e._v("NES is an open-source tool being developed that aims to assist neuroscience research laboratories in routine procedures for data collection. NES was developed to store a large amount of data in a structured way, allowing researchers to seek and share data and metadata of neuroscience experiments. To the best of our knowledge, there are no open-source software tools which provide a way to record data and metadata involved in all steps of an electrophysiological experiment and also register experimental data and its fundamental provenance information. With the anonymization of sensitive information, the data collected using NES can be publicly available through the "),t("a",{attrs:{href:"https://neuromatdb.numec.prp.usp.br/",target:"_blank",rel:"noopener noreferrer"}},[e._v("NeuroMat Open Database"),t("OutboundLink")],1),e._v(", which allows any researcher to reproduce the experiment or simply use the data in a different study.")]),e._v(" "),t("p",[e._v("The system already has some features ready to use, such as Participant registration, Experiment management, Questionnaire management and Data exportation. Some types of data that NES deals with are tasks, stimuli, instructions, EEG, EMG, TMS and questionnaires. Questionnaires are produced with "),t("a",{attrs:{href:"https://www.limesurvey.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("LimeSurvey"),t("OutboundLink")],1),e._v(" (an open-source software).")]),e._v(" "),t("p",[e._v("We propose to change the NES to rely on the philosophy for Frictionless Data. The data exportation module can be adjusted to reflect the set of specifications for data and metadata interoperability and also to be in the Data Package format, as well as any other feature to be in accordance to the philosophy proposed. A major feature to be developed is a JSON file “descriptor” with initial information related to the experiment. However, as sensitive information may be presented at this stage, public access to such data will be done after the anonymization and submission of the experiment to the NeuroMat Open Database.")]),e._v(" "),t("p",[e._v("To bring NES to the philosophy for Frictionless Data opens up an opportunity for scientists to have access not only to a universe of well-documented and labeled data, but also to understand the process that generated this data.")]),e._v(" "),t("h2",{attrs:{id:"how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-and-nes"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-and-nes"}},[e._v("#")]),e._v(" How can the open data, open source, community engage with the work you are doing around Frictionless Data and NES?")]),e._v(" "),t("p",[e._v("The source code is available on "),t("a",{attrs:{href:"https://github.com/neuromat/nes",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),t("OutboundLink")],1),e._v(" ("),t("a",{attrs:{href:"https://nes.readthedocs.io/en/latest/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation link"),t("OutboundLink")],1),e._v("). The development has been done on Django framework. The license is Mozilla Public License Version 2.0. NES is an open source project managed using the Git version control system, so contributing is as easy as forking the project and committing your enhancements.")]),e._v(" "),t("p",[e._v("As the RIDC NeuroMat has published "),t("a",{attrs:{href:"https://neuromat.numec.prp.usp.br/content/in-defense-of-public-scientific-data-sharing-a-neuromat-op-ed/",target:"_blank",rel:"noopener noreferrer"}},[e._v("elsewhere"),t("OutboundLink")],1),e._v(", the work on NES is part of a broader agenda for the development of a database that allows public access to neuroscientific data (physiological measures and functional assessments). We hope our engagement with the Frictionless Data community will open up possibilities of sharing and partnering up for moving this agenda forward.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[89],{611:function(e,a,t){"use strict";t.r(a);var o=t(29),n=Object(o.a)({},(function(){var e=this,a=e.$createElement,t=e._self._c||a;return t("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[t("p",[e._v("This grantee profile features Carlos Eduardo Ribas and João Alexandre Peschanski from the Neuroscience Experiments System (NES) for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.")]),e._v(" "),t("h2",{attrs:{id:"meet-carlos-joao-and-ridc-neuromat"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#meet-carlos-joao-and-ridc-neuromat"}},[e._v("#")]),e._v(" Meet Carlos, João, and RIDC NeuroMat")]),e._v(" "),t("p",[e._v("João Alexandre Peschanski is the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/Faculdade_C%C3%A1sper_L%C3%ADbero",target:"_blank",rel:"noopener noreferrer"}},[e._v("Cásper Líbero "),t("OutboundLink")],1),e._v("Professor of Digital Media and Computational Journalism and the research supervisor of the dissemination team of the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/NeuroMat",target:"_blank",rel:"noopener noreferrer"}},[e._v("Research, Innovation and Dissemination Center for Neuromathematics"),t("OutboundLink")],1),e._v(" (RIDC NeuroMat), from the São Paulo Research Foundation. He is also the president of the "),t("a",{attrs:{href:"https://meta.wikimedia.org/wiki/Wikimedia_Community_User_Group_Brasil",target:"_blank",rel:"noopener noreferrer"}},[e._v("Wiki Movimento Brasi"),t("OutboundLink")],1),e._v("l, the Brazilian affiliate of the Wikimedia movement. As an academic, he has worked on open crowdsourcing resources as well as structured narratives and semantic web.")]),e._v(" "),t("p",[e._v("Carlos Eduardo Ribas is the leading software developer at the RIDC NeuroMat. He holds a position at the "),t("a",{attrs:{href:"https://en.wikipedia.org/wiki/University_of_S%C3%A3o_Paulo",target:"_blank",rel:"noopener noreferrer"}},[e._v("University of São Paulo"),t("OutboundLink")],1),e._v(" as a systems analyst. He is the development team leader of the "),t("a",{attrs:{href:"https://github.com/neuromat/nes",target:"_blank",rel:"noopener noreferrer"}},[e._v("Neuroscience Experiments System"),t("OutboundLink")],1),e._v(".")]),e._v(" "),t("p",[e._v("The RIDC NeuroMat is a research center established in 2013 at the University of São Paulo, in Brazil. Among the core missions of NeuroMat are the development of open-source computational tools, keeping an active role under the context of open knowledge, open science and scientific dissemination. The NeuroMat project was recently renewed until July 31, 2024.")]),e._v(" "),t("h2",{attrs:{id:"how-did-you-first-hear-about-frictionless-data-and-why-did-you-apply-for-a-tool-fund-grant"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-did-you-first-hear-about-frictionless-data-and-why-did-you-apply-for-a-tool-fund-grant"}},[e._v("#")]),e._v(" How did you first hear about Frictionless Data and why did you apply for a Tool Fund grant?")]),e._v(" "),t("p",[e._v("We learned about the Tool Fund from an "),t("a",{attrs:{href:"https://br.okfn.org/2019/02/21/open-knowledge-internacional-anuncia-fundo-para-ferramenta-de-frictionless-data/",target:"_blank",rel:"noopener noreferrer"}},[e._v("announcement"),t("OutboundLink")],1),e._v(" in Portuguese that was posted by Open Knowledge Brasil. The Frictionless Data Tool Fund grant is also an opportunity to connect with like-minded professionals and their projects, and eventually building and supporting a community deeply engaged with the development of open science and tools.")]),e._v(" "),t("p",[e._v("Public databases are seen as crucial by many members of the neuroscientific community as a means of moving forward more effectively in understanding the functioning and treatment of brain pathologies. However, only open data are not enough, it should be created in a way that can be easily shared and used. Data and metadata should be readable by researchers and machines and Frictionless Data can certainly help with this.")]),e._v(" "),t("p",[e._v("In our case, NES and the NeuroMat Open Database were developed to establish a standard for data collection in neuroscientific experiments. The standardization of data collection is key for reproducible science. The advantages of the Frictionless Data approach for us is fundamentally to be able to standardize data opening and sharing within the scientific community.")]),e._v(" "),t("h2",{attrs:{id:"what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#what-specific-issues-are-you-looking-to-address-with-the-tool-fund"}},[e._v("#")]),e._v(" What specific issues are you looking to address with the Tool Fund?")]),e._v(" "),t("p",[e._v("NES is an open-source tool being developed that aims to assist neuroscience research laboratories in routine procedures for data collection. NES was developed to store a large amount of data in a structured way, allowing researchers to seek and share data and metadata of neuroscience experiments. To the best of our knowledge, there are no open-source software tools which provide a way to record data and metadata involved in all steps of an electrophysiological experiment and also register experimental data and its fundamental provenance information. With the anonymization of sensitive information, the data collected using NES can be publicly available through the "),t("a",{attrs:{href:"https://neuromatdb.numec.prp.usp.br/",target:"_blank",rel:"noopener noreferrer"}},[e._v("NeuroMat Open Database"),t("OutboundLink")],1),e._v(", which allows any researcher to reproduce the experiment or simply use the data in a different study.")]),e._v(" "),t("p",[e._v("The system already has some features ready to use, such as Participant registration, Experiment management, Questionnaire management and Data exportation. Some types of data that NES deals with are tasks, stimuli, instructions, EEG, EMG, TMS and questionnaires. Questionnaires are produced with "),t("a",{attrs:{href:"https://www.limesurvey.org/",target:"_blank",rel:"noopener noreferrer"}},[e._v("LimeSurvey"),t("OutboundLink")],1),e._v(" (an open-source software).")]),e._v(" "),t("p",[e._v("We propose to change the NES to rely on the philosophy for Frictionless Data. The data exportation module can be adjusted to reflect the set of specifications for data and metadata interoperability and also to be in the Data Package format, as well as any other feature to be in accordance to the philosophy proposed. A major feature to be developed is a JSON file “descriptor” with initial information related to the experiment. However, as sensitive information may be presented at this stage, public access to such data will be done after the anonymization and submission of the experiment to the NeuroMat Open Database.")]),e._v(" "),t("p",[e._v("To bring NES to the philosophy for Frictionless Data opens up an opportunity for scientists to have access not only to a universe of well-documented and labeled data, but also to understand the process that generated this data.")]),e._v(" "),t("h2",{attrs:{id:"how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-and-nes"}},[t("a",{staticClass:"header-anchor",attrs:{href:"#how-can-the-open-data-open-source-community-engage-with-the-work-you-are-doing-around-frictionless-data-and-nes"}},[e._v("#")]),e._v(" How can the open data, open source, community engage with the work you are doing around Frictionless Data and NES?")]),e._v(" "),t("p",[e._v("The source code is available on "),t("a",{attrs:{href:"https://github.com/neuromat/nes",target:"_blank",rel:"noopener noreferrer"}},[e._v("GitHub"),t("OutboundLink")],1),e._v(" ("),t("a",{attrs:{href:"https://nes.readthedocs.io/en/latest/",target:"_blank",rel:"noopener noreferrer"}},[e._v("documentation link"),t("OutboundLink")],1),e._v("). The development has been done on Django framework. The license is Mozilla Public License Version 2.0. NES is an open source project managed using the Git version control system, so contributing is as easy as forking the project and committing your enhancements.")]),e._v(" "),t("p",[e._v("As the RIDC NeuroMat has published "),t("a",{attrs:{href:"https://neuromat.numec.prp.usp.br/content/in-defense-of-public-scientific-data-sharing-a-neuromat-op-ed/",target:"_blank",rel:"noopener noreferrer"}},[e._v("elsewhere"),t("OutboundLink")],1),e._v(", the work on NES is part of a broader agenda for the development of a database that allows public access to neuroscientific data (physiological measures and functional assessments). We hope our engagement with the Frictionless Data community will open up possibilities of sharing and partnering up for moving this agenda forward.")])])}),[],!1,null,null,null);a.default=n.exports}}]); \ No newline at end of file diff --git a/assets/js/94.d1f3ce48.js b/assets/js/94.3a1b2358.js similarity index 96% rename from assets/js/94.d1f3ce48.js rename to assets/js/94.3a1b2358.js index 3305b21b1..7862786d3 100644 --- a/assets/js/94.d1f3ce48.js +++ b/assets/js/94.3a1b2358.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[94],{621:function(t,e,n){"use strict";n.r(e);var a=n(29),o=Object(a.a)({},(function(){var t=this,e=t.$createElement,n=t._self._c||e;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("Hi there, My name is "),n("a",{attrs:{href:"https://giftegwuenu.com",target:"_blank",rel:"noopener noreferrer"}},[t._v("Gift Egwuenu"),n("OutboundLink")],1),t._v(" and I’m super excited to share I joined "),n("a",{attrs:{href:"https://datopian.com/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Datopian"),n("OutboundLink")],1),t._v(" as a Frontend Developer and Developer Evangelist! 🎉")]),t._v(" "),n("p",[n("a",{attrs:{href:"https://frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data"),n("OutboundLink")],1),t._v(" is an open-source toolkit that brings simplicity and grace to the data experience. We want every Data Engineer or Data Scientist to know about it and benefit from it.")]),t._v(" "),n("p",[t._v("Part of my job involves spreading the word about Frictionless Data and encouraging community involvement by sharing what you can achieve with the toolkit 😃")]),t._v(" "),n("p",[t._v("My other day-to-day activities include the following and more:")]),t._v(" "),n("ul",[n("li",[t._v("Working on Frictionless Data tools")]),t._v(" "),n("li",[t._v("Working closely and interacting with the Frictionless Data Community via (chats, remote hangouts, and in-person events)")]),t._v(" "),n("li",[t._v("Writing documentation, guide and blog posts for Frictionless Data")])]),t._v(" "),n("p",[t._v("I’m glad I get to do this as a full-time job because I’m passionate about teaching and learning 🚀 and I’m excited to be a part of the "),n("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data community"),n("OutboundLink")],1),t._v(" where I get to contribute, share, learn and interact with the data community.")])])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[94],{622:function(t,e,n){"use strict";n.r(e);var a=n(29),o=Object(a.a)({},(function(){var t=this,e=t.$createElement,n=t._self._c||e;return n("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[n("p",[t._v("Hi there, My name is "),n("a",{attrs:{href:"https://giftegwuenu.com",target:"_blank",rel:"noopener noreferrer"}},[t._v("Gift Egwuenu"),n("OutboundLink")],1),t._v(" and I’m super excited to share I joined "),n("a",{attrs:{href:"https://datopian.com/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Datopian"),n("OutboundLink")],1),t._v(" as a Frontend Developer and Developer Evangelist! 🎉")]),t._v(" "),n("p",[n("a",{attrs:{href:"https://frictionlessdata.io",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data"),n("OutboundLink")],1),t._v(" is an open-source toolkit that brings simplicity and grace to the data experience. We want every Data Engineer or Data Scientist to know about it and benefit from it.")]),t._v(" "),n("p",[t._v("Part of my job involves spreading the word about Frictionless Data and encouraging community involvement by sharing what you can achieve with the toolkit 😃")]),t._v(" "),n("p",[t._v("My other day-to-day activities include the following and more:")]),t._v(" "),n("ul",[n("li",[t._v("Working on Frictionless Data tools")]),t._v(" "),n("li",[t._v("Working closely and interacting with the Frictionless Data Community via (chats, remote hangouts, and in-person events)")]),t._v(" "),n("li",[t._v("Writing documentation, guide and blog posts for Frictionless Data")])]),t._v(" "),n("p",[t._v("I’m glad I get to do this as a full-time job because I’m passionate about teaching and learning 🚀 and I’m excited to be a part of the "),n("a",{attrs:{href:"https://frictionlessdata.io/",target:"_blank",rel:"noopener noreferrer"}},[t._v("Frictionless Data community"),n("OutboundLink")],1),t._v(" where I get to contribute, share, learn and interact with the data community.")])])}),[],!1,null,null,null);e.default=o.exports}}]); \ No newline at end of file diff --git a/assets/js/95.d8ab2d48.js b/assets/js/95.27d7db14.js similarity index 96% rename from assets/js/95.d8ab2d48.js rename to assets/js/95.27d7db14.js index 2258b17aa..c073f62b6 100644 --- a/assets/js/95.d8ab2d48.js +++ b/assets/js/95.27d7db14.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[95],{625:function(e,t,o){"use strict";o.r(t);var n=o(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("img",{attrs:{src:"https://i.imgur.com/rls4pCT.jpg",alt:"Photo by William White on Unsplash"}})]),e._v(" "),o("p",[o("strong",[e._v("We are thrilled to announce we’ll be hosting a virtual community hangout to share recent developments in the Frictionless Data community. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")])]),e._v(" "),o("p",[o("strong",[e._v("Here are some key discussions we hope to cover:")])]),e._v(" "),o("ul",[o("li",[e._v("Introductions & share the purpose of this hangout.")]),e._v(" "),o("li",[e._v("Share the update on the new website release and general Frictionless Data related updates.")]),e._v(" "),o("li",[e._v("Have community members share their thoughts and general feedback on Frictionless Data.")]),e._v(" "),o("li",[e._v("Share information about CSV Conf.")])]),e._v(" "),o("p",[e._v("The hangout is scheduled to happen on "),o("strong",[e._v("20th April 2020 at 5 pm CET")]),e._v(". If you would like to attend, "),o("a",{attrs:{href:"https://zoom.us/meeting/register/tJEqdOyspzgvG9wlVM_3Z_6yyL8wzc-v03Bq",target:"_blank",rel:"noopener noreferrer"}},[e._v("you can sign up for the event in advance here."),o("OutboundLink")],1),e._v(" Everyone is welcome.")]),e._v(" "),o("p",[e._v("Looking forward to seeing you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[95],{623:function(e,t,o){"use strict";o.r(t);var n=o(29),r=Object(n.a)({},(function(){var e=this,t=e.$createElement,o=e._self._c||t;return o("ContentSlotsDistributor",{attrs:{"slot-key":e.$parent.slotKey}},[o("p",[o("img",{attrs:{src:"https://i.imgur.com/rls4pCT.jpg",alt:"Photo by William White on Unsplash"}})]),e._v(" "),o("p",[o("strong",[e._v("We are thrilled to announce we’ll be hosting a virtual community hangout to share recent developments in the Frictionless Data community. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")])]),e._v(" "),o("p",[o("strong",[e._v("Here are some key discussions we hope to cover:")])]),e._v(" "),o("ul",[o("li",[e._v("Introductions & share the purpose of this hangout.")]),e._v(" "),o("li",[e._v("Share the update on the new website release and general Frictionless Data related updates.")]),e._v(" "),o("li",[e._v("Have community members share their thoughts and general feedback on Frictionless Data.")]),e._v(" "),o("li",[e._v("Share information about CSV Conf.")])]),e._v(" "),o("p",[e._v("The hangout is scheduled to happen on "),o("strong",[e._v("20th April 2020 at 5 pm CET")]),e._v(". If you would like to attend, "),o("a",{attrs:{href:"https://zoom.us/meeting/register/tJEqdOyspzgvG9wlVM_3Z_6yyL8wzc-v03Bq",target:"_blank",rel:"noopener noreferrer"}},[e._v("you can sign up for the event in advance here."),o("OutboundLink")],1),e._v(" Everyone is welcome.")]),e._v(" "),o("p",[e._v("Looking forward to seeing you there!")])])}),[],!1,null,null,null);t.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/99.9e674d82.js b/assets/js/99.27adfc5c.js similarity index 96% rename from assets/js/99.9e674d82.js rename to assets/js/99.27adfc5c.js index 0a9d9924b..5d028ae2c 100644 --- a/assets/js/99.9e674d82.js +++ b/assets/js/99.27adfc5c.js @@ -1 +1 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[99],{629:function(t,e,o){"use strict";o.r(e);var n=o(29),r=Object(n.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),o("p",[o("img",{attrs:{src:"/img/blog/community.jpg",alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),o("p",[t._v("The hangout is scheduled to hold on "),o("strong",[t._v("25th June 2020 at 5 pm BST / 4 PM UTC")]),t._v(". If you would like to attend the hangout, "),o("a",{attrs:{href:"https://forms.gle/3wEGBy2q4Q6pdNfK8",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event using this form"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("Looking forward to seeing you there!")]),t._v(" "),o("h2",{attrs:{id:"community-hangout-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#community-hangout-recording"}},[t._v("#")]),t._v(" Community Hangout Recording")]),t._v(" "),o("p",[t._v("If you missed the community hangout and will like to catch up on what was discussed, here’s a recording of the hangout.")]),t._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/xBu855rFiOM",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file +(window.webpackJsonp=window.webpackJsonp||[]).push([[99],{630:function(t,e,o){"use strict";o.r(e);var n=o(29),r=Object(n.a)({},(function(){var t=this,e=t.$createElement,o=t._self._c||e;return o("ContentSlotsDistributor",{attrs:{"slot-key":t.$parent.slotKey}},[o("p",[t._v("We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.")]),t._v(" "),o("p",[o("img",{attrs:{src:"/img/blog/community.jpg",alt:"Photo by Perry Grone on Unsplash"}})]),t._v(" "),o("p",[t._v("The hangout is scheduled to hold on "),o("strong",[t._v("25th June 2020 at 5 pm BST / 4 PM UTC")]),t._v(". If you would like to attend the hangout, "),o("a",{attrs:{href:"https://forms.gle/3wEGBy2q4Q6pdNfK8",target:"_blank",rel:"noopener noreferrer"}},[t._v("you can sign up for the event using this form"),o("OutboundLink")],1)]),t._v(" "),o("p",[t._v("Looking forward to seeing you there!")]),t._v(" "),o("h2",{attrs:{id:"community-hangout-recording"}},[o("a",{staticClass:"header-anchor",attrs:{href:"#community-hangout-recording"}},[t._v("#")]),t._v(" Community Hangout Recording")]),t._v(" "),o("p",[t._v("If you missed the community hangout and will like to catch up on what was discussed, here’s a recording of the hangout.")]),t._v(" "),o("iframe",{attrs:{width:"560",height:"315",src:"https://www.youtube.com/embed/xBu855rFiOM",frameborder:"0",allow:"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture",allowfullscreen:""}})])}),[],!1,null,null,null);e.default=r.exports}}]); \ No newline at end of file diff --git a/assets/js/app.86770f7e.js b/assets/js/app.a2902b81.js similarity index 55% rename from assets/js/app.86770f7e.js rename to assets/js/app.a2902b81.js index 0fa09ad1d..7016cebc7 100644 --- a/assets/js/app.86770f7e.js +++ b/assets/js/app.a2902b81.js @@ -1,8 +1,8 @@ -(window.webpackJsonp=window.webpackJsonp||[]).push([[0],[]]);!function(t){function e(e){for(var a,i,l=e[0],c=e[1],s=e[2],g=0,u=[];g=0&&Math.floor(e)===e&&isFinite(t)}function d(t){return r(t)&&"function"==typeof t.then&&"function"==typeof t.catch}function m(t){return null==t?"":Array.isArray(t)||p(t)&&t.toString===s?JSON.stringify(t,null,2):String(t)}function h(t){var e=parseFloat(t);return isNaN(e)?t:e}function f(t,e){for(var o=Object.create(null),a=t.split(","),n=0;n-1)return t.splice(o,1)}}var v=Object.prototype.hasOwnProperty;function w(t,e){return v.call(t,e)}function k(t){var e=Object.create(null);return function(o){return e[o]||(e[o]=t(o))}}var P=/-(\w)/g,D=k((function(t){return t.replace(P,(function(t,e){return e?e.toUpperCase():""}))})),F=k((function(t){return t.charAt(0).toUpperCase()+t.slice(1)})),x=/\B([A-Z])/g,A=k((function(t){return t.replace(x,"-$1").toLowerCase()}));var T=Function.prototype.bind?function(t,e){return t.bind(e)}:function(t,e){function o(o){var a=arguments.length;return a?a>1?t.apply(e,arguments):t.call(e,o):t.call(e)}return o._length=t.length,o};function C(t,e){e=e||0;for(var o=t.length-e,a=new Array(o);o--;)a[o]=t[o+e];return a}function _(t,e){for(var o in e)t[o]=e[o];return t}function E(t){for(var e={},o=0;o0,Y=K&&K.indexOf("edge/")>0,Q=(K&&K.indexOf("android"),K&&/iphone|ipad|ipod|ios/.test(K)||"ios"===Z),X=(K&&/chrome\/\d+/.test(K),K&&/phantomjs/.test(K),K&&K.match(/firefox\/(\d+)/)),tt={}.watch,et=!1;if(V)try{var ot={};Object.defineProperty(ot,"passive",{get:function(){et=!0}}),window.addEventListener("test-passive",null,ot)}catch(t){}var at=function(){return void 0===G&&(G=!V&&!J&&"undefined"!=typeof global&&(global.process&&"server"===global.process.env.VUE_ENV)),G},nt=V&&window.__VUE_DEVTOOLS_GLOBAL_HOOK__;function rt(t){return"function"==typeof t&&/native code/.test(t.toString())}var it,lt="undefined"!=typeof Symbol&&rt(Symbol)&&"undefined"!=typeof Reflect&&rt(Reflect.ownKeys);it="undefined"!=typeof Set&&rt(Set)?Set:function(){function t(){this.set=Object.create(null)}return t.prototype.has=function(t){return!0===this.set[t]},t.prototype.add=function(t){this.set[t]=!0},t.prototype.clear=function(){this.set=Object.create(null)},t}();var ct=S,st=0,pt=function(){this.id=st++,this.subs=[]};pt.prototype.addSub=function(t){this.subs.push(t)},pt.prototype.removeSub=function(t){b(this.subs,t)},pt.prototype.depend=function(){pt.target&&pt.target.addDep(this)},pt.prototype.notify=function(){var t=this.subs.slice();for(var e=0,o=t.length;e-1)if(r&&!w(n,"default"))i=!1;else if(""===i||i===A(t)){var c=Wt(String,n.type);(c<0||l0&&(pe((c=t(c,(o||"")+"_"+a))[0])&&pe(p)&&(g[s]=yt(p.text+c[0].text),c.shift()),g.push.apply(g,c)):l(c)?pe(p)?g[s]=yt(p.text+c):""!==c&&g.push(yt(c)):pe(c)&&pe(p)?g[s]=yt(p.text+c.text):(i(e._isVList)&&r(c.tag)&&n(c.key)&&r(o)&&(c.key="__vlist"+o+"_"+a+"__"),g.push(c)));return g}(t):void 0}function pe(t){return r(t)&&r(t.text)&&!1===t.isComment}function ge(t,e){if(t){for(var o=Object.create(null),a=lt?Reflect.ownKeys(t):Object.keys(t),n=0;n0,i=t?!!t.$stable:!r,l=t&&t.$key;if(t){if(t._normalized)return t._normalized;if(i&&o&&o!==a&&l===o.$key&&!r&&!o.$hasNormal)return o;for(var c in n={},t)t[c]&&"$"!==c[0]&&(n[c]=he(e,c,t[c]))}else n={};for(var s in e)s in n||(n[s]=fe(e,s));return t&&Object.isExtensible(t)&&(t._normalized=n),W(n,"$stable",i),W(n,"$key",l),W(n,"$hasNormal",r),n}function he(t,e,o){var a=function(){var t=arguments.length?o.apply(null,arguments):o({});return(t=t&&"object"==typeof t&&!Array.isArray(t)?[t]:se(t))&&(0===t.length||1===t.length&&t[0].isComment)?void 0:t};return o.proxy&&Object.defineProperty(t,e,{get:a,enumerable:!0,configurable:!0}),a}function fe(t,e){return function(){return t[e]}}function ye(t,e){var o,a,n,i,l;if(Array.isArray(t)||"string"==typeof t)for(o=new Array(t.length),a=0,n=t.length;adocument.createEvent("Event").timeStamp&&(lo=function(){return co.now()})}function so(){var t,e;for(io=lo(),no=!0,to.sort((function(t,e){return t.id-e.id})),ro=0;roro&&to[o].id>t.id;)o--;to.splice(o+1,0,t)}else to.push(t);ao||(ao=!0,ee(so))}}(this)},go.prototype.run=function(){if(this.active){var t=this.get();if(t!==this.value||c(t)||this.deep){var e=this.value;if(this.value=t,this.user)try{this.cb.call(this.vm,t,e)}catch(t){Ht(t,this.vm,'callback for watcher "'+this.expression+'"')}else this.cb.call(this.vm,t,e)}}},go.prototype.evaluate=function(){this.value=this.get(),this.dirty=!1},go.prototype.depend=function(){for(var t=this.deps.length;t--;)this.deps[t].depend()},go.prototype.teardown=function(){if(this.active){this.vm._isBeingDestroyed||b(this.vm._watchers,this);for(var t=this.deps.length;t--;)this.deps[t].removeSub(this);this.active=!1}};var uo={enumerable:!0,configurable:!0,get:S,set:S};function mo(t,e,o){uo.get=function(){return this[e][o]},uo.set=function(t){this[e][o]=t},Object.defineProperty(t,o,uo)}function ho(t){t._watchers=[];var e=t.$options;e.props&&function(t,e){var o=t.$options.propsData||{},a=t._props={},n=t.$options._propKeys=[];t.$parent&&Dt(!1);var r=function(r){n.push(r);var i=$t(r,e,o,t);At(a,r,i),r in t||mo(t,"_props",r)};for(var i in e)r(i);Dt(!0)}(t,e.props),e.methods&&function(t,e){t.$options.props;for(var o in e)t[o]="function"!=typeof e[o]?S:T(e[o],t)}(t,e.methods),e.data?function(t){var e=t.$options.data;p(e=t._data="function"==typeof e?function(t,e){ut();try{return t.call(e,e)}catch(t){return Ht(t,e,"data()"),{}}finally{dt()}}(e,t):e||{})||(e={});var o=Object.keys(e),a=t.$options.props,n=(t.$options.methods,o.length);for(;n--;){var r=o[n];0,a&&w(a,r)||(i=void 0,36!==(i=(r+"").charCodeAt(0))&&95!==i&&mo(t,"_data",r))}var i;xt(e,!0)}(t):xt(t._data={},!0),e.computed&&function(t,e){var o=t._computedWatchers=Object.create(null),a=at();for(var n in e){var r=e[n],i="function"==typeof r?r:r.get;0,a||(o[n]=new go(t,i||S,S,fo)),n in t||yo(t,n,r)}}(t,e.computed),e.watch&&e.watch!==tt&&function(t,e){for(var o in e){var a=e[o];if(Array.isArray(a))for(var n=0;n-1:"string"==typeof t?t.split(",").indexOf(e)>-1:!!g(t)&&t.test(e)}function To(t,e){var o=t.cache,a=t.keys,n=t._vnode;for(var r in o){var i=o[r];if(i){var l=xo(i.componentOptions);l&&!e(l)&&Co(o,r,a,n)}}}function Co(t,e,o,a){var n=t[e];!n||a&&n.tag===a.tag||n.componentInstance.$destroy(),t[e]=null,b(o,e)}!function(t){t.prototype._init=function(t){var e=this;e._uid=ko++,e._isVue=!0,t&&t._isComponent?function(t,e){var o=t.$options=Object.create(t.constructor.options),a=e._parentVnode;o.parent=e.parent,o._parentVnode=a;var n=a.componentOptions;o.propsData=n.propsData,o._parentListeners=n.listeners,o._renderChildren=n.children,o._componentTag=n.tag,e.render&&(o.render=e.render,o.staticRenderFns=e.staticRenderFns)}(e,t):e.$options=Lt(Po(e.constructor),t||{},e),e._renderProxy=e,e._self=e,function(t){var e=t.$options,o=e.parent;if(o&&!e.abstract){for(;o.$options.abstract&&o.$parent;)o=o.$parent;o.$children.push(t)}t.$parent=o,t.$root=o?o.$root:t,t.$children=[],t.$refs={},t._watcher=null,t._inactive=null,t._directInactive=!1,t._isMounted=!1,t._isDestroyed=!1,t._isBeingDestroyed=!1}(e),function(t){t._events=Object.create(null),t._hasHookEvent=!1;var e=t.$options._parentListeners;e&&Ke(t,e)}(e),function(t){t._vnode=null,t._staticTrees=null;var e=t.$options,o=t.$vnode=e._parentVnode,n=o&&o.context;t.$slots=ue(e._renderChildren,n),t.$scopedSlots=a,t._c=function(e,o,a,n){return Ue(t,e,o,a,n,!1)},t.$createElement=function(e,o,a,n){return Ue(t,e,o,a,n,!0)};var r=o&&o.data;At(t,"$attrs",r&&r.attrs||a,null,!0),At(t,"$listeners",e._parentListeners||a,null,!0)}(e),Xe(e,"beforeCreate"),function(t){var e=ge(t.$options.inject,t);e&&(Dt(!1),Object.keys(e).forEach((function(o){At(t,o,e[o])})),Dt(!0))}(e),ho(e),function(t){var e=t.$options.provide;e&&(t._provided="function"==typeof e?e.call(t):e)}(e),Xe(e,"created"),e.$options.el&&e.$mount(e.$options.el)}}(Do),function(t){var e={get:function(){return this._data}},o={get:function(){return this._props}};Object.defineProperty(t.prototype,"$data",e),Object.defineProperty(t.prototype,"$props",o),t.prototype.$set=Tt,t.prototype.$delete=Ct,t.prototype.$watch=function(t,e,o){if(p(e))return wo(this,t,e,o);(o=o||{}).user=!0;var a=new go(this,t,e,o);if(o.immediate)try{e.call(this,a.value)}catch(t){Ht(t,this,'callback for immediate watcher "'+a.expression+'"')}return function(){a.teardown()}}}(Do),function(t){var e=/^hook:/;t.prototype.$on=function(t,o){var a=this;if(Array.isArray(t))for(var n=0,r=t.length;n1?C(o):o;for(var a=C(arguments,1),n='event handler for "'+t+'"',r=0,i=o.length;rparseInt(this.max)&&Co(i,l[0],l,this._vnode)),e.data.keepAlive=!0}return e||t&&t[0]}}};!function(t){var e={get:function(){return U}};Object.defineProperty(t,"config",e),t.util={warn:ct,extend:_,mergeOptions:Lt,defineReactive:At},t.set=Tt,t.delete=Ct,t.nextTick=ee,t.observable=function(t){return xt(t),t},t.options=Object.create(null),I.forEach((function(e){t.options[e+"s"]=Object.create(null)})),t.options._base=t,_(t.options.components,Eo),function(t){t.use=function(t){var e=this._installedPlugins||(this._installedPlugins=[]);if(e.indexOf(t)>-1)return this;var o=C(arguments,1);return o.unshift(this),"function"==typeof t.install?t.install.apply(t,o):"function"==typeof t&&t.apply(null,o),e.push(t),this}}(t),function(t){t.mixin=function(t){return this.options=Lt(this.options,t),this}}(t),Fo(t),function(t){I.forEach((function(e){t[e]=function(t,o){return o?("component"===e&&p(o)&&(o.name=o.name||t,o=this.options._base.extend(o)),"directive"===e&&"function"==typeof o&&(o={bind:o,update:o}),this.options[e+"s"][t]=o,o):this.options[e+"s"][t]}}))}(t)}(Do),Object.defineProperty(Do.prototype,"$isServer",{get:at}),Object.defineProperty(Do.prototype,"$ssrContext",{get:function(){return this.$vnode&&this.$vnode.ssrContext}}),Object.defineProperty(Do,"FunctionalRenderContext",{value:je}),Do.version="2.6.12";var So=f("style,class"),jo=f("input,textarea,option,select,progress"),Mo=f("contenteditable,draggable,spellcheck"),Oo=f("events,caret,typing,plaintext-only"),Ro=f("allowfullscreen,async,autofocus,autoplay,checked,compact,controls,declare,default,defaultchecked,defaultmuted,defaultselected,defer,disabled,enabled,formnovalidate,hidden,indeterminate,inert,ismap,itemscope,loop,multiple,muted,nohref,noresize,noshade,novalidate,nowrap,open,pauseonexit,readonly,required,reversed,scoped,seamless,selected,sortable,translate,truespeed,typemustmatch,visible"),Lo="http://www.w3.org/1999/xlink",Io=function(t){return":"===t.charAt(5)&&"xlink"===t.slice(0,5)},$o=function(t){return Io(t)?t.slice(6,t.length):""},Uo=function(t){return null==t||!1===t};function No(t){for(var e=t.data,o=t,a=t;r(a.componentInstance);)(a=a.componentInstance._vnode)&&a.data&&(e=Wo(a.data,e));for(;r(o=o.parent);)o&&o.data&&(e=Wo(e,o.data));return function(t,e){if(r(t)||r(e))return Ho(t,Go(e));return""}(e.staticClass,e.class)}function Wo(t,e){return{staticClass:Ho(t.staticClass,e.staticClass),class:r(t.class)?[t.class,e.class]:e.class}}function Ho(t,e){return t?e?t+" "+e:t:e||""}function Go(t){return Array.isArray(t)?function(t){for(var e,o="",a=0,n=t.length;a-1?ua(t,e,o):Ro(e)?Uo(o)?t.removeAttribute(e):(o="allowfullscreen"===e&&"EMBED"===t.tagName?"true":e,t.setAttribute(e,o)):Mo(e)?t.setAttribute(e,function(t,e){return Uo(e)||"false"===e?"false":"contenteditable"===t&&Oo(e)?e:"true"}(e,o)):Io(e)?Uo(o)?t.removeAttributeNS(Lo,$o(e)):t.setAttributeNS(Lo,e,o):ua(t,e,o)}function ua(t,e,o){if(Uo(o))t.removeAttribute(e);else{if(z&&!q&&"TEXTAREA"===t.tagName&&"placeholder"===e&&""!==o&&!t.__ieph){var a=function(e){e.stopImmediatePropagation(),t.removeEventListener("input",a)};t.addEventListener("input",a),t.__ieph=!0}t.setAttribute(e,o)}}var da={create:pa,update:pa};function ma(t,e){var o=e.elm,a=e.data,i=t.data;if(!(n(a.staticClass)&&n(a.class)&&(n(i)||n(i.staticClass)&&n(i.class)))){var l=No(e),c=o._transitionClasses;r(c)&&(l=Ho(l,Go(c))),l!==o._prevClass&&(o.setAttribute("class",l),o._prevClass=l)}}var ha,fa={create:ma,update:ma};function ya(t,e,o){var a=ha;return function n(){var r=e.apply(null,arguments);null!==r&&wa(t,n,o,a)}}var ba=Zt&&!(X&&Number(X[1])<=53);function va(t,e,o,a){if(ba){var n=io,r=e;e=r._wrapper=function(t){if(t.target===t.currentTarget||t.timeStamp>=n||t.timeStamp<=0||t.target.ownerDocument!==document)return r.apply(this,arguments)}}ha.addEventListener(t,e,et?{capture:o,passive:a}:o)}function wa(t,e,o,a){(a||ha).removeEventListener(t,e._wrapper||e,o)}function ka(t,e){if(!n(t.data.on)||!n(e.data.on)){var o=e.data.on||{},a=t.data.on||{};ha=e.elm,function(t){if(r(t.__r)){var e=z?"change":"input";t[e]=[].concat(t.__r,t[e]||[]),delete t.__r}r(t.__c)&&(t.change=[].concat(t.__c,t.change||[]),delete t.__c)}(o),ie(o,a,va,wa,ya,e.context),ha=void 0}}var Pa,Da={create:ka,update:ka};function Fa(t,e){if(!n(t.data.domProps)||!n(e.data.domProps)){var o,a,i=e.elm,l=t.data.domProps||{},c=e.data.domProps||{};for(o in r(c.__ob__)&&(c=e.data.domProps=_({},c)),l)o in c||(i[o]="");for(o in c){if(a=c[o],"textContent"===o||"innerHTML"===o){if(e.children&&(e.children.length=0),a===l[o])continue;1===i.childNodes.length&&i.removeChild(i.childNodes[0])}if("value"===o&&"PROGRESS"!==i.tagName){i._value=a;var s=n(a)?"":String(a);xa(i,s)&&(i.value=s)}else if("innerHTML"===o&&Jo(i.tagName)&&n(i.innerHTML)){(Pa=Pa||document.createElement("div")).innerHTML=""+a+"";for(var p=Pa.firstChild;i.firstChild;)i.removeChild(i.firstChild);for(;p.firstChild;)i.appendChild(p.firstChild)}else if(a!==l[o])try{i[o]=a}catch(t){}}}}function xa(t,e){return!t.composing&&("OPTION"===t.tagName||function(t,e){var o=!0;try{o=document.activeElement!==t}catch(t){}return o&&t.value!==e}(t,e)||function(t,e){var o=t.value,a=t._vModifiers;if(r(a)){if(a.number)return h(o)!==h(e);if(a.trim)return o.trim()!==e.trim()}return o!==e}(t,e))}var Aa={create:Fa,update:Fa},Ta=k((function(t){var e={},o=/:(.+)/;return t.split(/;(?![^(]*\))/g).forEach((function(t){if(t){var a=t.split(o);a.length>1&&(e[a[0].trim()]=a[1].trim())}})),e}));function Ca(t){var e=_a(t.style);return t.staticStyle?_(t.staticStyle,e):e}function _a(t){return Array.isArray(t)?E(t):"string"==typeof t?Ta(t):t}var Ea,Sa=/^--/,ja=/\s*!important$/,Ma=function(t,e,o){if(Sa.test(e))t.style.setProperty(e,o);else if(ja.test(o))t.style.setProperty(A(e),o.replace(ja,""),"important");else{var a=Ra(e);if(Array.isArray(o))for(var n=0,r=o.length;n-1?e.split($a).forEach((function(e){return t.classList.add(e)})):t.classList.add(e);else{var o=" "+(t.getAttribute("class")||"")+" ";o.indexOf(" "+e+" ")<0&&t.setAttribute("class",(o+e).trim())}}function Na(t,e){if(e&&(e=e.trim()))if(t.classList)e.indexOf(" ")>-1?e.split($a).forEach((function(e){return t.classList.remove(e)})):t.classList.remove(e),t.classList.length||t.removeAttribute("class");else{for(var o=" "+(t.getAttribute("class")||"")+" ",a=" "+e+" ";o.indexOf(a)>=0;)o=o.replace(a," ");(o=o.trim())?t.setAttribute("class",o):t.removeAttribute("class")}}function Wa(t){if(t){if("object"==typeof t){var e={};return!1!==t.css&&_(e,Ha(t.name||"v")),_(e,t),e}return"string"==typeof t?Ha(t):void 0}}var Ha=k((function(t){return{enterClass:t+"-enter",enterToClass:t+"-enter-to",enterActiveClass:t+"-enter-active",leaveClass:t+"-leave",leaveToClass:t+"-leave-to",leaveActiveClass:t+"-leave-active"}})),Ga=V&&!q,Ba="transition",Va="transitionend",Ja="animation",Za="animationend";Ga&&(void 0===window.ontransitionend&&void 0!==window.onwebkittransitionend&&(Ba="WebkitTransition",Va="webkitTransitionEnd"),void 0===window.onanimationend&&void 0!==window.onwebkitanimationend&&(Ja="WebkitAnimation",Za="webkitAnimationEnd"));var Ka=V?window.requestAnimationFrame?window.requestAnimationFrame.bind(window):setTimeout:function(t){return t()};function za(t){Ka((function(){Ka(t)}))}function qa(t,e){var o=t._transitionClasses||(t._transitionClasses=[]);o.indexOf(e)<0&&(o.push(e),Ua(t,e))}function Ya(t,e){t._transitionClasses&&b(t._transitionClasses,e),Na(t,e)}function Qa(t,e,o){var a=tn(t,e),n=a.type,r=a.timeout,i=a.propCount;if(!n)return o();var l="transition"===n?Va:Za,c=0,s=function(){t.removeEventListener(l,p),o()},p=function(e){e.target===t&&++c>=i&&s()};setTimeout((function(){c0&&(o="transition",p=i,g=r.length):"animation"===e?s>0&&(o="animation",p=s,g=c.length):g=(o=(p=Math.max(i,s))>0?i>s?"transition":"animation":null)?"transition"===o?r.length:c.length:0,{type:o,timeout:p,propCount:g,hasTransform:"transition"===o&&Xa.test(a[Ba+"Property"])}}function en(t,e){for(;t.length1}function cn(t,e){!0!==e.data.show&&an(e)}var sn=function(t){var e,o,a={},c=t.modules,s=t.nodeOps;for(e=0;em?v(t,n(o[y+1])?null:o[y+1].elm,o,d,y,a):d>y&&k(e,u,m)}(u,f,y,o,p):r(y)?(r(t.text)&&s.setTextContent(u,""),v(u,null,y,0,y.length-1,o)):r(f)?k(f,0,f.length-1):r(t.text)&&s.setTextContent(u,""):t.text!==e.text&&s.setTextContent(u,e.text),r(m)&&r(d=m.hook)&&r(d=d.postpatch)&&d(t,e)}}}function x(t,e,o){if(i(o)&&r(t.parent))t.parent.data.pendingInsert=e;else for(var a=0;a-1,i.selected!==r&&(i.selected=r);else if(O(mn(i),a))return void(t.selectedIndex!==l&&(t.selectedIndex=l));n||(t.selectedIndex=-1)}}function dn(t,e){return e.every((function(e){return!O(e,t)}))}function mn(t){return"_value"in t?t._value:t.value}function hn(t){t.target.composing=!0}function fn(t){t.target.composing&&(t.target.composing=!1,yn(t.target,"input"))}function yn(t,e){var o=document.createEvent("HTMLEvents");o.initEvent(e,!0,!0),t.dispatchEvent(o)}function bn(t){return!t.componentInstance||t.data&&t.data.transition?t:bn(t.componentInstance._vnode)}var vn={model:pn,show:{bind:function(t,e,o){var a=e.value,n=(o=bn(o)).data&&o.data.transition,r=t.__vOriginalDisplay="none"===t.style.display?"":t.style.display;a&&n?(o.data.show=!0,an(o,(function(){t.style.display=r}))):t.style.display=a?r:"none"},update:function(t,e,o){var a=e.value;!a!=!e.oldValue&&((o=bn(o)).data&&o.data.transition?(o.data.show=!0,a?an(o,(function(){t.style.display=t.__vOriginalDisplay})):nn(o,(function(){t.style.display="none"}))):t.style.display=a?t.__vOriginalDisplay:"none")},unbind:function(t,e,o,a,n){n||(t.style.display=t.__vOriginalDisplay)}}},wn={name:String,appear:Boolean,css:Boolean,mode:String,type:String,enterClass:String,leaveClass:String,enterToClass:String,leaveToClass:String,enterActiveClass:String,leaveActiveClass:String,appearClass:String,appearActiveClass:String,appearToClass:String,duration:[Number,String,Object]};function kn(t){var e=t&&t.componentOptions;return e&&e.Ctor.options.abstract?kn(Be(e.children)):t}function Pn(t){var e={},o=t.$options;for(var a in o.propsData)e[a]=t[a];var n=o._parentListeners;for(var r in n)e[D(r)]=n[r];return e}function Dn(t,e){if(/\d-keep-alive$/.test(e.tag))return t("keep-alive",{props:e.componentOptions.propsData})}var Fn=function(t){return t.tag||Ge(t)},xn=function(t){return"show"===t.name},An={name:"transition",props:wn,abstract:!0,render:function(t){var e=this,o=this.$slots.default;if(o&&(o=o.filter(Fn)).length){0;var a=this.mode;0;var n=o[0];if(function(t){for(;t=t.parent;)if(t.data.transition)return!0}(this.$vnode))return n;var r=kn(n);if(!r)return n;if(this._leaving)return Dn(t,n);var i="__transition-"+this._uid+"-";r.key=null==r.key?r.isComment?i+"comment":i+r.tag:l(r.key)?0===String(r.key).indexOf(i)?r.key:i+r.key:r.key;var c=(r.data||(r.data={})).transition=Pn(this),s=this._vnode,p=kn(s);if(r.data.directives&&r.data.directives.some(xn)&&(r.data.show=!0),p&&p.data&&!function(t,e){return e.key===t.key&&e.tag===t.tag}(r,p)&&!Ge(p)&&(!p.componentInstance||!p.componentInstance._vnode.isComment)){var g=p.data.transition=_({},c);if("out-in"===a)return this._leaving=!0,le(g,"afterLeave",(function(){e._leaving=!1,e.$forceUpdate()})),Dn(t,n);if("in-out"===a){if(Ge(r))return s;var u,d=function(){u()};le(c,"afterEnter",d),le(c,"enterCancelled",d),le(g,"delayLeave",(function(t){u=t}))}}return n}}},Tn=_({tag:String,moveClass:String},wn);function Cn(t){t.elm._moveCb&&t.elm._moveCb(),t.elm._enterCb&&t.elm._enterCb()}function _n(t){t.data.newPos=t.elm.getBoundingClientRect()}function En(t){var e=t.data.pos,o=t.data.newPos,a=e.left-o.left,n=e.top-o.top;if(a||n){t.data.moved=!0;var r=t.elm.style;r.transform=r.WebkitTransform="translate("+a+"px,"+n+"px)",r.transitionDuration="0s"}}delete Tn.mode;var Sn={Transition:An,TransitionGroup:{props:Tn,beforeMount:function(){var t=this,e=this._update;this._update=function(o,a){var n=qe(t);t.__patch__(t._vnode,t.kept,!1,!0),t._vnode=t.kept,n(),e.call(t,o,a)}},render:function(t){for(var e=this.tag||this.$vnode.data.tag||"span",o=Object.create(null),a=this.prevChildren=this.children,n=this.$slots.default||[],r=this.children=[],i=Pn(this),l=0;l-1?Ko[t]=e.constructor===window.HTMLUnknownElement||e.constructor===window.HTMLElement:Ko[t]=/HTMLUnknownElement/.test(e.toString())},_(Do.options.directives,vn),_(Do.options.components,Sn),Do.prototype.__patch__=V?sn:S,Do.prototype.$mount=function(t,e){return function(t,e,o){var a;return t.$el=e,t.$options.render||(t.$options.render=ft),Xe(t,"beforeMount"),a=function(){t._update(t._render(),o)},new go(t,a,S,{before:function(){t._isMounted&&!t._isDestroyed&&Xe(t,"beforeUpdate")}},!0),o=!1,null==t.$vnode&&(t._isMounted=!0,Xe(t,"mounted")),t}(this,t=t&&V?function(t){if("string"==typeof t){var e=document.querySelector(t);return e||document.createElement("div")}return t}(t):void 0,e)},V&&setTimeout((function(){U.devtools&&nt&&nt.emit("init",Do)}),0),e.a=Do},function(t,e,o){var a=o(3),n=o(26).f,r=o(14),i=o(16),l=o(82),c=o(116),s=o(79);t.exports=function(t,e){var o,p,g,u,d,m=t.target,h=t.global,f=t.stat;if(o=h?a:f?a[m]||l(m,{}):(a[m]||{}).prototype)for(p in e){if(u=e[p],g=t.noTargetGet?(d=n(o,p))&&d.value:o[p],!s(h?p:m+(f?".":"#")+p,t.forced)&&void 0!==g){if(typeof u==typeof g)continue;c(u,g)}(t.sham||g&&g.sham)&&r(u,"sham",!0),i(o,p,u,t)}}},function(t,e){t.exports=function(t){try{return!!t()}catch(t){return!0}}},function(t,e){var o=function(t){return t&&t.Math==Math&&t};t.exports=o("object"==typeof globalThis&&globalThis)||o("object"==typeof window&&window)||o("object"==typeof self&&self)||o("object"==typeof global&&global)||function(){return this}()||Function("return this")()},function(t,e,o){var a=o(3),n=o(53),r=o(9),i=o(54),l=o(84),c=o(112),s=n("wks"),p=a.Symbol,g=c?p:p&&p.withoutSetter||i;t.exports=function(t){return r(s,t)&&(l||"string"==typeof s[t])||(l&&r(p,t)?s[t]=p[t]:s[t]=g("Symbol."+t)),s[t]}},function(t,e){t.exports=function(t){return"object"==typeof t?null!==t:"function"==typeof t}},function(t,e,o){var a=o(5);t.exports=function(t){if(!a(t))throw TypeError(String(t)+" is not an object");return t}},function(t,e,o){var a=o(2);t.exports=!a((function(){return 7!=Object.defineProperty({},1,{get:function(){return 7}})[1]}))},function(t,e,o){var a=o(7),n=o(110),r=o(6),i=o(39),l=Object.defineProperty;e.f=a?l:function(t,e,o){if(r(t),e=i(e,!0),r(o),n)try{return l(t,e,o)}catch(t){}if("get"in o||"set"in o)throw TypeError("Accessors not supported");return"value"in o&&(t[e]=o.value),t}},function(t,e,o){var a=o(10),n={}.hasOwnProperty;t.exports=function(t,e){return n.call(a(t),e)}},function(t,e,o){var a=o(24);t.exports=function(t){return Object(a(t))}},function(t,e,o){var a=o(92),n=o(16),r=o(207);a||n(Object.prototype,"toString",r,{unsafe:!0})},function(t,e,o){"use strict";function a(t,e){if(!(t instanceof e))throw new TypeError("Cannot call a class as a function")}o.d(e,"a",(function(){return a}))},function(t,e,o){"use strict";var a=o(131).charAt,n=o(35),r=o(115),i=n.set,l=n.getterFor("String Iterator");r(String,"String",(function(t){i(this,{type:"String Iterator",string:String(t),index:0})}),(function(){var t,e=l(this),o=e.string,n=e.index;return n>=o.length?{value:void 0,done:!0}:(t=a(o,n),e.index+=t.length,{value:t,done:!1})}))},function(t,e,o){var a=o(7),n=o(8),r=o(40);t.exports=a?function(t,e,o){return n.f(t,e,r(1,o))}:function(t,e,o){return t[e]=o,t}},function(t,e,o){var a=o(56),n=Math.min;t.exports=function(t){return t>0?n(a(t),9007199254740991):0}},function(t,e,o){var a=o(3),n=o(14),r=o(9),i=o(82),l=o(88),c=o(35),s=c.get,p=c.enforce,g=String(String).split("String");(t.exports=function(t,e,o,l){var c,s=!!l&&!!l.unsafe,u=!!l&&!!l.enumerable,d=!!l&&!!l.noTargetGet;"function"==typeof o&&("string"!=typeof e||r(o,"name")||n(o,"name",e),(c=p(o)).source||(c.source=g.join("string"==typeof e?e:""))),t!==a?(s?!d&&t[e]&&(u=!0):delete t[e],u?t[e]=o:n(t,e,o)):u?t[e]=o:i(e,o)})(Function.prototype,"toString",(function(){return"function"==typeof this&&s(this).source||l(this)}))},function(t,e,o){var a=o(3),n=o(132),r=o(109),i=o(14),l=o(4),c=l("iterator"),s=l("toStringTag"),p=r.values;for(var g in n){var u=a[g],d=u&&u.prototype;if(d){if(d[c]!==p)try{i(d,c,p)}catch(t){d[c]=p}if(d[s]||i(d,s,g),n[g])for(var m in r)if(d[m]!==r[m])try{i(d,m,r[m])}catch(t){d[m]=r[m]}}}},function(t,e){var o=Array.isArray;t.exports=o},function(t,e,o){var a=o(38),n=o(24);t.exports=function(t){return a(n(t))}},function(t,e,o){var a=o(143),n="object"==typeof self&&self&&self.Object===Object&&self,r=a||n||Function("return this")();t.exports=r},function(t,e,o){"use strict";o.d(e,"a",(function(){return n}));o(162);function a(t,e){for(var o=0;o1?arguments[1]:void 0)}})},function(t,e){t.exports=function(t){return null!=t&&"object"==typeof t}},function(t,e,o){var a,n=o(6),r=o(191),i=o(87),l=o(42),c=o(114),s=o(83),p=o(57),g=p("IE_PROTO"),u=function(){},d=function(t){return" + diff --git a/blog/2016/04/30/publish-geo/index.html b/blog/2016/04/30/publish-geo/index.html index 3dd294c9c..c01c42c17 100644 --- a/blog/2016/04/30/publish-geo/index.html +++ b/blog/2016/04/30/publish-geo/index.html @@ -33,7 +33,7 @@ - + @@ -106,6 +106,6 @@ (opens new window)

Publishing Geospatial Data as a Data Package

Price icons created by Pixel perfect - Flaticon

Publishing your Geodata as Data Packages is very easy.

You have two options for publishing your geodata:

  • Geo Data Package (Recommended). This is a basic Data Package with the requirement that data be in GeoJSON and with a few special additions to the metadata for geodata. See the next section for instructions on how to do this.
  • Generic Data Package. This allows you to publish geodata in any kind of format (KML, Shapefiles, Spatialite etc). If you choose this option you will want to follow the standard instructions for packaging any kind of data as a Data Package.

We recommend Geo Data Package if that is possible as it makes it much easier for you to use 3rd party tools with your Data Package. For example, the datapackage viewer (opens new window) on this site will automatically preview a Geo Data Package.

TIP

Note: this document focuses on vector geodata – i.e. points, lines polygons etc (not imagery or raster data).

# Geo Data Packages

# Examples

# Traffic signs of Hansbeke, Belgium (opens new window)

Example of using point geometries with described properties in real world situation.

View it with the Data Package Viewer (opens new window)(deprecated)

# GeoJSON example on DataHub (opens new window)

# See more Geo Data Packages in the example data packages (opens new window) GitHub repository.

TIP

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our Introduction.

- + diff --git a/blog/2016/07/21/creating-tabular-data-packages-in-python/index.html b/blog/2016/07/21/creating-tabular-data-packages-in-python/index.html index 9bd81b9ae..b7639136a 100644 --- a/blog/2016/07/21/creating-tabular-data-packages-in-python/index.html +++ b/blog/2016/07/21/creating-tabular-data-packages-in-python/index.html @@ -33,7 +33,7 @@ - + @@ -161,6 +161,6 @@ 'title': 'Periodic Table' }

# Publishing

Now that you have created your Data Package, you might want to publish your data online so that you can share it with others.

- + diff --git a/blog/2016/07/21/publish-any/index.html b/blog/2016/07/21/publish-any/index.html index caba05455..188faaf81 100644 --- a/blog/2016/07/21/publish-any/index.html +++ b/blog/2016/07/21/publish-any/index.html @@ -33,7 +33,7 @@ - + @@ -106,6 +106,6 @@ (opens new window)

Publish Any Kind of Data as a Data Package

Price icons created by Pixel perfect - Flaticon

You can publish all and any kind of data as Data packages. It’s as simple as 1-2-3:

  1. Get your data together
  2. Add a datapackage.json file to wrap those data files up into a useful whole (with key information like the license, title and format)
  3. [optional] Share it with others, for example, by uploading the data package online

# 1. Get your data together

Get your data together in one folder (you can have data in subfolders of that folder too, if you wish).

# 2. Add a datapackage.json file

The datapackage.json is a small file in JSON (opens new window) format that describes your dataset. You’ll need to create this file and then place it in the directory you created.

Don’t worry if you don’t know what JSON is - we provide some tools such as Data Package Creator (opens new window) that can automatically create this file for you.

There are 2 options for creating the datapackage.json:

Option 1: Use the online datapackage.json creator tool (opens new window) - just answer a few questions and give it your data files and it will spit out a datapackage.json for you to include in your project

Option 2: Do it yourself - if you’re familiar with JSON you can create this yourself. Take a look at the Data Package Specification (opens new window).

# 3. Put the data package online

See the step-by-step instructions for putting your Data Package online.

TIP

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our Introduction.

- + diff --git a/blog/2016/07/21/publish-tabular/index.html b/blog/2016/07/21/publish-tabular/index.html index 7be7dfe93..11dad20e4 100644 --- a/blog/2016/07/21/publish-tabular/index.html +++ b/blog/2016/07/21/publish-tabular/index.html @@ -33,7 +33,7 @@ - + @@ -106,6 +106,6 @@ (opens new window)

Publish Tabular Data as a Data Package

Price icons created by Pixel perfect - Flaticon

Here’s how to publish your tabular data as Tabular Data Packages (opens new window). There are 4 steps:

  1. Create a folder (directory) - this folder will hold your “data package”
  2. Put your data into comma-separated values files (CSV) and add them to that folder
  3. Add a datapackage.json file to hold some information about the data package and the data in it e.g. a title, who created it, how other people can use it (licensing), etc
  4. Upload the data package online

# 1. Create a Directory (Folder)

# 2. Create your CSV files

CSV is a common file format for storing a (single) table of data (for example, a single sheet in a spreadsheet). If you’ve got more than one table you can save multiple CSV files, one for each table.

Put the CSV files in the directory you created – we suggest putting them in a subdirectory called data so that your base directory does not get too cluttered up.

You can produce CSV files from almost any application that handles data including spreadsheets like Excel and databases like MySQL or Postgresql.

You can find out more about CSVs and how to produce them in our guide to CSV or by doing a quick search online for CSV + the name of your tool.

# 3. Add a datapackage.json file

The datapackage.json is a small file in JSON (opens new window) format that gives a bit of information about your dataset. You’ll need to create this file and then place it in the directory you created.

Don’t worry if you don’t know what JSON is - we provide some tools that can automatically create your this file for you.

There are three options for creating the datapackage.json:

Option 1: Use the online datapackage.json creator tool (opens new window) - answer a few questions and give it your data files and it will spit out a datapackage.json for you to include in your project.

Option 2: Do it yourself - if you’re familiar with JSON you can create this yourself. Take a look at the Data Package (opens new window) and Tabular Data Format (opens new window) specifications.

Option 3: Use the Python, JavaScript, PHP, Julia, R, Clojure, Java, Ruby or Go libraries for working with data packages.

# 4. Put the data package online

See Putting Your Data Package online


# Appendix: Examples of Tabular Data Packages

Pay special attention to the scripts directory (and look at the commit logs!)

TIP

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our introduction.

- + diff --git a/blog/2016/08/29/publish-online/index.html b/blog/2016/08/29/publish-online/index.html index 1ae0c2547..6555bf948 100644 --- a/blog/2016/08/29/publish-online/index.html +++ b/blog/2016/08/29/publish-online/index.html @@ -30,7 +30,7 @@ - + @@ -119,6 +119,6 @@ http://your.website.com/mydatapackage/data.csv http://your.website.com/mydatapackage/somedir/other-data.csv

This can be a problem with services like e.g. Google Drive where files in a given folder don’t have a web address that relates to that folder. The reason we need to preserve relative paths is that when using the Data Package client software will compute the full path from the location of the datapackage.json itself plus the relative path for the file give in the datapackage.json resources section.

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive Frictionless Data Field Guide.

- + diff --git a/blog/2016/08/29/using-data-packages-in-python/index.html b/blog/2016/08/29/using-data-packages-in-python/index.html index 102c0da22..5900c1315 100644 --- a/blog/2016/08/29/using-data-packages-in-python/index.html +++ b/blog/2016/08/29/using-data-packages-in-python/index.html @@ -33,7 +33,7 @@ - + @@ -167,6 +167,6 @@ # check if data has been saved successfully list(engine.execute('SELECT * from data'))
- + diff --git a/blog/2016/08/30/publish/index.html b/blog/2016/08/30/publish/index.html index bb1705415..5a59ca07b 100644 --- a/blog/2016/08/30/publish/index.html +++ b/blog/2016/08/30/publish/index.html @@ -33,7 +33,7 @@ - + @@ -106,6 +106,6 @@ (opens new window)

Publish Data as Data Packages - Overview

Price icons created by Pixel perfect - Flaticon

You can publish any kind of data as a Data Package.

Making existing data into a Data Package is very straightforward. Once you have packaged up your data, you can make it available for others by putting it online or sending an email.

# I want to package up and publish data that is …

# Tabular

Rows and columns like in a spreadsheet? It’s tabular …

Here’s a tutorial on publishing tabular data

# Geospatial

Map or location related? It’s geospatial …

Here’s a tutorial on publishing geodata

# Any Kind

Any kind of data you have - graph, binary, RDF …

Here’s a tutorial on publishing other types of data

TIP

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our introduction.

- + diff --git a/blog/2016/11/15/dataship/index.html b/blog/2016/11/15/dataship/index.html index 9b916bdc5..96752ccdc 100644 --- a/blog/2016/11/15/dataship/index.html +++ b/blog/2016/11/15/dataship/index.html @@ -38,7 +38,7 @@ - + @@ -119,6 +119,6 @@ Data CLI

Dataship (opens new window) is a way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free. It allows you to create notebooks that hold and deliver your data, as well as text, images and inline scripts for doing analysis and visualization. The people you share it with can read, execute and even edit a copy of your notebook and publish the remixed version as a fork.

One of the main challenges we face with data is that it’s hard to share it with others. Tools like Jupyter (iPython notebook)[1] make it much easier and more affordable to do analysis (with the help of open source projects like numpy[2] and pandas[3]). What they don’t do is allow you to cheaply and easily share that with the world. If it were as easy to share data and analysis as it is to share pictures of your breakfast, the world would be a more enlightened place. Dataship is helping to build that world.

Every notebook on Dataship is also a Data Package[4]. Like other Data Packages it can be downloaded, along with its data, just by giving its URL to software like data-cli[5]. Additionally, working with existing Data Packages is easy. Just as you can fork other notebooks, you can also fork existing Data Packages, even when they’re located somewhere else, like GitHub.

Dataship GIF
Dataship in action

Every cell in a notebook is represented by a resource entry[6] in an underlying Data Package. This also allows for interesting possibilities. One of these is executable Data Packages. Since the code is included inline and its dependencies are explicit and bounded, very simple software could be written to execute a Data Package-based notebook from the command line, printing the results to the console and writing images to the current directory.

It would be useful to have a JavaScript version of some of the functionality in goodtables[7] available for use, specifically header detection in parsed csv contents (output of PapaParse), as well as an option in dpm to not put things in a ‘datapackages’ folder, as I rarely need this when downloading a dataset.

dpm, mentioned above, is now deprecated. Check out DataHub’s data-cli (opens new window)

My next task will be building and integrating the machine learning and neural network components into Dataship. After that I’ll be focusing on features that allow organizations to store private encrypted data, in addition to the default public storage. The focus of the platform will always be open data, but hosting closed data sources will allow us to nudge people towards sharing, when it makes sense.

As for additional use cases, the volume of personal data is growing exponentially- from medical data to internet activity and media consumption. These are just a few existing examples. The rise of the Internet of Things will only accelerate this. People are also beginning to see the value in controlling their data themselves. Providing mechanisms for doing this will likely become important over the next ten years.


  1. Jupyter Notebook: http://jupyter.org/ (opens new window) ↩︎

  2. NumPy: Python package for scientific computing: http://www.numpy.org (opens new window) ↩︎

  3. Pandas: Python package for data analysis: http://pandas.pydata.org/ (opens new window) ↩︎

  4. Data Packages: ↩︎

  5. DataHub’s data commandline tool: https://github.com/datahq/data-cli (opens new window) ↩︎

  6. Data Package Resource: https://specs.frictionlessdata.io/data-package/#resource-information (opens new window) ↩︎

  7. goodtables: http://try.goodtables.io (opens new window) ↩︎

- + diff --git a/blog/2016/11/15/open-power-system-data/index.html b/blog/2016/11/15/open-power-system-data/index.html index d635272dd..fa306958e 100644 --- a/blog/2016/11/15/open-power-system-data/index.html +++ b/blog/2016/11/15/open-power-system-data/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

Open Power System Data (opens new window) aims at providing a free-of-charge and open platform[1] that provides the data needed for power system analysis and modeling.

All of our project members are energy researchers. We struggled collecting this kind of data in what is typically a very burdensome and lengthy process. In doing my PhD, I spent the first year collecting data and realized that not only had many others done that before, but that many others coming later would have to do it again. This is arguably a huge waste of time and resources, so we thought we (Open Power System Data) should align ourselves and join forces to do this properly, once and for all, and in a free and open manner to be used by everyone. We are funded for two years by the German government. After starting work in 2015, we have about one more year to go.

On one hand, people who are interested in European power systems are lucky because a lot of data needed for that research is available. If you work on, say, Chinese systems, and you are not employed at the Chinese power company, you probably won’t find anything. On the other hand, if you search long enough (and you know where to look), you can find stuff online (and usually free of charge) on European power systems—not everything you want, but a
big chunk, so in that respect, we are all lucky. However, this data is quite problematic for many reasons.

Available Data (opens new window)

Data availability overview on the platform

Some of the problems we face in working with data include:

  • varied data sources and formats
  • licensing issues
  • ‘dirty’ data

# Inconsistent Data Sources and formats

First, it is scattered throughout the Internet and very hard to Google. For example, the Spanish government will only publish their data in the Spanish language, while the German government will publish only in German, so you need to speak 20 languages if we are talking about Europe. Second, it is often of low quality. For instance, we work with a lot with time series data—that is, hourly data for electricity generation and consumption. Twice a year, during the shift between summer and winter, there is sort of an “extra” or “missing” hour to account for daylight savings time. Every single data source has a different approach for how to handle that. While some datasets just ignore it, some double the hours, while others call the third hour something like “3a” and “3b”. To align these data sources, you have to handle all these different approaches. In addition, some data providers, for example, provide data in one format for the years 2010 and 2011, and then for 2012 and 2013 in a different format, and 2014 and 2015 in yet another format. A lot of that data comes in little chunks, so some datasets have one file for everything (which is great) but then others provide files split by the year, the month, or even the day. If you are not familiar with programming, you can’t write scripts to download that, and you have to manually download three years of daily data files: thousands of files. Worse, these files come in different formats: some companies and agencies provide CSV files, others Excel files, and still others provide formats which are not very broadly used (e.g. XML and NetCDF).

# Licensing Questions

And maybe least known, but really tricky for us is the fact that all those data are subject to copyright. These data are open in the sense that they are on the Internet to be accessed freely, but they are not open in the legal sense; you are not allowed to use them or republish them or share them with others. If you look at the terms of use that you agree on to download, it will usually says that all those data are subject to copyright and you are not allowed to do anything with them, essentially.

Available Data

This last fact is somewhat surprising. Mostly, the belief is that if something is free online then it’s “Open” but legally that, of course, doesn’t say anything; just because something is on YouTube and you can access that for free, that doesn’t mean you can copy, resample, and sell it to someone. And the same is true for data. So, in the project, we are trying to convince these owners and suppliers of data to change their terms of use, provide good licenses, publish data under an open license, preferably, something like Creative Commons[2] or the ODbL[3], or something else that people from the open world use. That’s a very burdensome process; we just talked to four German transmission system operators and it took us a full year of meetings and emails to convince them. They finally signed on to open licensing last month.

# ‘Dirty’ data aka the devil in the details

Some of the most annoying problems are not the major problems, but all these surprising minor problems. As I mentioned earlier, I work a lot with time series data and there are so many weird mistakes, errors, or random facts in the data. For example, we have one source where every day, the 24th hour of the day is simply missing so the days only have 23 hours. Another weird phenomenon is that another data source, a huge data source that publishes a lot, only starts the year aligned on weeks, so if the first Monday falls on January 4th, they might miss the first four days of the year. If you want to model energy consumption for a year, you can’t use the data at all because the first four days are missing. So, nitty-gritty nasty stuff like this that makes work really burdensome if you look at this scale of numbers of information: you have to find these errors while looking at hundreds of thousands of data entry points. There’s of course, nothing you can easily do manually.

Our target users are researchers, economists, or engineers interested in energy; they are mostly familiar with Excel, or some statistical software like R, SPSS, or STATA but they are not programmers or data scientists. As a result, they are not experts in data handling and not trained in detecting errors, missing data, and correct interpolation. If you know where to look to find gaps in your data, this is quickly done. However, if you are doing this kind of data wrangling for the first time (and you don’t really want to do it, but rather you want to learn something about solar power in Switzerland) then this is, of course, a long detour for a lot of our users.

We collect time series data for renewable and thermal power plants, each of which we compile into a dataset that follows the specification for a Tabular Data Package[4], consisting of a datapackage.json file for metadata and a CSV file containing the actual data. On top of this we include the same data in Excel format and also some differently structured CSV files to suit the needs of different user types. We also implemented a framework that parses the content of the datapackage.json and renders it into a more human-readable form for our website.

Where the data in each column is homogeneous in terms of the original source, as is the case with time series data, the datapackage.json file is used to document the sources per column.

We started this project only knowing what we wanted to do in vague terms, but very little understanding of how to go about it, so we weren’t clear at all about how to publish this data. The first idea that we had was to build a database without any of us knowing what a database actually was.

Step-by-step, we realized we would like to offer a full “package” of all data that users can download in one click and have everything they need on their hard drive. Sort of a full model input package of everything a researcher would like with the option to just delete (or simply ignore) the data that is not useful.

We had a first workshop[5] with potential users, and I think one of us, maybe it was Ingmar, Googled you and found out about the Data Package specification (opens new window). That it perfectly fit our needs was pretty evident within a few minutes, and we decided to go along with this.

A lot of our clients are practitioners that use Microsoft Excel as a standard tool. If I look at a data source, and I open a well structured Excel sheet with colors and (visually) well structured tables, it makes it a lot easier for me to get a first glimpse of the data and an insight as to what’s in there, what’s not in there, its quality, how well it is documented, and so on. So the one difficulty I see from a user perspective with the Data Package specification (at least, in the way we use it) is that CSV and JSON files take more than one click in a browser to get a human-readable, easily understandable, picture of the data.

The stuff that is convenient for humans to structure content—colors, headlines, bolding, the right number of decimals, different types of data sorted by blocks, with visual spaces in between; this stuff makes a table aesthetically convenient to read, but is totally unnecessary for being machine-readable. The number one priority for us is to have the data in a format that’s machine-readable and my view is that Frictionless Data/Data Packages are perfect for this. But from the have-a-first-glimpse-at-the-data-as-a-human perspective, having a nice colored Excel table, from my personal point of view, is still preferable. We have decided in the end just to provide both. We publish everything as a Data Package and on top of that we also publish the data in an Excel file for those who prefer it. On top of that we publish everything in an SQLite database for our clients and users who would like it in an SQL database.

We also think there is potential to expand on the Data Package Viewer (opens new window) tool provided by Open Knowledge International. In its current state, we cannot really use it, because it hangs on the big datasets we’re working with. So mainly, I would imagine that for large datasets, the Data Package Viewer should not try to show and visualize all data but just, for example, show a summary. Furthermore, it would be nice if it also offered possibilities to filter the datasets for downloading of subsets. The filter criteria could be specified as part of the datapackage.json.

The old data package viewer, referenced above, is now deprecated. The new data package viewer, available on create.frictionlessdata.io (opens new window), addresses the issues raised above.

Generally I think such an online Data Package viewer could be made more and more feature-rich as you go. It could, for example, also offer possibilities to download the data in alternative formats such as Excel or SQLite, which would be generated by the Data Package viewer automatically on the server-side (of course, the data would then need to be cached on the server side).

Advantages I see from those things are:

  • Ease of use for data providers: Just provide the CSV with a proper description of all fields in the datapackage.json, and everything else is taken care of by the online Data Package viewer.
  • Ease of use for data consumers: They get what they want (filtered) in the format they prefer.
  • Implicitly that would also do a proper validation of thedatapackage.json: Because if you have an error there, then things will also be messed up in the automatically generated files. So that also ensures good datapackage.json metadata quality in general which is important for all sorts of things you can do with Data Packages.

Regarding the data processing workflow we created, I would refer you to our processingscripts[6] on GitHub. I talked a lot about time series data – this should give you an overview (opens new window); here are the processing details (opens new window).

In the coming days, we are going to extend the geographic scope and other various details—user friendliness, interpolation, data quality issues—so no big changes, just further work in the same direction.


  1. Data Platform: http://data.open-power-system-data.org/ (opens new window) ↩︎

  2. https://creativecommons.org/ (opens new window) ↩︎

  3. http://opendatacommons.org/licenses/odbl/ (opens new window) ↩︎

  4. Tabular Data Package specifications: https://specs.frictionlessdata.io/tabular-data-package/ (opens new window) ↩︎

  5. First Workshop of Open Power System Data: http://open-power-system-data.org/workshop-1/ (opens new window) ↩︎

  6. GitHub repository: https://github.com/Open-Power-System-Data (opens new window) ↩︎

- + diff --git a/blog/2016/11/15/tesera/index.html b/blog/2016/11/15/tesera/index.html index bc41c466d..3fc447d5b 100644 --- a/blog/2016/11/15/tesera/index.html +++ b/blog/2016/11/15/tesera/index.html @@ -38,7 +38,7 @@ - + @@ -135,6 +135,6 @@ END; $$ LANGUAGE plpgsql;

Again the user is presented with violations as errors or warnings and can they can choose to commit the plots without errors into the shared database. Essentially this three step workflow from imported, to staged, to committed allows FGroW to ensure quality data that will be useful for their modeling and analysis purposes.

FGroW has built a database that currently has 2400 permanent sample plots each containing many trees and all together 10s of millions of measurements across a wide variety of strata including various natural regions and natural sub-regions. This database provides the numeric power to produce and refine better growth models and enable companies to adopt their planning and management to real conditions.

There are many cases where industries might wish to bring together measurement data in a consistent way to maximize their productivity. One of the more obvious examples is in agriculture where precision information is increasingly collected at the local or individual farm level, but bringing this information together in aggregate would produce new and greater insight with regard to productivity, broad scale change, and perhaps adaption to climate change strategies.

# 2. Mackenzie DataStream

http://www.mackenziedatastream.org/ (opens new window)

Mackenzie DataStream App
Mackenzie DataStream App

Mackenzie DataStream (opens new window) is an open access platform for exploring and sharing water data in the Mackenzie River Basin. DataStream’s mission is to promote knowledge sharing and advance collaborative and evidence-based decision making throughout the Basin. The Mackenzie River Basin is extremely large, measuring 1.8 million square kilometers and as such monitoring is a large challenge. To overcome this challenge, water quality monitoring is carried out by a variety of partners which include communities and Aboriginal, territorial, and federal governments. With multiple parties collecting and sharing information, Mackenzie DataStream had to overcome challenges of trust and interoperability.

The Mackenzie River Basin
The Mackenzie River Basin

Tesera leveraged the Data Package standard as an easy way for Government and community partners alike to import data into the system. We used Table Schema to define the structure and constraints of the Data Themes which we represented in a simple visible way.

Table fields and validation rules derived from Table Schema
Table fields and validation rules derived from Table Schema

The backend on this system also relies on the Data Package Validator and the Relational Database Loader. The observation data is then exposed to the client via a simple Express.js (opens new window) API as JSON. The Frictionless Data specifications help us ensure clean consistent data and make visualization a breeze. We push the data to Plotly (opens new window) to build the charts as it provides lots of options for scientific plotting, as well as a good api, at a minimal cost.

Mackenzie DataStream visualization example
Mackenzie DataStream visualization example

The Mackenzie DataStream is gaining momentum and partners. The Fort Nelson First Nation (opens new window) has joined on as a contributing partner and the Government of Northwest Territories (opens new window) is looking to apply DataStream to a few other datatypes and bringing on some addition partners in water permitting and cumulative effects monitoring. We think of this as a simple and effective way to make environmental monitoring data more accessible.

Mackenzie DataStream environmental observation data
Mackenzie DataStream environmental observation data

There are many ways to monitor the environment, but bringing the data together according to standards, ensuring that it is loaded correctly, and making it accessible via a simple API seems pretty universal. We are working through a UX/UI overhaul and then hope to open source the entire DataStream application for other organizations that are collecting environmental observation data and looking to increase its utility to citizens, scientists, and consultants alike.

Mackenzie DataStream summary statistics
Mackenzie DataStream summary statistics


  1. Data Packages: https://specs.frictionlessdata.io/data-package/ (opens new window) ↩︎

  2. Table Schema: https://specs.frictionlessdata.io/table-schema/ (opens new window) ↩︎

  3. Amazon Simple Storage Service (Amazon S3): https://aws.amazon.com/s3/ (opens new window) ↩︎

  4. Amazon DynamoDB: https://aws.amazon.com/dynamodb/ (opens new window) ↩︎

  5. Elastic Search: https://www.elastic.co/products/elasticsearch (opens new window) ↩︎

  6. Table Schema Field Constraints: https://specs.frictionlessdata.io/table-schema/#constraints (opens new window) ↩︎

  7. Amazon AWS Lambda: https://aws.amazon.com/lambda/ (opens new window) ↩︎

  8. Water and Environmental Hub: http://watercanada.net/2013/ (opens new window) ↩︎

  9. Amazon EC2: Virtual Server Hosting: https://aws.amazon.com/ec2/ (opens new window) ↩︎

  10. Kibana: https://www.elastic.co/products/kibana (opens new window) ↩︎

- + diff --git a/blog/2017/03/28/john-snow-labs/index.html b/blog/2017/03/28/john-snow-labs/index.html index 2750b1c93..630c3e63c 100644 --- a/blog/2017/03/28/john-snow-labs/index.html +++ b/blog/2017/03/28/john-snow-labs/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

John Snow Labs (opens new window) accelerates data science and analytics teams, by providing clean, rich and current data sets for analysis. Our customers typically license between 50 and 500 data sets for a given project, so providing both data and metadata in a simple, standard format that is easily usable with a wide range of tools is important.

Each data set we license is curated by a domain expert, which then goes through both an automated DataOps platform and a manual review process. This is done in order to deal with a string of data challenges. First, it’s often hard to find the right data sets for a given problem. Second, data files come in different formats, and include dirty and missing data. Data types are inconsistent across different files, making it hard to join multiple data sets in one analysis. Null values, dates, currencies, units and identifiers are represented differently. Datasets aren’t updated on a standard or public schedule, which often requires manual labor to know when they’ve been updated. And then, data sets from different sources have different licenses - we use over 100 data sources which means well over 100 different
data licenses that we help our clients be compliant with.

The most popular data format in which we deliver data is the Data Package [1]. Each of our datasets is available, among other formats, as a pair of data.csv and datapackage.json files, complying with the specs [2]. We currently provide over 900 data sets that leverage the Frictionless Data specifications.

Two years ago, when we were defining the product requirements and architecture, we researched six different standards for metadata definition over a few months. We found Frictionless Data as part of that research, and after careful consideration have decided to adopt it for all the datasets we curate. The Frictionless Data specifications were the simplest to implement, the simplest to explain to our customers, and enable immediate loading of data into the widest variety of analytical tools.

Our data curation guidelines have added more specific requirements, that are currently underspecified in the Frictionless Data specifications. For example, there are guidelines for dataset naming, keywords, length of the description, field naming, identifier field naming and types, and some of the properties supported for each field. Adding these to Frictionless Data would make it harder to comply with the specifications, but would also raise the quality bar of standard datasets; so it may be best to add them as recommendation.

Another area where the Frictionless Data specifications are worth expanding is more explicit definition of the properties of each data type - in particular geospatial data, timestamp data, identifiers, currencies and units. We have found a need to extend the type system and properties for each field’s type, in order to enable consistent mapping of schemas to different analytics tools that our customers use (Hadoop, Spark, MySQL, ElasticSearch, etc). We recommend adding these to the specifications.

We are working with Open Knowledge International (opens new window) on open sourcing some of the libraries and tools we’re building. Internally, we are adding more automated validations, additional output file formats, and automated pipelines to load data into ElasticSearch[3] and Kibana[4], to enable interactive data discovery & visualization.

The core use case we see for Frictionless Data specs is making data ready for analytics. There is a lot of Open Data out there, but a lot of effort is still required to make it usable. This single use case expands into as many variations as there are BI & data management tools, so we have many years of work ahead of us to address this one core use case.


  1. Data Package: https://specs.frictionlessdata.io/data-package/ (opens new window) ↩︎

  2. Frictionless Data Specifications specs (opens new window) ↩︎

  3. Elastic Search https://www.elastic.co/ (opens new window) ↩︎

  4. kibana https://www.elastic.co/products/kibana (opens new window) ↩︎

- + diff --git a/blog/2017/03/31/data-package-views-proposal/index.html b/blog/2017/03/31/data-package-views-proposal/index.html index b6fc7bf58..5dd677aca 100644 --- a/blog/2017/03/31/data-package-views-proposal/index.html +++ b/blog/2017/03/31/data-package-views-proposal/index.html @@ -38,7 +38,7 @@ - + @@ -1049,6 +1049,6 @@ }, ...

In sample type, only size of a sample is needed.

# Appendix: SQL Transforms

Just use SQL e.g.

- + diff --git a/blog/2017/04/11/dataworld/index.html b/blog/2017/04/11/dataworld/index.html index 276ad6c0c..3cf873c15 100644 --- a/blog/2017/04/11/dataworld/index.html +++ b/blog/2017/04/11/dataworld/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

data.world

At data.world (opens new window), we deal with a great diversity of data, both in terms of content and in terms of source format - most people working with data are emailing each other spreadsheets or CSVs, and not formally defining schema or semantics for what’s contained in these data files.

When data.world (opens new window) ingests tabular data, we “virtualize” the tables away from their source format, and build layers of type and semantic information on top of the raw data. What this allows us to do is to produce a clean Tabular Data Package[1] for any dataset, whether the input is CSV files, Excel Spreadsheets, JSON data, SQLite Database files - any format that we know how to extract tabular information from - we can present it as cleaned-up CSV data with a datapackage.json that describes the schema and metadata of the contents.

Available Data
Tabular Data Package structure on disk

We would also like to see graph data packages developed as part of the Frictionless Data specifications, or “Universal Data Packages” that can encapsulate both tabular and graph data. It would be great to be able to present tabular and graph data in the same package and develop software that knows how to use these things together.

To elaborate on this, it makes a lot of sense to normalize tabular data down to clean, well-formed CSVs or data that is more graph-like, and to normalize it to a standard format. RDF[2] is a well-established and standardized format, with many serialized forms that could be used interchangeably (RDF XML, Turtle, N-Triples, or JSON-LD, for example). The metadata in the datapackage.json would be extremely minimal, since the schema for RDF data is encoded into the data file itself. It might be helpful to use the datapackage.json descriptor to catalog the standard taxonomies and ontologies that were in use, for example, it would be useful to know if a file contained SKOS[3] vocabularies, or OWL[4] classes.

In the coming days, we want to continue to enrich the metadata we include in Tabular Data Packages exported from data.world (opens new window), and we’re looking into using datapackage.json as an import format as well as an export option.

data.world (opens new window) works with lots of data across many domains - what’s great about Frictionless Data is that it’s a lightweight set of content specifications that can be a starting point for building domain-specific content standards - it really helps with the “first mile” of standardizing data and making it interoperable.

Available Data
Tabular datasets can be downloaded as Tabular Data Packages

In a certain sense, a Tabular Data Package is sort of like an open-source, cross-platform, accessible replacement for spreadsheets that can act as a “binder” for several related tables of data. I could easily imagine web or desktop-based tools that look and function much like a traditional spreadsheet, but use Data Packages as their serialization format.

To read more about Data Package integration at data.world (opens new window), read our post: Try This: Frictionless data.world (opens new window). Sign up, and starting playing with data.


  1. Tabular Data Package specifications: https://specs.frictionlessdata.io/tabular-data-package (opens new window) ↩︎

  2. RDF: Resource Description Framework: https://www.w3.org/RDF/ (opens new window) ↩︎

  3. SKOS: Simple Knowledge Organization System: https://www.w3.org/2004/02/skos/ (opens new window) ↩︎

  4. OWL Web Ontology Language: https://www.w3.org/TR/owl-ref/ (opens new window) ↩︎

- + diff --git a/blog/2017/05/23/cmso/index.html b/blog/2017/05/23/cmso/index.html index c20de7a1c..cae681396 100644 --- a/blog/2017/05/23/cmso/index.html +++ b/blog/2017/05/23/cmso/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Cell Migration Standardization Organization

Researchers worldwide try to understand how cells move, a process extremely important for many physiological and pathological conditions. Cell migration (opens new window) is in fact involved in many processes, like wound healing,neuronal development and cancer invasion. The Cell Migration Standardization Organization (opens new window) (CMSO) is a community building standards for cell migration data, in order to enable data sharing in the field. The organization has three main working groups:

  • Minimal reporting requirement (developing MIACME (opens new window), i.e. the Minimum Information About a Cell Migration Experiment)
  • Controlled Vocabularies
  • Data Formats and APIs

In our last working group, we discussed where the Data Package specifications[1] could be used or expanded for the definition of a standard format and the corresponding libraries to interact with these standards. In particular, we have started to address the standardization of cell tracking data. This is data produced using tracking software that reconstructs cell movement in time based on images from a microscope.

Diagram
In pink, the ISA (opens new window) (Investigation Study Assay) model to annotate the experimental metadata; in blue, the OME (opens new window) (Open Microscopy Environment) model for the imaging data; in green, our biotracks format based on the Data Package specification for the analytics data (cell tracking, positions, features etc.);in purple, CV: Controlled Vocabulary; and in turquoise, MIACME (opens new window): Minimum Information About a Cell Migration Experiment. CC BY-SA 4.0 (opens new window) Credit: Paola Masuzzo (text) and CMSO (diagram).

CMSO deals specifically with cell migration data (a subject of cell biology). Our main challenge lies in the heterogeneity of the data. This diversity has its origin in two factors:

  • Experimentally: Cell migration data can be produced using many diverse techniques (imaging, non-imaging, dynamic, static, high-throughput/screening, etc.)
  • Analytically: These data are produced using many diverse software packages, each of these writing data to specific (sometimes proprietary) file formats.

This diversity hampers (or at least makes very difficult) procedures like meta-analysis, data integration, data mining, and last but not least, data reproducibility.

CMSO has developed and is about to release the first specification of a Cell Tracking format (opens new window). This specification is built on a tabular representation, i.e. data are stored in tables. Current v0.1 of this specification can be seen at here (opens new window).

CMSO is using the Tabular Data Package[2] specification to represent cell migration-derived tracking data, as illustrated
here (opens new window). The specification is used for two goals:

  1. Create a Data Package representation where the data—in our case objects (e.g. cells detected in microscopy images), links and optionally tracks—are stored in CSV files, while metadata and schema[3] information are stored in a JSON file.
  2. Write this Data Package to a pandas[4] dataframe, to aid quick inspection and visualization.

You can see some examples here (opens new window).

I am an Open Science fan and advocate, so I try to keep up to date with the initiatives of the
Open Knowledge International (opens new window) teams. I think I first became aware of Frictionless Data when I saw a tweet and I checked the specs out. Also, CMSO really wanted to keep a possible specification and file format light and simple. So different people of the team must have googled for ‘CSV and JSON formats’ or something like that, and Frictionless Data popped out 😃.

I have opened a couple of issues on the GitHub page of the spec (opens new window), detailing what I would like to see developed in the Frictionless Data project. The CMSO is not sure yet if the Data Package representation will be the one we’ll go for in the very end, because we would first like to know how sustainable/sustained this spec will be in the future.

CMSO is looking into expanding the list of examples (opens new window) we have so far in terms of tracking software. Personally, I would like to choose a reference data set (a live-cell, time-lapse microscopy data set), and run different cell tracking algorithms/software packages on it. Then I want to put the results into a common, light and easy-to-interpret CSV+JSON format (the biotracks format), and show people how data containerization[5] can be the way to go to enable research data exchange and knowledge discovery at large.

With most other specifications, cell tracking data are stored in tabular format, but metadata are never kept together with the data, which makes data interpretation and sharing very difficult. The Frictionless Data specifications take good care of this aspect. Some other formats are based on XML[6] annotation, which certainly does the job, but are perhaps heavier (even though perhaps more sustainable in the long term). I hate Excel formats, and unfortunately I need to parse those too. I love the integration with Python[7] and the pandas[4:1] system, this is a big plus when doing data science.

As a researcher, I mostly deal with research data. I am pretty sure if this could work for cell migration data, it could work for many cell biology disciplines as well. I recommend speaking to more researchers and data producers to determine additional use cases!


  1. Data Package: https://specs.frictionlessdata.io/data-package (opens new window) ↩︎

  2. Tabular Data Package: https://specs.frictionlessdata.io/tabular-data-package (opens new window) ↩︎

  3. Table Schema: https://specs.frictionlessdata.io/table-schema (opens new window) ↩︎

  4. Pandas: Python package for data analysis: http://pandas.pydata.org/ (opens new window) ↩︎ ↩︎

  5. Design Philosophy: specs (opens new window) ↩︎

  6. Extensible Markup Language: https://en.wikipedia.org/wiki/XML (opens new window) ↩︎

  7. Data Package-aware libraries in Python: https://github.com/frictionlessdata/datapackage-py (opens new window), https://github.com/frictionlessdata/tableschema-py (opens new window), https://github.com/frictionlessdata/goodtables-py (opens new window) ↩︎

- + diff --git a/blog/2017/05/24/the-data-retriever/index.html b/blog/2017/05/24/the-data-retriever/index.html index 4cadd0c5e..c3ca4371b 100644 --- a/blog/2017/05/24/the-data-retriever/index.html +++ b/blog/2017/05/24/the-data-retriever/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

The Data Retriever (opens new window) automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a variety of databases and file formats. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

We originally built the Data Retriever starting in 2010 with a focus on ecological data. Over time, we realized that the common challenges with finding downloading, and cleaning up ecological data applied to data in most other fields, so we rebranded and starting integrating data from other fields as well.

The Data Retriever is primarily focused on tabular data, but we’re starting work on supporting spatial data as well.

Diagram
The Data Retriever automatically installing the BBS (USGS North American Breeding Bird Survey) (opens new window) dataset

Data is often messy and needs cleaning and restructuring before it can be effectively used. It is often not feasible to modify and redistribute the data due to licensing and other limitations (Editor’s note: see our Open Power System Data case study for more on this).

We need to make it as easy as possible for contributors to add new datasets (opens new window). For relatively clean datasets this means having a simple, easy-to-work-with metadata standard to describe existing data. The description for each dataset is written in a single file which gets read by our plugin infrastructure.

To describe the structure of simple data, we originally created a YAML-like[1] metadata structure. When the Data Package[2] specs were created by Open Knowledge International (opens new window), we decided to switch over to using this standard so that others could benefit from the metadata we were creating and so that we could benefit from th standards-based infrastructure[^software] being created around the specs.

The transition to the Data Package specification was fairly smooth as most of the fields we needed were already included in the specs. The only thing that we needed to add were fields for restructuring poorly formatted data since the spec assumes the data is well structured to begin with. For example, we use custom fields for describing how to convert wide data to long data (opens new window).

We first learned about Frictionless Data through the announcement (opens new window) of their funding by the Sloan Foundation. Going forward, we would love to see the Data Package spec expanded to include information about “imperfections” in data. It currently assumes that the person creating the metadata can modify the raw data files to comply with the standard rules of data structure. However this doesn’t work if someone else is distributing the data, which is a very common use
case.

The expansion of the standard would include things like a way to indicate wide versus long data with enough information to uniquely describe how to translate from one to the other as well as information on single tables that are composed from data in many separate files. We have already been adding new fields to the JSON to accomplish some of these things and would be happy to be part of a larger dialog about implementing them more widely. For the wide-data-to-long-data example mentioned above, we use ct_column and ct_names fields and a ct-type type to indicate how to transform the data into a properly normalized form.

The other thing we’ve come across is the need to develop a clear specification for semantic versioning (opens new window) of Data Packages. The specification includes an optional version field[3] for keeping track to changes to the package. This version has a standard structure from semantic versioning in software that includes major, minor, and patch level changes. Unlike in software there is no clearly established standard for what changes in different version numbers indicate. Since we work with a lot of different datasets, we’ve been changing a lot of version numbers over the last year; this has lead us to open a discussion with the OKFN team (opens new window) about developing a standard to apply to these changes.

Our next big step is working on the challenge of simple data integration. One of the major challenges data analysts have after they have cleaned up and prepared individual data sources is combining them. General solutions to the data integration problem (e.g. linked data approaches) have proven to difficult but we are approaching the problem by tackling a small number of common use cases and involving humans in the metadata development describing the linkages between datasets.

The major specification that is available for ecological data is the Ecological Metadata Language (EML) (opens new window). It is an XML[4] based spec that includes a lot of information specific to ecological datasets. The nice thing about EML—which is also its challenge—is that it is very comprehensive. This gives it a lot of strength in a linked data context, but also means that it is difficult to drive adoption by users.

The Frictionless Data specifications line up better with our approach to data[5], which is to complement lightweight computational methods with human contributions to make data easier to work with quickly.

Community contributions to our work are welcome. We work hard to make all of our development efforts open and inclusive (see our Code of Conduct (opens new window)) and love it when new developers, data scientists, and domain specialists contribute (opens new window). A contribution can be as easy as adding a new dataset by following a set of prompts (opens new window) to create a new JSON file and submitting a PR (opens new window) on GitHub, or even just opening an issue to tell us about a dataset that would be useful to you. So, open an issue (opens new window), submit a PR, or stop by our Gitter chat channel (opens new window) and say “Hi”. We also participate in Google Summer of Code (opens new window), which is a great opportunity for students interested in being directly supported to work on the project.


  1. YAML Ain’t Markup Language: https://en.wikipedia.org/wiki/YAML (opens new window) ↩︎

  2. Data Package: https://specs.frictionlessdata.io/data-package (opens new window) ↩︎

  3. Data Package version field: /specs/#version (opens new window) ↩︎

  4. Extensible Markup Language: https://en.wikipedia.org/wiki/XML (opens new window) ↩︎

  5. Design Philosophy: /specs/#design-philosophy (opens new window) ↩︎

- + diff --git a/blog/2017/06/26/pacific-northwest-national-laboratory-active-data-biology/index.html b/blog/2017/06/26/pacific-northwest-national-laboratory-active-data-biology/index.html index 310cec808..8115c6896 100644 --- a/blog/2017/06/26/pacific-northwest-national-laboratory-active-data-biology/index.html +++ b/blog/2017/06/26/pacific-northwest-national-laboratory-active-data-biology/index.html @@ -38,7 +38,7 @@ - + @@ -130,6 +130,6 @@ "[Pending]" ]

# Software

Goodtables had existed (opens new window) as a Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. This software was put to good use in a local government context.

For this pilot, and in coordination with other work in the project, we took the opportunity to drastically improve the software to support the online, automated validation referenced in the above use case. We took as inspiration the workflow in use in software development environments around the world—continuous automated testing—and applied to data. This involved not only updating the Python library to reflect the specification development to date, but the design of a new data publishing workflow that is applicable beyond PNNL’s needs. It is designed to be extensible, so that custom checks and custom backends (e.g. other places where one might publish a dataset) can take advantage of this workflow. For example, in addition to datasets stored on GitHub, the new goodtables supports the automated validation of datasets stored on S3 and we are currently working on validation of datasets stored on CKAN.

Goodtables supports validation of tabular data in GitHub repositories to solve the use case for Active Data Biology. On every update to the dataset, a validation task is run on the stored data.

# Review

# How Effective Was It

The omics team at PNNL are still investigating the use of goodtables.io (opens new window) for their use case, but early reports are positive:

We created a schema and started testing it. So far so good! I think this is going to work for a lot of projects which want to store data in a repo.

As a real test of the generality of goodtables, we also tried to apply it to another project. This second project is a public repository describing measurements of metabolites in ion mobility mass spectrometry. Here, we are again using flat files for structured data. The data is actually a library of information describing metabolites, and we know that the library will be growing. So it was very similar to the ADBio project, in that the curated data would be continually updated. (see https://github.com/PNNL-Comp-Mass-Spec/MetabolomicsCCS (opens new window) for the project itself, and https://github.com/PNNL-Comp-Mass-Spec/metaboliteValidation (opens new window) for a validation script that leverages goodtables).

Of course, technical issues that they have encountered have been translated in GitHub issues and are being addressed:

- + diff --git a/blog/2017/08/09/collections-as-data/index.html b/blog/2017/08/09/collections-as-data/index.html index 412770672..622f180a2 100644 --- a/blog/2017/08/09/collections-as-data/index.html +++ b/blog/2017/08/09/collections-as-data/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Collections as Data Facets - Carnegie Museum of Art Collection Data

This blog post was originally published as part of the Collections as Data Facets document collections (opens new window) on the Always Already Computational - Collections as Data website.

- + diff --git a/blog/2017/08/09/tutorial-template/index.html b/blog/2017/08/09/tutorial-template/index.html index f57683ca1..1431cbfff 100644 --- a/blog/2017/08/09/tutorial-template/index.html +++ b/blog/2017/08/09/tutorial-template/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

Template for Tutorials

Price icons created by Pixel perfect - Flaticon

This post provides you with a template for writing Frictionless Data tutorials. Specifically, tutorials of the form: How to do X thing using Y Frictionless Data tool.

# Introduction

You want to start introducting what you are doing e.g.

In this tutorial you’ll learn how to {do a thing using a tool} to {provide some benefit} (This first sentence may be inspired by a user story (opens new window)).

Clearly state the objective of your tutorial in the title and then once again in more detail at the very beginning of the tutorial. This gives readers an idea of what to expect and helps them determine if they want to continue reading.

Tutorial time : 20 minutes

Audience : Beginner Data Packagers {user role} with {skill level}.

Then continue like this:

# What you’ll need

You’ll need a basic understanding of:

  • JSON syntax
  • how to run commands in Terminal

To complete this tutorial you’ll need:

# Introduction

Introduce any basic concepts.

To {achieve the benefit} we’ll guide you through these steps:

  1. import the data
  2. generate a table schema
  3. create a data package
  4. publish the data package

# 1. Import the data

Write in a friendly, conversational style. Using humor is fine.

# 2. Generate a table schema

Include pictures. Highlight key items on screenshots. Make sure pictures can be view in fullsize.

# 3. Create a data package

Explain why something must be done, not just how to do it.

# 4. Publish the data package

In this step you’ll…

# Congratulations

In 4 simple steps you’ve learned how {do a thing}. With this new knowledge, now you can {achieve a benefit}.

Now go {do something}

# Learn more

# References

- + diff --git a/blog/2017/08/15/causa-natura-pescando-datos/index.html b/blog/2017/08/15/causa-natura-pescando-datos/index.html index ac5fff250..08b92b3ca 100644 --- a/blog/2017/08/15/causa-natura-pescando-datos/index.html +++ b/blog/2017/08/15/causa-natura-pescando-datos/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Causa Natura - Pescando Datos

# Context

Causa Natura is a non-profit organization based in Mexico. It supports public policies to allow management of natural resources respecting human rights, equity, efficiency and sustainability. This project, “Pescando Datos” seeks to advocate for improved public policies for more than just subsidies allocation, through the collection of, analysis, and visualization of data around subsidies available to fishing communities in Mexico.

After an extended period of analysis a web platform is being built in order to explore data and visualize it with launch due for later in 2017. Following a meeting at csv,conf after a presentation by Adrià Mercader on ‘Continuous Data Validation for Everybody’ (opens new window) we have piloted with Causa Natura to explore how our goodtables service can support the project. We spoke to Eduardo Rolón, Executive Director of Causa Natura and Gabriela Rodriguez who is working on the platform.

# Problem We Were Trying To Solve

Causa Natura are making a lot of freedom of Information requests in Mexico on information to do with fishers in order to understand how policies are impacting people. The data is needed to support a range of stakeholders from the many co-op fisher communities to advocacy organisations.

Eduardo Rolón: Advocacy organizations, either from CSOs or from the fisheries sector may be more interested in data that evaluates and supports policy recommendations. Fisher communities have more immediate needs, such as how to obtain better governmental services and support.

Gabriela Rodriguez: The data is important to us because Campaigns and decisions will be made based on the analysis on the data Causa Natura collected. To be able to do the required analysis we need good data.

Gabriela Rodriguez: Currently, there is a tedious process of cleaning to give us data that can be worked on. Much of the data Causa Natura was using came as PDFs and needed to be processed. We process a lot of PDFs and Excel files and there are a lot of problems getting the OCR to capture the information correctly to csv. For example, names are not consistent and this causes us a lot of problems.

# The Work

# Software

goodtables was an existing Python library and web application developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema as described above. We introduced goodtables in a blog post (opens new window) earlier this year.

On top of that, Open Knowledge International has developed goodtables.io (opens new window), a web service for a continuous data validation that connects to different data sources to generate structure and content reports.

# What Did We Do

Let’s see how goodtables.io (opens new window) has helped to identify source and structural errors in the Causa Natura pilot dataset:

ADBio

After we’ve signed in, we synchronize our GitHub repositories and activate the repository we want to validate (https://github.com/frictionlessdata/pilot-causanatura (opens new window)):

ADBio

Once the repository is activated, every time there is an update on the data hosted on GitHub, the service will generate a validation report. This is how one of these reports looks like:

ADBio

Here, we see that there are 59 valid tables, but the report has identified source and structural errors in 41 of the other tables hosted on the repository, including:

  • duplicate rows
  • duplicate headers
  • blank rows
  • missing values

The full list of checks exercised by goodtables.io (opens new window) can be found in the Data Quality Spec (opens new window). And the full report can be found here (opens new window).

After identifying errors we went back do a manual cleanup of the data. As we mentioned, there is no need to run goodtables.io (opens new window) validation manually - it happens on any GitHub push for all activated repositories:

ADBio

If we need to customize a validation process we can put a goodtables.yml configuration file on the repository root, allowing us to tweak settings like the actual checks to perform, limit of rows to check, etc:

ADBio

And instant feedback is available via GitHub commit statuses and a goodtables.io (opens new window) badge that can be included in the README file:

ADBio

# Review

Gabriela Rodriguez: Right now I have not been using it extensively yet but I have a lot of faith that it could get incorporated in the process of importing data into the Github repository. It should be easy to introduce into our workflow. I really like the process of hooks after git-push as I’m trying to get the organization to use Github for new data. I really like the validation part and that a report is generated each time data is pushed. This is very important and very useful. This makes it easier for the people who are doing the cleaning of data who may not have experience with GitHub.

Gabriela Rodriguez: The web interface needs a lot of usability work. But the idea is awesome. There are problems and it is kind of hard to use at the moment as it takes a long time to sync repositories and the process is not clear, but i think it has a huge potential to make a difference to the work we are doing, mostly if people use Github to store data then it could make a difference.

# Next Steps

# Areas for further work

Gabriela Rodriguez: With continuous integration it would be very helpful to be notified with messages about the problems in the data. Perhaps emails notifications would be a good way to go, or integrations with other programs - Slack for example - would be fantastic.

One thing to note is that all the errors shown following the analysis refer to the structure of the data files (missing headers, duplicate rows, etc). Including schema validation against some of the files would be a very logical next step in testing whether the contents of the data are what is expected). We are now planning to work with Causa Natura to take the steps to identify a subset of the data and create a base schema/data package that will be easily expandable and extendable.

# Find Out More

To explore for the yourself and collaborate, see the Pescando Datos project on github (opens new window) and our goodtables reports (opens new window) from the project.

- + diff --git a/blog/2017/08/15/center-for-data-science-and-public-policy-workforce-data-initiative/index.html b/blog/2017/08/15/center-for-data-science-and-public-policy-workforce-data-initiative/index.html index 1131a4279..f0c749dbe 100644 --- a/blog/2017/08/15/center-for-data-science-and-public-policy-workforce-data-initiative/index.html +++ b/blog/2017/08/15/center-for-data-science-and-public-policy-workforce-data-initiative/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

The Workforce Data Initiative aims to modernize the US workforce through the use of data. One aspect of this initiative is to help state and local workforce boards collect, aggregate, and distribute statistics on the effectiveness of training providers and the programs they offer. The US Department of Labor mandates that every eligible training provider (ETP) work with state workforce
boards to track the outcomes of their students in order to receive federal funding. We are building a suite of open-source tools using open data specifications in order to help make this a reality; this collection of tools is called the Training Provider Outcomes Toolkit (TPOT). This specific tool, the etp-uploader, is a website that state workforce boards can deploy for training providers to upload their individual-level data.

There are many hundreds or thousands of training providers within the purview of each workforce development board. Each one must securely upload their participant data to their workforce board. This means that the workforce development boards must be equipped to receive and validate the data.

Training providers range from small trade apprenticeships to community colleges to multi-state organizations, with a wide range of data sophistication. The ways in which the workforce data board collects participant outcomes must be easy and accessible to all organizations. At the same time, it must be easy for the board itself to automatically process and validate the datasets.

We use the Frictionless Data Table Schema specification to define the required columns and data value constraints. This is decoupled from the code, allowing each state to precisely define their requirements and easily create custom instances of the site. We expose this flexibility through a Heroku build script (opens new window).

We have modified the goodtables-web project (opens new window) to add support for uploading to an S3 repository. We’ve further extended it to allow for uploading metadata about the uploaded file after it is validated. This metadata is uploaded as a separate file. In the future, we may use the data package standard to describe these two files as a single data package.

I am excited to see the new developments around goodtables-py 1.0 and beyond. It will be nice to eventually move our upload website to the new APIs. One possible area for improvement in the goodtables-web validator is better error messages when specific data values do not match constraints. I’ve imagined adding a custom “data_constraint_error” field to the Table Schema that would allow for friendlier errors, or perhaps dynamically generating such error messages using the constraints themselves.

I think that this general structure — a validated table upload software — is very useful and could be used for a wide variety of applications. It may make sense to allow for even more easy customizations to the site.

The extension to goodtables-web is open source and available here (opens new window) with a demo also running at http://send.dataatwork.org (opens new window)

- + diff --git a/blog/2017/08/15/university-of-cambridge/index.html b/blog/2017/08/15/university-of-cambridge/index.html index dc0f19256..d2bf25470 100644 --- a/blog/2017/08/15/university-of-cambridge/index.html +++ b/blog/2017/08/15/university-of-cambridge/index.html @@ -38,7 +38,7 @@ - + @@ -205,6 +205,6 @@ read this metadata—could provide an “object-oriented” style of working with experimental data.

We have not tried this out on multiple samples (this is forthcoming), so we don’t have much information yet on
the usefulness of this approach, but the exercise raised several important issues to potentially address with the
Data Package format:

  1. Stephen’s request for a specified location for storing units in a structured way comes up often: https://github.com/frictionlessdata/specs/issues/216 (opens new window)
  2. More iterations, with more of a variety of data sources could help in trialling this
  3. Stephen wanted to store a non-tabular data file (an image) with the tabular datasets that comprise his datasets. This is currently not allowed, but the subsequent definition of a Tabular Data Resource could pave the way for a method of specifying types of different resources and the kind of processing, validation or otherwise, that could be done with each.

# Next Steps

# Areas for future work

Stephen now has about 100 retinal mosaics that might make for a nice use case of the Data Package. In addition, The Frictionless Data Tool Fund has funded the development of the next version of the R Data Package. This will make some of the improvements brought to the Data Package specifications in the past few months available in the R library.

- + diff --git a/blog/2017/09/28/zegami/index.html b/blog/2017/09/28/zegami/index.html index 567864df3..6bd61ca0b 100644 --- a/blog/2017/09/28/zegami/index.html +++ b/blog/2017/09/28/zegami/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Zegami

Zegami (opens new window) makes information more visual and accessible, enabling intuitive exploration, search and discovery of large data sets. Zegami combines the power of machine learning and human pattern recognition to reveal hidden insights and new perspectives.

imagesearch
image search on Zegami

It provides a more powerful tool for visual data than what’s possible with spreadsheets or typical business intelligence tools. By presenting data within a single field of view, Zegami enables users to easily discover patterns and correlations. Facilitating new insights and discoveries that would otherwise not be possible.

metadatasearch
metadata search on Zegami

For Zegami to shine, our users need to be able to easily import their data so they can get actionable insight with minimal fuss. In building an analytics platform we face the unique challenge of having to support a wide variety of data sources and formats. The challenge is compounded by the fact that the data we deal with is rarely clean.

At the onset, we also faced the challenge of how best to store and transmit data between our components and micro-services. In addition to an open, extensible and simple yet powerful data format, we wanted one that can preserve data types and formatting, and be parsed by all the client applications we use, which includes server-side applications, web clients and visualisation frameworks.

We first heard about messytables[1] and of the data protocols site (currently Frictionless Data Specifications[2]) through a lightning talk at EuroSciPy 2015. This meant when we searched for various things around jsontableschema (now tableschema[3]), we landed on the Frictionless Data project.

We are currently using the specifications in the following ways:

  • We use tabulator.Stream[4] to parse data on our back end.
  • We use schema infer from tableschema-py[5] to store an extended json table schema to represent data structures in our system. We are also developing custom json parsers using json paths and the ijson library

In the coming days, We plan on using

  • datapackage-pipelines[6] as a spec for the way we treat joins and multi-step data operations in our system
  • tabulator in a polyglot persistence scenario[7] - storing data in both storage buckets and either elasticsearch[8] or another column store like druid.io (opens new window).

Diagram

Moving forward it would be interesting to see tableschema and tabulator as a communication protocol over websockets. This would allow for a really smooth experience when using handsontable[9] spreadsheets with a datapackage of some kind. A socket-to-socket version of datapackage-pipelines which runs on container orchestration systems would also be interesting. There are few protocols similar to datapackage-pipelines, such as Dask[10] which, although similar, is not serialisable and therefor unsuitable for applications where front end communication is necessary or where the pipelines need to be used by non-coders.

We are also keen to know more about repositories around the world that use datapackages[11] so that we can import the data and show users and owners of those repositories the benefits of browsing and visualising data in Zegami.

In terms of other potential use cases, it would be useful to create a python-based alternative to the dreamfactory API server[12]. wqio (opens new window) is one example, but it is quite hard to use and a lighter version would be great. Perhaps CKAN[13] datastore could be licensed in a more open way?

In terms of the next steps for us, we are currently working on a SaaS implementation of Zegami which will dramatically reduce the effort required in order to start working with Zegami. We are then planning on developing a series of APIs so developers can create their own data transformation pipelines. One of our developers, Andrew Stretton, will be running Frictionless Data sessions at PyData London[14] on Tuesday, October 3 and PyCon UK[15] on Friday, October 27.


  1. Library for parsing messy tabular data: https://github.com/okfn/messytables (opens new window) ↩︎

  2. Frictionless Data Specifications: specs (opens new window) ↩︎

  3. Table Schema: https://specs.frictionlessdata.io/table-schema (opens new window) ↩︎

  4. Tabulator: library for reading and writing tabular data https://github.com/frictionlessdata/tabulator-py (opens new window) ↩︎

  5. Table Schema Python Library: https://github.com/frictionlessdata/tableschema-py (opens new window) ↩︎

  6. Data Package Pipelines: https://github.com/frictionlessdata/datapackage-pipelines (opens new window) ↩︎

  7. Polyglot Persistence: https://en.wikipedia.org/wiki/Polyglot_persistence (opens new window) ↩︎

  8. Elastic Search: https://www.elastic.co/products/elasticsearch (opens new window) ↩︎

  9. Handsontable: Javascript spreadsheet component for web apps: https://handsontable.com (opens new window) ↩︎

  10. Dask Custom Graphs: http://dask.pydata.org/en/latest/custom-graphs.html (opens new window) ↩︎

  11. Data Packages: https://specs.frictionlessdata.io/data-package (opens new window) ↩︎

  12. Dream Factory: https://www.dreamfactory.com/ (opens new window) ↩︎

  13. CKAN: Open Source Data Portal Platform: https://ckan.org (opens new window) ↩︎

  14. PyData London, October 2017 Meetup: https://www.meetup.com/PyData-London-Meetup/events/243584161/ (opens new window) ↩︎

  15. PyCon UK 2017 Schedule: http://2017.pyconuk.org/schedule/ (opens new window) ↩︎

- + diff --git a/blog/2017/10/24/elife/index.html b/blog/2017/10/24/elife/index.html index 46c41bc7d..012065915 100644 --- a/blog/2017/10/24/elife/index.html +++ b/blog/2017/10/24/elife/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

# Context

eLife (opens new window) is a non-profit organisation with a mission to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. eLife publishes important research in all areas of life and biomedical sciences. The research is selected and evaluated by working scientists and is made freely available to all readers.

# Problem We Were Trying To Solve

Having met at csv,conf,v3 in Portland in May 2017, eLife’s Naomi Penfold (opens new window) and Open Knowledge International’s Adrià Mercader (opens new window) determined that eLife would be a good candidate for a Frictionless Data pilot. eLife has a strong emphasis on research data, and stood to benefit from the data validation service offered by Frictionless Data’s goodtables.

# The Work

In order to assess the potential for a goodtables integration at eLife, we first needed to measure the quality of source data shared directly through eLife.

# Software

To explore the data published in the eLife platform we used the goodtables library[1]. Both the goodtables python library and web service[2] were developed by Open Knowledge International to support the validation of tabular datasets both in terms of structure and also with respect to a published schema. You can read more about them in this introductory blog post (opens new window).

# What Did We Do

The first stage was to perform validation on all files made available through the eLife API in order to generate a report on data quality - this would allow us to understand the current state of eLife-published data and present the possibility of doing more exciting things with the data such as more comprehensive tests or visualisations.

The process:

  • We downloaded a big subset of the articles metadata made available via the eLife public API[3].
  • We parsed all metadata files in order to extract the data files linked to each article, regardless of whether it was an additional file or a figure source. This gave us a direct link to each data file linked to the parent article.
  • We then ran the validation process on each file, storing the resulting report for future analysis.

All scripts used in the process as well as the outputs can be found in our pilot repository (opens new window).

Here are some high-level statistics for the process:

We analyzed 3910 articles, 1085 of which had data files. The most common format was Microsoft Excel Open XML Format Spreadsheet (xlsx), with 89% of all 4318 files being published on this format. Older versions of Excel and CSV files made up the rest.

datasets analysed by eLife image A summary of the eLife research articles analysed as part of the Frictionless Data pilot work

In terms of validation, more than 75% of the articles analyzed contained at least one invalid file. Of course valid data is an arbitrary term based on the tests that are set within goodtables and results need to be reviewed to adjust the checks performed. For instance errors raised by blank rows are really common on Excel files as people add a title on the first row, leaving an empty row before the data, or empty rows are detected at the end of the sheet.

Other errors raised that might actually point to genuine errors included duplicated headers, extra headers, missing values, incorrect format values (e.g. date format instead of gene name) to give just some examples. Here’s a summary of the raw number of errors encountered. For a more complete description of each error, see the Data Quality Spec[4]:

Error Type Count
Blank rows 45748
Duplicate rows 9384
Duplicate headers 6672
Blank headers 2479
Missing values 1032
Extra values 39
Source errors 11
Format errors 4

# Review

# How Effective Was It

Following analysis of a sample of the results, the vast majority of the errors appear to be due to the data being presented in nice-looking tables, using formatting to make particular elements more visually clear, as opposed to a machine-readable format:

example tables image shared by Naomi Data from Maddox et al. was shared in a machine-readable format (top), and adapted here to demonstrate how such data are often shared in a format that looks nice to the human reader (bottom).
Source: Source data
The data file is presented as is and adapted from Maddox et al. eLife 2015;4:e04995 under the Creative Commons Attribution License (CC BY 4.0).

This is not limited to the academic field of course, and the tendency to present data in spreadsheets so it is visually appealing is perhaps more prevalent in other areas. Perhaps because consumers of the data are even less likely to have the data processed by machines or because the data is collated by people with no experience of having to use it in their work.

In general the eLife datasets had better quality than for instance those created by government organisations, where structural issues like missing headers, extra cells, etc are much more common. So although the results here have been good, the community could derive substantial benefit from researchers going that extra mile to make files more machine-friendly and embrace more robust data description techniques like Data Packages.

Because these types of ‘errors’ are so common we have introduced default ignore blank rows and ignore duplicate rows options in our standalone validator (opens new window) since this helps bring more complex errors to the surface and focusses attention on the errors which may be less trivial to resolve. Excluding duplicate and blank rows as well as duplicate headers (the most common but also relatively simple errors), 6.4% (277/4318) of data files had errors remaining, affecting 10% of research articles (112/1085).

Having said this, the relevance of these errors should not be underplayed as blank rows, duplicate rows and other human-centered formatting preferences can still result in errors that prevent machine readability. Although the errors were often minor and easy to fix in our case, these seemingly simple errors can be obstructive to anyone trying to reuse data in a more computational workflow. Any computational analysis software, such as R[5], requires that all column headers are variables and rows are individual observations i.e. we need variables in columns and observations in rows for any R analysis.

Much less frequent errors were related to difficulties retrieving and opening data files. It was certainly helpful to flag articles with files that were not actually possible to open (source-error), and the eLife production team are resolving these issues. While only representing a small number of datasets, this is one use key case for goodtables: enabling publishers to regularly check continued data availability after publication.

The use case for authors is clear — to identify how a dataset could be reshaped to make it reusable. However, this demands extra work if reshaping is a job added at the point of sharing work. In fact, it is important that any issues are resolved before final publication, to avoid adding updated versions of publications/datasets. Tools that reduce this burden by making it easy to quickly edit a datafile to resolve the errors are of interest moving forward. In the meantime, it may be helpful to consider some key best practises as datasets are collected.

Overall, the findings from this pilot demonstrate that there are different ways of producing data for sharing: datasets are predominantly presented in an Excel file with human aesthetics in mind, rather than structured for use by a statistical program. We found few issues with the data itself beyond presentation preferences. This is encouraging and is a great starting point for venturing forward with helping researchers to make greater use of open data.

# Next Steps

# Areas for further work

Libraries such as goodtables help to flag the gap between the current situation and the ideal situation, which is machine-readability. Most of the issues identified by goodtables in the datasets shared with eLife relate to structuring the data for human visual consumption: adding space around the table, merging header cells, etc. We encourage researchers to make data as easy to consume as possible, and recognise that datasets built primarily to look good to humans may only be sufficient for low-level reuse.

Moving forward, we are interested in tools and workflows that help to improve data quality earlier in the research lifecycle or make it easy to reshape at the point of sharing or reuse.

# Find Out More

https://github.com/frictionlessdata/pilot-elife (opens new window)

Parts of this post are cross-posted (opens new window) on eLife Labs[6].


  1. goodtables Python library: http://github.com/frictionlessdata/goodtables-py (opens new window) ↩︎

  2. goodtables web service: http://goodtables.io (opens new window) ↩︎

  3. eLife Public API: https://api.elifesciences.org/ (opens new window) ↩︎

  4. Data Quality Spec: https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json (opens new window) ↩︎

  5. R Programming Language: Popular open-source programming language and platform for data analysis: https://www.r-project.org (opens new window) ↩︎

  6. eLife Labs: https://elifesciences.org/labs (opens new window) ↩︎

- + diff --git a/blog/2017/10/24/georges-labreche/index.html b/blog/2017/10/24/georges-labreche/index.html index e521fd8e8..9b21bae42 100644 --- a/blog/2017/10/24/georges-labreche/index.html +++ b/blog/2017/10/24/georges-labreche/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

Tool Fund Grantee: Georges Labrèche

This grantee profile features Georges Labreche for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

I arrived in Kosovo from New York back in 2014 in order to conduct field research for my Masters thesis in International Affairs: I was studying the distinct phenomenon of Digital State-Building, i.e. the use of online digital technologies to promote statehood. I didn’t pack much on my trip here but did bring along a lot of entrepreneurial drive to start a digital agency with strong elements of corporate social responsibility and tech community building. Initially, I had hoped to leverage my background as a Software Engineer to build a small service-oriented startup, but in light of Kosovo’s ongoing state-building processes and push for good governance and anti-corruption, I saw the opportunity to establish a civic-tech NGO, Open Data Kosovo (opens new window) (ODK), as a means of getting local techies to play an active part in state-building by applying their digital skills towards contributing to increasing government transparency and accountability.

Work aside, I have a passion for continuous learning so if you were to meet me I would probably steer the conversation towards what I recently learned on my latest online edX course. My current deep-dives are around space, physics, astronautics and robotics and it is likely that you would find me happily struggling on my homework for online courses in these fields or getting excited about the next scheduled SpaceX launch in my spare time. I am also passionate about travel, particularly experiences that combine visits to UNESCO World Heritage sites, discovery of local cuisines, as well as hiking and mountain climbing in the great outdoors.

I first heard about Frictionless Data from Tin Geber (opens new window) (formerly of The Engine Room). He directly contacted me with a link to the Frictionless Data Tool Fund grant and asked me to apply. A couple of days later, Andrew Russell (opens new window), the UN Development Coordinator and UNDP Resident Representative in Kosovo, asked me about Frictionless Data and the Tool Fund grant on Twitter and I have since had the opportunity to explain the concept behind Frictionless Data to several people.

At first I was just really excited about using the already available Frictionless Data Python library for a procurement data importer we were working on for an Open Contracting Data Standard (OCDS) project. Here in Kosovo, my organization has liberated public procurement datasets that we’ve transformed into an open format but without any strong nor consistent data processing methodology. As I went through the specifications, it became clear to me that it was exactly what our procurement data liberation workflow was missing. I also wanted to do more than just use it, I wanted to contribute to it and make it more accessible to other developer communities, and especially in Java, which I am proficient in.

Data is messy and, for developers, cleaning and processing data from one project to another can quickly turn an awesome end-product idea into a burdensome chore. Data packages and Frictionless Data tools and libraries are important because they allow developers to focus more on the end-product itself without having to worry about heavy lifting in the data processing pipeline.

Members of programming communities are, as a whole, involved in infinitely diverse projects and problem solving initiatives. Working with that diversity allows us to explore use cases that we would never have imagined when conceptualizing such libraries and tapping into such an ecosystem of programmers would serve to enhance future versions of the libraries.

All my work around extending implementation of Frictionless Data libraries in Java will be available on Github in these two repositories: datapackage-java (opens new window) and tableschema-java (opens new window), and comments, forks and pull requests are welcome.

- + diff --git a/blog/2017/10/26/matt-thompson/index.html b/blog/2017/10/26/matt-thompson/index.html index 56946110a..693d55d97 100644 --- a/blog/2017/10/26/matt-thompson/index.html +++ b/blog/2017/10/26/matt-thompson/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

Tool Fund Grantee: Matt Thompson

This grantee profile features Matt Thompson for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

My name is Matt Thompson, I am from Bristol, UK, and work as a lecturer in Creative Computing at Bath Spa University. I have been involved in the Clojure community for a while, running the Bristol Clojurians group (opens new window) since 2014. I was involved in the DM4T project (opens new window) during my postdoc at Bath University, where we used Frictionless Data software to create metadata for large datasets recording domestic energy usage.

We worked with Open Knowledge International’s Developer Advocate, Dan Fowler, on the DM4T project at Bath in a collaboration which turned into a soon-to-be-published pilot study for the project. Dan showed us (opens new window) how the Frictionless Data software could allow us to quickly automate ways to annotate our datasets with metadata. We came away excited about the possibilities that the Frictionless Data software enable for the datasets we’re working with.

When the call for applications (opens new window) for Frictionless Data’s Tool Fund was made in May 2017, I was already building tools in Clojure for working with Frictionless Data as part of DM4T, and I decided to apply to enable me to flesh them out into well-tested, documented libraries.

The problem we had with the DM4T project is that the same kinds of data were being collected by many different projects run by different universities across the country. In addition to describing the energy usage of different appliances, the data also includes different types of readings as well (electric usage, gas usage, humidity levels, temperature readings, etc). Different projects store their data in different ways, with some using mySQL databases, and others using CSV tables. The Frictionless Data software allow us to create simple JSON files describing the metadata for each project in a uniform way that is easy for the collaborating researchers to understand and implement.

Once datasets are put into the public domain, it is extremely useful to also have the metadata that describe them. This would enable people to, for example, run queries across multiple datasets. One example in our case would be to ask: “What was the energy usage of all homes in Exeter for January 2014?”. This information would be contained in datasets that are curated by different people, and so we need uniform metadata in order to be able to make these kinds of queries.

We run Clojure events and workshops twice a month as part of the Bristol Clojurians group (opens new window), so interested people can drop in and discuss the work we’re doing with Frictionless Data. I’m also planning to give a talk about Frictionless Data at one of the Clojurian events.

You can follow the development of the Clojure libraries on the Clojure Data Package library (opens new window) and Clojure Table Schema library (opens new window) Github repositories.

- + diff --git a/blog/2017/10/27/open-knowledge-greece/index.html b/blog/2017/10/27/open-knowledge-greece/index.html index e3b5d1fd0..b7474f4dd 100644 --- a/blog/2017/10/27/open-knowledge-greece/index.html +++ b/blog/2017/10/27/open-knowledge-greece/index.html @@ -35,7 +35,7 @@ - + @@ -109,6 +109,6 @@ Blog

Tool Fund Grantee: Open Knowledge Greece

Price icons created by Pixel perfect - Flaticon

This grantee profile features Open Knowledge Greece for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

Open Knowledge Greece (opens new window), formally appointed as the Greek Chapter of Open Knowledge International, was established in 2012 by a group of academics, developers, citizens, hackers and State representatives. We are supported by a national network of volunteers, most of whom are experienced professionals in the fields of Computer Science, Mathematics, Medicine, Journalism, Agriculture etc.
Our team consists of community members who are interested in open data, linked data technologies, coding, data journalism, and who do their best in applying scientific results into community activities.

We were very excited when we read about Frictionless Data on the Open Knowledge International blog (opens new window) and have been following the progress of the project carefully. When we saw the Frictionless Data Tool Fund had been announced (opens new window), we were certain that we wanted to be part of this project and build tools that can help in removing the friction in working with data.

People and organizations working with data are interested in analyzing, visualizing or building apps based on data, but they end up spending most of their time cleaning and preparing the data.

Frictionless Data software and specifications aim to make data ready for use. There are a lot of open data repositories in different formats and they often include dirty and missing data with null values, dates, currencies, units and identifiers which are represented differently and need effort to make them usable. Moreover datasets from different sources have different licenses, are often not up to date and inconsistencies make including in in one analysis difficult.

We received the Frictionless Data Tool Fund grant for the development of libraries in R language. R is a powerful open source programming language and environment for statistical computing and graphics that is widely used among statisticians and data miners for developing statistical software and data analysis.

We are going to implement two Frictionless Data libraries in R - Table Schema (opens new window) for publishing and sharing tabular-style data and Data Package (opens new window) for describing a coherent collection of data in a single package, keeping the frictionless data specifications (opens new window). Comments, forks, and pull requests welcome in the two repositories.

Users will be able to load a data package into R in seconds so that data can be used for analysis and visualizations, invalid data can be fixed and generally friction is removed in working with data especially when shifting from one language to another. For example you will be able to get and analyze a data package in R which will in turn make visualizations in other graphical interfaces easier.

We are also very delighted to be hosting Open Knowledge Festival (opens new window) (OKFest) in Thessaloniki, Greece from 3-6 May 2018. OKFest is expected to bring together over 1,500 people from more than 60 countries. During the four-day full program of the Festival, participants will work together to share their skills and experiences, build the very tools and partnerships that will further the power of openness as a positive force for change. Open Knowledge Greece CEO Dr. Charalampos Bratsas (opens new window) and the rest team are very excited to host the biggest gathering of the open knowledge community in our country. Look out for OKFest updates on the website and on Twitter (opens new window).

Interested in learning more about OK Greece? See our Facebook (opens new window) page, read our blog (opens new window) and watch our videos on Youtube (opens new window).

- + diff --git a/blog/2017/11/01/daniel-fireman/index.html b/blog/2017/11/01/daniel-fireman/index.html index f800b730e..7f3dd705b 100644 --- a/blog/2017/11/01/daniel-fireman/index.html +++ b/blog/2017/11/01/daniel-fireman/index.html @@ -35,7 +35,7 @@ - + @@ -108,6 +108,6 @@ (opens new window)

Tool Fund Grantee: Daniel Fireman

Price icons created by Pixel perfect - Flaticon

This grantee profile features Daniel Fireman for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

I was born in Maceió (opens new window), a sunny coastal city in the Northeast of Brazil. It was 20th century still when I had a first contact with an Intel 80386 and installed Conectiva Linux Guarani 3.0. A lot has happened since, for instance, a bachelor’s degree in Computer Science at UFCG after three years as a research assistant in the Distributed Systems Lab (LSD). It was already the 21st century when I realized that distributed and scalable systems were the way to go. I kept on studying the field and pursued a MSc at UFMG. From there I joined Google and spent 6 happy years working at multiple offices (NYC, ZRH, BHZ). I’ve got the chance to work on a myriad of projects, ranging from social networks to Google’s default Java HTTP/RPC server framework. Currently, I’m back to UFCG doing a Ph.D. in cloud computing performance. It is easy to find me at hackathons and other efforts to increase transparency of public data. I have also been busy working on projects like contratospublicos.info (opens new window) and Frictionless Data, using Go to improve data transparency in Brazil and around the world.

I started following Open Knowledge International (OKI) on Twitter (opens new window) after watching a talk from Vitor Baptista (opens new window) at UFCG. I learnt about Frictionless Data from posts by OKI and liked the overall idea a lot. I have been a Golang enthusiast for a while now, but I hadn’t thought of applying to the fund until I had a quick chat with Nazareno Andrade (opens new window) that started with Golang and ended with: “what about the Frictionless Data Tool Fund (opens new window)?”

Go has a lot to deliver in terms of approximating simplicity of reading/writing, correctness, and performance. I believe bringing the experience and solid specifications of Frictionless Data to the Go ecosystem will not only make data description, validation and processing easier and faster, but also help to decrease the distance between data analysis/processing and production serving systems, resulting in simpler and more solid infrastructure.

In the coming weeks, I hope to use the Tool Fund grant I received to bring Go’s performance and concurrency capabilities to data processing and to have a set of tools distributed as standalone and multi-platform binaries which are very easy to download and install. I am currently working on my Ph.D. and one pitfall I have come across is the use of one environment/system to collect/generate data and another to process. I will be working to alleviate this issue in order to make it easier to process tabular data in Go.

From the developer’s perspective, it is really great to use open source software. This is especially true when the community around the software fosters it’s usage and welcome contributors. That ends up increasing the overall quality of the software, which benefits all users.

The source code will be hosted at Github’s tableschema-go (opens new window) and datapackage-go (opens new window) repositories. We are going to use issues to track development progress and next steps.

- + diff --git a/blog/2017/12/04/openml/index.html b/blog/2017/12/04/openml/index.html index 88ab3f265..9b8157434 100644 --- a/blog/2017/12/04/openml/index.html +++ b/blog/2017/12/04/openml/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

OpenML

OpenML (opens new window) is an online platform and service for machine learning, whose goal is to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. People can upload and share data sets and questions (prediction tasks) on OpenML that they then collaboratively solve using machine learning algorithms.

(opens new window)
A brief introduction to openML

We offer open source tools (opens new window) to download data into your favorite machine learning environments (opens new window) and work with it. You can then upload your results back onto the platform so that others can learn from you. If you have data, you can use OpenML to get insights on what machine learning method works well to answer your question. Machine Learners can use OpenML to find interesting data sets and questions that are relevant for others and also for machine learning research (e.g. learning how algorithms behave on different types of data sets).

Users typically store their data in all kinds of formats, which makes it hard to simplify the data upload process on OpenML. Currently we only allow data in ARFF format. We are looking to make it as easy as possible for users to upload data, download and work with data from OpenML while keeping the datasets in machine readable formats and availing metadata in easy to read formats for our users. We also like to avail datasets from other services on OpenML. Most of these external sources currently contain data in varied formats, but some i.e. data.world (opens new window) have started adopting and using data packages (opens new window). You can read more about data.world’s adoption and use of data packages here and here (opens new window).

(opens new window)
Learn how to upload data on OpenML in 1 minute

We first heard about the Frictionless Data project through School of Data (opens new window). One of the OpenML core members is also involved in School of Data and used data packages in one of the open data workshops from School of Data Switzerland. In the coming months, we are looking to adopt Frictionless Data specifications (opens new window) to improve user friendliness on OpenML. We hope to make it possible for users to upload and connect datasets in data packages format (opens new window). This will be a great shift because it would enable people to easily build and share machine learning models trained on any dataset in the frictionless data ecosystem.

OpenML currently works with tabular data in Attribute Relation File Format (ARFF (opens new window)) accompanied by metadata in an XML or JSON file. It is actually very similar to Frictionless Data’s tabular data package (opens new window) specification, but with ARFF instead of csv.


Image of dataset list on OpenML

ARFF (Attribute-Relation File Format) is a CSV file with a header that lists the names of the attributes (columns) and their data types. Especially the latter is very important to do data analysis. For instance, say that you have a column with values 1,2,3. It is very important to know whether that is just a number (1,2,3 ice creams), a rank (1st, 2nd, 3rd place), or a category (item 1, item 2, item 3). This is missing from CSV data. ARFF also allows to connect multiple tables together, although we don’t really use this right now.


Image of a dataset overview on openML

The metadata is free-form information about the dataset. It is mostly key-value data, although some values are more structured. It is stored in our database and exported to simple JSON or XML. Here’s an example (opens new window). It covers basic information (textual description of the dataset, owner, format, license, et al) as well as statistics (number of instances, number of features, number of missing values, details about the data distribution, and results of simple machine learning algorithms run on the data), and summary statistics (mainly used for the quick overview plots).

We firmly believe that if data packages become the go-to specification for sharing data in scientific communities, accessibility to data that’s currently ‘hidden’ in data platforms and university libraries will improve vastly, and are keen to adopt and use the specification on OpenML in the coming months.

Interested in contributing to our quest to adopt the data package specification (opens new window) as an import and export option for data on the OpenML platform? Start here (opens new window).

- + diff --git a/blog/2017/12/12/ukds/index.html b/blog/2017/12/12/ukds/index.html index 3a5bfe9b9..23936ce4c 100644 --- a/blog/2017/12/12/ukds/index.html +++ b/blog/2017/12/12/ukds/index.html @@ -38,7 +38,7 @@ - + @@ -205,6 +205,6 @@ tabulator: headers: 3 # specifying which row contains headers

# Add Data Package Views

View specs can be added to the data package to enable datahub.io (opens new window) to create visualisations from resource data in the data package. The views property is a list of file paths to json files containing view-spec (opens new window) compatible views.

Currently, datahub.io (opens new window) supports views written either with a ‘simple’ views-spec, or using Vega (v 2.6.5). See datahub.io docs (opens new window) for more details about the supported views-spec.

# Push to datahub.io (opens new window)

Once the harvesting pipeline has been run the resulting data packages are pushed to datahub.io (opens new window) using the datahub.dump.to_datahub (opens new window) processor.

This creates or updates an entry for the package on datahub. If a view has been defined in the entry configuration, this will be created on the datahub.io (opens new window) entry Showcase page.

# Review

We were able to demonstrate that a data processing pipeline using Frictionless Data tools can facilitate the automated harvesting, validation, transformation, and upload to a data package-compatible third-party service, based on a simple configuration.

# Next Steps

The pilot data package pipeline runs locally in a development environment, but given each processor has been written as a separate module, these could be used within any pipeline. datahub.io (opens new window) uses datapackage-pipelines within its infrastructure, and the processors developed for this project could be used within datahub.io (opens new window) itself to facilitate the automatic harvesting of datasets from OAI-PMH enabled data sources.

Once a pipeline is in place, it can be scheduled to run each day (or week, month, etc.). This would ensure datahub.io (opens new window) is up-to-date with data on UKDS Reshare.

Working with ‘real-world’ data from UKDS Reshare has helped to identify and prioritise improvements and future features for datahub.io (opens new window).

# Additional Resources

- + diff --git a/blog/2017/12/15/university-of-pittsburgh/index.html b/blog/2017/12/15/university-of-pittsburgh/index.html index 2a2bd5f9b..5cac356ae 100644 --- a/blog/2017/12/15/university-of-pittsburgh/index.html +++ b/blog/2017/12/15/university-of-pittsburgh/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Western Pennsylvania Regional Data Center

# Context

One of the main goals of the Frictionless Data project is to help improve data quality by providing easy to integrate libraries and services for data validation. We have integrated data validation seamlessly with different backends like GitHub and Amazon S3 via the online service goodtables.io (opens new window), but we also wanted to explore closer integrations with other platforms.

An obvious choice for that are Open Data portals. They are still one of the main forms of dissemination of Open Data, especially for governments and other organizations. They provide a single entry point to data relating to a particular region or thematic area and provide users with tools to discover and access different datasets. On the backend, publishers also have tools available for the validation and publication of datasets.

Data Quality varies widely across different portals, reflecting the publication processes and requirements of the hosting organizations. In general, it is difficult for users to assess the quality of the data and there is a lack of descriptors for the actual data fields. At the publisher level, while strong emphasis has been put in metadata standards and interoperability, publishers don’t generally have the same help or guidance when dealing with data quality or description.

We believe that data quality in Open Data portals can have a central place on both these fronts, user-centric and publisher-centric, and we started this pilot to showcase a possible implementation.

To field test our implementation we chose the Western Pennsylvania Regional Data Center (opens new window) (WPRDC), managed by the University of Pittsburgh Center for Urban and Social Research (opens new window). The Regional Data Center made for a good pilot as the project team takes an agile approach to managing their own CKAN instance along with support from OpenGov, members of the CKAN association. As the open data repository is used by a diverse array of data publishers (including project partners Allegheny County and the City of Pittsburgh), the Regional Data Center provides a good test case for testing the implementation across a variety of data types and publishing processes. WPRDC is a great example of a well managed Open Data portal, where datasets are actively maintained and the portal itself is just one component of a wider Open Data strategy. It also provides a good variety of publishers, including public sector agencies, academic institutions, and nonprofit organizations. The project’s partnership with the Digital Scholarship Services team at the University Library System also provides data management expertise not typically available in many open data implementations.

# The Work

# What Did We Do

The portal software that we chose for this pilot is CKAN (opens new window), the world’s leading open source software for Open Data portals (source (opens new window)). Open Knowledge International initially fostered the CKAN project and is now a member of the CKAN Association (opens new window).

We created ckanext-validation (opens new window), a CKAN extension that provides a low level API and readily available features for data validation and reporting that can be added to any CKAN instance. This is powered by goodtables (opens new window), a library developed by Open Knowledge International to support the validation of tabular datasets.

The extension allows users to perform data validation against any tabular resource, such as CSV or Excel files. This generates a report that is stored against a particular resource, describing issues found with the data, both at the structural level (missing headers, blank rows, etc) and at the data schema level (wrong data types, values out of range etc).


data validation on CKAN made possible by ckanext-validation extension

This provides a good overview of the quality of the data to users but also to publishers so they can improve the quality of the data file by addressing these issues. The reports can be easily accessed via badges that provide a quick visual indication of the quality of the data file.


badges indicating quality of data files on CKAN

There are two default modes for performing the data validation when creating or updating resources. Data validation can be automatically performed in the background asynchronously or as part of the dataset creation in the user interface. In this case the validation will be performed immediately after uploading or linking to a new tabular file, giving quick feedback to publishers.


data validation on upload or linking to a new tabular file on CKAN

The extension adds functionality to provide a schema (opens new window) for the data that describes the expected fields and types as well as other constraints, allowing to perform validation on the actual contents of the data. Additionally the schema is also stored with the resource metadata, so it can be displayed in the UI or accessed via the API.

The extension also provides some utility commands for CKAN maintainers, including the generation of reports (opens new window) showing the number of valid and invalid tabular files, a breakdown of the error types and links to the individual resources. This gives maintainers a snapshot of the general quality of the data hosted in their CKAN instance at any given moment in time.

As mentioned before, we field tested the validation extension on the Western Pennsylvania Regional Data Center (WPRDC). At the moment of the import the portal hosted 258 datasets. Out of these, 221 datasets had tabular resources, totalling 626 files (mainly CSV and XLSX files). Taking into account that we only performed the default validation that only includes structural checks (ie not schema-based ones) these are the results:

466 resources - validation success

156 resources - validation failure

4 resources - validation error

The errors found are due to current limitations in the validation extension with large files.

Here’s a breakdown of the formats:

Valid resources Invalid / Errored resources
CSV 443 64
XLSX 21 57
XLS 2 39

And of the error types (more information about each error type can be found in the Data Quality Specification (opens new window)):

Type of Error Error Count
Blank row 19654
Duplicate row 810
Blank header 299
Duplicate header 270
Source error 30
Extra value 11
Format error 9
HTTP error 2
Missing value 1

The highest number of errors are obviously caused by blank and duplicate rows. These are generally caused by Excel adding extra rows at the end of the file or by publishers formatting the files for human rather than machine consumption. Examples of this include adding a title in the first cell (like in this case: portal page (opens new window) | file (opens new window)) or even more complex layouts (portal page (opens new window) | file (opens new window)), with logos and links. Blank and duplicate header errors like on this case (portal page (opens new window) | file (opens new window)) are also normally caused by Excel storing extra empty columns (and something that can not be noticed directly from Excel).

These errors are easy to spot and fix manually once the file has been opened for inspection but this is still an extra step that data consumers need to perform before using the data on their own processes. It is also true that they are errors that could be easily fixed automatically as part of a pre-process of data cleanup before publication. Perhaps this is something that could be developed in the validation extension in the future.

Other less common errors include Source errors, which include errors that prevented the file from being read by goodtables, like encoding issues or HTTP responses or HTML files incorrectly being marked as Excel files (like in this case: portal page (opens new window) | file (opens new window)). Extra value errors are generally caused by not properly quoting fields that contain commas, thus breaking the parser (example: portal page (opens new window) | file (opens new window)).

Format errors are caused by labelling incorrectly the format of the hosted file, for instance CSV when it links to an Excel file (portal page (opens new window) | file (opens new window)), CSV linking to HTML (portal page (opens new window) | file (opens new window)) or XLS linking to XLSX (portal page (opens new window) | file (opens new window)). These are all easily fixed at the metadata level.

Finally HTTP errors just show that the linked file hosted elsewhere does not exist or has been moved.

Again, it is important to stress that the checks performed are just basic and structural checks (opens new window) that affect the general availability of the file and its general structure. The addition of standardized schemas would allow for a more thorough and precise validation, checking the data contents and ensuring that this is what was expected.

Also it is interesting to note that WPRDC has the excellent good practice of publishing data dictionaries describing the contents of the data files. These are generally published in CSV format and they themselves can present validation errors as well. As we saw before, using the validation extension we can assign a schema defined in the Table Schema spec to a resource. This will be used during the validation, but the information could also be used to render it nicely on the UI or export it consistently as a CSV or PDF file.

All the generated reports can be further analyzed using the output files stored in this repository (opens new window).

Additionally, to help browse the validation reports created from the WPRDC site we have set up a demo site that mirrors the datasets, organizations and groups hosted there (at the time we did the import).

All tabular resources have the validation report attached, that can be accessed clicking on the data valid / invalid badges.

# Next Steps

# Areas for further work

The validation extension for CKAN currently provides a very basic workflow for validation at creation and update time: basically if the validation fails in any way you are not allowed to create or edit the dataset. Maintainers can define a set of default validation options to make it more permissive but even so some publishers probably wouldn’t want to enforce all validation checks before allowing the creation of a dataset, or just apply validation to datasets from a particular organization or type. Of course the underlying API (opens new window) is available for extension developers to implement these workflows, but the validation extension itself could provide some of them.

The user interface for defining the validation options can definitely be improved, and we are planning to integrate a Schema Creator (opens new window) to make easier for publishers to describe their data with a schema based on the actual fields on the file. If the resource has a schema assigned, this information can be presented nicely on the UI to the users and exported in different formats.

The validation extension is a first iteration to demonstrate the capabilities of integrating data validation directly into CKAN, but we are keen to know about different ways in which this could be expanded or integrated in other workflows, so any feedback or thoughts is appreciated.

# Additional Resources

- + diff --git a/blog/2017/12/19/dm4t/index.html b/blog/2017/12/19/dm4t/index.html index 49491864d..d24c84d02 100644 --- a/blog/2017/12/19/dm4t/index.html +++ b/blog/2017/12/19/dm4t/index.html @@ -38,7 +38,7 @@ - + @@ -348,6 +348,6 @@ list(storage.write('refit-cleaned', 'house-1', resource.read(keyed=True), ['Unix']))

Now we are able to check that our documents are indexed:

$ http http://localhost:9200/_cat/indices?v
 

# Getting insight from data using Kibana

To demonstrate how the Frictionless Data specs and software empower the usage of other analytics tools, we will use ElasticSearch/Kibana project. On the previous step we have imported our data package into an ElasticSearch cluster. It allows us to visualize data using a simple UI:

screenshot of elasticsearch cluster

In this screenshot we see the distribution of the average electricity comsumption. This is just an example of what you can do by having the ability to easily load datasets into other analytical software.

# Review

# The results

In this pilot, we have been able to demonstrate the the following:

  • Packaging the refit-cleaned dataset as a data package using the Data Package Pipelines library
  • Validating the data package using the Goodtables library
  • Modifying data packages metadata using the Packagist UI
  • Uploading the dataset to Amazon S3 and ElasticSearch cluster using Frictionless Data tools
  • Reading and analysing in Python the created Data Package using the Frictionless Data library

# Current limitations

The central challenge of working with these datasets is the size. Publishing the results of these research projects as flat files for immediate analysis is beneficial, however, the scale of each of these datasets (gigabytes of data, millions of rows) is a challenge to deal with no matter how you are storing. Processing this data through Data Package pipelines takes a long time.

# Next Steps

  • Improve the speed of the data package creation step

# Find Out More

# Source Material

- + diff --git a/blog/2018/02/14/creating-tabular-data-packages-in-r/index.html b/blog/2018/02/14/creating-tabular-data-packages-in-r/index.html index 7f965afd4..655e80b3d 100644 --- a/blog/2018/02/14/creating-tabular-data-packages-in-r/index.html +++ b/blog/2018/02/14/creating-tabular-data-packages-in-r/index.html @@ -33,7 +33,7 @@ - + @@ -191,6 +191,6 @@ ## } ##

# Publishing

Now that you have created your Data Package, you might want to publish your data online so that you can share it with others.

Now that you have created a data package in R, find out how to use data packages in R in this tutorial.

- + diff --git a/blog/2018/02/14/using-data-packages-in-r/index.html b/blog/2018/02/14/using-data-packages-in-r/index.html index bacf0638d..2f804403e 100644 --- a/blog/2018/02/14/using-data-packages-in-r/index.html +++ b/blog/2018/02/14/using-data-packages-in-r/index.html @@ -33,7 +33,7 @@ - + @@ -156,6 +156,6 @@ ## 8 8 O Oxygen 15.999400 nonmetal ## 9 9 F Fluorine 18.998403 halogen

More about using databases, SQLite in R you can find in vignettes of DBI (opens new window) and RSQLite (opens new window) packages.

We welcome your feedback and questions via our Frictionless Data Gitter chat (opens new window) or via Github issues (opens new window) on the datapackage-r (opens new window) repository.

- + diff --git a/blog/2018/02/16/using-data-packages-in-go/index.html b/blog/2018/02/16/using-data-packages-in-go/index.html index 4e18b0dec..9fceeaa0b 100644 --- a/blog/2018/02/16/using-data-packages-in-go/index.html +++ b/blog/2018/02/16/using-data-packages-in-go/index.html @@ -33,7 +33,7 @@ - + @@ -195,6 +195,6 @@ {Number:3 Symbol:Li Name:Lithium Mass:6.941 Metal:alkali metal} {Number:4 Symbol:Be Name:Beryllium Mass:9.012182 Metal:alkaline earth metal}

And our code is ready to deal with the growth of the periodic table in a very memory-efficient way 😃

We welcome your feedback and questions via our Frictionless Data Gitter chat (opens new window) or via GitHub issues (opens new window) on the datapackage-go repository.

- + diff --git a/blog/2018/03/07/well-packaged-datasets/index.html b/blog/2018/03/07/well-packaged-datasets/index.html index 1843ffec9..b071cb69c 100644 --- a/blog/2018/03/07/well-packaged-datasets/index.html +++ b/blog/2018/03/07/well-packaged-datasets/index.html @@ -38,7 +38,7 @@ - + @@ -130,6 +130,6 @@ dataresource2.csv datapackage.json
  • data/: All data files are contained in this folder. In our example, there is only one: data/gdp.csv .

  • datapackage.json: This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files.

Congratulations! You have now created a schema for your data, and combined it with descriptive metadata and your data collection to create your first data package!

- + diff --git a/blog/2018/03/12/automatically-validated-tabular-data/index.html b/blog/2018/03/12/automatically-validated-tabular-data/index.html index cd7eb5745..e6d11c71e 100644 --- a/blog/2018/03/12/automatically-validated-tabular-data/index.html +++ b/blog/2018/03/12/automatically-validated-tabular-data/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ field-guide

One-off validation of your tabular datasets can be hectic, especially where plenty of published data is maintained and updated fairly regularly.

Running continuous checks on data provides regular feedback and contributes to better data quality as errors can be flagged and fixed early on. This section introduces you to tools that continually check your data for errors and flag content and structural issues as they arise. By eliminating the need to run manual checks on tabular datasets every time they are updated, they make your data workflow more efficient.

In this section, you will learn how to setup automatic tabular data validation using goodtables, so your data is validated every time it’s updated. Although not strictly necessary, it’s useful to know about Data Packages and Table Schema before proceeding, as they allow you to describe your data in more detail, allowing more advanced validations.

We will show how to set up automated tabular data validations for data published on:

If you don’t use any of these platforms, you can still setup the validation using goodtables-py (opens new window), it will just require some technical knowledge

If you do use some of these platforms, the data validation report look like:

Figure 1: Goodtables.io tabular data validation report (opens new window)
Figure 1: Goodtables.io (opens new window) tabular data validation report.

# Validate tabular data automatically on CKAN

CKAN (opens new window) is an open source platform for publishing data online. It is widely used across the planet, including by the federal governments of the USA, United Kingdom, Brazil, and others.

To automatically validate tabular data on CKAN, enable the ckanext-validation (opens new window) extension, which uses goodtables to run continuous checks on your data. The ckanext-validation (opens new window) extension:

  • Adds a badge next to each dataset showing the status of their validation (valid or invalid), and
  • Allows users to access the validation report, making it possible for errors to be identified and fixed.

Figure 2: Annotated in red, automated validation checks on datasets in CKAN
Figure 2: Annotated in red, automated validation checks on datasets in CKAN.

The installation and usage instructions for ckanext-validation (opens new window) extension are available on Github (opens new window).

# Validate tabular data automatically on GitHub

If your data is hosted on GitHub, you can use goodtables web service to automatically validate it on every change.

For this section, you will first need to create a GitHub repository (opens new window) and add tabular data to it.

Once you have tabular data in your Github repository:

  1. Login on goodtables.io (opens new window) using your GitHub account and accept the permissions confirmation.
  2. Once we’ve synchronized your repository list, go to the Manage Sources (opens new window) page and enable the repository with the data you want to validate.
    • If you can’t find the repository, try clicking on the Refresh button on the Manage Sources page

Goodtables will then validate all tabular data files (CSV, XLS, XLSX, ODS) and data packages (opens new window) in the repository. These validations will be executed on every change, including pull requests.

# Validate tabular data automatically on Amazon S3

If your data is hosted on Amazon S3, you can use goodtables.io (opens new window) to automatically validate it on every change.

It is a technical process to set up, as you need to know how to configure your Amazon S3 bucket. However, once it’s configured, the validations happen automatically on any tabular data created or updated. Find the detailed instructions here (opens new window).

# Custom setup of automatic tabular data validation

If you don’t use any of the officially supported data publishing platforms, you can use goodtables-py (opens new window) directly to validate your data. This is the most flexible option, as you can configure exactly when, and how your tabular data is validated. For example, if your data come from an external source, you could validate it once before you process it (so you catch errors in the source data), and once after cleaning, just before you publish it, so you catch errors introduced by your data processing.

The instructions on how to do this are technical, and can be found on https://github.com/frictionlessdata/goodtables-py (opens new window).

- + diff --git a/blog/2018/03/12/data-publication-workflow-example/index.html b/blog/2018/03/12/data-publication-workflow-example/index.html index 9c02c6601..a0f1e30dc 100644 --- a/blog/2018/03/12/data-publication-workflow-example/index.html +++ b/blog/2018/03/12/data-publication-workflow-example/index.html @@ -38,7 +38,7 @@ - + @@ -120,6 +120,6 @@ atomic mass is 26.9815386, it does ensure you that all atomic mass values are
numbers, among the other validations.

Now that we’ve created a data package, described our data with a table schema,
and validated it, we can finally publish it.

# Step 3. Publish the data

Our final step is to publish the dataset. The specifics instructions will vary depend on where you’re publishing to. In this example, we’ll see how to publish to a public CKAN (opens new window) instance, the Datahub (opens new window). If you want to use it and don’t have an account yet, you can request one via our community page (opens new window). (Note: this example is now out of date. See the CKAN docs (opens new window) for more updated information). Let’s start.

After you’re logged in, go to the datasets list page (opens new window) and click on the Import Data Package button. On this form, click on “Upload”, select the datapackage.json file we created in the previous step, and choose your organisation. We’ll keep the visibility as private for now, so we can review the dataset before it’s made public.

Importing a data packate to the DataHub
Importing a data packate to the DataHub

If you don’t see the “Import Data Package” button in your CKAN instance, install the ckanext-datapackager (opens new window) extension to add support for importing and exporting your datasets as data packages.

You will be redirected to the newly created dataset on CKAN, with its metadata and resource extracted from the data package. Double check if everything seems fine, and when you’re finished, click on the “Manage” button and change the visibility to “Public”.

Data package in CKAN (opens new window)

That’s it! CKAN supports data packages via the ckanext-datapackager (opens new window) extension, so importing (and exporting) data packages is trivial, as all the work on describing the dataset was done while creating the data package.

- + diff --git a/blog/2018/03/27/applying-licenses/index.html b/blog/2018/03/27/applying-licenses/index.html index fa7417a35..a0dead780 100644 --- a/blog/2018/03/27/applying-licenses/index.html +++ b/blog/2018/03/27/applying-licenses/index.html @@ -33,7 +33,7 @@ - + @@ -172,6 +172,6 @@ "title": "Creative Commons Attribution Share-Alike 4.0" }]

# License may become legally binding

The specification (opens new window) for licenses states:

This property is not legally binding and does not guarantee the package is licensed under the terms defined in this property.

A data package may be uploaded to a data platform and the licenses applied to the data resources may be publicly displayed. This may make, or give the perception that, the license is legally binding. Please check your specific situation before publishing the data.

# Software may not fully support the Frictionless Data specification

Be aware that some data platforms or software may not fully support the Frictionless Data specification. This may result in license information being lost or other issues. Always test your data publication to ensure you communicate the correct license information.

For example, at the time of writing:

- + diff --git a/blog/2018/04/04/creating-tabular-data-packages-in-javascript/index.html b/blog/2018/04/04/creating-tabular-data-packages-in-javascript/index.html index fd4149ead..038d26487 100644 --- a/blog/2018/04/04/creating-tabular-data-packages-in-javascript/index.html +++ b/blog/2018/04/04/creating-tabular-data-packages-in-javascript/index.html @@ -30,7 +30,7 @@ - + @@ -107,6 +107,6 @@

This tutorial will show you how to install the JavaScript libraries for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.

# Setup

For this tutorial we will need datapackage-js (opens new window) which is a JavaScript library for working with Data Packages.

Using Node Package Manager (npm), install the latest version of datapackage-js by entering the following into your command line:

npm install datapackage@latest
 

Run the datapackage --help command to find out all options available to you.

# Creating a package

The basic building block of a data package is the datapackage.json file. It contains the schema and metadata of your data collections.

Now that the node package for working with data packages has been installed, create a directory for your project, and use the command datapackage infer path/to/file.csv to generate a schema for your dataset. To save this file in the directory for editing and sharing, simply append > datapackage.json to the command above, like so:

datapackage infer path/to/file.csv > datapackage.json
 

This creates a datapackage.json file in this directory.

# Publishing

Now that you have created your Data Package, you might want to publish your data online so that you can share it with others.

- + diff --git a/blog/2018/04/05/joining-tabular-data-in-python/index.html b/blog/2018/04/05/joining-tabular-data-in-python/index.html index 017214b35..cf15ae19f 100644 --- a/blog/2018/04/05/joining-tabular-data-in-python/index.html +++ b/blog/2018/04/05/joining-tabular-data-in-python/index.html @@ -30,7 +30,7 @@ - + @@ -141,6 +141,6 @@ f.write(dp.to_json()) # dp.save("real_gdp.zip")
- + diff --git a/blog/2018/04/06/joining-data-in-python/index.html b/blog/2018/04/06/joining-data-in-python/index.html index 516204115..179caaefd 100644 --- a/blog/2018/04/06/joining-data-in-python/index.html +++ b/blog/2018/04/06/joining-data-in-python/index.html @@ -33,7 +33,7 @@ - + @@ -150,6 +150,6 @@ new_dp.commit() new_dp.save('datapackage.zip')

We can now quickly render this GeoJSON file into a chloropleth map (opens new window) using QGIS (opens new window):

GDP Map Example

Or we can rely on GitHub to render our GeoJSON for us. When you click a country, it’s property list will show up featuring “ADMIN”, “ISO_A3”, and the newly added “GDP (2014)” property.

- + diff --git a/blog/2018/04/28/using-data-packages-in-java/index.html b/blog/2018/04/28/using-data-packages-in-java/index.html index ecf06bad9..787fe9b14 100644 --- a/blog/2018/04/28/using-data-packages-in-java/index.html +++ b/blog/2018/04/28/using-data-packages-in-java/index.html @@ -33,7 +33,7 @@ - + @@ -155,6 +155,6 @@ $ mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V $ mvn test -B

Make sure that all tests pass, and submit a PR with your contributions once you’re ready.

We also welcome your feedback and questions via our Frictionless Data Gitter chat (opens new window) or via GitHub issues (opens new window) on the datapackage-java repository.

- + diff --git a/blog/2018/05/07/using-data-packages-in-clojure/index.html b/blog/2018/05/07/using-data-packages-in-clojure/index.html index 2d0d3cde2..8c9e531f7 100644 --- a/blog/2018/05/07/using-data-packages-in-clojure/index.html +++ b/blog/2018/05/07/using-data-packages-in-clojure/index.html @@ -33,7 +33,7 @@ - + @@ -151,6 +151,6 @@ {::number 3 ::symbol "Li" ::name "Lithium" ::mass 6.941 ::metal "alkali gas"} {::number 4 ::symbol "Be" ::name "Beryllium" ::mass 9.012182 ::metal "alkaline earth metal"}

This concludes our simple tutorial for using the Clojure libraries for Frictionless Data.

We welcome your feedback and questions via our Frictionless Data Gitter chat (opens new window) or via GitHub issues (opens new window) on the datapackage-clj (opens new window) repository.

- + diff --git a/blog/2018/07/09/csv/index.html b/blog/2018/07/09/csv/index.html index 0072c85b2..3caf34fb6 100644 --- a/blog/2018/07/09/csv/index.html +++ b/blog/2018/07/09/csv/index.html @@ -30,7 +30,7 @@ - + @@ -158,6 +158,6 @@

Credit for these fixups to contributors on this question on
StackExchange (opens new window)

and to James Smith (opens new window).

- + diff --git a/blog/2018/07/09/developer-guide/index.html b/blog/2018/07/09/developer-guide/index.html index 91aa3b872..8fa26566d 100644 --- a/blog/2018/07/09/developer-guide/index.html +++ b/blog/2018/07/09/developer-guide/index.html @@ -30,7 +30,7 @@ - + @@ -149,6 +149,6 @@ # import into Google BigQuery import_datapackage_into_bigquery(pathToDataPackage, bigQueryInfo)

# Examples

# Python

The main Python library for working with Data Packages is datapackage:

See http://github.com/frictionlessdata/datapackage-py (opens new window)

Additional functionality such as TS and TS integration:

tabulator is a utility library that provides a consistent interface for reading tabular data:

https://github.com/frictionlessdata/tabulator-py (opens new window)

Here’s an overview of the Python libraries available and how they fit together:

how the different tableschema libraries in python fit together
how the different tableschema libraries in python fit together

# Javascript

Following “Node” style we have partitioned the Javascript library into pieces, see this list of libraries:

# SQL Integration

Here’s a walk-through (opens new window) of the SQL integration for Table Schema (opens new window) written in python. This integration allows you to generate SQL tables, load and extract data based on Table Schema (opens new window) descriptors.

Related blog post: http://okfnlabs.org/blog/2017/10/05/frictionless-data-specs-v1-updates.html (opens new window)

- + diff --git a/blog/2018/07/09/validating-data/index.html b/blog/2018/07/09/validating-data/index.html index 6a07c3900..cb4a068f2 100644 --- a/blog/2018/07/09/validating-data/index.html +++ b/blog/2018/07/09/validating-data/index.html @@ -30,7 +30,7 @@ - + @@ -115,6 +115,6 @@ report = validate('data.csv', schema='schema.json', order_fields=True) ...

# Continuous Data Validation

In a bid to streamline the process of data validation and ensure seamless integration is possible in different publishing workflows, we have set up a continuous data validation hosted service that builds on top of Frictionless Data libraries. goodtables.io (opens new window) provides support for different backends. At this time, users can use it to check any datasets hosted on GitHub and Amazon S3 buckets, automatically running validation against data files every time they are updated, and providing a user friendly report of any issues found.

Data Valid

Start your continuous data validation here: https://goodtables.io (opens new window)

Blog post on goodtables python library and goodtables web service: http://okfnlabs.org/blog/2017/05/22/introducing-the-new-goodtables-library-and-goodtablesio.html (opens new window)

See the README.md for more information.

Find more examples on validating tabular data in the Frictionless Data Field Guide

- + diff --git a/blog/2018/07/16/oleg-lavrovsky/index.html b/blog/2018/07/16/oleg-lavrovsky/index.html index ea514145a..7fe551185 100644 --- a/blog/2018/07/16/oleg-lavrovsky/index.html +++ b/blog/2018/07/16/oleg-lavrovsky/index.html @@ -35,7 +35,7 @@ - + @@ -108,6 +108,6 @@ (opens new window)

Tool Fund Grantee: Oleg Lavrovsky

Price icons created by Pixel perfect - Flaticon

This grantee profile features Oleg Lavrovsky for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

We are digital natives, dazzled by the boundless information and cultural resources of electronic networks, tuned in to a life on - and offline, dimly aware of all kinds of borders being rewritten. I was born in the Soviet Union and grew up in Canada, immersed in the wonders of creative code on Apple II and DOS-era personal computers, doing fun things in programming environments from BASIC (opens new window) to C++/C#/.NET (hey @ooswald (opens new window)!) to Perl (hey @virtualsue (opens new window)!) to Java (hey @timcolson (opens new window)!) to JavaScript (hey @jermolene (opens new window)!) to Python (hey @gasman (opens new window)!), all of which find some use in the freelance work I now do based in my adoptive home of Switzerland - a country of plurality (opens new window).

Over the years, I have tried other languages like Clojure and Pascal, Groovy and Go, Erlang and Haskell, Scala and R, even ARM C/C++ and x86 assembly. Some have stuck in my dev chain, others have not. As far as possible, I hope to keep a beginner’s mind open to new paradigms, a solid craft of working on code and data with care, and the wisdom to avoid jumping off every tempting new thing on the horizon.

I first came across tendrils of Open Knowledge ten years ago while living in Oxford, a vibrant community of thinkers and civic reformers. After we started a hackspace (opens new window), I got more involved in extracurricular open source activities, joined barcamps and hackathons, started contributing to projects. I started to see so-called ‘big IT’ or ‘enterprise software’ challenges to be, on many levels, problems of incompatible or intractable data standards. It was in the U.K. that I also discovered civic tech and open data activism.

Helping to start a Swiss Open Knowledge chapter (opens new window) presented me with the opportunity to be involved in an ambitious and exciting techno-political movement, and to learn from some of the most deeply ethical and forward-thinking people in Information Technology. Running the School of Data (opens new window) working group and supporting many projects in the Swiss Opendata.ch (opens new window) association and international network is today no longer just a weekend activity: it is my master branch.

I first heard the term frictionless from a philosopher (opens new window) who warned of a world where IT removes friction to the point where we live anywhere, and do anything, at the cost of social alienation - and, along with it, grave consequences to our well-being. There are parallels here to “closed datasets”, which may well be padlocked for a reason. Throwing them into the wind may deprive them of the nurturing care of the original owners. The open data community offers them a softer landing.

Some of the conversations that led to Frictionless Data took place at OKCon 2013 (opens new window) in Geneva, where I was busy mining the Law (opens new window). Max Ogden mentioned related ideas in his talk (opens new window) there on Dat Project (opens new window). It later became a regular topic in the Open Knowledge Labs hangouts (opens new window) and elsewhere. My first impression was mixed: I liked the idea in principle, but found it hard to foresee what the standardization process could accomplish. It took me a couple of years to catch up, gain experience in putting the Open Definition (opens new window) to use, struggle with some of the fundamental issues myself - just to wholly accept the idea of an open data ecosystem.

Working with more unwieldy data as well as having an interest in Data Science, and the great vibe of a growing community all led me to test the waters with the Julia language (opens new window). I quickly became a fan, and started looking for ways to include it in my workflow. Thanks to the collaboration enabled by the Frictionless Data Tool Fund, I will now be able to focus on this goal and start connecting the dots more quickly. More bridges need to be built to help open data users use Julia’s computing environment, and Julia users could use sturdier access to open data.

There are two high level use cases which I think are particularly interesting when it comes to Frictionless Data: strongly typed and easy to validate dataset schema leading to a “light” version of semantic interoperability, helping data analysts, developers, even automated agents, to see at a glance how compatible datasets might be. Take a look at dataship, open power system data and other case studies at Frictionlessdata.io for examples. The other is the pipelines approach which, as a feature of Unix and other OS (opens new window) is the basis for an incredibly powerful system building tool, now laying the foundation of a rich and reliable world of shared data (opens new window).

At a more practical level, I have been using Data Packages to publish data for hackathons (opens new window), School of Data workshops (opens new window) and other activities in my Open Knowledge chapter, and regularly explaining the concepts and training people to use Frictionless Data tools in the Open Data module I teach at the Bern University of Applied Sciences (opens new window). I have built support for them into Dribdat (opens new window), a tool we use for connecting the dots between people, code and data.

Over the years, I have made small contributions to OKI’s codebases on projects like CKAN (opens new window). Contributing to the Frictionless Data project clears the way to the frontlines of development: putting better tools in users’ hands, committing directly to the needs of the community, setting an elevated expectation of responsibility and quality. That said, I am a novice in Julia. But my initial ambition is modest: make a working set of tools, produce a stable v1.0 specification (opens new window) release. Run tests, get reviewed, interact with the community, and iterate. This project will be a learning process, and my intention is to widen the goalposts as much as I can for others to follow.

The Julia language also needs to be better known, so I will start threads on the OKI forums (opens new window), at the School of Data (opens new window), in technical and academic circles. I am likewise really looking forward to representing Frictionless Data in the diverse and wide-ranging Julia community (opens new window), sharing whatever questions and needs arise both ways. The specifications, libraries and tools will help to preserve key information on widely used datasets, foster a more in-depth technical discussion between everyone involved in data sharing, and open the door to more critical feedback loops between creators, publishers and users of open data.

I will be developing the datapackage-jl (opens new window) and tableschema-jl (opens new window) libraries on GitHub, and you can follow me on GitHub (opens new window) to see how this develops and read stories about putting Frictionless Data libraries to use. Please feel free to write me a note (opens new window), send in your use case, respond to anything I’m working on or writing about, share a tricky dataset or any other kind of challenge - and let’s chat (opens new window)!

- + diff --git a/blog/2018/07/16/ori-hoch/index.html b/blog/2018/07/16/ori-hoch/index.html index 734fc6ec7..be32b86f7 100644 --- a/blog/2018/07/16/ori-hoch/index.html +++ b/blog/2018/07/16/ori-hoch/index.html @@ -35,7 +35,7 @@ - + @@ -108,6 +108,6 @@ (opens new window)

Tool Fund Grantee: Ori Hoch

Price icons created by Pixel perfect - Flaticon

This grantee profile features Ori Hoch for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

My name is Ori Hoch, I am 35 years old, living in Israel and married with 2 children. I recently took my family to Midburn - the Israeli regional Burning Man event where I juggled some fire clubs in the main burn ceremony. Through the Tool Fund, I am working on implementing the PHP libraries for Frictionless Data. I am also working on several other open source data projects: Open Knesset (opens new window), Open Budget (opens new window), Beit Hatfutsot (opens new window) - all projects are open source and fully transparent - both the code and the development process - which I think is a great way to work. I’m also very interested in community and teamwork - how to get a group of people working together on a common goal, a hard task in normal scenarios which grows even more complex when dealing with volunteers / open source contributors. Of course, besides all the philosophical ideals I’m also a hard-core technologist who loves diving into complex problems, finding and implementing the right solution.

I first heard about the Frictionless Data ecosystem from my activity in The Public Knowledge Workshop (opens new window) where I worked with Adam Kariv and Paul Walsh. I have a lot of experience working with data, and know many of the common problems and pitfalls. One of the major obstacles is interoperability between different data sources. Having the core Frictionless Data libraries available in different languages will allow for easier interoperability and integrations between sources.

At Beit Hatfutsot (opens new window) (The Museum of The Jewish People), we aggregate data from many sources, including some data from PHP frameworks such as MediaWiki / Wordpress. At the moment we ask developers of the external data sources to create a datapackage for us, based on a given schema. Frictionless Data libraries for PHP will make this much easier for people to do, and will have a huge effect in reducing errors.

In addition to interoperability, the Frictionless Data specifications and software are based on the combined experience of many individuals working on a variety of data projects. Anyone using the libraries and tools will benefit from these experiences and will avoid problems and pitfalls which other people encountered in the past.

I welcome PHP enthusiasts to join in the development effort of the tableschema (opens new window) and datapackage (opens new window) libraries which I am currently working on. Both repositories follow standard GitHub development flow using Issues, Pull Requests, Releases et al. Check the README and CONTRIBUTING files in the repositories above for more details and reach out to me or the rest of the Frictionless Data developer community on the active Gitter channel (opens new window).

I would also love to have PHP developers use the core libraries to write some more high-level tools. For example - consider an organization which has some data in their Wordpress / Drupal installation which they would like to publish or use with Frictionless Data compatible tools. Without a compatible plugin for their framework it would require them to either write some custom code or create the datapackage manually - both options are time consuming and error prone. If they had a ready to use plugin for their framework which publishes a compliant datapackage - it will greatly simplify the process and ensure interoperability.

With the availability of the PHP libraries for Frictionless Data the task of developing such plugins will be greatly simplified. The libraries handle all the work of creating / loading datapackages and ensuring they conforms to the specs. Allowing the developer to focus on the plugin logic.

Additional possibilities for leveraging the PHP libraries:

  • Import plugins - for loading datapackages into a data store
  • visualization tools to allow people to view and analyze data packages from PHP based code
  • Integration of existing Frictionless Data to be available for use from PHP, for example the datapackage-pipelines (opens new window) framework

Finally, I would like to thank Open Knowledge International and The Sloan foundation for the opportunity to work on the forefront of the open data eco-system. I think that the tools we are developing now will have tremendous effects on how we manage and use data in the future and we have not yet seen all the possible benefits and outcomes from this work.

- + diff --git a/blog/2018/07/16/point-location-data/index.html b/blog/2018/07/16/point-location-data/index.html index 23dba9799..d027c6ba4 100644 --- a/blog/2018/07/16/point-location-data/index.html +++ b/blog/2018/07/16/point-location-data/index.html @@ -30,7 +30,7 @@ - + @@ -251,6 +251,6 @@ ] }

** Thoughts **

# Frictionless data

# World Wide Web Consortium (W3C)

These documents advise on best practices related to the publication of data and spatial data on the web.

# Australian Government - CSV GEO AU

csv-geo-au (opens new window) is a specification for publishing point or region-mapped Australian geospatial data in CSV format to data.gov.au (opens new window) and other open data portals.

# IETF - GeoJSON

GeoJSON (opens new window) is a geospatial data interchange format based on JavaScript Object Notation (JSON).

# OGC - Simple Feature Access

The Open Geospatial Consortium - OpenGIS Simple Feature Access (opens new window) is also called ISO 19125. It provides a model for geometric objects associated with a Spatial Reference System.

Recommended reading: We recently commissioned research work to determine how necessary and useful it would be to create a Geo Data Package as a core Frictionless Data offering. Follow the discussions here on Discuss (opens new window) and read the final report into the spatial data package investigation by Steve Bennett (opens new window). Examples following the recommendations in this research will be added in due course.

- + diff --git a/blog/2018/07/16/publish-data-as-data-packages/index.html b/blog/2018/07/16/publish-data-as-data-packages/index.html index 97b921f04..67bd98788 100644 --- a/blog/2018/07/16/publish-data-as-data-packages/index.html +++ b/blog/2018/07/16/publish-data-as-data-packages/index.html @@ -30,7 +30,7 @@ - + @@ -109,6 +109,6 @@
  1. The datapackage.json is a small file in JSON format that gives a bit of information about your dataset. You’ll need to create this file and then place it in the directory you created.
  2. Don’t worry if you don’t know what JSON is - we provide some tools that can automatically create your this file for you.
  3. There are 2 options for creating the datapackage.json:
    1. Use the Data Package Creator (opens new window)) tool
      1. Just answer a few questions and give it your data files and it will spit out a datapackage.json for you to include in your project
    2. Use the Python (opens new window), JavaScript (opens new window), PHP (opens new window), Julia (opens new window), R (opens new window), Clojure (opens new window), Java (opens new window), Ruby (opens new window) or Go (opens new window) libraries for working with data packages.

Recommended reading: Find out how to use Frictionless Data software to improve your data publishing workflow in our new and comprehensive Frictionless Data Field Guide.

- + diff --git a/blog/2018/07/16/validated-tabular-data/index.html b/blog/2018/07/16/validated-tabular-data/index.html index 70ed6cccd..3e4a436e3 100644 --- a/blog/2018/07/16/validated-tabular-data/index.html +++ b/blog/2018/07/16/validated-tabular-data/index.html @@ -38,7 +38,7 @@ - + @@ -119,6 +119,6 @@

Errors in data are not uncommon. They also often get in the way of quick and timely data analysis for many data users. What if there was a way to quickly identify errors in your data to accelerate the process by which you fix them before sharing your data or using it for analysis?

In this section, we will learn how to carry out one-time data validation using

Our working assumption is that you already know what a data schema and a data package are, and how to create them. If not, start here.

# One-time data validation with try.goodtables.io (opens new window)

Now that you have your data package you may want to check it for errors. We refer to this process as data validation. Raw data is often ‘messy’ or ‘dirty’, which means it contains errors and irrelevant bits that make it inaccurate and difficult to quickly analyse and draw insight from existing datasets. Goodtables exists to identify structural and content errors in your tabular data so they can be fixed quickly. As with other tools mentioned in this field guide, goodtables aims to help data publishers improve the quality of their data before the data is shared elsewhere and used for analysis, or archived.

Types of errors identified in the validation process

Here are some of the errors that try.goodtables.io (opens new window) highlights. A more exhaustive list is available here (opens new window).

Structural Errors
blank-header There is a blank header name. All cells in the header row must have a value.
duplicate-header There are multiple columns with the same name. All column names must be unique.
blank-row Rows must have at least one non-blank cell.
duplicate-row Rows can’t be duplicated.
extra-value A row has more columns than the header.
missing-value A row has less columns than the header.
Content Errors
schema-error Schema is not valid.
non-matching-header The header’s name in the schema is different from what’s in the data.
extra-header The data contains a header not defined in the schema.
missing-header The data doesn’t contain a header defined in the schema.
type-or-format-error The value can’t be cast based on the schema type and format for this field.

Load tabular data for one-time validation

You can add a dataset for one-time validation on try.goodtables.io (opens new window) in two ways:

  • If your tabular data is publicly available online, obtain a link to the tabular data you would like to validate and paste it in the {Source} section.
  • Alternatively, Click on the Upload file prompt in the {Source} section to load a tabular dataset from your local machine

Validating data without a schema

In this section we will illustrate how to check tabular data for structural errors on try.goodtables.io (opens new window) where a data schema is not available. For this tutorial we will use a sample CSV file with errors (opens new window).

Copy and paste the file’s URL to the {Source} input. When you click on the {Validate} button, try.goodtables.io (opens new window) presents an exhaustive list of structural errors in your dataset.

Add dataset link in the Source field, or select the Upload file option
Figure 1: Add dataset link in the Source field, or select the Upload file option.

If needed, you can disable two types of validation checks:

  • Ignore blank rows
    Use this checkbox to indicate whether blank rows should be considered as errors, or simply ignored. Check this option if missing data is a known issue that cannot be fixed immediately i.e. if you are not the owner/publisher of the data.

  • Ignore duplicate rows
    Use this checkbox to indicate whether duplicate rows should be considered as errors, or simply ignored.

We will leave all boxes unchecked for our example. On validate, we receive a list of 12 errors as we can see in figure 7 below.

dataset errors outlined on try.goodtables.io
Figure 2: dataset errors outlined on try.goodtables.io (opens new window).

try.goodtables.io (opens new window) points us to specific cells containing errors so they can be fixed easily. We can use this list as a guide to fix all errors in our data manually, and run a second validation test to confirm that all issues are resolved. If there no validations could be found, the ensuing message will be as in figure 8 below:

valid data message on goodtables.io
Figure 3: valid data message on goodtables.io (opens new window).

Improving data quality is an iterative process that should involve data publishers and maintainers. Tools such as try.goodtables.io (opens new window) allow you to focus on complex errors like if the presented data is correct, instead of wasting time with simple (but very common) errors like incorrect date formats.

Validating tabular data with a schema

A data schema contains information on the structure of your tabular data. Providing a data schema as part of the validation process on try.goodtables.io (opens new window) makes it possible to check your dataset for content errors. For example, a schema contains information on fields and their assigned data types, making it possible to highlight misplaced data i.e. text in an amounts column where numeric data is expected. If you haven’t yet, learn how to create a data schema for your data collection before continuing with this section.

To test how this works, you can use:

In any given Data Package, the datapackage.json file contains the schema and the data folder contains tabular data to be validated against the schema.

Often, you will find that you may be working in workflows that involve many datasets, which are updated regularly. In cases such as this, one-time validation on try.goodtables.io (opens new window) is probably not the answer. But fear not! Goodtables has the ability to automate the validation process so that errors are checked for continually. Find out more in our continuous and automated data validation section.

# One-time data validation with goodtables command line tool

The same validations that we’ve done on try.goodtables.io (opens new window), can also be done in your local machine using goodtables. This is especially useful for big datasets, or if your data is not publicly accessible online. However, this is a slightly technical task, which requires basic knowledge of the command line (CLI). If you don’t know how to use the CLI, or are a bit rusty, we recommend you to read the Introduction to the command-line tutorial (opens new window) before proceeding.

For this section, you will need:

Once Python is set up, open your Terminal and install goodtables using the package manager, PIP. The command pip install goodtables.

installing goodtables command-line tool with pip in Terminal
Figure 4: installing goodtables command-line tool with pip in Terminal.

To validate a data file, type goodtables followed by the path to your file i.e. goodtables path/to/file.csv. You can pass multiple file paths one after the other, or even the path to a datapackage.json file.

For our first example, we will download and check this simple location CSV data file (opens new window) for errors. In the second instance, we will validate this Department of Data Expenses dataset, that contains errors (opens new window).

Validating data files using goodtables in Terminal
Figure 5: Validating data files using goodtables in Terminal.

You can see the list of options by running goodtables --help. The full documentation, including the list of validation checks that can be run, is available on the goodtables-py repository on GitHub (opens new window).

Congratulations, you now know how to validate your tabular data using the command-line!

If you regularly update your data or maintain many different datasets, running the validations manually can be time-consuming. The solution is to automate this process, so the data is validated every time it changes, ensuring the errors are caught as soon as possible. Find out how to do it in the “Automating the validation checks” section.

- + diff --git a/blog/2018/07/16/visible-findable-shareable-data/index.html b/blog/2018/07/16/visible-findable-shareable-data/index.html index 51b000035..74474e4bb 100644 --- a/blog/2018/07/16/visible-findable-shareable-data/index.html +++ b/blog/2018/07/16/visible-findable-shareable-data/index.html @@ -38,7 +38,7 @@ - + @@ -131,6 +131,6 @@
  • data/: All data files are contained in this folder. In our example, there are two: data/schools.csv and data/cities.csv.
  • docs/: Images, sample analysis, and other documentation files regarding the dataset. The main documentation is in README.md, but in this folder you can add any images used in the README, and other writings about the dataset.
  • scripts/: All scripts are contained in this folder. There could be scripts to scrape the data, join different files, clean them, etc. Depending on the programming language you use, you might also add requirements files like requirements.txt for Python, or package.json for NodeJS.
  • Makefile: The scripts are only part of the puzzle, we also need to know how to run them. In which order they should be executed, which one should I run to update the data, and so on. You could document this information textually in the README.md file, but the Makefile allows you to have executable documentation. You can think of it as a script to run the scripts. If you have never written a Makefile, read Why Use Make (opens new window).
  • datapackage.json: This file describes the dataset’s metadata. For example, what is the dataset, where are its files, what they contain, what each column means (for tabular data), what’s the source, license, and authors, and so on. As it’s a machine-readable specification, other software can import and validate your files. See HOW TO CREATE A DATA PACKAGE on instructions on writing this file.
  • README.md (opens new window): This is where the dataset is described for humans. We recommend the following sections:
    • Introduction: A short description of the dataset, what it contains, the time or geographical area it covers
    • Data: What the data structure? Does it use any codes? How do you define missing values (e.g. ‘N/A’ or ‘-1’)
    • Preparation: How was the data collected? How do I update the data? Was it modified in any way? If you have a Makefile, this section will mostly document how to run it. Otherwise you can describe how to run the scripts, or how to collect the data manually.
    • License: There are two issues here: the license of the data itself, and the license of the package you are creating (including any scripts). Our recommendation is to license the package you created as CC0 (opens new window), and add any relevant information or disclaimers regarding the source data’s license.

To summarize, these are the folders, files, and their respective contents in this structure:

Path Type Contents
data/ Data Dataset’s data files.
docs/ Documentation Images, analysis, and other documentation files.
scripts/ Scripts Scripts used for creating, modifying, or analysing the dataset.
Makefile Scripts Executable documentation on how to run the scripts.
datapackage.json Metadata Data Package descriptor file.
README.md (opens new window) Documentation Textual description of the dataset with description, preparation steps, license, etc.

** Step 2. Upload the dataset to GitHub **

  1. Login (or create) a new account on GitHub
  2. Create a new repository (opens new window)
    • Write a short description about the dataset
  3. On your repository page, click on the “Upload files” link
  4. Upload the files you created in the previous step

** (Optional) Step 3. Enable automatic tabular data validation **

You can automatically validate your tabular data files using goodtables.io (opens new window). This will take only a few minutes, and will ensure you’ll always know when there are errors with your dataset, maintaining its quality. Read the walkthrough here.

The sample datasets used in this example, that is, List of schools in Birmingham, UK are available in this repository (opens new window).

- + diff --git a/blog/2018/07/20/nimblelearn/index.html b/blog/2018/07/20/nimblelearn/index.html index c35d8799c..8acee90ce 100644 --- a/blog/2018/07/20/nimblelearn/index.html +++ b/blog/2018/07/20/nimblelearn/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

Nimble Learn - Data Package M (datapackage-m)

Data Package M (opens new window), also known as datapackage-m, is a set of functions written in Power Query M (opens new window) for working with Tabular Data Packages in Power BI Desktop (opens new window) and Power Query for Excel (opens new window) (also known as ‘Get & Transform Data’ in Excel 2016).

datapackage-m makes use of the Data Package, Data Resource, Tabular Data Package, Tabular Data Resource, and Table Schema specifications, enabling you to go from data to insight in Power BI and Excel, faster.

In 2014, while searching the web for high quality open data, we stumbled across the Frictionless Data project. On learning about Data Packages, we spent some time getting acquainted with the specs and began to use Tabular Data Packages for some internal projects. datapackage-m then started off as an internal tool at Nimble Learn for working with Tabular Data Packages.


How datapackage-m works in Power BI

datapackage-m now implements v1 of the Frictionless Data specs (opens new window) from a Tabular Data Package consumption perspective. By implementing a broad number of the specs, datapackage-m is able to extract the tables from most Tabular Data Packages (opens new window), or Data Packages with tabular resources, in seconds. These tables can be quickly loaded into a Power BI Data Model or an Excel Worksheet (or Data Model), ready for you to analyse. datapackage-m currently handles Gzip compressed resources and we’re looking into support for Zip. We have successfully tested datapackage-m with several Data Packages from Datahub (opens new window) and the Frictionless Data Example Data Packages (opens new window) GitHub repository.

In working with data, there are often many repetitive tasks required to get data into a state that can be analysed. Even when the requirement is just to profile and assess whether a new dataset is suitable for a given use case, a lot of time can be wasted getting it into good tabular shape. Data Packages are designed to alleviate this issue, and datapackage-m makes them available for use in Power BI and Excel.

We find that the Frictionless Data specs are simple to use from both a data publisher and data consumer perspective. We’ve seen a great number of other specifications that are feature-rich but too verbose. In contrast to these, the Frictionless Data specs are minimalist and support use cases where Data Packages are created using one’s favourite text editor.


How datapackage-m works in Excel

There’s an ongoing discussion around a Data Resource compression pattern which is important from a data publishing perspective i.e. due to ongoing file storage and bandwidth costs. Once this pattern is agreed upon and published, it would be good to see this added to the Data Resource (opens new window) and Tabular Data Resource (opens new window) specs not too long after.

Other than this, we would like to see another Data Package profile that extends the Tabular Data Package with semantic layer metadata. In addition to the Tabular Data Profile properties, this ‘Semantic Data Package’ would have properties for measure definitions, attribute hierarchies, and other semantic layer metadata. Something like this could be used to programmatically generate Semantic Data Models (opens new window) in a data analytics tool of choice and populate it with data from the tabular data directly.

There are many existing use cases for Tabular Data Packages (opens new window), and we see ‘Subject Area’ Tabular Data Packages as a significant additional use case that is worth exploring . By ‘Subject Area’, we mean a Tabular Data Package that combines relevant Tabular Data Resources from other high quality Tabular Data Packages. This would help to reduce the time spent seeking out related/relevant data for a given area of analysis and could save researchers tonnes of time, for example.

In addition to datapackage-m, Nimble Learn is working on a public-facing project that is focused on publishing pre-integrated open data from various sources as subject area Tabular Data Packages. In addition to this we plan on extending datapackage-m to adopt more Frictionless Data specifications. Keep an eye out for all these updates on GitHub (opens new window).

- + diff --git a/blog/2019/03/01/datacurator/index.html b/blog/2019/03/01/datacurator/index.html index 26807c03f..8b9452690 100644 --- a/blog/2019/03/01/datacurator/index.html +++ b/blog/2019/03/01/datacurator/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

# Data Curator - share usable open data

Open data producers are increasingly focusing on improving open data so it can be easily used to create insight and drive positive change.

Open data is more likely to be used if data consumers can:

  • understand the structure of the data
  • understand the quality of the data
  • understand why and how the data was collected
  • look up the meaning of codes used in the data
  • access the data in an open machine-readable format
  • know how the data is licensed and how it can be reused

Data Curator enables open data producers to define all this information using their desktop computer, prior to publishing it on the Internet.

Data Curator uses the Frictionless Data specification (opens new window) and software to package the data and supporting information in a Tabular Data Package (opens new window).

Data Curator screenshot

# Using Data Curator

Here’s how to use Data Curator to share usable open data in a data package:

  1. Download Data Curator (opens new window) for Windows or macOS
  2. In Data Curator, either:
    • create some data
    • open an Excel sheet
    • open a separated value file (e.g. CSV, TSV)
  3. Follow the steps below…

# Describe the data

The Frictionless Data specification allows you to describe tabular data using a Table Schema (opens new window). A Table Schema allows each field in the data to be given:

  • a name, title and description
  • a data type (e.g. string, integer) and format (e.g. uri, email)
  • one or more constraints (e.g. required, unique) to limit data values and improve data validation

The Table Schema also allows you to describe the characters used to represent missing values (e.g. n/a, tba), primary keys, and foreign key relationships.

After adding data in Data Curator, to create a Table Schema:

  • Give your data a header row, if it doesn’t have one
  • Set the header row to give each field a name
  • Guess column properties to give each field a type and format
  • Set column properties to improve the data type and format guesses, and add a title, description and constraints
  • Set table properties to give the table a name, define missing values, a primary key, and foreign keys.

# Validate the data

Using Data Curator, you can validate if the data complies with the field’s type, format and contraints. Errors found can be filtered in different ways so you can correct errors by row, by column or by error type.

In some cases data errors cannot be corrected, as they should be corrected in the source system and not as part of the data packaging process. If you’re happy to publish the data with errors, the error messages can be appended to the provenance information.

# Provide context

Data Curator lets you add provenance information to help people understand why and how the data was collected and determine if it is fit for their purpose.

Provenance information can be entered using Markdown (opens new window). You can preview the Markdown formatting in Data Curator.

Add provenance information screenshot

You should follow the Readme FAQ when writing provenance information or, even easier, cut and paste from this sample (opens new window).

# Explain the meaning of codes

Data Curator supports foreign key relationships between data. Often a set of codes is used in a column of data and the list of valid codes and their description is in another table. The Frictionless Data specification enables linking this data within a table or across two tables in the same data package.

We’ve implemented the Foreign Keys to Data Packages pattern (opens new window) so you can have foreign key relationships across two data packages. This is really useful if you want to share code-lists across organisations.

You can define foreign key relationships in Data Curator in the table properties and the relationships are checked when you validate the data.

# Save the data in an open format

Data Curator lets you save data as a comma, semicolon, or tab separated value file. A matching CSV Dialect (opens new window) is added to the data package.

# Apply an open license

Applying a license, waiver, or public domain mark to a data package (opens new window) and its resources (opens new window) helps people understand how they can use, modify, and share the contents of the data package.

Apply open license to data package screenshot

Although there are many ways to apply a licence, waiver or public domain mark to a data package, Data Curator only allows you to use open licences - after all, its purpose is to share usable open data.

# Export the data package

To ensure only usable open data is shared, Data Curator applies some checks before allowing a data package to be exported. These go beyond the mandatory requirements* in the Frictionless Data specification.

To export a tabular data package, it must have:

  • a header row
  • a table schema*
  • a table (resource) name*
  • a data package name*
  • provenance information
  • an open licence applied to the data package

If a data package version is used, it must follow the data package version pattern (opens new window).

Before exporting a data package you should:

  • add a title and description to each field, table and data package
  • acknowledge any data sources and contributors
  • validate the data and add any known errors to the provenance information

The data package is exported as a datapackage.zip file that contains the:

  • data files in a /data directory
  • data package, table (resource), table schema, and csv dialect properties in adatapackage.json file
  • provenance information in a README.md file

# Share the data

Share the datapackage.zip with open data consumers by publishing it on the Internet or on an open data platform. Some platforms support uploading, displaying, and downloading data packages.

Open data consumers will be able to read the data package with one of the many applications and software libraries that work with data packages, including Data Curator.

# Get Started

Download Data Curator (opens new window) for Windows or macOS and start sharing usable open data.

# Who made Data Curator?

Data Curator was made possible with funding from the Queensland Government (opens new window) and the guidance of the Open Data Policy team within the Department of Housing and Public Works. We’re grateful for the ideas and testing provided by open data champions in the Department of Environment and Science, and the Department of Transport and Main Roads.

The project was led by Stephen Gates (opens new window) from the ODI Australian Network (opens new window). Software development was coordinated by Gavin Kennedy and performed by Matt Mulholland from the Queensland Cyber Infrastructure Foundation (opens new window) (QCIF).

Data Curator uses the Frictionless Data software libraries maintained by Open Knowledge International (opens new window) and we’re extremely grateful for the support provided by the team (opens new window).

Data Curator started life as Comma Chameleon (opens new window), an experiment (opens new window) by the ODI (opens new window). The ODI and the ODI Australian Network agreed to take the software in different directions (opens new window).

- + diff --git a/blog/2019/05/20/used-and-useful-data/index.html b/blog/2019/05/20/used-and-useful-data/index.html index 693577b84..333b51514 100644 --- a/blog/2019/05/20/used-and-useful-data/index.html +++ b/blog/2019/05/20/used-and-useful-data/index.html @@ -38,7 +38,7 @@ - + @@ -150,6 +150,6 @@ The Open Knowledge discussion platform is a great place to invoke and contribute to conversation on specific subjects. Dive in (opens new window)!

  • Gitter
    Gitter is a chat platform that’s well suited for more technical discussions around open data. If you are looking to engage technical data users, consider joining our Open Knowledge Foundation channel (opens new window) or the Frictionless Data project channel (opens new window).

  • In-person meetups
    Organizing and participating in meetups, hackathons and domain-specific conferences is a good way to engage with communities.

  • Community calls, webinars and podcasts

  • Finally, to maintain an active community of data users as a data publisher:

    • Keep your datasets updated and highlight changes that might be of interest to the community. For example, if the changes are relevant to a specific data request, reach out and let the user know.
    • Have a human representative play an active role in community activities. Bots can be fun and efficient, but they are limited and can get in the way of meaningful interactions.
    • Be flexible and transparent. Listen to your community needs and respond appropriately and in timely fashion i.e. consider publishing datasets that are in high demand first, or more regularly. Archive, rather than delete datasets, but if one must be deleted, issue a forewarning and explain why.
    • Set up a sharing system to regularly showcase notable data use cases by the the community i.e. fortnightly to inspire other community members.
    - + diff --git a/blog/2019/07/02/stephan-max/index.html b/blog/2019/07/02/stephan-max/index.html index 82a9c7f99..8c12dbcf2 100644 --- a/blog/2019/07/02/stephan-max/index.html +++ b/blog/2019/07/02/stephan-max/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Tool Fund Grantee: Stephan Max

    This grantee profile features Stephan Max for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Stephan Max

    Hi, my name is Stephan Max and I am a computer scientist based in Cologne, Germany. I’ve been in the industry for over 10 years now and worked for all kinds of companies, ranging from own startup (crowd-funded online journalism), over big corporate (IBM), to established African business data startup (Asoko Insight). I am now a filter engineer at eyeo trying to make the web a fair, open, and safe place for everybody.

    I love working with kids and teenagers, cooking, and doing music—I just recently started drum lessons!

    # How did you first hear about Frictionless Data?

    I’ve been following the work of the Open Knowledge Foundation for a while now and contributed to the German branch as a mentor for the teenage hackathon weekends project “Jugend Hackt” (Youth Hacks). I first heard about the Frictionless Data program when the OKF announced funding by the Sloan Foundation in 2018. After watching Serah Njambi Rono’s talk on Youtube (https://www.youtube.com/watch?v=3Ranx9Jz0Ro (opens new window)) and reading about the Reproducible Research Tool Fund on Twitter, I knew I wanted to contribute.

    # Why did you apply for a Tool Fund grant?

    I first heard about the concepts and challenges around Reproducible Research when taking the MOOC “Data Science” from Johns Hopkins University on Coursera. Since I had my fair share of work inside proprietary data formats and tools, I was happy to see that there are people out there making serious efforts to remedy the loss of attribution and data manipulation steps. After browsing through OKF’s Frictionless Data website, I was even happier that there are actual tools, libraries, and standards already available. Applying for the tool fund and contributing my own humble idea was a no-brainer for me.

    # What specific issues are you looking to address with the Tool Fund?

    My goal is to add a Data Package import/export add-on to Google Sheets. I understand that a lot of data wrangling is still done in Sheets, Excel, and files being swapped around. A lot of information is lost that way. Where did the data initially come from? How was it manipulated, cleaned, or otherwise altered? How can we feed spreadsheets back into a Reproducible Research pipeline? I think Data Packages is a brilliant format to model and preserve exactly that information. While I do not want to lure people away from the tools they are already familiar with, I think we can bridge the gap between Google Sheets and Frictionless Data by making Data Packages a first-class citizen.

    # How can the open data, open source, community engage with the work you are doing around Frictionless Data Google Sheets add-on?

    I think open source and data is a unique and wonderful opportunity to get access to the “wisdom of the crowd” and ensure that software and information is and remains accessible to everyone. In the first few weeks I will focus on getting a first prototype and sufficient documentation up, so you can all play with the Data Package import/export add-on as soon as possible. After that, I invite you to take a look at our Github repository (https://github.com/frictionlessdata/googlesheets-datapackage-tools (opens new window)), play around with the tool, and contribute. Raising an issue, opening a pull request, improving the documentation, giving feedback on the user experience—everything counts! I am so stoked to be part of this Frictionless Data journey and can’t wait to see what we will accomplish. Thank you very much in advance!

    - + diff --git a/blog/2019/07/03/nes/index.html b/blog/2019/07/03/nes/index.html index fdc02670b..6021d95b9 100644 --- a/blog/2019/07/03/nes/index.html +++ b/blog/2019/07/03/nes/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Tool Fund Grantee: Carlos Eduardo Ribas and João Alexandre Peschanski

    This grantee profile features Carlos Eduardo Ribas and João Alexandre Peschanski from the Neuroscience Experiments System (NES) for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Carlos, João, and RIDC NeuroMat

    João Alexandre Peschanski is the Cásper Líbero (opens new window)Professor of Digital Media and Computational Journalism and the research supervisor of the dissemination team of the Research, Innovation and Dissemination Center for Neuromathematics (opens new window) (RIDC NeuroMat), from the São Paulo Research Foundation. He is also the president of the Wiki Movimento Brasi (opens new window)l, the Brazilian affiliate of the Wikimedia movement. As an academic, he has worked on open crowdsourcing resources as well as structured narratives and semantic web.

    Carlos Eduardo Ribas is the leading software developer at the RIDC NeuroMat. He holds a position at the University of São Paulo (opens new window) as a systems analyst. He is the development team leader of the Neuroscience Experiments System (opens new window).

    The RIDC NeuroMat is a research center established in 2013 at the University of São Paulo, in Brazil. Among the core missions of NeuroMat are the development of open-source computational tools, keeping an active role under the context of open knowledge, open science and scientific dissemination. The NeuroMat project was recently renewed until July 31, 2024.

    # How did you first hear about Frictionless Data and why did you apply for a Tool Fund grant?

    We learned about the Tool Fund from an announcement (opens new window) in Portuguese that was posted by Open Knowledge Brasil. The Frictionless Data Tool Fund grant is also an opportunity to connect with like-minded professionals and their projects, and eventually building and supporting a community deeply engaged with the development of open science and tools.

    Public databases are seen as crucial by many members of the neuroscientific community as a means of moving forward more effectively in understanding the functioning and treatment of brain pathologies. However, only open data are not enough, it should be created in a way that can be easily shared and used. Data and metadata should be readable by researchers and machines and Frictionless Data can certainly help with this.

    In our case, NES and the NeuroMat Open Database were developed to establish a standard for data collection in neuroscientific experiments. The standardization of data collection is key for reproducible science. The advantages of the Frictionless Data approach for us is fundamentally to be able to standardize data opening and sharing within the scientific community.

    # What specific issues are you looking to address with the Tool Fund?

    NES is an open-source tool being developed that aims to assist neuroscience research laboratories in routine procedures for data collection. NES was developed to store a large amount of data in a structured way, allowing researchers to seek and share data and metadata of neuroscience experiments. To the best of our knowledge, there are no open-source software tools which provide a way to record data and metadata involved in all steps of an electrophysiological experiment and also register experimental data and its fundamental provenance information. With the anonymization of sensitive information, the data collected using NES can be publicly available through the NeuroMat Open Database (opens new window), which allows any researcher to reproduce the experiment or simply use the data in a different study.

    The system already has some features ready to use, such as Participant registration, Experiment management, Questionnaire management and Data exportation. Some types of data that NES deals with are tasks, stimuli, instructions, EEG, EMG, TMS and questionnaires. Questionnaires are produced with LimeSurvey (opens new window) (an open-source software).

    We propose to change the NES to rely on the philosophy for Frictionless Data. The data exportation module can be adjusted to reflect the set of specifications for data and metadata interoperability and also to be in the Data Package format, as well as any other feature to be in accordance to the philosophy proposed. A major feature to be developed is a JSON file “descriptor” with initial information related to the experiment. However, as sensitive information may be presented at this stage, public access to such data will be done after the anonymization and submission of the experiment to the NeuroMat Open Database.

    To bring NES to the philosophy for Frictionless Data opens up an opportunity for scientists to have access not only to a universe of well-documented and labeled data, but also to understand the process that generated this data.

    # How can the open data, open source, community engage with the work you are doing around Frictionless Data and NES?

    The source code is available on GitHub (opens new window) (documentation link (opens new window)). The development has been done on Django framework. The license is Mozilla Public License Version 2.0. NES is an open source project managed using the Git version control system, so contributing is as easy as forking the project and committing your enhancements.

    As the RIDC NeuroMat has published elsewhere (opens new window), the work on NES is part of a broader agenda for the development of a database that allows public access to neuroscientific data (physiological measures and functional assessments). We hope our engagement with the Frictionless Data community will open up possibilities of sharing and partnering up for moving this agenda forward.

    - + diff --git a/blog/2019/07/09/open-referral/index.html b/blog/2019/07/09/open-referral/index.html index 7f67ea2bf..a6eba1001 100644 --- a/blog/2019/07/09/open-referral/index.html +++ b/blog/2019/07/09/open-referral/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Tool Fund Grantee: Greg Bloom and Shelby Switzer

    This grantee profile features Greg Bloom & Shelby Switzer for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Greg, Shelby, and Open Referral

    Shelby Switzer and Greg Bloom work with Open Referral (opens new window), which develops data standards and open source tools for health, human, and social services. For the Tool Fund, they will be building out datapackage support for all their interfaces, from the open source tools that transform and validate human services data to the Human Services API Specification. Greg is the founder of the Open Referral Initiative, and has experience in nonprofit communications, cooperative development, and community organizing. Shelby is a long-time civic tech contributor, and will be the lead developer on this project.

    I got my start in tech through civic tech and open data. After a variety of software development and API product management roles in my career, including most recently leading the API and integrations team at a healthcare technology company, I’ve returned to my roots to write about and contribute to open source, community-focused tech projects full-time. - Shelby

    Open Referral develops data standards and open platforms that make it easy to share and find information about community resources – i.e. the health, human, and social services available to people in need. The Open Referral Initiative is developing the Human Services Data Toolkit – a suite of open source data management tools that facilitate transformation, validation, and publication of standardized data about health, human, and social services. By leveraging the JSON datapackage specification across each of these components, we can provide a comprehensive approach to frictionless data management of information about any kind of community resources provisioned by governments, charity, and civic institutions.

    # Shelby, how did you hear about Frictionless Data?

    I think I heard about Frictionless Data first over the past year or two just through working with Open Referral. I was doing research on what tools already existed out there for data munging and CSV processing, to help inform my own with open data and specifically diverse sets of community resource data. First impressions? I thought it was awesome, and wanted to explore more to figure out how to incorporate some of FD’s specs and tools into my own pipelines.

    # What specific issues are you looking to address with the Tool Fund grant?

    I’m definitely excited about building out datapackage support for all our interfaces, from the open source tools that transform and validate human services data to the Human Services API Specification. This will help us plug-and-play tools much more efficiently to build pipelines customized to each deployment. A lot of our work is in Ruby, JavaScript, and PHP, so I think this will be an opportunity to help contribute some tools in those languages to the Frictionless Data ecosystem, for example a Ruby library for generating datapackages given an input directory or a library for generating a SQL Server database from a datapackage. We want to do more with our existing data pipeline tools, especially to link them together using the datapackage spec as a common exchange format. We’re also about to use some of these tools in specific projects in the US validating and federating community resource data sets, and we hoped that applying for a tool grant might help us have the runway to iterate on tool improvements based on what we learn from these deployments.

    # How can the open data, open source, community engage with the work you are doing around Frictionless Data and the Human Data Services Toolkit?

    - + diff --git a/blog/2019/07/22/nimblelearn-dpc/index.html b/blog/2019/07/22/nimblelearn-dpc/index.html index 976615be4..5c40e1fa1 100644 --- a/blog/2019/07/22/nimblelearn-dpc/index.html +++ b/blog/2019/07/22/nimblelearn-dpc/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

    Data Package Connector (opens new window), also known as datapackage-connector, is a Power BI Custom Connector (opens new window) that enables you to quickly load one or more tables from Tabular Data Packages into Power BI. It builds on top of one of our other Frictionless Data projects, Data Package M (also known as datapackage-m), and provides a user friendly Power BI ‘Get Data’ experience and also allows these Power BI tables to be refreshed directly from Tabular Data Packages within the Power BI Service. This has been a sought after capability because the Data Package M functions alone don’t currently support this scenario.

    When we first created datapackage-m, we thought it would be quite powerful if it was possible to include a ‘Get Data’ experience in Power BI for Tabular Data Packages, but this wasn’t possible with Power Query M functions alone. For those of you not too familiar with Power BI, the ‘Get Data’ experience is a user interface (UI) wizard that guides you through some simple steps to get data from supported data sources in Power BI. With datapackage-connector, we’ve introduced a ‘Get Data’ experience for Tabular Data packages which makes it easier to build Power BI reports and dashboards from Tabular Data Packages. This is especially useful when a Tabular Data Package has several tables that you’d like to load into Power BI in one go.


    How datapackage-connector works in Power BI

    datapackage-m has one major limitation from a Power BI perspective: it doesn’t support the ability to refresh data from within the Power BI service and this means the data refreshes must be done from Power BI Desktop. datapackage-connector, being a Power BI connector, doesn’t have this limitation. This unlocks a new usage scenario where Power BI reports and dashboards can be built directly on top of Tabular Data Packages and kept up-to-date through scheduled data refreshes.


    datapackage-connector supports data refresh in the Power BI service

    datapackage-connector reuses the same Power Query M functions from datapackage-m and this means that it has the same level of Frictionless Data specs support. We’ll be keeping these two
    projects aligned as we further expand their support for the specs. Read more about datapackage-m here, and check out the documentation for datapackage-connector on our GitHub repo (opens new window).

    - + diff --git a/blog/2019/08/29/welcome-frictionless-fellows/index.html b/blog/2019/08/29/welcome-frictionless-fellows/index.html index 7051b7169..6225bd949 100644 --- a/blog/2019/08/29/welcome-frictionless-fellows/index.html +++ b/blog/2019/08/29/welcome-frictionless-fellows/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    A warm welcome to our Frictionless Data for Reproducible Research Fellows

    As part of our commitment to opening up scientific knowledge, we recently launched the Frictionless Data for Reproducible Research Fellows Programme (opens new window), which will run from mid-September until June 2020.

    We received over 200 impressive applications for the Programme, and are very excited to introduce the four selected Fellows:

    Monica Granados, a Mitacs Canadian Science Policy Fellow;
    Selene Yang, a graduate student researcher at the National University of La Plata, Argentina;
    Daniel Ouso, a postgraduate researcher at the International Centre of Insect Physiology and Ecology;
    Lily Zhao, a graduate student researcher at the University of California, Santa Barbara.

    Next month, the Fellows will be writing blogs to further introduce themselves to the Frictionless Data community, so stay tuned to learn more about these impressive researchers.

    The Programme will train early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content.

    As the programme progresses, we will be sharing the Fellows’ work on making research more reproducible with the Frictionless Data software suite by posting a series of blogs here and on the Fellows website (opens new window). In June 2020, the Programme will culminate in a community call where all Fellows will present what they have learned over the nine months: we encourage attendance by our community. If you are interested in learning more about the Programme, the syllabus (opens new window), lessons (opens new window), and resources (opens new window) are open.

    # More About Frictionless Data

    The Fellows Programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund (opens new window), in which four grantees are developing open source tooling for reproducible research. The Fellows Programme will be running until June 2020, and we will post updates to the Programme as they progress.

    - + diff --git a/blog/2019/09/12/andre-heughebaert/index.html b/blog/2019/09/12/andre-heughebaert/index.html index 963c0fd22..d714d01e7 100644 --- a/blog/2019/09/12/andre-heughebaert/index.html +++ b/blog/2019/09/12/andre-heughebaert/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Tool Fund Grantee: André Heughebaert

    This grantee profile features André Heughebaert for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet André Heughebaert

    With 30+ years experience in Software Development (mostly Database, GIS and webapps), I‘ve seen a wide variety of technologies, programming languages and paradigm changes. My current job is an IT Software Engineer at the Belgian Biodiversity Platform (opens new window) and as the Belgian GBIF Node manager. Today, my activities focus on Open Data advocacy and technical support to publication and re-use of Open Data through GBIF Network. This includes intensive use of Darwin Core standards and related Biodiversity tools. Since 2016, I’m the chair of GBIF (opens new window) Participation Nodes Committee. Before that, I’ve been working in Banking systems, early Digital TV and VoD servers and e-Learning platform. Last but certainly not least, I’m the proud father of four children. I live in Brussels and work for the federal public service.

    # How did you first hear about Frictionless Data?

    Acquainted with CKAN, I discovered Frictionless Data through Twitter and the OKF website. Soon after that, I published my first Data Package (opens new window) on historical movements of troops during the Napoleonic campaign of Belgium in 1815 underlying JunIBIS.be (opens new window), a website launched with a friend for the 200th anniversary of this event.

    # What specific issues are you looking to address with the Tool Fund?

    The suggested tool will automatically convert Darwin Core Archive into Frictionless Data Packages, offering new perspectives to the GBIF community of data publishers and users. I will especially pay attention to potential incompatibilities between the two standards. Limitations of the Darwin Core Star schema (opens new window) encouraged me to investigate emerging open data standards, and the Frictionless Data Tool Fund grant is an excellent opportunity for me to bridge these two Open Data tools ecosystems.

    # How can the open data, open source, community engage with the work you are doing around Frictionless Data, Darwin Core Archive and GBIF?

    I do hope the Frictionless and GBIF communities will help me with issuing/tracking and solving incompatibilities, and also to build up new synergies. You can engage with Andre’s Tool Fund at the Frictionless DarwinCore repository (opens new window).

    - + diff --git a/blog/2019/10/21/fellows-reflect-on-open-access-week/index.html b/blog/2019/10/21/fellows-reflect-on-open-access-week/index.html index f8ecf23e3..2f88998b4 100644 --- a/blog/2019/10/21/fellows-reflect-on-open-access-week/index.html +++ b/blog/2019/10/21/fellows-reflect-on-open-access-week/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    The Fellows reflect on Open Access Week

    The theme of this year’s Open Access Week (opens new window) is “Open for Whom”, which inspired us to reflect on what Open Access means, why it is important, and especially how the people are (positively and negatively) affected by openness in science. Below you will find short thoughts from our four Fellows:

    # Sele

    The privilege of science. We speak from the opening of the production of knowledge, however many times we do not analyze what it really means how privileged this access is. Open Access to whom? Could we also add the “by whom”? What do we produce, what do we share, who enables this? This week leaves me more doubts and reflections, because it is not enough to think about who we deposit the knowledge to, we have to analyze our place from where we stand when we share it.

    # Monica

    Open science touts the tantalizing prospect of making science accessible to anyone, anywhere in the world. Yet simply making data and code freely available doesn’t make science easier to access for everyone, everywhere. To me this year’s Open Access Week theme is challenging the open science community to think about how we rebuild the inequitable system we are dismantling. How do we make science both physically reachable and comprensible, while not putting the burden of transforming the system on marginalized groups? These are, perhaps, the most important questions for the open science community and we must address them before we can hope of making real progressive change to the way knowledge is created and shared.

    # Ouso

    Open Access Week is an important event for open access awareness. It is to highlight advances in the realms of modern scientific research practices and communication. But, I wish one thing for you this week, that Open Access (OA) should bother you. Is your scientific practise in line with it? The week’s theme question - “Open for whom?” eagerly arrested my attention; what is its meaning from my perspective? I would love to pick your mind on it too, but I’ll stick to mine for this bit. Foremost, let me clarify, in its classical meaning OA is unrestricted access to literature, yet here I will mean unrestricted access to all products of research, more or less synonymous to Open Science. My personal summary of it is #NROA – No Restrictions, Only Attribution. However, I must remind us of the need to remain ethical which, I think, is an aspect of the “for whom?”. This question insinuates that certain risks are associated with openness, signalling a red light, and inevitably invoking fear. The fears are chiefly fuelled by ignorance on the many protective options available with openness. In most of my interactions regarding openness, the elephant in the room is security. Of the common triad of Open Science, Open- access (literature), source, and data, the latter is most affected with regards to security. Some people profit from data without consent, at the expense of the public. Also, mainly from the aspect of the former two, some researchers have feared the hijacking of their in-development ideas from open spaces, missing out on well-deserved recognition/attribution. This has led only to preferential access to closed research groups. Openness is very welcoming, attracting with it people/entities with good and bad intentions indiscriminately, but mitigations have been put or are in the process of being put in place. Such include data protection laws (opens new window), policies (opens new window) for anonymisation and confidentiality in research, portals for preregistration (opens new window) of research ideas and design, and micro-publishers like flushPub (opens new window) for quick sharing of knowledge.

    # Lily

    A question I ask other scientists as part of my research is: “Who or what do you consider to be the beneficiaries of your research?”. The answers range from ‘the broader academic community’, to ‘local communities’, to ‘everyone’. I then ask a follow-up question, “How successful do you think you have been in reaching this audience?”. Interestingly, scientists who answer that their beneficiaries are other academics are more likely to consider themselves successful in reaching their target audience, citing scientific publications leading to their success. Scientists trying to reach broader audiences often feel that they are unsuccessful in having their work affect the general public or local residents near their study site. They mention limited funding, not having enough time, and feeling awkward stepping outside their comfort zone as things hindering them. However, in my opinion, they are brave in sharing these sentiments. Wanting to reach a broader audience and recognizing one’s limitations is an important step in the spirit of Open Access Week. Sele mentioned that we can consider not only ‘open for whom’ but ‘open by whom’. By explicitly considering these questions early on, in an inclusive, iterative and transparent manner, researchers, practitioners, and communities can build a more equitable and streamlined pipeline from research to impact.

    - + diff --git a/blog/2020/01/22/frictionless-darwincore/index.html b/blog/2020/01/22/frictionless-darwincore/index.html index c602429f6..3a45bd495 100644 --- a/blog/2020/01/22/frictionless-darwincore/index.html +++ b/blog/2020/01/22/frictionless-darwincore/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Frictionless DarwinCore developed by André Heughebaert

    This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.

    Originally published https://blog.okfn.org/2019/12/09/andre-heughebaert-frictionless-darwincore/ (opens new window)

    The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.

    André Heughebaert is an open biodiversity data advocate in his work and his free time. He is an IT Software Engineer at the Belgian Biodiversity Platform and is also the Belgian GBIF (Global Biodiversity Information Facility) Node manager. During this time, he has worked with the Darwin Core Standards and Open Biodiversity data on a daily basis. This work inspired him to apply for the Tool Fund, where he has developed a tool to convert DarwinCore Archives into Frictionless Data Packages.

    The DarwinCore Archive (DwCA) is a standardised container for biodiversity data and metadata largely used amongst the GBIF community, which consists of more than 1,500 institutions around the world. The DwCA is used to publish biodiversity data about observations, collections specimens, species checklists and sampling events. However, this domain specific standard has some limitations, mainly the star schema (core table + extensions), rules that are sometimes too permissive, and a lack of controlled vocabularies for certain terms. These limitations encouraged André to investigate emerging open data standards. In 2016, he discovered Frictionless Data and published his first data package on historical data from 1815 Napoleonic Campaign of Belgium. He was then encouraged to create a tool that would, in part, build a bridge between these two open data ecosystems.

    As a result, the Frictionless DarwinCore tool converts DwCA into Frictionless Data Packages, and also gives access to the vast Frictionless Data software ecosystem enabling constraints validation and support of a fully relational data schema. Technically speaking, the tool is implemented as a Python library, and is exposed as a Command Line Interface. The tool automatically converts:

    • DwCA data schema into datapackage.json
    • EML metadata into human readable markdown readme file
    • data files are converted when necessary, this is when default values are described

    The resulting zip file complies to both DarwinCore and Frictionless specifications.

    flow
    Frictionless DarwinCore Flow

    André hopes that bridging the two standards will give an excellent opportunity for the GBIF community to provide open biodiversity data to a wider audience. He says this is also a good opportunity to discover the Frictionless Data specifications and assess their applicability to the biodiversity domain. In fact, on 9th October 2019, André presented the tool at a GBIF Global Nodes meeting. It was perceived by the nodes managers community as an exploratory and pioneering work. While the command line interface offers a simple user interface for non-programmers, others might prefer the more flexible and sophisticated Python API. André encourages anyone working with DarwinCore data, including all data publishers and data users of GBIF network, to try out the new tool.

    “I’m quite optimistic that the project will feed the necessary reflection on the evolution of our biodiversity standards and data flows.”

    To get started, installation of the tool is done through a single pip install command (full directions can be found in the project README). Central to the tool is a table of DarwinCore terms linking a Data Package type, format and constraints for every DwC term. The tool can be used as CLI directly from your terminal window or as Python Library for developers. The tool can work with either locally stored or online DwCA. Once converted to Tabular DataPackage, the DwC data can then be ingested and further processed by software such as Goodtables, OpenRefine or any other Frictionless Data software.

    André has aspirations to take the Frictionless DarwinCore tool further by encapsulating the tool in a web-service that will directly deliver Goodtables reports from a DwCA, which will make it even more user friendly. Additional ideas for further improvement would be including an import pathway for DarwinCore data into Open Refine, which is a popular tool in the GBIF community. André’s long term hope is that the Data Package will become an optional format for data download on GBIF.org (opens new window).

    workflow

    Further reading:

    Repository: https://github.com/frictionlessdata/FrictionlessDarwinCore (opens new window)

    Project blog: https://andrejjh.github.io/fdwc.github.io/ (opens new window)

    - + diff --git a/blog/2020/01/22/open-referral-tool/index.html b/blog/2020/01/22/open-referral-tool/index.html index 250397696..95f7f43af 100644 --- a/blog/2020/01/22/open-referral-tool/index.html +++ b/blog/2020/01/22/open-referral-tool/index.html @@ -35,7 +35,7 @@ - + @@ -112,6 +112,6 @@

    This blog is part of a series showcasing projects developed during the 2019 Tool Fund.

    Originally published at https://blog.okfn.org/2020/01/15/frictionless-data-tool-fund-update-shelby-switzer-and-greg-bloom-open-referral/ (opens new window)

    The 2019 Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.

    Open Referral creates standards for health, human, and social services data – the data found in community resource directories used to help find resources for people in need. In many organizations, this data lives in a multitude of formats, from handwritten notes to Excel files on a laptop to Microsoft SQL databases in the cloud. For community resource directories to be maximally useful to the public, this disparate data must be converted into an interoperable format. Many organizations have decided to use Open Referral’s Human Services Data Specification (HSDS) as that format. However, to accurately represent this data, HSDS uses multiple linked tables, which can be challenging to work with. To make this process easier, Greg Bloom and Shelby Switzer from Open Referral decided to implement datapackage bundling of their CSV files using the Frictionless Data Tool Fund.

    In order to accurately represent the relationships between organizations, the services they provide, and the locations they are offered, Open Referral aims to use their Human Service Data Specification (HSDS) makes sense of disparate data by linking multiple CSV files together by foreign keys. Open Referral used Frictionless Data’s datapackage to specify the tables’ contents and relationships in a single machine-readable file, so that this standardized format could transport HSDS-compliant data in a way that all of the teams who work with this data can use: CSVs of linked data.

    In the Tool Fund, Open Referral worked on their HSDS Transformer tool, which enables a group or person to transform data into an HSDS-compliant data package, so that it can then be combined with other data or used in any number of applications. The HSDS-Transformer is a Ruby library that can be used during the extract, transform, load (ETL) workflow of raw community resource data. This library extracts the community resource data, transforms that data into HSDS-compliant CSVs, and generates a datapackage.json that describes the data output. The Transformer can also output the datapackage as a zip file, called HSDS Zip, enabling systems to send and receive a single compressed file rather than multiple files. The Transformer can be spun up in a docker container — and once it’s live, the API can deliver a payload that includes links to the source data and to the configuration file that maps the source data to HSDS fields. The Transformer then grabs the source data and uses the configuration file to transform the data and return a zip file of the HSDS-compliant datapackage.

    DemoApp
    A demo app consuming the API generated from the HSDS Zip

    The Open Referral team has also been working on projects related to the HSDS Transformer and HSDS Zip. For example, the HSDS Validator checks that a given datapackage of community service data is HSDS-compliant. Additionally, they have used these tools in the field with a project in Miami. For this project, the HSDS Transformer was used to transform data from a Microsoft SQL Server into an HSDS Zip. Then that zipped datapackage was used to populate a Human Services Data API with a generated developer portal and OpenAPI Specification.

    Further, as part of this work, the team also contributed to the original source code for the datapackage-rb Ruby gem. They added a new feature to infer a datapackage.json schema from a given set of CSVs, so that you can generate the json file automatically from your dataset.

    Greg and Shelby are eager for the Open Referral community to use these new tools and provide feedback. To use these tools currently, users should either be a Ruby developer who can use the gem as part of another Ruby project, or be familiar enough with Docker and HTTP APIs to start a Docker container and make an HTTP request to it. You can use the HSDS Transformer as a Ruby gem in another project or as a standalone API. In the future, the project might expand to include hosting the HSDS Transformer as a cloud service that anyone can use to transform their data, eliminating many of these technical requirements.

    Interested in using these new tools? Open Referral wants to hear your feedback. For example, would it be useful to develop an extract-transform-load API, hosted in the cloud, that enables recurring transformation of nonstandardized human service directory data source into an HSDS-compliant datapackage? You can reach them via their GitHub repos.

    Further reading:

    Repository: https://github.com/openreferral/hsds-transformer (opens new window)
    HSDS Transformer: https://openreferral.github.io/hsds-transformer/ (opens new window)

    - + diff --git a/blog/2020/01/23/nes-tool/index.html b/blog/2020/01/23/nes-tool/index.html index 8eb806a0c..443b34e5b 100644 --- a/blog/2020/01/23/nes-tool/index.html +++ b/blog/2020/01/23/nes-tool/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Neuroscience Experiments System Tool Fund

    This blog is part of a series showcasing projects developed during the 2019 Tool Fund.

    Originally published at https://blog.okfn.org/2019/12/16 neuroscience-experiments-system-frictionless-tool/ (opens new window)

    The 2019 Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.

    The Research, Innovation and Dissemination Center for Neuromathematics (RIDC NeuroMat) is a research center established in 2013 by the São Paulo Research Foundation (FAPESP) at the University of São Paulo, in Brazil. A core mission of NeuroMat is the development of open-source computational tools to aid in scientific dissemination and advance open knowledge and open science. To this end, the team has created the Neuroscience Experiments System (NES), which is an open-source tool to assist neuroscience research laboratories in routine procedures for data collection. To more effectively understand the function and treatment of brain pathologies, NES aids in recording data and metadata from various experiments, including clinical data, electrophysiological data, and fundamental provenance information. NES then stores that data in a structured way, allowing researchers to seek and share data and metadata from those neuroscience experiments. For the 2019 Tool Fund, the NES team, particularly João Alexandre Peschanski, Cassiano dos Santos and Carlos Eduardo Ribas, proposed to adapt their existing export component to conform to the Frictionless Data specifications.

    Public databases are seen as crucial by many members of the neuroscientific community as a means of moving science forward. However, simply opening up data is not enough; it should be created in a way that can be easily shared and used. For example, data and metadata should be readable by both researchers and machines, yet they typically are not. When the NES team learned about Frictionless Data, they were interested in trying to implement the specifications to help make the data and metadata in NES machine readable. For them, the advantage of the Frictionless Data approach was to be able to standardize data opening and sharing within the neuroscience community.

    Before the Tool Fund, NES had an export component that set up a file with folders and documents with information on an entire experiment (including data collected from participants, device metadata, questionnaires, etc. ), but they wanted to improve this export to be more structured and open. By implementing Frictionless Data specifications, the resulting export component includes the Data Package (datapackage.json) and the folders/files inside the archive, with a root folder called data. With this new “frictionless” export component, researchers can transport and share their export data with other researchers in a recognized open standard format (the Data Package), facilitating the understanding of that exported data. They have also implemented Goodtables into the unit tests to check data structure.

    The RIDC NeuroMat team’s expectation is that many researchers, particularly neuroscientists and experimentalists, will have an interest in using the freely available NES tool. With the anonymization of sensitive information, the data collected using NES can be publicly available through the NeuroMat Open Database, allowing any researcher to reproduce the experiment or simply use the data in a different study. In addition to storing collected experimental data and being a tool for guiding and documenting all the steps involved in a neuroscience experiment, NES has an integration with the Neuroscience Experiment Database, another NeuroMat project, based on a REST API, where NES users can send their experiments to become publicly available for other researchers to reproduce them or to use as inspiration for further experiments.

    export
    Screenshot of the export of an experiment
    data
    Screenshot of the export of data on participants
    tree
    Picture of a hypothetical export file tree of type Per Experiment after the Frictionless Data implementation

    # Further reading

    - + diff --git a/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/index.html b/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/index.html index 3c742c385..3b5c5e859 100644 --- a/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/index.html +++ b/blog/2020/02/10/frictionless-data-pipelines-for-open-ocean/index.html @@ -35,7 +35,7 @@ - + @@ -117,6 +117,6 @@ Remove parenthesis and units from column names (field descriptions and units captured in metadata).
    Remove spaces from column names
    The web application, named Laminar, built on top of DPP helps Data Managers at BCO-DMO perform these operations in a consistent way. First, Laminar prompts us to name and describe the current pipeline being developed, and assumes that the data manager wants to load some data in to start the pipeline, and prompts for a source location.

    Laminar

    After providing a name and description of our DPP workflow, we provide a data source to load, and give it the name, ‘nfix’.

    In subsequent pipeline steps, we refer to ‘nfix’ as the resource we want to transform. For example, to convert the latitude and longitude into decimal degrees, we add a new step to the pipeline, select the ‘Convert to decimal degrees’ processor, a proxy for our custom processor convert_to_decimal_degrees’, select the ‘nfix’ resource, select a field form that ‘nfix’ data source, and specify the Python regex pattern identifying where the values for the degrees, minutes and seconds can be found in each value of the latitude column.

    processor step

    Similarly, in step 7 of this pipeline, we want to generate an ISO 8601-compliant UTC datetime value by combining the pre-existing ‘Date’ and ‘Local Time’ columns. This step is depicted below:

    date processing step

    After the pipeline is completed, the interface displays all steps, and lets the data manager execute the pipeline by clicking the green ‘play’ button at the bottom. This button then generates the pipeline-spec.yaml file, executes the pipeline, and can display the resulting dataset.

    all steps

    data

    The resulting DPP workflow contained 223 lines across this 12-step operation, and for a data manager, the web application reduces the chance of error if this pipelines was being generated by hand. Ultimately, our work with OKF helped us develop processors that follow the DPP conventions.

    Our goal for the pilot project with OKF was to have BCO-DMO data managers using the Laminar for processing 80% of the data submissions we receive. The pilot was so successful, that data managers have processed 95% of new data submissions to the repository using the application.

    This is exciting from a data management processing perspective because the use of Laminar is more sustainable, and acted to bring the team together to determine best strategies for processing, documentation, etc. This increase in consistency and efficiency is welcomed from an administrative perspective and helps with the training of any new data managers coming to the team.

    The OKF team are excellent partners, who were the catalysts to a successful project. The next steps for BCO-DMO are to build on the success of The Frictionlessdata Data Package Pipelines by implementing the Frictionlessdata Goodtables specification for data validation to help us develop submission guidelines for common data types. Special thanks to the OKF team – Lilly Winfree, Evgeny Karev, and Jo Barrett.

    - + diff --git a/blog/2020/03/18/frictionless-data-pilot-study/index.html b/blog/2020/03/18/frictionless-data-pilot-study/index.html index e9010c0aa..d8fe201dd 100644 --- a/blog/2020/03/18/frictionless-data-pilot-study/index.html +++ b/blog/2020/03/18/frictionless-data-pilot-study/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    Frictionless Public Utility Data - A Pilot Study

    This blog post describes a Frictionless Data Pilot with the Public Utility Data Liberation project. Pilot projects are part of the Frictionless Data for Reproducible Research project (opens new window). Written by Zane Selvans, Christina Gosnell, and Lilly Winfree.

    The Public Utility Data Liberation project, PUDL (opens new window), aims to make US energy data easier to access and use. Much of this data, including information about the cost of electricity, how much fuel is being burned, powerplant usage, and emissions, is not well documented or is in difficult to use formats. Last year, PUDL joined forces with the Frictionless Data for Reproducible Research team as a Pilot project to release this public utility data. PUDL takes the original spreadsheets, CSV files, and databases and turns them into unified Frictionless [tabular data packages(https://frictionlessdata.io/docs/tabular-data-package/ (opens new window))] that can be used to populate a database, or read in directly with Python, R, Microsoft Access, and many other tools.

    Catalyst Logo

    # What is PUDL?

    The PUDL project, which is coordinated by Catalyst Cooperative (opens new window), is focused on creating an energy utility data product that can serve a wide range of users. PUDL was inspired to make this data more accessible because the current US utility data ecosystem fragmented, and commercial products are expensive. There are hundreds of gigabytes of information available from government agencies, but they are often difficult to work with, and different sources can be hard to combine.

    PUDL users include researchers, activists, journalists, and policy makers. They have a wide range of technical backgrounds, from grassroots organizers who might only feel comfortable with spreadsheets, to PhDs with cloud computing resources, so it was important to provide data that would work for all users.

    Before PUDL, much of this data was freely available to download from various sources, but it was typically messy and not well documented. This led to a lack of uniformity and reproducibility amongst projects that were using this data. The users were scraping the data together in their own way, making it hard to compare analyses or understand outcomes. Therefore, one of the goals for PUDL was to minimize these duplicated efforts, and enable the creation of lasting, cumulative outputs.

    # What were the main Pilot goals?

    The main focus of this Pilot was to create a way to openly share the utility data in a reproducible way that would be understandable to PUDL’s many potential users. The first change Catalyst identified they wanted to make during the Pilot was with their data storage medium. PUDL was previously creating a Postgresql database as the main data output. However many users, even those with technical experience, found setting up the separate database software a major hurdle that prevented them from accessing and using the processed data. They also desired a static, archivable, platform-independent format. Therefore, Catalyst decided to transition PUDL away from PostgreSQL, and instead try Frictionless Tabular Data Packages. They also wanted a way to share the processed data without needing to commit to long-term maintenance and curation, meaning they needed the outputs to continue being useful to users even if they only had minimal resources to dedicate to the maintenance and updates. The team decided to package their data into Tabular Data Packages and identified Zenodo as a good option for openly hosting that packaged data.

    Catalyst also recognized that most users only want to download the outputs and use them directly, and did not care about reproducing the data processing pipeline themselves, but it was still important to provide the processing pipeline code publicly to support transparency and reproducibility. Therefore, in this Pilot, they focused on transitioning their existing ETL pipeline from outputting a PostgreSQL database, that was defined using SQLAlchemy, to outputting datapackages which could then be archived publicly on Zenodo. Importantly, they needed this pipeline to maintain the metadata, information about data type, and database structural information that had already been accumulated. This rich metadata needed to be stored alongside the data itself, so future users could understand where the data came from and understand its meaning. The Catalyst team used Tabular Data Packages to record and store this metadata (see the code here: https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py (opens new window)).

    Another complicating factor is that many of the PUDL datasets are fairly entangled with each other. The PUDL team ideally wanted users to be able to pick and choose which datasets they actually wanted to download and use without requiring them to download it all (currently about 100GB of data when uncompressed). However, they were worried that if single datasets were downloaded, the users might miss that some of the datasets were meant to be used together. So, the PUDL team created information, which they call “glue”, that shows which datasets are linked together and that should ideally be used in tandem.

    The cumulation of this Pilot was a release of the PUDL data (access it here – https://zenodo.org/record/3672068 (opens new window) and read the corresponding documentation here – https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/ (opens new window)), which includes integrated data from the EIA Form 860, EIA Form 923, The EPA Continuous Emissions Monitoring System (CEMS), The EPA Integrated Planning Model (IPM), and FERC Form 1.

    # What problems were encountered during this Pilot?

    One issue that the group encountered during the Pilot was that the data types available in Postgres are substantially richer than those natively in the Tabular Data Package standard. However, this issue is an endemic problem of wanting to work with several different platforms, and so the team compromised and worked with the least common denominator. In the future, PUDL might store several different sets of data types for use in different contexts, for example, one for freezing the data out into data packages, one for SQLite, and one for Pandas.

    Another problem encountered during the Pilot resulted from testing the limits of the draft Tabular Data Package specifications. There were aspects of the specifications that the Catalyst team assumed were fully implemented in the reference (Python) implementation of the Frictionless toolset, but were in fact still works in progress. This work led the Frictionless team to start a documentation improvement project, including a revision of the specifications website to incorporate this feedback.

    Through the pilot, the teams worked to implement new Frictionless features, including the specification of composite primary keys and foreign key references that point to external data packages. Other new Frictionless functionality that was created with this Pilot included partitioning of large resources into resource groups in which all resources use identical table schemas, and adding gzip compression of resources. The Pilot also focused on implementing more complete validation through goodtables, including bytes/hash checks, foreign keys checks, and primary keys checks, though there is still more work to be done here.

    # Future Directions

    A common problem with using publicly available energy data is that the federal agencies creating the data do not use version control or maintain change logs for the data they publish, but they do frequently go back years after the fact to revise or alter previously published data — with no notification. To combat this problem, Catalyst is using data packages to encapsulate the raw inputs to the ETL process. They are setting up a process which will periodically check to see if the federal agencies’ posted data has been updated or changed, create an archive, and upload it to Zenodo. They will also store metadata in non-tabular data packages, indicating which information is stored in each file (year, state, month, etc.) so that there can be a uniform process of querying those raw input data packages. This will mean the raw inputs won’t have to be archived alongside every data release. Instead one can simply refer to these other versioned archives of the inputs. Catalyst hopes these version controlled raw archives will also be useful to other researchers.

    Another next step for Catalyst will be to make the ETL and new dataset integration more modular to hopefully make it easier for others to integrate new datasets. For instance, they are planning on integrating the EIA 861 and the ISO/RTO LMP data next. Other future plans include simplifying metadata storage, using Docker to containerize the ETL process for better reproducibility, and setting up a Pangeo (opens new window) instance for live interactive data access without requiring anyone to download any data at all. The team would also like to build visualizations that sit on top of the database, making an interactive, regularly updated map of US coal plants and their operating costs, compared to new renewable energy in the same area. They would also like to visualize power plant operational attributes from EPA CEMS (e.g., ramp rates, min/max operating loads, relationship between load factor and heat rate, marginal additional fuel required for a startup event…).

    Have you used PUDL? The team would love to hear feedback from users of the published data so that they can understand how to improve it, based on real user experiences. If you are integrating other US energy/electricity data of interest, please talk to the PUDL team about whether they might want to integrate it into PUDL to help ensure that it’s all more standardized and can be maintained long term. Also let them know what other datasets you would find useful (E.g. FERC EQR, FERC 714, PHMSA Pipelines, MSHA mines…). If you have questions, please ask them on GitHub (https://github.com/catalyst-cooperative/pudl (opens new window)) so that the answers will be public for others to find as well.

    - + diff --git a/blog/2020/03/20/joining-the-frictionless-data-team/index.html b/blog/2020/03/20/joining-the-frictionless-data-team/index.html index c2abe0365..38c78eb74 100644 --- a/blog/2020/03/20/joining-the-frictionless-data-team/index.html +++ b/blog/2020/03/20/joining-the-frictionless-data-team/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Joining the Frictionless Data Team

    Hi there, My name is Gift Egwuenu (opens new window) and I’m super excited to share I joined Datopian (opens new window) as a Frontend Developer and Developer Evangelist! 🎉

    Frictionless Data (opens new window) is an open-source toolkit that brings simplicity and grace to the data experience. We want every Data Engineer or Data Scientist to know about it and benefit from it.

    Part of my job involves spreading the word about Frictionless Data and encouraging community involvement by sharing what you can achieve with the toolkit 😃

    My other day-to-day activities include the following and more:

    • Working on Frictionless Data tools
    • Working closely and interacting with the Frictionless Data Community via (chats, remote hangouts, and in-person events)
    • Writing documentation, guide and blog posts for Frictionless Data

    I’m glad I get to do this as a full-time job because I’m passionate about teaching and learning 🚀 and I’m excited to be a part of the Frictionless Data community (opens new window) where I get to contribute, share, learn and interact with the data community.

    - + diff --git a/blog/2020/04/16/annoucing-frictionless-data-virtual-hangout/index.html b/blog/2020/04/16/annoucing-frictionless-data-virtual-hangout/index.html index 5fee63a4c..4e57618f0 100644 --- a/blog/2020/04/16/annoucing-frictionless-data-virtual-hangout/index.html +++ b/blog/2020/04/16/annoucing-frictionless-data-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    Photo by William White on Unsplash

    We are thrilled to announce we’ll be hosting a virtual community hangout to share recent developments in the Frictionless Data community. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.

    Here are some key discussions we hope to cover:

    • Introductions & share the purpose of this hangout.
    • Share the update on the new website release and general Frictionless Data related updates.
    • Have community members share their thoughts and general feedback on Frictionless Data.
    • Share information about CSV Conf.

    The hangout is scheduled to happen on 20th April 2020 at 5 pm CET. If you would like to attend, you can sign up for the event in advance here. (opens new window) Everyone is welcome.

    Looking forward to seeing you there!

    - + diff --git a/blog/2020/04/23/table-schema-catalog/index.html b/blog/2020/04/23/table-schema-catalog/index.html index 257c0a471..ebf0ac49f 100644 --- a/blog/2020/04/23/table-schema-catalog/index.html +++ b/blog/2020/04/23/table-schema-catalog/index.html @@ -33,7 +33,7 @@ - + @@ -114,6 +114,6 @@ Example: Validata (opens new window), an adaptation of Goodtables for French open data.

  • Problem: Sharing open data standards
    Solution: Schema Catalog
    Example: SCDL (opens new window), Schema.data.gouv.fr (opens new window), Schemas.frictionlessdata.io (opens new window)

  • There’s an ongoing conversation about this project on Frictionless Data Forum (opens new window) and it’s open to feedback and contribution.

    - + diff --git a/blog/2020/04/28/recap-post-frictionless-data-hangout-april-2020/index.html b/blog/2020/04/28/recap-post-frictionless-data-hangout-april-2020/index.html index ca20a7279..8b8e4c0d0 100644 --- a/blog/2020/04/28/recap-post-frictionless-data-hangout-april-2020/index.html +++ b/blog/2020/04/28/recap-post-frictionless-data-hangout-april-2020/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    The first edition of Frictionless Data Community Hangout (opens new window) held on 20 April 2020 and it was a huge success and a great time spent with members of the community.
    We had over 16 guests join the event - the highlight from this event was community interaction. We had several people ask questions regarding things they needed clarity on about Frictionless Data and people shared what they are currently working on.

    The event started with members of the community doing introductions across the room so we get to know each other better. People were also interested in knowing:

    • how frictionless data is different from Pandas
    • how are we moving towards the Open Science direction

    Another topic that surfaced was how people are using Frictionless Data. A great example here is Johan Richer’s use case. Johan Richer (opens new window), who works with the Open Data France team, shared a proposal for building a community led project called Table Schema Catalog (opens new window). This project will serve as a single source of truth and a collection of table schemas from different organizations making all table schemas discoverable and usable. Here’s a great opportunity for the community to show some support! Johan is looking for collaborators, so if this project sounds interesting to you, go find more details on the Frictionless Data Forum (opens new window) and get yourself involved.

    Finally, we rounded up the hangout with Lilly from the Open Knowledge Foundation (opens new window) team. She shared information about the upcoming CSV Conf (opens new window) on May 13-14 2020 and Frictionless Data Tool Fund (opens new window) which, by the way is still open for applications until May 17 2020.

    The event went pretty well thanks to everyone that showed up - I think it’s a great start to cultivating community growth on Frictionless Data. We’ve scheduled another hangout on May 21, 2020. Early registration is on go register now (opens new window) so you don’t miss out. We are also opening up spots for people in the community to share what they are working on and anything related to Frictionless Data that’ll benefit the entire community. If this sounds appealing to you - reach out to us on Discord (opens new window) and we’ll set it up.

    For more updates on the project, join our online community on Discord (opens new window) and follow @frictionlessd8a (opens new window) on twitter!

    Thank you and look forward to seeing you at our next event!

    - + diff --git a/blog/2020/04/30/frictionless-data-workshop/index.html b/blog/2020/04/30/frictionless-data-workshop/index.html index 8bcb76900..701f733c9 100644 --- a/blog/2020/04/30/frictionless-data-workshop/index.html +++ b/blog/2020/04/30/frictionless-data-workshop/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Join our free virtual Frictionless Data workshop on 20th May

    Join us on 20th May at 4pm BST/10am CDT for a Frictionless Data workshop led by the Reproducible Research fellows. This 90-minute long workshop will cover an introduction to the open source Frictionless Data tools. Participants will learn about data wrangling, including how to document metadata, package data into a datapackage, write a schema to describe data and validate data. The workshop is suitable for beginners and those looking to learn more about using Frictionless Data. It will be presented in English, but you can ask questions in English or Spanish.

    The fellows programme is part of the Frictionless Data for Reproducible Research project (opens new window) overseen by the Open Knowledge Foundation (opens new window). This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts.

    At its core, Frictionless Data is a set of specifications (opens new window) for data and metadata interoperability, accompanied by a collection of software libraries (opens new window) that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata. This workshop will be led by the members of the first cohort of the fellows programme: Lily Zhao, Daniel Ouso, Monica Granados, and Selene Yang. You can read more about their work during this programme here: http://fellows.frictionlessdata.io/blog/ (opens new window).

    Additionally, applications are now open for the second cohort of fellows. Read more about applying here: https://blog.okfn.org/2020/04/27/apply-now-to-become-a-frictionless-data-reproducible-research-fellow/ (opens new window)

    - + diff --git a/blog/2020/05/01/announcing-new-website/index.html b/blog/2020/05/01/announcing-new-website/index.html index 901e802f0..387f446b6 100644 --- a/blog/2020/05/01/announcing-new-website/index.html +++ b/blog/2020/05/01/announcing-new-website/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Announcing Our New Website Release

    We’re excited to announce the launch of our newly designed Frictionless Data website. The goal of the rebranding was to better communicate our brand values and improve the user experience. We want Frictionless Data to be wildly successful – we want people to not only know about us, but also also use our tools by default.

    Frictionless Data Homepage
    Screenshot of Frictionless Data Homepage

    We’ve improved the layout of our content, done some general changes on our brand logo, design, as well as on the whole site structure - the navigation is now more accessible with a sidebar option integrated so you can access key items easily and you get more from a quick read.

    Revamped Frictionless Brand Logo
    Revamped Frictionless Brand Logo

    We have a new Team page (opens new window) with a list of Core Team Members, Tool Fund Partners, and Reproducible Research Fellows contributing effort to the project. There are also many other smaller, but impactful changes, all aiming to make the experience of the Frictionless Data website much better for you.

     Team Page
    Frictionless Data Team Page

    In our bid to increase the adoption of our tooling and specifications, we are also working on rewriting our documentation. The current effort involved will birth a new subpage called the Guide (opens new window) - it’s first section is even already published on the website. Furthermore, we’ll be releasing different How-to’s sections that’ll walk our users through the steps required to solve a real-world data problem.

    We hope you find our new website fresher, cleaner and clearer. If you have any feedback and/or improvement suggestions, please let us know on our Discord Channel (opens new window) or on Twitter (opens new window).

    - + diff --git a/blog/2020/05/20/frictionless-data-may-hangout/index.html b/blog/2020/05/20/frictionless-data-may-hangout/index.html index 2280d34b0..eb4123b6c 100644 --- a/blog/2020/05/20/frictionless-data-may-hangout/index.html +++ b/blog/2020/05/20/frictionless-data-may-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -112,6 +112,6 @@ community-hangout

    We are hosting another round of our virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.

    Photo by Perry Grone on Unsplash

    The hangout is scheduled to hold on 21st May 2020 at 5 pm BST. If you would like to attend the hangout, you can sign up for the event here (opens new window)

    Looking forward to seeing you there!

    - + diff --git a/blog/2020/05/22/etalab-case-study-schemas-data-gouv-fr/index.html b/blog/2020/05/22/etalab-case-study-schemas-data-gouv-fr/index.html index 306c29fdc..ed81717a2 100644 --- a/blog/2020/05/22/etalab-case-study-schemas-data-gouv-fr/index.html +++ b/blog/2020/05/22/etalab-case-study-schemas-data-gouv-fr/index.html @@ -35,7 +35,7 @@ - + @@ -110,6 +110,6 @@

    schema.data.gouv.fr - An Open Data Schema Catalog for France

    In June 2019, Etalab (opens new window), a department of the French interministerial digital service (DINUM), launched schema.data.gouv.fr (opens new window), a platform listing schemas for France. It could be described as what Johan Richer recently called a schema catalog (opens new window). This project is an initiative of data.gouv.fr (opens new window), the French open data platform, which is developed and maintained by Etalab.

    schema.gouv.fr homepage

    # What’s a schema?

    A schema declares a data model in a clear and precise manner, the various fields and types in a structured and consistent manner, according to a specification. For example, Table Schema (opens new window) is a simple language to declare a schema for tabular data.

    Schemas are well suited for a wide range of applications: validating data against a schema, documenting a data model, consolidating data from multiple sources, generating example datasets, or proposing tailored input forms. This wide range of applications makes schemas an important tool for both producers and reusers.

    # Advancing open data quality

    A common complaint of open data reusers has been the lack of quality of the data and data structure changes over time, without notice. The OKFN spoke about this issue in mid-2017 in a blog post, Open data quality – the next shift in open data? (opens new window)

    With schema.data.gouv.fr, Etalab promotes high-quality open data: producers are encouraged to discuss and come up with an appropriate schema for the data they want to publish, and to document it with a recognised specification. Producers will then be able to make sure that the data they publish conforms to the schema over time. Reusers benefit from high-quality documentation, a stable data structure, and increased quality of the data.

    # Impacts

    The first impact of the launch of schema.data.gouv.fr (opens new window) has put at the forefront the challenge of open data quality. It acknowledges that this is not a solved problem and that producers should embrace schemas, validators, documentation, automated testing to raise the quality of the data they publish. It’s also a recognition of the efforts already made by the community, for example the “Socle commun des données locales” (Common Ground of Local Data) by OpenDataFrance (opens new window).

    To help producers discover schemas and how it can be helpful for them, we published in March 2020 a long guide (opens new window) going over steps producers are encouraged to follow when creating a schema: discovery, discussions, implementation, publication and finally referencing the schema on schema.data.gouv.fr (opens new window).

    Since the launch, producers worked with their reusers and published various schemas: carpooling places (opens new window) or defibrillators (opens new window) to name a few. People had in-depth discussions about their data model, encouraged by the thoroughness of the Table Schema specification. Producers worked hard to clean their data and finally reached a point where their dataset is 100% aligned with the schema, without any errors.

    # What’s next

    Here are a few things we are working on and hope to be able to finish in the coming years.

    # Improved data models defined in the law

    Right now, when data models are introduced by law, the data model is often described by a table. We’d like to offer a schema when these laws are published, to ease adoption by the community and improve discoverability.

    # Integration with data.gouv.fr (opens new window)

    The schema.data.gouv.fr (opens new window) initiative is mainly based on published datasets on the French open data platform data.gouv.fr (opens new window). However, these tools are still quite separated today. In the coming months, we would like to strengthen the link between schema.data.gouv.fr (opens new window) and data.gouv.fr (opens new window) by promoting existing schemas directly on the open data platform.

    First, we would like to inform users of the existence of a consolidated dataset based on an existing schema and provide them with its quality report. Such a feature is newly available on schema.data.gouv.fr (opens new window). The same feature will arrive soon on data.gouv.fr (opens new window).

    Screenshot à prévoir

    Second, we’re looking into integrating schemas into the data publishing process on data.gouv.fr (opens new window). We could help users by letting them know that a schema corresponding to their dataset already exists. We could suggest them what changes to make to get their data directly validated. We already started doing this with a simple implementation: we post comments on datasets which are supposed to follow a schema, letting producers know if the data is valid and if not, enabling them to access a report to troubleshoot.

    Another possibility would be to offer a new service on data.gouv.fr (opens new window) such as the generation of data from an automatically generated form. This is the goal of the ongoing development of CSV-GG (opens new window) allowing to generate a form from an existing Table Schema. This could help users to directly produce validated data.

    screenshot à prévoir

    # Automation

    In the longer term, we also plan to automate data consolidation based on a schema as much as possible. For that, we need to better know and understand available resources on the platform. This could be done by systematically analyzing the content of a new resource and try to fetch metadata such as headers or type of data for each column.

    These metadata could then be used to identify datasets with similar structures and link them to an existing schema or propose to create a new one if it does not already exist.

    We could also take advantage of the tool CSVAPI (opens new window) which is actually in use on data.gouv.fr (opens new window) to preview data of a specific dataset. CSVAPI could evolve to offer new features such as highlighting quality problems directly in the dataset or navigating through different datasets with same - or partial - structures. The schema associated with a dataset could also help having a better preview by associating a type to each field. For example, a postal code could be recognized as such and the leading zero would not be cropped.

    # Conclusion

    All of the features mentioned in this article are intended to promote the usefulness and the value of schemas and lead to the creation of new ones. We hope this will result in an increase of the overall quality of the data hosted on data.gouv.fr (opens new window).

    Furthermore, we strongly believe that these features will help to link different users and producers with similar interests and therefore be in line with the community-based nature of data.gouv.fr (opens new window).

    - + diff --git a/blog/2020/06/05/june-virtual-hangout/index.html b/blog/2020/06/05/june-virtual-hangout/index.html index 1ca107529..cd5607b10 100644 --- a/blog/2020/06/05/june-virtual-hangout/index.html +++ b/blog/2020/06/05/june-virtual-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -112,6 +112,6 @@ community-hangout

    We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.

    Photo by Perry Grone on Unsplash

    The hangout is scheduled to hold on 25th June 2020 at 5 pm BST / 4 PM UTC. If you would like to attend the hangout, you can sign up for the event using this form (opens new window)

    Looking forward to seeing you there!

    # Community Hangout Recording

    If you missed the community hangout and will like to catch up on what was discussed, here’s a recording of the hangout.

    - + diff --git a/blog/2020/06/26/csvconf-frictionless-recap/index.html b/blog/2020/06/26/csvconf-frictionless-recap/index.html index b314c6d91..aecdd793a 100644 --- a/blog/2020/06/26/csvconf-frictionless-recap/index.html +++ b/blog/2020/06/26/csvconf-frictionless-recap/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    csv,conf,v5 Frictionless Data talks and recap

    csv,conf,v5 (opens new window), which occurred virtually in May 2020, featured several talks about using Frictionless Data, and was also organized by two members of the Frictionless Data team, Lilly Winfree and Jo Barratt. csv,conf is a community conference that brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over the years we have had over a hundred different talks from a huge range of speakers, most of which you can still watch back on our YouTube Channel (opens new window).

    COVID-19 threw a wrench in our plans for csv,conf,v5, and we ended up converting the conference to a virtual event. We were looking forward to our first conference in Washington DC, but unfortunately, like many other in-person events, this was not going to be possible in 2020. However, there were many positive outcomes of moving to a virtual conference. For instance, the number of attendees quadrupled (over 1000 people registered!) and people were able to attend from all over the world.

    During the conference, there were several talks showcasing Frictionless Data. Two of the Frictionless Data Fellows, Monica Granados and Lily Zhao, presented a talk (“How Frictionless Data Can Help You Grease Your Data (opens new window)”) that had over 100 people watching live, which is many more than would have been at their talk in person. Other related projects gave talks that incorporated Frictionless Data, such as Christina Gosnell and Pablo Virgo from Catalyst Cooperative discussing “Getting climate advocates the data they need. (opens new window)” I also recommend watching “Data and Code for Reproducible Research (opens new window)” by Lisa Federer and Maryam Zaringhalam, and “Low-Income Data Diaries - How “Low-Tech” Data Experiences Can Inspire Accessible Data Skills and Tool Design (opens new window)” by David Selassie Opoku. You can see the full list of talks, with links to slides and videos, on the csv,conf website: https://csvconf.com/speakers/ (opens new window).

    If you are planning on organizing a virtual event, you can read more about how csv,conf,v5 was planned here: https://csvconf.com/going-online (opens new window).

    We hope to see some of you next year for csv,conf,v6!

    - + diff --git a/blog/2020/07/10/tool-fund-intermine/index.html b/blog/2020/07/10/tool-fund-intermine/index.html index e32a9e642..6ce3970bf 100644 --- a/blog/2020/07/10/tool-fund-intermine/index.html +++ b/blog/2020/07/10/tool-fund-intermine/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Adding Data Package Specifications to InterMine’s im-tables

    This grantee profile features Nikhil Vats for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Nikhil Vats

    I am an undergraduate student pursuing BE Computer Science and MSc Economics from BITS Pilani, India. My open-source journey started as a Google Summer of Code student with Open Bioinformatics Foundation (opens new window) in 2019 and currently, I am a mentor at InterMine (opens new window) for Outreachy. I’ve been working part-time as a full-stack web developer for the last two years. The latest project that I worked on was DaanCorona (opens new window) (daan is a Hindi word which means donation) - a non-profit initiative to help small businesses affected by Coronavirus in India. Through the Frictionless Data Tool Fund, I would like to give back to the open-source community by adding data package specifications to InterMine’s im-tables. Also, I love animals, music and cinema!

    # How did you first hear about Frictionless Data?

    I first heard about Frictionless Data from my mentor Yo Yehudi. She had sent an email to the InterMine community explaining the Frictionless Data initiative. The introductory video of Frictionless Data by Rufus Pollock inspired me deeply. I researched about Frictionless Data Specifications, Data Packages, and other tools and was amazed by how useful they can be while working with data. I wanted to contribute to Frictionless Data because I loved its design philosophy and the plethora of potential tools that can go a long way in changing how we produce, consume, and reuse data in research.

    # What specific issues are you looking to address with the Tool Fund?

    InterMine is an open-source biological data warehouse. Over thirty different InterMine instances exist and can be viewed using InterMine’s web interface im-tables (opens new window), a Javascript-based query results table data displayer. The export functionality of the im-tables supports common formats like CSV, TSV, and JSON. Whilst this is standardized across different instances of InterMine, exported data doesn’t conform to any specific standards, resulting in friction in data especially while integrating with other tools. Adding data package specifications and integrating with frictionless data specifications will ensure seamless integration, reusability, and sharing of data among individuals and apps, and will affect a broad number of InterMines based in research institutes around the world. In the long run, I would also like to develop and add a specification for InterMine’s data to the Frictionless Data registry.

    # How can the open data, open source, or open science communities engage with the work you are doing?

    I will be working on the im-tables (opens new window) and intermine (opens new window) GitHub repository, writing blogs every month to share my progress. I also plan to write documentation, tutorials, and contributing guidelines to help new contributors get started easily. I want to encourage and welcome anyone who wants to contribute or get started with open-source to work on this project. I’ll be happy to help you get familiar with InterMine and this project. You can get in touch here (opens new window) or here (opens new window). Lastly, I welcome everyone to try out and use the features added during this project to make data frictionless, usable, and open!

    - + diff --git a/blog/2020/07/16/tool-fund-polar-institute/index.html b/blog/2020/07/16/tool-fund-polar-institute/index.html index eacd2cbcf..f2357ef8b 100644 --- a/blog/2020/07/16/tool-fund-polar-institute/index.html +++ b/blog/2020/07/16/tool-fund-polar-institute/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    schema-collaboration Tool Fund

    This grantee profile features Carles Pina Estany for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Carles Pina Estany

    I’m Carles and I’m currently working part-time for the Swiss Polar Institute (opens new window) as a software engineer. I’m not a scientist but I like working with scientists, for science institutions, in research, education and with free/open source software. You can read more about me on my website: https://carles.pina.cat/ (opens new window).

    One of the tasks in the institute is to publish data and encourage researchers to provide detailed metadata. Often this metadata is written by researchers together with a data manager and without a tool in place to do this, it can become tricky to keep track of versions and progress. Frictionless Data schemas provide a model on which the metadata can be written to ensure it is machine-readable and standardised but completing the metadata in JSON files is not very user-friendly. My Tool Fund project, schema-collaboration, will help data managers and researchers collaborate easily to document data following the already-existing Frictionless Data schemas datapackage and tableschema.

    # How did you first hear about Frictionless Data?

    We have had Frictionless Data on our radar for about a year. Lilly Winfree’s talk at FOSDEM 2020 (opens new window) gave us a good insight into how it could be used and we realised that it was a good fit. Recently we have been improving the way that we describe data for a collaborating organisation: Frictionless Data was a natural way to go and we started using it to describe all datasets. create.frictionlessdata.io (opens new window) was a good start for creating a first draft of the tableschema and datapackage but we missed a tool to collaborate with the researchers when describing a data set.

    # What specific issues are you looking to address with the Tool Fund?

    Collaboration between data managers and researchers needs to be as easy as possible for both sides. Currently there is no tool to collaboratively document tabular data and data packages easily. Using this tool the researcher will be able to enter the information in a controlled manner and the data manager will be able to give feedback on what’s missing or what needs to be changed through a common platform.

    Hopefully this will lead to more productive use of time for both sides and having the data described with machine-readable Frictionless Data schemas will make it easier to validate, reuse and have consistent documentation. The tool will be based on datapackage-ui (opens new window) for the frontend, allowing all those involved to collaborate on the metadata through a user-friendly UI. Django will be used for the backend and Docker will be used for installation and deployments.

    # How can the open data, open source, or open science communities engage with the work you are doing?

    This project will be based on datapackage-ui (opens new window) so using this tool and opening and fixing issues would be useful contributions to the project.

    Feel free to submit issues, ideas and PR on the Github repository schema-collaboration (opens new window) or Discord (opens new window) and test the project on the staging deployment when available.

    - + diff --git a/blog/2020/07/21/data-matrices-pilot/index.html b/blog/2020/07/21/data-matrices-pilot/index.html index dcfeee862..6217fd023 100644 --- a/blog/2020/07/21/data-matrices-pilot/index.html +++ b/blog/2020/07/21/data-matrices-pilot/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Clarifying the semantics of data matrices and results tables - a Frictionless Data Pilot

    As part of the Frictionless Data for Reproducible Research project, funded by the Sloan Foundation, we have started a Pilot collaboration with the Data Readiness Group at the Department of Engineering Science of the University of Oxford; the group will be represented by Dr. Philippe Rocca-Serra, an Associate Member of Faculty. This Pilot will focus on removing the friction in reported scientific experimental results by applying the Data Package specifications.

    Publishing of scientific experimental results is frequently done in ad-hoc ways that are seldom consistent. For example, results are often deposited as idiosyncratic sets of Excel files or tabular files that contain very little structure or description, making them difficult to use, understand and integrate. Interpreting such tables requires human expertise, which is both costly and slow, and leads to low reuse. Ambiguous tables of results can lead researchers to rerun analysis or computation over the raw data before they understand the published tables. This current approach is broken, does not fit users’ data mining workflows, and limits meta-analysis. A better procedure for organizing and structuring information would reduce unnecessary use of computational resources, which is where the Frictionless Data project comes into play. This Pilot collaboration aims to help researchers publish their results in a more structured, reusable way.

    In this Pilot, we will use (and possibly extend) Frictionless tabular data packages (opens new window) to devise both generic and specialized templates. These templates can be used to unambiguously report experimental results. Our short term goal from this work is to develop a set of Frictionless Data Packages for targeted use cases where impact is high. We will first focus first on creating templates for statistical comparison results, such as differential analysis, enrichment analysis, high-throughput screens, and univariate comparisons, in genomics research by using the STATO ontology (opens new window) within tabular data packages.

    Our longer term goals are that these templates will be incorporated into publishing systems to allow for more clear reporting of results, more knowledge extraction, and more reproducible science. For instance, we anticipate that this work will allow for increased consistency of table structure in publications, as well as increased data reuse owing to predictable syntax and layout. We also hope this work will ease creation of linked data graphs from table of results due to clarified semantics.

    An additional goal is to create code that is compatible with R’s ggplot2 library (opens new window), which would allow for easy generation of data analysis plots. To this end, we plan on working with R developers in the future to create a package that will generate Frictionless Data compliant data packages.

    This work has recently begun, and will continue throughout the year. We have already met with some challenges, such as working on ways to transform, or normalize, data and ways to incorporate RDF linked data (you can read our related conversations in GitHub (opens new window)). We are also working on how to define a ‘generic’ table layout definition, which is broad enough to be reused in as wide a range of situation as possible.

    If you are interested in staying up to date on this work, we encourage you to check out these GitHub repositories: https://gitlab.com/datascriptor/datascriptor-fldatapackages (opens new window) and https://github.com/ISA-tools/frictionless-collab (opens new window). Additionally, we will (virtually) be at the eLife Sprint in September to work on closely related work, which you can read about here: https://sprint.elifesciences.org/data-paper-skeleton-tools-for-life-sciences/ (opens new window). Throughout this Pilot, we are planning on reaching out to the community to test these ideas and get feedback. Please contact us on GitHub or in Discord (opens new window) if you are interested in contributing.

    - + diff --git a/blog/2020/08/03/tool-fund-cambridge-neuro/index.html b/blog/2020/08/03/tool-fund-cambridge-neuro/index.html index 88ca6ae95..2b54a5663 100644 --- a/blog/2020/08/03/tool-fund-cambridge-neuro/index.html +++ b/blog/2020/08/03/tool-fund-cambridge-neuro/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools

    This grantee profile features Stephen Eglen for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Stephen Eglen

    I am a Reader in Computational Neuroscience at the University of Cambridge. A large part of my work involves analysing neuronal recordings taken from high-throughput recording devices, such as multi-electrode arrays. Despite these arrays having been in use for many years, there are still no standard formats for exchanging data, and so we spend lots of time simply reformatting data as we pass it around different groups, or use different analysis https://doi.org/10.1186/2047-217X-3-3 (opens new window)) used HDF5; the aim of our current project is to evaluate the use of Frictionless Data as a common format for the analysis of our spontaneous activity recordings, both past and present. The bulk of the work this summer will be done by a talented Natural Science undergraduate at Cambridge, Alexander Shtyrov.

    # How did you first hear about Frictionless Data?

    I had the good fortune to meet Dr Rufus Pollock in 2015 at a scientific meeting where I was presenting our work from 2014 and he was presenting an introduction to Frictionless Data. We then developed a case study (circa 2016) using a simpler data set (the spatial distribution of neurons in the retina). Skipping forward a few years, I saw the call for applications from Frictionless Data and decided it might be a good time to see how the project had developed. Rather than developing further tools, after discussions with the Frictionless Data team, we decided to make a case study for the application of these tools.

    # What specific issues are you looking to address with the Tool Fund?

    Our goals are:

    1. Convert our existing datasets (Eglen et al 2014) into Frictionless Data containers.
    2. Compare the relative merits of the containers vs HDF5 for storing “medium-sized” (megabytes, rather than gigabytes) data files. Aspects to consider will include portability, efficiency and ease of access.
    3. Develop a case study for analysing spontaneous activity patterns with a generative approach to model the underlying neuronal networks. This code has been developed by colleagues at Cambridge in Matlab, but has yet to be tested on our spontaneous activity patterns.
    4. Write up our findings for publication in a peer-reviewed journal.

    # How can the open data, open source, or open science communities engage with the work you are doing?

    We have a GitHub repository, but it is currently private (shared also with Frictionless Data) as it contains some recent datasets relating to human patients that are not yet ready to be shared. We hope to release it as soon as we can, where it will be linked to from my home page: https://sje30.github.io (opens new window). We aim to share all our findings from this project for the benefit of the community.

    - + diff --git a/blog/2020/08/17/frictionless-wheat/index.html b/blog/2020/08/17/frictionless-wheat/index.html index 168f23de1..a57aa7b02 100644 --- a/blog/2020/08/17/frictionless-wheat/index.html +++ b/blog/2020/08/17/frictionless-wheat/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Data for Wheat

    This grantee profile features Simon Tyrrell, Xingdong Bian, and Robert Davey for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet the Grassroots team

    Hi I’m Simon Tyrrell and I’m a research software engineer having spent most of my career in academia. My first degree was in Maths and I did my PhD in Cheminformatics, both done at the University of Sheffield. After some postdoctoral fellowships in Computational Chemistry, I now happily reside in the field of Bioinformatics here at the Earlham Institute (EI) writing software to a diet of tea and loud guitars, both listened to and played.

    Xingdong Bian is a member of the Data Infrastructure group (opens new window), he joined the Earlham Institute in January 2010 and was involved in the development of EI’s Laboratory Information Management System (MISO) and the TGAC Browser. He has worked on solutions for data visualisation, managing servers, genomic databases and bioinformatics tools. Xingdong is now working mainly on the Grassroots project as a research software engineer. He has a BSc in Computer Science from the University of Sheffield and a MSc in Software Engineering from the University of York.

    Robert Davey leads the Data Infrastructure group at the Earlham Institute and is the PI for the Grassroots project. He has a PhD in Computer Science from the University of East Anglia, undertaken at the Roberts lab in the National Collection of Yeast Cultures (opens new window). Rob leads a number of large computing infrastructure development and deployment projects, is a certified Software Carpentry (opens new window) Instructor and Trainer, an editorial board member for Nature Scientific Data, and a Software Sustainability Institute (opens new window) Fellow.

    Together Xingdong and I work in Robert Davey’s team at the Earlham Institute developing Grassroots. This is a set of middleware tools for sharing bioinformatics data and services so that users and developers can do scientific analyses as easily as possible.

    # How did you first hear about Frictionless Data?

    We have always been big believers in the FAIR data principles and when we saw a tweet about the Frictionless Data tool fund, the more that we read about it, the more it seemed to be exactly what we were after! Even without the fund, it is likely to have been something that we would have looked to implement anyway.

    # What specific issues are you looking to address with the Tool Fund?

    As part of the Designing Future Wheat (DFW) project, we currently have two different repositories: the DFW data portal (opens new window), using iRODS (opens new window) with mod_eirods_dav (opens new window), and a digital repository (opens new window) using CKAN (opens new window). Both of these contain a wide variety of heterogeneous data such as genetic sequences, field trial experiment results, images, spreadsheets, publications, etc., and we are trying to standardise how to expose these datasets and their associated metadata. This is where Frictionless Data comes in! The ability to have consistent methods of accessing this information should make it easier for other researchers and data scientists to access and do some great work with all of this data.

    # How can the open data, open source, or open science communities engage with the work you are doing?

    We firmly believe in open source and open data and everything that we create is freely available. We plan to build a selection of Frictionless Data tools and make them available on our existing data portals so people can try them out and give any feedback. These will be rolled out incrementally so that progress is visible from early on. Our initial set of work will focus on extending the DFW data portal that uses one of our existing tools, eirods-dav (https://github.com/billyfish/eirods-dav (opens new window)) which is a tool for exposing the data in an iRODS repository in a user-friendly way with rich APIs for developers and data scientists too. So if anyone has any feedback, ideas, suggestions, rants 😃, please raise an issue at the GitHub repo; the more, the merrier!

    - + diff --git a/blog/2020/08/27/august-virtual-hangout/index.html b/blog/2020/08/27/august-virtual-hangout/index.html index aaab7612a..994709dcc 100644 --- a/blog/2020/08/27/august-virtual-hangout/index.html +++ b/blog/2020/08/27/august-virtual-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -113,6 +113,6 @@ community-hangout

    We are hosting a virtual community hangout to share recent developments in the Frictionless Data community and it’s also an avenue to connect with other community members. This will be a 1-hour meeting where community members come together to discuss key topics in the data community.

    Photo by Perry Grone on Unsplash

    The hangout is scheduled to hold on 27th August 2020 at 5 pm BST / 4 PM UTC. If you would like to attend the hangout, you can sign up for the event using this form (opens new window)

    Looking forward to seeing you there!

    # Community Hangout Recording

    If you missed the community hangout and would like to catch up on what was discussed, here’s a recording of the hangout.

    Here is a short summary of what we were up to:

    # Technical presentation on frictionless-py

    We also made available a technical presentation of a new tool we are working on: frictionless-py (opens new window). If you would like to delve deeper into the nuts and bolts of it, here it is for your enjoyment!

    - + diff --git a/blog/2020/09/01/hello-fellows-cohort2/index.html b/blog/2020/09/01/hello-fellows-cohort2/index.html index d4326ff81..2a14e1b06 100644 --- a/blog/2020/09/01/hello-fellows-cohort2/index.html +++ b/blog/2020/09/01/hello-fellows-cohort2/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Say hello to the second cohort of Frictionless Fellows!

    We are very excited to introduce the newest Fellows for Cohort 2 of the Frictionless Data Reproducible Research Fellows Programme (opens new window)! Over the next nine months, these eight early career researchers will be learning about open science, data management, and how to use Frictionless Data tooling in their work to make their data more open and their research more reusable. As an introduction, each Fellow has written a short blog about themselves and their goals. Read below to meet the Fellows and click on their individual blogs to learn more about them!


    Katerina picture

    Hi everyone, my name is Katerina Drakoulaki, I am from Greece and Cyprus, and I’m currently doing my PhD at the National and Kapodistrian University of Athens. My PhD combines all my interests: linguistics, language disorders, music cognition, and working with children! Research reproducibility is important in order to reliably identify and provide intervention to children with difficulties. Read more about Katerina here. (opens new window)


    Evelyn picture

    Hello everybody! I’m Evelyn Night, an MSc student at the University of Nairobi (opens new window) and a research fellow at the International Center of Insect Physiology and Ecology (opens new window). Growing up in a tiny village in Kano plains of Western Kenya, I always had a passion for learning. Fast forward through the years I find my way into academia pursuing a master’s degree and characterizing insect pollinator communities using morphometric and molecular tools for my thesis. My goal is to improve agricultural research capacity in the country and to also enhance formation of policies that would ensure increase in agricultural productivity. Read more about Evelyn here. (opens new window)


    Dani picture

    Hi everyone! I’m Dani, a cognitive neuroscientist and open science enthusiast. I live and work in San Sebastian, a beautiful city by the sea in northern Spain. We have a responsibility to overcome the current incentive system in the Academy to provide more honest, accessible, and quality research. I look forward to learning more about Frictionless Data tools and incorporating them into my work so that my research is open to everyone. Read more about Dani here. (opens new window)


    Kate picture

    Hello hi! I’m Kate Bowie, a 28-year-old midwesterner studying the human microbiome, or the collection of bacteria that live in and on the human body. As I dive deeper into the field of microbiome science, I am becoming an advocate for putting resources and time into improving research reproducibility. I wanted to become a Frictionless Fellow so that I could learn tools to help microbiome science data workflows become more reproducible and engage in the open science community. Read more about Kate here. (opens new window)


    Sam picture

    Hello! My name is Sam Wilairat. I am currently earning a Master of Library and Information Science degree (MLIS) and have an interest in data librarianship. As a fellow, I’m hoping to learn frictionless data principles and tools to ultimately promote them at my institution via education and outreach to researchers. I believe Open Science is the future and the more people embrace it, the more equitable and innovative research will be! Read more about Sam here. (opens new window)


    Anne picture

    Hey everyone, I’m Anne! I’m a graduate student based in Geneva, Switzerland that was born and bred in a few places across the United States (including New York, Chicago, Houston, and Washington DC!). Here in Switzerland, I study international institutions with the eye of an anthropologist or sociologist, through long-term ethnographic research. I’m excited to learn how to apply the Frictionless Data tools in my work throughout these nine months, and to experiment with new forms of conveying social science research in the process.


    Ritwik picture

    Hi Ritwik here! I am based near Delhi, India and am doing my masters in Sustainable buildings, Energy conservation and Climate Change from International Institute of Information Technology Hyderabad. It is very important that the research which is carried in this domain is reproducible and available to all so we can use it to spread awareness among people. Read more about Ritwik here. (opens new window)


    Jacqueline picture

    Hi! My name is Jacqueline. I am a Master’s Candidate and Interdisciplinary Innovation Fellow in the Department of Computer and Information Science at the University of Pennsylvania. I applied to be a Reproducible Research Fellow to build space into my research process for actively exploring open science and reproducibility issues. As a scientist, I consider it an obligation to share my knowledge as widely and freely as possible and to ensure that my findings can be vetted through replication studies and other important checks. Read more about Jacqueline here. (opens new window)

    - + diff --git a/blog/2020/09/16/goodtables-bcodmo/index.html b/blog/2020/09/16/goodtables-bcodmo/index.html index 207198db2..f2e78a78c 100644 --- a/blog/2020/09/16/goodtables-bcodmo/index.html +++ b/blog/2020/09/16/goodtables-bcodmo/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Goodtables - Expediting the data submission and submitter feedback process

    This post was originally published on the BCO-DMO blog (opens new window).

    Earlier this year, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) (opens new window) completed a pilot project with the Open Knowledge Foundation (OKF) (opens new window) to streamline the data curation processes for oceanographic datasets using Frictionless Data Pipelines (FDP) (opens new window). The goal of this pilot was to construct reproducible workflows that transformed the original data submitted to the office into archive-quality, FAIR-compliant (opens new window) versions. FDP lets a user define an order of processing steps to perform on some data, and the project developed new processing steps specific to the needs of these oceanographic datasets. These ordered processing steps are saved into a configuration file that is then available to be used anytime the archived version of the dataset must be reproduced. The primary value of these configuration files is that they capture and make the curation process at BCO-DMO transparent. Subsequently, we found additional value internally by using FDP in three other areas. First, they made the curation process across our data managers much more consistent versus the ad-hoc data processing scripts they individually produced before FDP. Second, we found that data managers saved time because they could reuse pre-existing pipelines to process newer versions submitted for pre-existing datasets. Finally, the configuration files helped us keep track of what processes were used in case a bug or error was ever found in the processing code. This project exceeded our goal of using FDP on at least 80% of data submissions to BCO-DMO to where we now use it almost 100% of the time.

    As a major deliverable from BCO-DMO’s recent NSF award (opens new window) the office planned to refactor its entire data infrastructure using techniques that would allow BCO-DMO to respond more rapidly to technological change. Using Frictionless Data as a backbone for data transport is a large piece of that transformation. Continuing to work with OKF, both groups sought to continue our collaboration by focusing on how to improve the data submission process at BCO-DMO.

    Duplication error

    Goodtables noticed a duplicate row in an uploaded tabular data file.

    Part of what makes BCO-DMO a successful data curation office is our hands-on work helping researchers achieve compliance with the NSF’s Sample and Data Policy coming from their Ocean Sciences division (opens new window). Yet, a steady and constant queue of data submissions means that it can take some weeks before our data managers can thoroughly review data submissions and provide necessary feedback to submitters. In response, BCO-DMO has been creating a lightweight web application for submitting data while ensuring such a tool preserves the easy experience of submitting data that presently exists. Working with OKF, we wanted to expedite the data review process by providing data submitters with as much immediate feedback as possible by using Frictionless Data’s GoodTables project (opens new window).

    Through a data submission platform, researchers would be able to upload data to BCO-DMO and, if tabular, get immediate feedback from Goodtables about whether it was correctly formatted or any other quality issues existed. With these reports at their disposal, submitters could update their submissions without having to wait for a BCO-DMO data manager to review. For small and minor changes this saves the submitter the headache of having to wait for simple feedback. The goal is to catch submitters at a time where they are focused on this data submission so that they don’t have to return weeks later and reconstitute their headspace around these data again. We catch them when their head is in the game.

    Goodtables provides us a framework to branch out beyond simple tabular validation by developing data profiles. These profiles would let a submitter specify the type of data they are submitting. Is the data a bottle or CTD file? Does it contain latitude, longitude time or depth observations? These questions, optional for submitters to answer, would provide even further validation steps to get improved feedback immediately. For example, specifying that a file contains latitude or longitude columns could detect whether all values fall within valid bounds. Or that a depth column contains values above the surface. Or that the column pertaining to the time of an observation has inconsistent formatting across some of the rows. BCO-DMO can expand on this platform to continue to add new and better quality checks that submitters can use.

    Out-of-bounds longitude
    Goodtables noticed a longitude that is outside a range of -180 to 180. This happended because BCO-DMO recommends using decimal degrees format between -180 to 180 and defined a Goodtables check for longitude fields.

    - + diff --git a/blog/2020/09/17/tool-fund-metrics/index.html b/blog/2020/09/17/tool-fund-metrics/index.html index cceec3a97..732d5ac53 100644 --- a/blog/2020/09/17/tool-fund-metrics/index.html +++ b/blog/2020/09/17/tool-fund-metrics/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Metrics in Context

    This grantee profile features Asura for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    # Meet Asura

    Hallihallöchen meine Lieben!

    I’m Asura and I’m a doctoral student at Simon Fraser University, Vancouver. I’m working in the muddy area between data science, communication, and philosophy in order to explore questions of power and systemic inequality within scholarly communication. This means that I work at the ScholCommLab as a data scientist, while exploring the philosophical issues in my doctoral project. Concretely, I am intending to develop an analytic framework for the study citations as infrastructure building on critical feminist theory and Science and Technology Studies (STS). However, I remain a coder and tinkerer at heart, which is how I ended up working with Frictionless Data on Metrics in Context.

    # How did you first hear about Frictionless Data?

    I first heard about Frictionless Data at the pre-csv,conf,v4 meetup hosted by Open Knowledge Foundation in 2019. I remember being quite impressed by the basic premise of Frictionless, although I hadn’t grasped the full picture of the technicalities yet. During the main conference I then learnt about more opportunities to get involved such as the Fellowship and the Tool Fund. I left csv,conf with great impressions and plans to work out an application but then life a.k.a my PhD happened… I had forgotten about Frictionless Data, until I recently found out that the Tool Fund is going into its second round. At the time I had started working with the Make Data Count team on data citations, then ideas and topics fell into place, and here I am now!

    # What specific issues are you looking to address with the Tool Fund?

    In this project, I want to address a common theme within the critique of modern technology in our data-driven world: the lack of context for data and, often related, biases in databases. Algorithmic and database biases have moved into the spotlight of critical thought on how technology exacerbates systemic inequalities. Following these insights, I want to address the need for different (rather than simply more) context and metadata for scholarly metrics in the face of racial, gender, and geographic biases which plague modern academia.

    It isn’t controversial to say that scholarly metrics have become an integral part of scholarship and probably they are here to stay. Controversy usually comes into play once we discuss how and for which purposes metrics are used. This typically refers to the (mis)use of citation counts and citation-based indicators1 for research assessment and governance, which also led to a considerable number of initiatives and movements calling for a responsible use of metrics2. However, I would like to take a step back and redirect the attention to the origin of the data underlying citation counts.

    These conversations about the inherent biases of citation databases are not entirely new and scholars across disciplines have been highlighting the consequential systemic issues. However, in this project I am not proposing a solution to overcome or abolish these biases per se, but rather I want to shine light on the opaque mechanism of capturing metrics which lead to the aforementioned inequalities. In other words, I propose to develop an open data standard3 for scholarly metrics which documents the context in which the data was captured. This metadata describes the properties of the capturing apparatus of a scholarly event (e.g., a citation, news mention, or tweet of an article) such as the limitations of document coverage (what kind of articles are indexed?), the kind of events captured (tweets, retweets, or the both maybe?) or other technicalities (is Facebook considered as a whole or only a subset of public pages?).

    While metrics in context don’t remove systemic inequality, they make the usually hidden and inaccessible biases visible and explicit. In doing so, they facilitate conversations about structural issues in academia and eventually contribute to the development of better infrastructures for the future.

    # How can the open data, open source, or open science communities engage with the work you are doing?

    Metrics in Context will be fully conducted out in the open which means that all resources will be available on Github and I will do my best to transparently document progress and decisions.

    The project is organized in three parts (roughly breaking down into conceptual questions, technical implementation, and scholarly application) and I invite all of you to leave your ideas, thoughts, and critiques via email or a Github issue.

    You can see the full roadmap with a detailed breakdown of tasks here: https://github.com/Bubblbu/metrics-in-context/issues/2 (opens new window)


    1. There is extensive literature for the critique of indicators such as the h-index or Journal Impact Factor. See Haustein and Larivière (2015) for an overview.
    2. See DORA and the Leiden Manifesto for two prominent examples of responsible research metrics initiatives
    3. I am expecting references to this xkcd comic on standards: https://xkcd.com/927/ (opens new window)

    - + diff --git a/blog/2020/10/08/frictionless-framework/index.html b/blog/2020/10/08/frictionless-framework/index.html index 32ddde4e6..c8465a569 100644 --- a/blog/2020/10/08/frictionless-framework/index.html +++ b/blog/2020/10/08/frictionless-framework/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

    # Frictionless Framework

    We are excited to announce our new high-level Python framework, frictionless-py: https://github.com/frictionlessdata/frictionless-py (opens new window). Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny: https://youtu.be/VPnC8cc6ly0 (opens new window)

    # Why did we write new Python code?

    Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries (datapackage,goodtables, tableschema,tabulator) were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands - mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is frictionless-py.

    # What happens to the old Python code (datapackage-py, goodtables-py, tableschema-py, tabulator-py)? How does this affect current users?

    Datapackage-py (see details (opens new window)), tableschema-py (see details (opens new window)), tabulator-py (see details (opens new window)) still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on frictionless-py, and encourage you to consider starting to experiment with or work with frictionless-py during the last months of 2020 and migrate to it starting from 2021 (here is our migration guide) (opens new window). The one important thing to note is that goodtables-py has been subsumed by frictionless-py (since version 3 of Goodtables). We will continue to bug-fix goodtables@2.x in this branch (opens new window) and it is also still available on PyPi (opens new window) as it was before. Please note that frictionless@3.x version’s API is not stable as we are continuing to work on it at the moment. We will release frictionless@4.x by the end of 2020 to be the first SemVer/stable version.

    # What does frictionless-py do?

    Frictionless-py has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods.

    Describe your data: You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details.

    Extract your data: You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others.

    Validate your data: You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process.

    Transform your data: You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data.

    Additional features:

    • Powerful Python framework
    • Convenient command-line interface
    • Low memory consumption for data of any size
    • Reasonable performance on big data
    • Support for compressed files
    • Custom checks and formats
    • Fully pluggable architecture
    • The included API server
    • More than 1000+ tests

    # How can users get started?

    We recommend that you begin by reading the Getting Started Guide (opens new window) and the Introduction Guide (opens new window). We also have in depth documentation for Describing Data (opens new window), Extracting Data (opens new window), Validating Data (opens new window), and Transforming Data (opens new window).

    # How can you give us feedback?

    What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord (opens new window) or by opening an issue in the frictionless-py repo: https://github.com/frictionlessdata/frictionless-py/issues (opens new window).

    # FAQ’s

    # Where’s the documentation?

    Are you a new user? Start here: Getting Started (opens new window) & Introduction Guide (opens new window)
    Are you an existing user? Start here: Migration Guide (opens new window)
    The full list of documentation can be found here: https://github.com/frictionlessdata/frictionless-py#documentation (opens new window)

    # What’s the difference between datapackage and frictionless?

    In general, frictionless is our new generation software while tabulator/tableschema/datapackage/goodtables are our previous generation software. Frictionless has a lot of improvements over them. Please see this issue for the full answer and a code example: https://github.com/frictionlessdata/frictionless-py/issues/428 (opens new window)

    # I’ve spotted a bug - where do I report it?

    Let us know by opening an issue in the frictionless-py repo: https://github.com/frictionlessdata/frictionless-py/issues (opens new window). For tabulator/tableschema/datapackage issues, please use the corresponding issue tracker and we will triage it for you. Thanks!

    # I have a question - where do I get help?

    You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: https://discord.com/invite/j9DNFNw (opens new window). We also have a Twitter account (@frictionlessd8a) (opens new window) and community calls where you can come meet the team and ask questions: http://frictionlessdata.io/events/ (opens new window).

    # I want to help - how do I contribute?

    Amazing, thank you! We always welcome community contributions. Start here (https://frictionlessdata.io/contribute/ (opens new window)) and here (https://github.com/frictionlessdata/frictionless-py/blob/master/CONTRIBUTING.md (opens new window)) and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.

    - + diff --git a/blog/2020/10/19/fellows-reflect-on-open-access-week/index.html b/blog/2020/10/19/fellows-reflect-on-open-access-week/index.html index e5e8cbaf8..99e8dd09d 100644 --- a/blog/2020/10/19/fellows-reflect-on-open-access-week/index.html +++ b/blog/2020/10/19/fellows-reflect-on-open-access-week/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ fellows

    The theme of this year’s Open Access Week (opens new window) is “Open with Purpose: Taking Action to Build Structural Equity and Inclusion”. How can we be more purposeful in the open space? How can we work towards true equity and inclusion? The following blog is a compilation of the Fellows’ thoughts and reflections on this theme.

    # Katerina

    When I read this year’s theme I wondered how I could relate to it. Inclusion was the word that made it for me. At first I thought about how the Fellowship itself was inclusive for me, a person with a humanities background that had not had the chance to receive any institutional or structured support when it comes to programming and data management. Afterwards, what came to mind is how inclusive are the things I’m currently learning on the programme with regard with the populations I work in my clinical role. Cognitive accessibility is an effort to make online content more accessible to persons with overall cognitive difficulties, that is difficulties with memory, attention, and language. These are not rare difficulties, as they characterize individuals with learning difficulties (developmental language disorder, dyslexia), autism spectrum disorder, attention deficit-hyperactivity disorder (ADHD), dementia, aphasia and other cognitive difficulties following a stroke. I discovered a lot of initiatives and guidelines on how online content could be more accessible: using alternatives to text, such as figures, audio, or a simpler layout, making content appear in predictable ways, giving more time to individuals to interact with the content, focusing on readability of the content among others. In sum, many individuals among us have difficulties accessing online content in an optimal way. More information about what we can do about it here (opens new window) and here (opens new window).

    # Dani

    Once again, we see academia and the overall scientific research environment engaged in a discussion about who should bear the costs of scientific publications. Few have welcomed with open arms the new agreement (opens new window) between a few German institutions and the Nature Publishing group. The obvious gap between what the prestigious publishing group demands and what researchers can afford has turn the news into some sort of bad joke. However, it seems that many have accepted by now other relatively cheaper Open Access publishing arrangements. At least, relatively cheaper for them. Research funding is nowadays so scarce and precarious in many countries that a simple article processing charge of 1200€ will prevent researchers from submitting to such journal. No doubt there is good will in those who fight to make the current publishing model more open. However, I can’t help but feel there is a lack of awareness of the financial gap involved in setting an acceptable threshold for article processing charges that are based on the standards of the world’s major economic powers.

    # Sam

    Libraries spend an enormous amount of money paying journal subscription fees in order to give their patrons access to cutting edge research. Imagine a world in which paywalls are a thing of the past and these thousands of dollars currently reserved at every library for journal subscription costs could be redistributed. Librarians need to support Open Access and to publicly reject the current systems in place that restrict access to information for the majority of the global community. Librarians should stop and ask themselves, what are the long term effects of supporting the current system? What historic injustices are being perpetuated by paying for standard subscription-based journals? If librarianship is based upon providing equitable service to all information users, supporting Open Access is a necessity.

    # Anne

    My colleagues and I have been having interesting discussions about what Open Access means in the context of our respective disciplines, and so many of them have boiled down to funding models, and how to make sure that the (financial) incentives are in the right place. So when I approached these questions of structural equity and inclusion, I wondered how we can balance the ideals of open access that allow for creative collaboration, open knowledge, and more equitable contributions (all things that brought us all together at OKF) with the necessary requirements of funding and the pressure to publish. In my own discipline, these debates have been happening for a long time (opens new window), and were recently brought to light because of an experimental Open Access journal called HAU (opens new window), which was founded by the late David Graeber (opens new window). Furthermore, as a journalist, I tend not to equate open access with accessibility more generally, because making something available or open doesn’t necessarily mean that it will be used (let alone understood by a wider audience!). This is the integral role that journalism can play within the open access academic community, in my view: through increased data literacy, visualisation tools, and what I call “translation through storytelling”. This is what drew me to #dataviz, and why I’m creating interactive visualisations of human rights data from the United Nations with OKF. While the Universal Periodic Review is well-known for being one of the most inclusive and equitable venues at the UN, few know about it outside of Geneva. So as Open Access Week comes to a close, I’ve been starting to re-think the movement as “open, accessible, fundable, and understandable”. Maybe it’s not as catchy, but it’s what I hope to embody!

    # Ritwik

    Wherever I see terms as ‘Open access’ and ‘Open Science’, I usually think about how we can make changes to the current research environment so as to extract meaning from open research space and allow people to learn more about this and move beyond conventional ‘Research Journals’. One of the ways we can empower structural and racial equity in research is by investing in Open Science infrastructures and services and capacity building for Open Science by including Open translation services and tools like github to lower the language barrier. Not every potential reader of openly available science is fluent in English and Automatic translation is not always correct, but mere information translations can still convey the overall meaning. We can take help from open source development programs to empower organisations like CC Extractor and other local translation free softwares so we can include languages like Spanish, Italian, Hindi, Japanese and other native languages so that everyone is able to break those barriers and understand literature promoted in different languages. Similarly provide sustainable funding mechanisms and foster decentralized, community-owned/-run non-profit open source initiatives in this space. Apply an inclusive, holistic approach to science and research in the sense of Open Scholarship to include human value education, open scholcomm and open education with a view on teaching in the seminar and classroom, etc. - basically the whole variety of research and teaching practices that define academic life, but still remain underrepresented in the larger debate around Open Science.

    # Evelyn

    ‘Equity’ and ‘inclusion’ are two words that I know too well given the yawning gaps that exist between the haves and have-nots in the African society. Research indeed is the core of any society, identifying calamities and solving them in the most sustainable of ways. These two words therefore occupy an integral space in the open research arena since structural equity and inclusion would mean that research knowledge is given for free irrespective of any societal construct for productive downstream research. Although open access has been lauded for promoting access to high quality research at no costs, authors have so far faced sky high publishing costs that have quite limited the number of papers that make it to the open especially in low and middle income regions like Africa (opens new window). The need to subsidize publishing costs to the open space is thus apparent with the overall goal of strengthening research capacity and impactful research especially for such regions to be at par with the rest of the world in research and development. Research societies and governments need to forge bilateral pacts whose main purpose is to encourage open access by introducing waivers on publishing costs and also curbing predatory journals that most often than not derail the reputation of scientists.Indeed the achievement of structural equity and inclusion will require that the authors and users of scientific papers alike get to disseminate and access knowledge for free.

    # Kate

    In organizing my ideas for a coherent reflection on the theme of this year’s Open Access Week, I thought of recent news out of the United Kingdom. UK’s National Institute for Health Research (NIHR) recently revealed new measures (opens new window) no longer requiring universities to have memberships to specific charters and concordats to receive grant funding. This may seem like a move towards removing roadblocks for funding, however membership to these charters, such as the Athena SWAN Charter (opens new window) and Race Equality Charter (opens new window), provide universities strategies to identify and address institutional and cultural barriers. In a 2020 world in which the Open Access community picks a theme that specifically mentions “structural equity and inclusion” as its goals, institutes of power, like UK’s NIHR, seem to be tone-deaf by no longer requiring charters to guide them in those structures. I commend the Open Access community for leading the way by prioritizing equity and inclusion in its pursuit to share knowledge, and I believe we should all challenge institutional framework, like UK’s NIHR, to embrace the values of the open access community.

    Jacqueline
    As a machine learning researcher, this year’s Open Access Week theme resonates. Open access, structural equity, and inclusion should be explicit goals in artificial intelligence (AI) research. To quote the Algorithmic Justice League (opens new window), “Technology should serve all of us. Not just the priviledged few.” However, the demographics of the AI community do not reflect societal diversity, and this can allow algorithms to reinforce harmful systemic biases (opens new window). But even if we know who is writing the algorithms that affect our lives, we often don’t know how these predictive systems make their decisions. A recent response (opens new window) to a Google Health closed source tool (opens new window) for breast cancer screening argues that failing to release code and training data undermines the scientific value, transparency, and reproducibility of AI systems. Ironically, however, this well-worded argument lies behind a paywall that limits transparency by design. Competing views on closed access AI publishing are captured in the 2018 boycott (opens new window) of Nature Machine Intelligence, its coverage (opens new window) in the scientific media, and the journal’s rebuttal (opens new window). Whether you stand by Plan S (opens new window) or not, open conversations around the ethics of access and transparency are important steps toward safe, equitable, and inclusive AI.

    - + diff --git a/blog/2020/10/28/october-virtual-hangout/index.html b/blog/2020/10/28/october-virtual-hangout/index.html index 10f33cb8e..b3c50bc9b 100644 --- a/blog/2020/10/28/october-virtual-hangout/index.html +++ b/blog/2020/10/28/october-virtual-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -112,6 +112,6 @@ community-hangout

    # Did you miss our October community call?

    We had a great presentation by Keith Hughitt, who told us about his work on using Frictionless to create infrastructure for sharing biology and genomics data packages. You can watch his presentation here:

    # Other agenda items of note included:

    # Join us next month!

    Our next meeting will be on 19th November. You can sign up here: https://forms.gle/5HeMrt2MDCYSYWxT8 (opens new window). We’ll discuss new features of frictionless-py, and there will also be time for your updates too. Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    Here is the recording of the full call:

    As always, join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions!

    - + diff --git a/blog/2020/11/18/dryad-pilot/index.html b/blog/2020/11/18/dryad-pilot/index.html index 735b2a04c..724add737 100644 --- a/blog/2020/11/18/dryad-pilot/index.html +++ b/blog/2020/11/18/dryad-pilot/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Dryad and Frictionless Data collaboration

    By Tracy Teal; originally posted in the Dryad blog: https://blog.datadryad.org/2020/11/18/frictionless-data/ (opens new window)

    Guided by our commitment to make research data publishing more seamless and also re-usable, we are thrilled to partner with Open Knowledge Foundation and the Frictionless Data team to enhance our submission processes. Integrating the Frictionless Data toolkit, Dryad will be able to directly provide feedback to authors on the structure of tabular files uploaded. This will also allow for automated file level metadata to be created at upload and available for download for published datasets.

    We are excited to get moving on this project and with support from the Sloan Foundation, Open Knowledge Foundation has just announced a job opening to contribute to this work. Please check out the posting and circulate it to any developers who may be interested in building out this functionality with us: https://okfn.org/about/jobs/ (opens new window)

    Stay tuned for a project update in July 2021!

    - + diff --git a/blog/2020/11/19/november-virtual-hangout/index.html b/blog/2020/11/19/november-virtual-hangout/index.html index 6847a6124..924d16db2 100644 --- a/blog/2020/11/19/november-virtual-hangout/index.html +++ b/blog/2020/11/19/november-virtual-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -112,6 +112,6 @@ community-hangout

    # A recap from our November community call

    This time around, we were offered a fantastic presentation by Costas Simatos, the team leader of the ISA2 Interoperability Test Bed Action (opens new window) at the European Commission! He revealed some powerful tools to validate data against specifications, including the following:

    If you would like to dive deeper and watch Costas’ presentation, here it is:

    # Other agenda items from our hangout

    # Join us next month!

    Our next meeting will be on December 17. You can sign up here (opens new window). We’ll discuss using Frictionless data package for the web archive data package (WACZ format) and give some space to talk about geospatial data standards, coping with Covid and showing a member’s platform dedicated to open data hackathons!

    As always, there will be time for your updates too. Do you want to share something with the community? Let us know when you sign up!

    # Call recording

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2020/11/26/fellows-packaging/index.html b/blog/2020/11/26/fellows-packaging/index.html index 0333cf277..659b5f950 100644 --- a/blog/2020/11/26/fellows-packaging/index.html +++ b/blog/2020/11/26/fellows-packaging/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Packaging Research Data with the Frictionless Fellows

    Have you ever been looking at a dataset and had no idea what the data values mean? What units are being used? What does that acronym in the first column mean? What is the license for this data?

    These are all very common issues that make data hard to understand and use. At Frictionless Data, we work to solve these issues by packaging data with its metadata - aka the description of the data. To help you package your data, we have code in several languages (opens new window) and a browser tool, called Data Package Creator (opens new window).

    Our Reproducible Research Fellows recently learned all about packaging their data by using the Data Package Creator. To help others learn how they too can package their data, the Fellows wrote about packaging their data in blogs that you can read below!


    # Data Package is Valid! By Ouso Daniel (opens new window) (Cohort 1)

    “To quality-check the integrity of your data package creation, you must validate it before downloading it for sharing, among many things. The best you can get from that process is “Data package is valid!”. What about before then?”


    # Combating other people’s data by Monica Granados (opens new window) (Cohort 1)

    “Follow the #otherpeoplesdata on Twitter and in it you will find a trove of data users trying to make sense of data they did not collect. While the data may be open, having no metadata or information about what variables mean, doesn’t make it very accessible….Without definitions and an explanation of the data, taking the data out of the context of my experiment and adding it to something like a meta-analysis is difficult. Enter Data packages. “


    # Data Package Blog by Lily Zhao (opens new window) (Cohort 1)

    "When I started graduate school, I was shocked to learn that seafood is actually the most internationally traded food commodity in the world….However, for many developing countries being connected to the global seafood market can be a double-edged sword….Over the course of my master’s degree, I developed a passion for studying these issues, which is why I am excited to share with you my experience turning some of the data my collaborators into a packaged dataset using the Open Knowledge Foundation’s Datapackage tool.”


    # ¿Cómo empaquetamos datos y por qué es importante organizar la bolsa del supermercado? By Sele Yang (opens new window) (Cohort 1)

    “Empaquetando datos sobre aborto desde OpenStreetMap Esta es una publicación para compartirles sobre el proceso y pasos para crear datapackages. ¿Qué es esto? Un datapackage es básicamente un empaquetado que agiliza la forma en que compartimos y replicamos los datos. Es como un contenedor de datos listo para ser transportado por la autopista del conocimiento (geeky, right).”


    # So you want to get your data package validated? By Katerina Drakoulaki (opens new window) (Cohort 2)

    “Have you ever found any kind of dataset, (or been given one by your PI/collaborator) and had no idea what the data were about? During my PhD I’ve had my fair share of not knowing how code works, or how stimuli were supposed to be presented, or how data were supposed to be analysed….The datapackage tool tries to solve one of these issues, more specifically creating packages in which data make sense, and have all the explanations (metadata) necessary to understand and manipulate them.”


    # Constructing a basic data package in Python by Jacqueline Maasch (opens new window) (Cohort 2)

    “As a machine learning researcher, I am constantly scraping, merging, reshaping, exploring, modeling, and generating data. Because I do most of my data management and analysis in Python, I find it convenient to package my data in Python as well. The screenshots below are a walk-through of basic data package construction in Python.”


    # Sharing data from your own scientific publication by Dani Alcalá-López (opens new window) (Cohort 2)

    “What better way to start working with open data than by sharing a Data Package from one of my own publications? In this tutorial, I will explain how to use the Frictionless Data tools to share tabular data from a scientific publication openly. This will make easier for anyone to reuse this data.”


    # Data Package Blog by Sam Wilairat (opens new window) (Cohort 2)

    “As a library science student with an interest in pursuing data librarianship, learning how to create, manage, and share frictionless data is important. These past few months I’ve been learning about Frictionless Data and how to use Frictionless Data Tools to support reproducible research….To learn how to use the Frictionless Data Tools, I decided to pursue an independent project and am working on creating a comprehensive dataset of OER (open educational resources) health science materials that can be filtered by material type, media format, topic, and more.”


    # Let’s Talk Data Packaging by Evelyn Night (opens new window) (Cohort 2)

    “A few weeks ago I met data packages for the first time and I was intrigued since I had spent too much time in the past wrangling missing and inconsistent values. Packaging data therefore taught me that arranging and preserving data does not have to be tedious anymore. Here, I show how I packaged a bit of my data (unpublished) into a neat json document using the Data Package creator . I am excited to show you just how much I have come from knowing nothing to being able to package and extract the json output.”


    # [Data]packaging human rights with the Universal Periodic Review by Anne Lee Steele (opens new window) (Cohort 2)

    “All of the records for the Universal Periodic Review have been uploaded online, and are available for the public. However, it’s not likely that the everyday user would be able to make heads or tails of what it actually means….The way I think about it, the Data Package is a way of explaining the categories used within the data itself, in case someone besides an expert is using them. While sections like “Recommendation” and “Recommending State” may be somewhat self-explanatory, I can imagine that this will get way more complicated with purely numerical data.”


    # Creating a datapackage for microbial community data (and a phyloseq object) by Kate Bowie (opens new window) (Cohort 2)

    “I study bacteria, and lucky for me, bacteria are everywhere….My lab often tries many different ways to handle the mock [bacteria] community, so it’s important that the analysis be documented and reproducible. To address this, I decided to generate a data package using a tool created by the Open Knowledge Foundation. Here is my experience creating a data package of our data, the metadata, and associated software.”


    # Using Weather and Rainfall Data to Validate by Ritwik Agarwal (opens new window) (Cohort 2)

    “I am using a data resource from Telangana Open Data…it is an open source data repository commissioned by the state government here in India and basically it archives and stores Weather, Topological, Agriculture and Infrastructure data which then can be used by research students and stakeholders keen to study and make reports in it….CSV files are very versatile, but cannot handle the metadata with all the necessary context. We need to make sure that people can find our data and the information they need to understand our data. That’s where the Data Package comes in! ”

    - + diff --git a/blog/2020/12/17/december-virtual-hangout/index.html b/blog/2020/12/17/december-virtual-hangout/index.html index 9c62daabd..5b7a98a20 100644 --- a/blog/2020/12/17/december-virtual-hangout/index.html +++ b/blog/2020/12/17/december-virtual-hangout/index.html @@ -35,7 +35,7 @@ - + @@ -114,6 +114,6 @@

    # A recap from our December community call

    We had a presentation about “using frictionless data package for web archive data package (WACZ format)”. More details in this GitHub issue (opens new window).

    If you would like to dive deeper and watch Ilya’s presentation, you can find it here:

    # Other agenda items from our hangout

    # Join us next time!

    Our next meeting will be announced in January 2021! You can sign up here (opens new window) to be notified when the hangout will be scheduled. We’ll give some space to talk about geospatial data standards, coping with Covid and showing a member’s platform dedicated to open data hackathons!

    As always, there will be time for your updates too. Do you want to share something with the community? Let us know when you sign up!

    # Call recording

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/01/13/partnering-with-odi/index.html b/blog/2021/01/13/partnering-with-odi/index.html index 91ce6ae40..c0cbb80c6 100644 --- a/blog/2021/01/13/partnering-with-odi/index.html +++ b/blog/2021/01/13/partnering-with-odi/index.html @@ -35,7 +35,7 @@ - + @@ -116,6 +116,6 @@ We are currently looking for novice and intermediate users to help us review our documentation, in order to make it more useful for you and all our future users.
    For every user session you take part into, you will be given £50 for your time and feedback.
    Are you interested? Then fill in this form (opens new window).

    # More about Frictionless Data

    Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

    - + diff --git a/blog/2021/01/18/schema-collaboration/index.html b/blog/2021/01/18/schema-collaboration/index.html index ff4d20dfc..3a8a3338c 100644 --- a/blog/2021/01/18/schema-collaboration/index.html +++ b/blog/2021/01/18/schema-collaboration/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Schema Collaboration

    This blog is part of a series showcasing projects developed during the 2020 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.

    # What problem does Schema-Collaboration solve?

    As a software engineer, I’ve spent more than a decade developing software used by researchers or data managers using different technologies. I have been involved in free software communities and projects for more than 20 years.

    Whilst working for a polar research institute, we saw the opportunity to take advantage of Frictionless data packages to describe datasets in a machine readable way ready for publication. But it was difficult for data managers and researchers to collaborate effectively on this, particularly when one or both groups were not familiar with Frictionless schemas. We needed a way for researchers submitting datasets to get feedback from the data managers to ensure that the dataset’s schema was correct.

    # How does Schema-Collaboration make collaborating easier?

    The Frictionless Data Package Creator (opens new window) is a very good Web-based tool to create the schemas but it didn’t help out of the box on the collaboration part. The solution in this tool fund was to build a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). To encourage collaboration within a project multiple researchers can work on the same schema. Being able to view the description in human-readable formats makes it easier to spot mistakes and to integrate with third-party data repositories.

    From a data manager’s perspective the tool allows them to keep tabs on the datasets being managed and their progress. It prevents details getting lost in emails and hopefully provides a nicer interface to encourage better collaboration.

    In other words: think of a very simplified “Google Docs” specialised for data packages.

    # Who can use Schema-Collaboration?

    The tool is designed to help data managers(*) and researchers document data packages. The documentation (which is based on Frictionless schemas) needs to be started by the data manager who then sends the URL to the researchers allowing them to edit the schema.

    *: or anybody who wants to collaborate on creating a data package.

    Data-packages
    Data managers can view a list of datapackages within the Schema-Collaboration tool.

    # How can I use this tool?

    To evaluate the tool it is possible to use the public demo server (opens new window) or to install it locally on a computer.

    It was packaged in a Docker container to make it easier to install on servers. There is full documentation available (opens new window).

    Once the tool is installed it is used via a Web browser both by data managers and researchers.

    datapackage-detail
    You can view details about the datapackage, including comments from the data manager or other users, and also edit the datapackage.

    # Future plans for Schema-Collaboration

    We plan to install the schema-collaboration at the Swiss Polar Institute to be used to describe polar data sets.

    In the upcoming January Frictionless Data community call (sign up here (opens new window) to join), I will do a demo and I would really appreciate feedback. Please feel free to use it and add issues (bugs or ideas) in the GitHub repository (opens new window).

    # Tech stack

    For the curious: schema-collaboration is developed using Python and Django and uses the django-crispy-forms package to create the forms. It supports sqlite3 and MariaDB databases.

    # Thanks to…

    In order to integrate Data Package Creator with schema-collaboration some changes where needed in the Data Package Creator. Evgeny (@roll on GitHub/Discord) from Frictionlessdata project made the changes to Data Package Creator needed to achieve this and helped with the integration. Thank you very much!

    Further reading:

    GitHub repository: https://github.com/frictionlessdata/schema-collaboration (opens new window)

    Meet Carles Pina Estany: https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/#meet-carles-pina-estany (opens new window)

    - + diff --git a/blog/2021/01/26/sara-petti/index.html b/blog/2021/01/26/sara-petti/index.html index ba7d943df..fbbcecd6d 100644 --- a/blog/2021/01/26/sara-petti/index.html +++ b/blog/2021/01/26/sara-petti/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Meet the new Frictionless Data Community Manager

    Hi everyone,

    I am Sara Petti, the new Frictionless Data (opens new window) Community Manager. After a very happy decade in Brussels, I moved to Hamburg, Germany last year in February, just in time to live the global pandemic from a brand-new place. With social life put to a stop, I finally decided to start something I had wanted to do for quite some time: learn to code in order to do data visualisation. Right now I am learning Python slowly but surely and when I am not going bananas over cleaning data I draw comics or experiment fermentation with items in my kitchen (mostly vegetables). So if you are also a passionate breeder of lactobacillus bacteria, we should definitely get in touch!

    Back in Brussels I worked with public libraries, developing projects with them, but also advocating for them to be on the EU agenda. Talking with some of the most innovative librarians, I became well aware of the importance of granting free access to information and knowledge to everyone in order to empower citizens and foster democracy. I started to take an interest in the whole open movement, monitoring projects and policy development, and quickly became passionate about it.

    I think the real turning point for me was when I got to know an amazing project on opening air quality data developed by some public libraries in Colombia and tried to replicate it in Europe. At that point, I really understood the implications of the open movement: by opening data, citizens are able to gain ownership on compelling subjects for them and to advocate for policy improvement. Sadly more often than not, when data is made available, it is not directly reusable.

    Frictionless Data provides tools that improve the quality of open data, making it more useful to society and reusable by a wide range of people. I think that this idea of serving society with a free and open service that people can use to empower themselves is what attracted me at first. Providing those kinds of services with no barriers to entry (and no friction, if you will) should be the purpose of any institution. This is why I am very excited to join the Open Knowledge Foundation (opens new window) team working on this amazing project, doing what I think I can do best: interact with people and create links between them. My plan for the upcoming months is to build a proactive community around this project and engage with them. So if you are interested in knowing more about Frictionless Data, or you are already using our tools and would like to connect, email me: sara.petti@okfn.org or connect with the project on Twitter (opens new window) or Discord (opens new window).

    - + diff --git a/blog/2021/01/30/fellows-validation/index.html b/blog/2021/01/30/fellows-validation/index.html index 27eb55322..958f6d2a4 100644 --- a/blog/2021/01/30/fellows-validation/index.html +++ b/blog/2021/01/30/fellows-validation/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Learning how to validate research data - A Fellows blog

    Have you ever heard a data horror story about Excel automatically changing all numbers into dates without so much as a warning? Have you ever accidentally entered a wrong data value into a spreadsheet, or accidentally deleted a cell? What if there was an easy way to detect errors in data types and content? Well there is! That is the main goal of Goodtables, the Frictionless data validation service, and also the Frictionless-py validate function. Interested in learning more about how you can validate your data? Read on to see how the Frictionless Fellows validated their research data and learn their tips and tricks!

    TIP

    Click on the links below to read the whole blog.

    # Don’t you wish your table was as clean as mine? By Monica Granados (opens new window) (Cohort 1)

    “How many times have you gotten a data frame from a colleague or downloaded data that had missing values? Or it’s missing a column name? Do you wish you were never that person? Well introducing Goodtables – your solution to counteracting bad data frames! As part of the inaugural Frictionless Data Fellows, I took Goodtables out for a spin.”

    # Validando datos un paquete a la vez by Sele Yang (opens new window) (Cohort 1)

    “Yo trabajé con la base de datos que vengo utilizando para el programa que se encuentra en mi repositorio de Github. Es una base de datos geográficos sobre clínicas de aborto descargada desde OpenStreetMap a través de OverpassTurbo….Goodtables es una herramienta muy poderosa, que nos permite contar contar con la posibilidad de validación constante y de forma simple para mantener nuestras bases de datos en condiciones óptimas, no sólo para nuestro trabajo, sino también para la reproducción y uso de los mismos por otras personas.”

    # Tabular data: Before you use the data by Ouso Daniel (opens new window) (Cohort 1)

    “I want to talk about goodtables, a Frictionless data (FD) tool for validating tabular data sets. As hinted by the name, you only want to work on/with tabular data in good condition; the tool highlights errors in your tabular dataset, with the precision of the exact location of your error. Again, the beautiful thing about FD tools is that they don’t discriminate on your preferences, it encompasses the Linux-based CLI, Python, GUI folks, among other languages.”

    # Data Validation Of My Interview Dataset Using Goodtables by Lily Zhao (opens new window) (Cohort 1)

    “I used goodtables to validate the interview data we gathered as part of the first chapter of my PhD. These data were collected in Mo’orea, French Polynesia where we interviewed both residents and scientists regarding the future of research in Mo’orea….Amplifying local involvement and unifying the perspectives of researchers and coastal communities is critical not only in reducing inequity in science, but also in securing lasting coral reef health.”

    # Walking through the frictionless framework by Jacqueline Maasch (opens new window) (Cohort 2)

    “While the GoodTables web server is a convenient tool for automated data validation, the frictionless framework allows for validation right within your Python scripts. We’ll demonstrate some key frictionless functionality, both in Python and command line syntax. As an illustrative point, we will use a CSV file that contains an invalid element – a remnant of careless file creation.”

    # Validating your data before sharing with the community by Dani Alcalá-López (opens new window) (Cohort 2)

    “Once we have decided to share our data with the rest of the world, it is important to make sure that other people will be able to reuse it. This means providing as much metadata as possible, but also checking that there are no errors in the data that might prevent others from benefiting from our data. Goodtables is a simple tool that you can use both on the web and in the command-line interface to carry out this verification process”

    # Goodtables blog by Sam Wilairat (opens new window) (Cohort 2)

    “Now let’s try validating the same data using the Goodtables command line tool! ….Once the installation is complete, type “goodtables path/to/file.csv”. You will either receive a green message stating that the data is valid, or a red message, like the one I have shown below, showing that the data is not valid!”

    # Using goodtables to validate metadata from multiple sequencing runs by Kate Bowie (opens new window) (Cohort 2)

    “Here, I will show you how I used a schema and GoodTables to make sure my metadata files could be combined, so I can use them for downstream microbial diversity analysis….It’s extremely helpful that GoodTables pointed this ### [error] out, because if I tried to combine these metadata files in R with non-matching case as it is here, then it would create TWO separate columns for the metadata….Now I will be able to combine these metadata files together and it will make my data analysis pipeline a lot smoother.”

    # Reflecting on ‘datafication’, data prep, and UTF-8 with goodtables.io by Anne Lee Steele (opens new window) (Cohort 2)

    “Before I knew it, it was 2021, and revisiting my data in the new year has made me realize just how much time and efforts goes into cleaning, structuring, and formatting datasets – and how much more goes into making them understandable for others (i.e. through Frictionless’ data-package). I’d always thought of these processes as a kind of black box, where ‘data analysis’ simply happens. But in reality, it’s the fact that we’ve been spending so much time on preparatory work that points to how important these processes actually are: and how much goes into making sure that data can be used before analyzing it in the first place.”

    # Validate it the GoodTables way! By Evelyn Night (opens new window) (Cohort 2)

    “Errors may sometimes occur while describing data in a tabular format and these could be in the structure; such as missing headers and duplicated rows, or in the content for instance assigning the wrong character to a string. Some of these errors could be easily spotted by naked eyes and fixed during the data curation process while others may just go unnoticed and later impede some downstream analytical workflows. GoodTables are handy in flagging down common errors that come with tabular data handling as it recognises these discrepancies fast and efficiently to enable users debug their data easily. ”

    # Using the frictionless framework for data validation by Katerina Drakoulaki (opens new window) (Cohort 2)

    “Thus, similar to what the data package creator and goodtables.io (opens new window) does, frictionless detects your variables and their names, and infers the type of data. However, it detected some of my variables as strings, when they are in fact integers. Of course, goodtables did not detect this, as my data were generally -in terms of formatting- valid. Not inferring the right type of data can be a problem both for future me, but also for other people looking at my data.”

    - + diff --git a/blog/2021/02/03/january-virtual-hangout/index.html b/blog/2021/02/03/january-virtual-hangout/index.html index 423c77621..e6a69e8e0 100644 --- a/blog/2021/02/03/january-virtual-hangout/index.html +++ b/blog/2021/02/03/january-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    # A recap from our January community call

    On January 28th we had our first Frictionless Data Community Call for 2021. It was great to see it was so well attended!

    We heard a presentation by Carles Pina i Estany on schema-collaboration, a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (text, Markdown, PDF). Before this tool was developed, researchers communicated with a data manager via email for each datapackage they were publishing, which slowed down considerably the whole process, besides making it more difficult.

    To discover more about schema-collaboration, have a look at it on GitHub (opens new window) or read the blog (opens new window) Carles wrote about the project. If you would like to dive deeper and watch Carles’ presentation, you can find it here:

    # Other agenda items from our hangout

    # News from the community

    Giuseppe Peronato and cividi (opens new window) started using Frictionless Data for data pipelines using (Geo-)Spatial datasets, e.g. raster data and GeoJSONs. You can have a look here (opens new window). They have also been looking more closely at the Creator’s UI library in a prototype (opens new window) with researchers, and releasing a QGIS plugin (opens new window) for Frictionless Data.

    Thorben started working on the official vaccination publication by the German Federal Health Authority, which was replaced daily with a Data Package Pipeline saved as a Data Package by a GitHub Action. If you are interested, have a look here (opens new window).

    # Join us next month!

    Our next meeting will be on 25th February. Don’t miss the opportunity to get a code demonstration on frictionless.py (opens new window) by our very own Evgeny Karev (@roll). You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/02/04/tableschema-to-template/index.html b/blog/2021/02/04/tableschema-to-template/index.html index 3820b4e37..fec838b7e 100644 --- a/blog/2021/02/04/tableschema-to-template/index.html +++ b/blog/2021/02/04/tableschema-to-template/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    HuBMAP - Table Schema generating an Excel template

    HuBMAP (Human BioMolecular Atlas Program (opens new window)) is creating an open, global atlas of the human body at the cellular level. To do this, we’re incorporating data from dozens of different assay types, and as many institutions. Each assay type has its own metadata requirements, and Frictionless Table Schemas are an important part of our validation framework, to ensure that the metadata supplied by the labs is good.

    That system has worked well, as far as it goes, but when there are errors, it’s a pain for the labs to read the error message, find the original TSV, scroll to the appropriate row and column, re-enter, re-save, re-upload… and hopefully not repeat! To simplify that process, we’ve made tableschema-to-template (opens new window): it takes a Table Schema as input, and returns an Excel template with embedded documentation and some basic validations.

    pip install tableschema-to-template

    ts2xl.py schema.yaml new-template.xlsx

    It can be used either as a command-line tool, or as a python library. Right now the generated Excel files offer pull-downs for enum constraints, and also check that floats, integers, and booleans are the correct format, and that numbers are in bounds. Adding support for regex pattern constraints is a high priority for us… What features are important to you? Issues and PRs are welcome at the GitHub repo (opens new window).

    - + diff --git a/blog/2021/02/26/halfway-odi/index.html b/blog/2021/02/26/halfway-odi/index.html index d6879c10f..bc1707358 100644 --- a/blog/2021/02/26/halfway-odi/index.html +++ b/blog/2021/02/26/halfway-odi/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    How we are improving the quality and interoperability of Frictionless Data

    Originally published: https://blog.okfn.org/2021/02/25/how-we-are-improving-the-quality-and-interoperability-of-frictionless-data/ (opens new window)

    As we announced in January (opens new window), the Open Knowledge Foundation (opens new window) has been awarded funds from the Open Data Institute (opens new window) to improve the quality and interoperability of Frictionless Data. We are halfway through the process of reviewing our documentation and adding new features to Frictionless Data, and wanted to give a status update showing how this work is improving the overall Frictionless experience.

    We have already done four feedback sessions and have been delighted to meet 16 users from very diverse backgrounds and different levels of expertise using Frictionless Data, some of whom we knew and some not. In spite of the variety of users, it was very interesting to see a widespread consensus on the way the documentation can be improved. You can have a look at a few of the community PRs here (opens new window) and here (opens new window).

    We are very grateful to all the Frictionless Data users who took part in our sessions - they helped us see all of our guides with fresh eyes. It was very important for us to do this review together with the Frictionless Data community because they are (together with those to come) the one who will benefit from it, so are the best placed to flag issues and propose changes.

    Every comment is being carefully reviewed at the moment and the new documentation will soon be released.

    # What are the next steps?

    • We are going to have 8 to 12 more users giving us feedback in the coming month.
    • We are also adding a FAQ section based on the questions we got from our users in the past.

    If you have any feedback and/or improvement suggestions, please let us know on our Discord Channel (opens new window) or on Twitter (opens new window).

    # More about Frictionless Data

    Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

    - + diff --git a/blog/2021/03/01/february-virtual-hangout/index.html b/blog/2021/03/01/february-virtual-hangout/index.html index cb16480e4..1cf6b5ea4 100644 --- a/blog/2021/03/01/february-virtual-hangout/index.html +++ b/blog/2021/03/01/february-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    # A recap from our February community call

    On this February Community Call we had a top notch code demonstration of the new frictionless.py (opens new window) framework by our Frictionless Data senior developer Evgeny Karev. We had been looking very much forward to presenting the new framework to you all and we were very pleased that so many of you joined us. If you would like to know more about it, you can explore the new Frictionless Python framework through the documentation portal (opens new window) or on GitHub (opens new window).

    If you couldn’t make it to the call, or you are just curious and would like to go over the presentation again, here it is:

    # Other agenda items from our hangout

    Open Data Day (opens new window) is fast approaching with over 200 events organised online on March 6th. Together with the Frictionless Data Fellows (opens new window) we will be celebrating open research data. Join us online from 3pm UTC. RSVP here (opens new window) for the link to join this virtual event. This event is open to everyone.

    # Join us next month!

    Our next meeting will be on 25th March. We will hear about Hackathons to facilitate the creation of web tools to create field-specific FAIR archive files from Oleg Lavrovsky and Giuseppe Peronato.

    You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/03/05/frictionless-data-for-wheat/index.html b/blog/2021/03/05/frictionless-data-for-wheat/index.html index 4bb981b4c..edbd6bf02 100644 --- a/blog/2021/03/05/frictionless-data-for-wheat/index.html +++ b/blog/2021/03/05/frictionless-data-for-wheat/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ Figure1: A Data Package generated automatically by mod_eirods_dav

    imgblog2
    Figure2: Tabular Data Package generated automatically by mod_eirods_dav

    # Adding CKAN support

    The second of the tools that we have implemented Frictionless Data support for is the DFW CKAN website. Primarily we use this to store publications from the project output. We currently have over 300 entries in there and since its collection is getting larger and larger, we needed a more manageable way of having better data integration, especially when using other systems through the projects by our collaborators.

    So we built a simple Python Django webapp to do this:

    imgblog3

    By querying the REST API provided by CKAN and getting the datasets’ metadata as JSON output, followed by using the Frictionless CKAN Mapper (opens new window), the JSON is converted into datapackage.json, to conform with Frictionless Data standard. If any of the resources under a dataset is CSV, the headings will be extracted as the tabular data package schema (opens new window) and integrated into the datapackage.json file itself. As well as providing the datapackage.json file as a download through the Django web app, it is also possible to push the datapackage.json back to the CKAN as a resource file on the page. This requires the CKAN user key with the relevant permissions.

    imgblog4

    # How can you try this tool?

    The tool can be used by accessing its REST interface:

    • /convert?q={ckan-dataset-id} - convert CKAN dataset json to datapackage json e.g. /convert?q=0c03fa08-2142-426b-b1ca-fa852f909aa6
    • /convert_resources?q={ckan-dataset-id} - convert CKAN dataset json to datapackage json with resources, also if any of the resources files are CSV files, the tabular data package will be converted. e.g. /convert_resources?q=grassroots-frictionless-data-test
    • /convert_push?q={ckan-dataset-id}&key={ckan-user-key} - push the generated datapackage.json to the CKAN entry.
      An example REST query page:

    imgblog5

    It is possible to have your own local deployment of the tool too by downloading the web app from its Github repository, installing the requirements, and running the server with

    $manage.py runserver 8000

    Our collaborators can utilise the datapackage.json and integrate the CKAN entries to their own tools or project with ease as it conforms to the Frictionless Data standard.

    # Next Steps for Frictionlessly Designing Future Wheat

    It has been a hugely positive step to implement support for Frictionless Data Packages and we’ve already used these packages ourselves after two of our servers decided to fall over within three days of each other! Our future plans are to add support for further metadata keys within the datapackage.json files and expose more datasets as Frictionless Data Packages. For the CKAN-side, there are a few improvements that can be made in future: firstly, make the base CKAN url configurable in a config file, so this can be used for any CKAN website. Secondly, create a docker file to include the whole Django app, so it is more portable and easier to be deployed. You can keep track of the project at the following links:

    - + diff --git a/blog/2021/03/10/fellows-reproducing/index.html b/blog/2021/03/10/fellows-reproducing/index.html index abceb4ff2..1fea65404 100644 --- a/blog/2021/03/10/fellows-reproducing/index.html +++ b/blog/2021/03/10/fellows-reproducing/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Is reproducing someone else’s research data a Frictionless experience?

    The “reproducibility crisis” is a hot topic in scientific research these days. Can you reproduce published data from another laboratory? Can you follow the published scientific methods and get the same result? Unfortunately, the answer to these questions is often no.

    One of the goals of Frictionless Data is to help researchers make their work more reproducible. To achieve this, we focus on making data more understandable (make sure to document your metadata!), of higher quality (via validation checks), and easier to reuse (by standardization and packaging).

    As a test of these reproducibility measures, we tasked the Frictionless Fellows with reproducing each others’ data packages! This was a great learning experience for the Fellows and revealed some important lessons about how to make their data more (re)usable. Click on the blog links below to read more about their experiences!


    # Reproduciendo un viaje a Mo’rea by Sele Yang (opens new window) (Cohort 1)

    “Mi viaje a través de los datos de Lily, me llevó a Mo’rea, Polinesia Francesa, desde donde ella, a través de diferentes herramientas, recopiló un total de 175 entrevistas entre residentes y también investigadores/as de la región…Para reproducir los datos de Lily, utilicé inicialmente el DataPackage Creator tool para cargar su información en bruto y así empezar a revisar las especificaciones de su data type creados de manera automática por la herramienta.”


    # Packaging Ouso’s Data by Lily Zhao (opens new window) (Cohort 1)

    “This week I had the opportunity to work with my colleague’s data. He created a Datapackage which I replicated. In doing so, I learned a lot about the Datapackage web interface….Using these data Ouso and his co-authors evaluate the ability of high-resolution melting analysis to identify illegally targeted wildlife species.”


    # Data Barter: Real-life data interactions by Ouso Daniel (opens new window) (Cohort 1)

    “Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica’s work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!”


    # On README files, sharing data and interoperability by Anne Lee Steele (opens new window) (Cohort 2)

    “One of the goals of the Frictionless Data Fellowship has been to help us make our research more interoperable, which is another way of saying: something that other researchers can use, even if they have entirely different systems or tools with which they approach the same topic….What if researchers of all types wrote prototypical “data packages” about their research, that gave greater context to their work, or explained its wider relevance? In my fields, many researchers tend to find this in ‘the art of the footnote’, but this type of informal knowledge or context is not operationalized in any real way.”


    # Using Frictionless tools to help you understand open data by Dani Alcalá-López (opens new window) (Cohort 2)

    “A few weeks ago, the fellows did an interesting exercise: We would try to replicate each others DataPackages in pairs. We had spent some time before creating and validating DataPacakges with our own data. Now it was the time to see how would it be to work with someone else’s. This experience was intended to be a way for us to check how it was to be at the other side.”


    # Validating someone else’s data! By Katerina Drakoulaki (opens new window) (Cohort 2)

    “The first thing I did was to go through the README file on my fellow’s repository. Since the repository was in a completely different field, I really had to read through everything very carefully, and think about the terms they used….Validating the data (to the extent that it was possible after all) was easy using the goodtables tools.”


    # Reproducing Jacqueline’s Datapackage and Revalidating her Data! By Sam Wilairat (opens new window) (Cohort 2)

    “Using Jacqueline’s GitHub repository, Frictionless Data Package Creator, and Goodtables, I feel that I can confidently reuse her dataset for my own research purposes. While there was one piece of metadata missing from her dataset, her publicly published datapackage .JSON file on her repository helped me to quickly figure out how to interpret the unlabeled column. I also feel confident that the data is valid because after doing a visual scan of the dataset, I used the Goodtables tool to double check that the data was valid!”


    # Reproducing a data package by Jacqueline Maasch (opens new window) (Cohort 2)

    “Is it easy to reproduce someone else’s data package? Sometimes, but not always. Tools that automate data management can standardize the process, making reproducibility simpler to achieve. However, accurately anticipating a tool’s expected behavior is essential, especially when mixing technologies.”


    # Validating data from Daniel Alcalá-López by Evelyn Night (opens new window) (Cohort 2)

    “In a fast paced research world where there’s an approximate increase of 8-9% in scientific publications every year, an overload of information is usually fed to the outside world. Unfortunately for us, most of this information is often wasted due to the reproducibility crisis marred by data or code that’s often locked away. We explored the question, ‘how reproducible is your data?’ by exchanging personal data and validating them according to the instructions that are outlined in the fellows’ recent goodtables blogs.”

    - + diff --git a/blog/2021/03/29/february-virtual-hangout/index.html b/blog/2021/03/29/february-virtual-hangout/index.html index e12250798..c007dbfad 100644 --- a/blog/2021/03/29/february-virtual-hangout/index.html +++ b/blog/2021/03/29/february-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    # A recap from our March community call

    On our last Frictionless Data community call on March 25th, we dealt with a very current topic thanks to Thorben Westerhuys, who presented his project on Frictionless Vaccination data.

    To compensate for the lack of time perspective in the government data, Thorben has developed a spatiotemporal tracker for state level covid vaccination data, which takes the data provided by the government, reformats it and makes it available to everyone in a structured, more machine readable form.

    To discover more about this great project, have a look at it on GitHub (opens new window). If you would like to dive deeper and discover all the project’s applications, you can watch Thorben’s presentation here:

    # Other agenda items from our hangout

    csv,conf,v6 (opens new window) is happening on May 4-5. Registrations are open. Don’t forget to book your place!

    # Join us next month!

    Our next meeting will be on April 29th. We will hear a presentation from the Frictionless Fellows. You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

     

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/04/13/data-package-for-intermine/index.html b/blog/2021/04/13/data-package-for-intermine/index.html index ddd6f96f3..2f7748b95 100644 --- a/blog/2021/04/13/data-package-for-intermine/index.html +++ b/blog/2021/04/13/data-package-for-intermine/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Data Package for InterMine

    This blog is part of a series showcasing projects developed during the 2020-2021 Tool Fund. The Tool Fund provided five mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This Fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.

    My name is Nikhil and I am a pre-final year student pursuing M.Sc (opens new window). Economics and B.E. Computer Science from BITS Pilani, India. For my Frictionless Data Tool Fund, I worked with InterMine (opens new window) which is an open-source biological data warehouse and offers a webapp to query and download that data in multiple formats like CSV, TSV, JSON, XML, etc. However, it is sometimes difficult for new users to understand the InterMine data since it is complex and structured. Also, for developers to contribute to InterMine in a more effective way, they need to understand the data and its structure at the core of InterMine, and this can be difficult for new developers.

    To help resolve these user needs, my solution was to design a data package for InterMine and give users the option to download the data package along with the results of any query. This would help them understand the structure of the results like class and attributes by describing all the attributes and summarizing other important information such as data sources, primary key(s), etc. Also, other fields like the version of app, link to query and timestamp can help them trace any potential errors. The new feature to export data packages is available in both the old version of InterMine webapps and the new version (BlueGenes). Users can use any of the apps to build a query and then go to the results page, where they can click on the export button, which provides the option to export Frictionless Data Package (see the images below for detailed steps).

    Within InterMine, there are over 30 mines that provide biological data for organisms like flies, humans, rats, etc. For this Frictionless Tool Fund, the target audience is the InterMine community, whether it’s researchers in institutes around the world or Google Summer of Code and Outreachy applicants who can understand the process of querying and the structure of data to kickstart their contribution.

    While this Tool Fund is over, a future idea to improve this work is adding class and attribute descriptions in the data package using the configuration files in the InterMine codebase. The class description file already exists but we need to add the attribute descriptions. Another possible future expansion would be integrating this feature with one of the frictionless tools, like Goodtables. For more details, see the images below and read the documentation for the tool here (opens new window).

    Screenshot 1 : Step 1 to export data package
    screenshot1

    Screenshot 2 : Step 2 to export data package
    screenshot2

    Screenshot 3 : A sample data package
    screenshot3

    - + diff --git a/blog/2021/04/14/new-data-documentation-portal/index.html b/blog/2021/04/14/new-data-documentation-portal/index.html index 190f1ecd8..7e8af2b2f 100644 --- a/blog/2021/04/14/new-data-documentation-portal/index.html +++ b/blog/2021/04/14/new-data-documentation-portal/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Unveiling the new Frictionless Data documentation portal

    Originally published: https://blog.okfn.org/2021/04/14/unveiling-the-new-frictionless-data-documentation-portal/ (opens new window)

    Have you used Frictionless Data documentation in the past and been confused or wanted more examples? Are you a brand new Frictionless Data user looking to get started learning?

    We invite you all to read our new and improved documentation portal (opens new window)! Thanks to a fund that the Open Knowledge Foundation was awarded (opens new window) from the Open Data Institute (opens new window), we have completely reworked the guides of our Frictionless Data Framework website (opens new window) according to the suggestions from a cohort of users gathered in several feedback sessions throughout the months of February and March.

    We cannot stress enough how precious those feedback sessions have been to us. They were an excellent opportunity to connect with our users and reflect together with them on how to make all our guides more useful for current and future users. The enthusiasm and engagement that the community showed for the process was great to see and reminded us that the link with the community should be at the core of open source projects.

    We were amazed by the amount of extremely useful inputs that we got. While we are still digesting some of the suggestions and working out how to best implement them, we have made many changes to make the documentation a smoother, Frictionless experience.

    # So what’s new?

    A common theme from the feedback sessions was that it was sometimes difficult for novice users to understand the whole potential of the Frictionless specifications. To help make this clearer, we added a more detailed explanation, user examples and user stories to our Introduction (opens new window). We also added some extra installation tips and a troubleshooting section to our Quick Start guide (opens new window).

    The users also suggested several code changes, like more realistic code examples, better explanations of functions, and the ability to run code examples in both the Command Line and Python. This last suggestion was prompted because most of the guides use a mix of Command Line and Python syntax, which was confusing to our users. We have clarified that by adding a switch in the code snippets that allows user to work with a pure Python Syntax or pure Command Line (when possible), as you can see here (opens new window). We also put together an FAQ section (opens new window) based on questions that were often asked on our Discord chat (opens new window). If you have suggestions for other common questions to add, let us know!

    The documentation revamping process also included the publication of new tutorials. We worked on two new Frictionless tutorials, which are published under the Notebooks link in the navigation menu. While working on those, we got inspired by the feedback sessions and realised that it made sense to give our community the possibility to contribute to the project with some real life examples of Frictionless Data use. The user selection process has started and we hope to get the new tutorials online by the end of the month, so stay tuned!

    # What’s next?

    Our commitment to continually improving our documentation is not over with this project coming to an end! Do you have suggestions for changes you would like to see in our documentation? Please reach out to us or open a pull request (opens new window) to contribute. Everyone is welcome to contribute! Learn how to do it here (opens new window).

    # Thanks, thanks, thanks!

    Once again, we are very grateful to the Open Data Institute for giving us the chance to focus on this documentation in order to improve it. We cannot thank enough all our users who took part in the feedback sessions, your contributions were precious.

    # More about Frictionless Data

    Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

    - + diff --git a/blog/2021/05/03/april-virtual-hangout/index.html b/blog/2021/05/03/april-virtual-hangout/index.html index 6a1ca9a8a..d359d8d82 100644 --- a/blog/2021/05/03/april-virtual-hangout/index.html +++ b/blog/2021/05/03/april-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last Frictionless Data community call on April 29th we had an interactive session with our great Frictionless Data Fellows: Daniel Alcalá López, Kate Bowie, Katerina Drakoulaki, Anne Lee, Jacqueline Maasch, Evelyn Night and Samantha Wilairat.

    The Fellows are early career researchers recruited to become champions of the Frictionless Data tools and approaches in their field. During the nine months of their fellowship, which started in August 2020, the Fellows learned how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. It was a real pleasure to work with this amazing cohort. Sadly the fellowship is coming to an end, but we are sure we will hear a lot from them in the future.

    You can learn more about them here (opens new window), and read all the great blogs they wrote here (opens new window).

    If you would like to hear directly from the Fellows about their experience with Frictionless Data and what the fellowship meant for them, you can have a look at the presentation they made during the community call here below:

    # Other agenda items from our hangout

    csv,conf,v6 (opens new window) is happening on May 4-5. It is free and virtual - register here (opens new window). There are two Frictionless sessions:

    • May 4th: Frictionless Data workshop led by the Reproducible Research fellows, don’t miss the opportunity to meet the Fellows again!
    • May 5th: Frictionless Data for Wheat by Simon Tyrrell

    Full programme here: https://csvconf.com/speakers (opens new window)

    # News from the Community

    Oleg Lavrovsky presented instant APIs for small Frictionless Data-powered apps. Here (opens new window) is an example app developed during the latest Swiss OpenGLAM hackathon. To know more about it, you can also check:

    # Join us next month!

    Our next meeting will be on May 27th. We will hear a presentation from Simon Tyrrell on his Tool Fund project - Frictionless Data for Wheat. You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/06/01/may-virtual-hangout/index.html b/blog/2021/06/01/may-virtual-hangout/index.html index bc07c2eb0..3db77de2f 100644 --- a/blog/2021/06/01/may-virtual-hangout/index.html +++ b/blog/2021/06/01/may-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -117,6 +117,6 @@

    On our last Frictionless Data community call on May 29th we had Simon Tyrrell and Xingdong Bian from the Earlham Institute giving a presentation on Frictionless Data for Wheat. The project was developed during the Frictionless Toolfund 2020-2021.

    Simon and Xingdong are part of the Designing Future Wheat, a research group studying how to increment the amount of wheat that is produced in a field in order to meet the global demand by 2050. To run the project, they collect a great amount of data and large scale datasets, which are shared with a great number of different users. Frictionless Data is used to make that data available, usable and interoperable for everyone.

    You can learn more about the Designing Future Wheat project here (opens new window). If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Simon’s and Xingdong’s presentation here:

    # Other agenda items from our hangout

    We are super happy to share with you Frictionless Repository - a Github Action for the continuous data validation of your repo (opens new window).
    We are actively looking for feedback, so please let us know what you think.

    # Join us next month!

    Our next meeting will be on June 24th. We will hear a presentation from
    Nikhil Vats on Frictionless Data Package for InterMine. You can sign up here. (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

     

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/06/16/new-changes-to-the-website/index.html b/blog/2021/06/16/new-changes-to-the-website/index.html index 5ba90cb9e..9a15e3691 100644 --- a/blog/2021/06/16/new-changes-to-the-website/index.html +++ b/blog/2021/06/16/new-changes-to-the-website/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Announcing New Changes to Our Website

    Have you noticed some changes to our website? Building upon last year’s website redesign (opens new window), we have finished making some new changes that we are very excited to tell you about! When we started reviewing our documentation for the Frictionless Python Framework (opens new window) with the support of the ODI (opens new window) back in January, we quickly realised that our main website could benefit from some revamping as well, in order to make it more user-friendly and easier to navigate.

    We needed to clarify the relationship between our main project website and the website of all our Frictionless standards, software, and specifications, which all had different layouts and visual styles. The harmonisation process is still ongoing, but we are already very happy with the fact that the new website offers a comprehensive view of all our tools.

    It was important for us that people visiting our website for the very first time could quickly understand what Frictionless Data is and how it can be useful to them. We did that through a reorganisation of the homepage and the navigation, which was a bit confusing for some users. We also updated most of the text to better reflect the current status of the project, but also to clearly state what Frictionless Data is. Users should now be able to understand in a glance that Frictionless is composed of two main parts, software (opens new window) and standards (opens new window), which make it more accessible for a broad range of people working with data.

    Schermata 2021-06-16 alle 15 03 47

    Users will also easily find examples of projects and collaborations that adopted Frictionless (opens new window), which can be very useful to better understand the full potential of the Frictionless toolkit.

    Our goal with this new website is to give visitors an easier way to learn about Frictionless Data, encourage them to try it out and join our great community. The new architecture should reflect that, and should make it easier for people to understand that Frictionless Data is a progressive open-source framework for building data infrastructure, aiming at making it easier to work with data. Being an open-source project, we welcome and cherish everybody’s contribution. Talking about that, we would love to hear your feedback! Let us know what you think about the new website, if you have any comments or if you see any further improvement we could make. We have created a GitHub issue (opens new window) you can use to give us your thoughts.

    Thank you!

    - + diff --git a/blog/2021/06/22/livemark/index.html b/blog/2021/06/22/livemark/index.html index b5fce6310..a539c24b6 100644 --- a/blog/2021/06/22/livemark/index.html +++ b/blog/2021/06/22/livemark/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Welcome Livemark - the New Frictionless Data Tool

    We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Livemark. What is that? Livemark is a great tool that allows you to publish data articles very easily, giving you the possibility to see your data live on a working website in a blink of an eye.

    # How does it work?

    Livemark is a Python library generating a static page that extends Markdown with interactive charts, tables, scripts, and much much more. You can use the Frictionless framework as a frictionless variable to work with your tabular data in Livemark.

    Livemark offers a series of useful features, like automatically generating a table of contents and providing a scroll-to-top button when you scroll down your document. You can also customise the layout of your newly created webpage.

    # How can you get started?

    Livemark is very easy to use. We invite you watch this great demo by developer Evgeny Karev:

     

    You can also have a look at the documentation on GitHub (opens new window).

    # What do you think?

    If you create a site using Livemark, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord (opens new window) or by opening an issue in the GitHub repo (opens new window).

    - + diff --git a/blog/2021/06/25/june-virtual-hangout/index.html b/blog/2021/06/25/june-virtual-hangout/index.html index 0a4af0b11..bbd510c45 100644 --- a/blog/2021/06/25/june-virtual-hangout/index.html +++ b/blog/2021/06/25/june-virtual-hangout/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    At our last Frictionless Data community call on June 24th we had Nikhil Vats giving a presentation on Frictionless Package for InterMine. The project was developed during the Frictionless Toolfund 2020-2021.

    InterMine is an open source biological data warehouse that creates databases of biological data accessed by sophisticated web query tools. Nikhil worked on the Frictionless Data Package integration, which is extremely helpful for users, as it describes all the fields of their query, specifically: name of field, type of field, class path, field and class ontology link.

    You can learn more about the Data Package for InterMine project here (opens new window). If you would like to dive deeper and discover all about the Frictionless implementation, you can watch Nikhil Vats’ presentation here:

    # Other agenda items from our hangout

    # Linked data support

    Nikhil’s presentation naturally led to a discussion on adding support for linked data and ontologies to Frictionless Data. On several occasions the community has shown interest in extending Frictionless specifications by incorporating standard attributes like ontology terms for improved interoperability. There have also been several discussion about supporting JSON-LD or RDF in the main specifications for improved data linking and querying. Would this help your work? Let us know what you think and if you are potentially interested in participating in this project.

    # New tool: Livemark

    We are super happy to share with you the newest entry in the Frictionless Data toolkit: Livemark - a static page generator with built-in tables and charts support (with support for data processing and validation with Frictionless): https://frictionlessdata.github.io/livemark/ (opens new window)

    To know more about it, check out our latest blog (opens new window) (featuring a great demo by developer Evgeny Karev).

    As usual, we would love to hear what you think, so please share your thoughts, comments and feedback with us.

    # News from the community

    Michael Amadi from Nimble Learn presented the Open Data Blend project (opens new window) - a set of open data services that aim to make large and complex UK open data easier to analyse. Open Data Blend’s bulk data API is built on the Frictionless Data specs. Keep an eye out for an upcoming blog with more details!

    Frictionless contributor Peter Desmet proposed to start a Frictionless Data community on Zenodo. We are currently discussing the best way to do that on Discord (opens new window) in the datasets channel. Join us there if you are interested or have ideas!

    # Join us next month!

    Our next meeting will be on July 29th. We will hear a presentation from
    Dave Rowe on Public Libraries Open Data Schema. You can sign up here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/06/28/frictionless-specs-european-commission/index.html b/blog/2021/06/28/frictionless-specs-european-commission/index.html index 4f42cef1f..a8f1ea49f 100644 --- a/blog/2021/06/28/frictionless-specs-european-commission/index.html +++ b/blog/2021/06/28/frictionless-specs-european-commission/index.html @@ -38,7 +38,7 @@ - + @@ -119,6 +119,6 @@ tabular-data

    Do you remember Costas Simatos (opens new window)? He introduced the Frictionless Data community to the Interoperability Test Bed (opens new window) (ITB), an online platform that can be used to test systems against technical specifications — curious minds will find a recording of his presentation on the subject available on YouTube (opens new window). Amongst the tools it offers, there is a CSV validator (opens new window) which relies on the Table Schema specifications (opens new window). Those specifications filled a gap that the RFC 4180 (opens new window) didn’t address by having a structured way of defining the content of individual fields in terms of data types, formats and constraints, which is a clear benefit of the Frictionless specifications as reported back in 2020 when a beta version of the CSV validator was launched (opens new window).


    Frictionless specifications are flexible while allowing users to define unambiguously the expected content of a given field, therefore they were officially adopted to realise the validator for the Kohesio pilot phase of 2014-2020 (opens new window), Kohesio (opens new window) being the “Project Information Portal for Cohesion Policy”. The Table Schema specifications made it easy and convenient for the Interoperability Test Bed to establish constraints and describe the data to be validated in a concise way based on an initial set of CSV syntax rules (opens new window), converting written and mostly non-technical definitions to their Frictionless equivalent. Using simple JSON objects, Frictionless specifications allowed the ITB to enforce data validation in multiple ways as can be observed from the schema used for the CSV validator (opens new window). The following list of items calls attention to the core aspects of the Table Schema standard that were taken advantage of:

    • Dates can be defined with string formatting (e.g. %d/%m/%Y stands for day/month/year);
    • Constraints can indicate whether a column can contain empty values or not;
    • Constraints can also specify a valid range of values (e.g. "minimum": 0.0 and "maximum": 100.0);
    • Constraints can specify an enumeration of valid values to choose from (e.g. "enum" : ["2014-2020", "2021-2027"]).
    • Constraints can be specified in custom ways, such as with regular expressions (opens new window) for powerful string matching capabilities;
    • Data types can be enforced for any column;
    • Columns can be forced to adapt a specific name and a description can be provided for each one of them.

    Because these specifications can be expressed as portable text files, they became part of a multitude of tools to provide greater convenience to users and the validation process has been documented extensively (opens new window). JSON code snippets from the documentation highlight the fact that this format conveys all the necessary information in a readable manner and lets users extend the original specifications as needed. In this particular instance, the CSV validator can be used as a Docker image (opens new window), as part of a command-line application (opens new window), inside a web application (opens new window) and even as a SOAP API (opens new window).

    Frictionless specifications were the missing piece of the puzzle that enabled the ITB to rely on a well-documented set of standards for their data validation needs. But there is more on the table (no pun intended): whether you need to manage files, tables or entire datasets, there are Frictionless standards to cover you. As the growing list of adopters and collaborations demonstrates, there are many use cases to make a data project shine with Frictionless.

    Are you working on a great project that should become the next glowing star in the world of Frictionless Data? Feel free to reach out to spread the good news!

    - + diff --git a/blog/2021/07/02/farewell-fellows/index.html b/blog/2021/07/02/farewell-fellows/index.html index fcfafe40f..13a8ab01a 100644 --- a/blog/2021/07/02/farewell-fellows/index.html +++ b/blog/2021/07/02/farewell-fellows/index.html @@ -38,7 +38,7 @@ - + @@ -117,6 +117,6 @@ “The fellowship was both exhilarating and educative. I got to engage in Open Science conversations, learned about and used frictionless tools like the Data Package Creator and Goodtables. I also navigated the open data landscape using CLI, Python, and git. I also got to engage in the Frictionless Community calls where software geniuses presented their work and also held Open science-centered conversations. These discussions enhanced my understanding of the Open Science movement and I felt a great honor to be involved in such meetings. I learned so much that the 9 months flew by.”

  • A fellowship concludes - by Jacqueline Maasch (opens new window)
    “It is hard to believe that my time as a Reproducible Research Fellow is over. I am most grateful for this program giving me a dedicated space in which to learn, a community with which to engage, and language with which to arm myself. I have been exposed to issues in open science that I had never encountered before, and have had the privilege of discussing these issues with people from across the world. I will miss the journal clubs the most!”

  • My experience in the fellows program - a reflection - by Katerina Drakoulaki (opens new window)
    “I got into the fellowship just with the hope of getting the opportunity to learn things I didn’t have the opportunity to learn on my own. That is, I did not have specific expectations, I was (and still am) grateful to be in. I feel that all the implicit expectations I might have had are all fulfilled. I got an amazing boost in my digital skills altogether and I know exactly why (no I did not gain a few IQ points). I was in a helpful community and I matured in a way that enabled me to have more of a growth mindset. I also saw other people ‘fail’, as in having their code not working and having to google the solution! I have to say all the readings, the discussions, the tutorials, the Frictionless tools have been amazing, but this shift in my mindset has been the greatest gift the fellowship has given me.”

  • Thank you Fellows! As a bonus, here are the reflections from the first cohort of Fellows: https://blog.okfn.org/2020/06/09/reflecting-on-the-first-cohort-of-frictionless-data-reproducible-research-fellows/ (opens new window)

    - + diff --git a/blog/2021/07/12/open-data-blend/index.html b/blog/2021/07/12/open-data-blend/index.html index 4fee98ee0..1a815bed1 100644 --- a/blog/2021/07/12/open-data-blend/index.html +++ b/blog/2021/07/12/open-data-blend/index.html @@ -38,7 +38,7 @@ - + @@ -127,6 +127,6 @@ # Check the contents of the dataframe df_date

    You can learn more about the opendatablend package here (opens new window).

    To further reduce the time to value and to make the open data insights more accessible, the Open Data Blend Analytics (opens new window) service can be used with business intelligence (BI) tools like Excel, Power BI Desktop, and Tableau Desktop to directly analyse the data over a live connection. Depending on the use case, this can remove the need to work with the data files altogether.

    open-data-blend-excel-experience

    # Want to Learn More About Open Data Blend?

    You can visit the Open Data Blend website here (opens new window) to learn more about the services. We also have some comprehensive documentation available here (opens new window), where Frictionless Data specific documentation can be found here (opens new window). If you would like to contribute to the project, you can find out how here (opens new window).

    Follow us on Twitter @opendatablend (opens new window) to get our latest news, feature highlights, thoughts, and tips.

    - + diff --git a/blog/2021/07/21/frictionless-repository/index.html b/blog/2021/07/21/frictionless-repository/index.html index 9ce121d22..93e5610df 100644 --- a/blog/2021/07/21/frictionless-repository/index.html +++ b/blog/2021/07/21/frictionless-repository/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Repository

    Are you looking for a way to automate the validation workflows of your datasets? Look no further, Frictionless Repository is here!

    We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Frictionless Repository. This is a Github Action allowing the continuous data validation of your repository and it will ensure the quality of your data by reporting any problems you might have with your datasets in no time.

    # How does it work?

    Every time you add or update any tabular data file in your repository, Frictionless Repository runs a validation. Missing header? Data type mismatch? You will get a neat, visual, human-readable validation report straight away, which will show any problems your data may have. The report lets you spot immediately where the error occurred, making it extremely easy to correct it. You can even get a Markdown Badge to display in your repository to show that your data is valid.

    Frictionless Repository only requires a simple installation. It is completely serverless, and it doesn’t rely on any third-party hardware except for the Github infrastructure.

    # Let’s go!

    Before you get started, have a look at developer Evgeny Karev’s demo:

     

    We also encourage you to check out the dedicated documentation website (opens new window), to get more detailed information.

    # What do you think?

    If you use Frictionless Repository, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord (opens new window) or by opening an issue in the GitHub repo (opens new window).

    - + diff --git a/blog/2021/08/02/apply-fellows/index.html b/blog/2021/08/02/apply-fellows/index.html index 9001a297b..b1671f3b2 100644 --- a/blog/2021/08/02/apply-fellows/index.html +++ b/blog/2021/08/02/apply-fellows/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Apply Now - become a Frictionless Data Reproducible Research Fellow

    The Frictionless Data Reproducible Research Fellows Program (opens new window), supported by the Sloan Foundation, aims to train graduate students, postdoctoral scholars, and early career researchers how to become champions for open, reproducible research using Frictionless Data tools and approaches in their field.

    # Apply today to join the Third Cohort of Frictionless Data Fellows!

    Fellows will learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. In addition to mentorship, we are providing Fellows with stipends of $5,000 to support their work and time during the nine-month long Fellowship. We welcome applications using this form (opens new window) from 4th August until 31st August 2021, with the Fellowship starting in October. We value diversity and encourage applicants from communities that are under-represented in science and technology, people of colour, women, people with disabilities, and LGBTI+ individuals. Questions? Please read the FAQ (opens new window), and feel free to email us (frictionlessdata@okfn.org) if your question is not answered in the FAQ.

    # Frictionless Data for Reproducible Research

    The Fellowship is part of the Frictionless Data for Reproducible Research (opens new window) project at Open Knowledge Foundation (opens new window), and is the third iteration. Frictionless Data aims to reduce the friction often found when working with data, such as when data is poorly structured, incomplete, hard to find, or is archived in difficult to use formats. This project, funded by the Sloan Foundation and the Open Knowledge Foundation, applies our work to data-driven research disciplines, in order to help researchers and the research community resolve data workflow issues. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata. The Frictionless Data approach aims to address identified needs for improving data-driven research such as generalized, standard metadata formats, interoperable data, and open-source tooling for data validation.

    # Fellowship program

    During the Fellowship, our team will be on hand to work closely with you as you complete the work. We will help you learn Frictionless Data tooling and software, and provide you with resources to help you create workshops and presentations. Also, we will announce Fellows on the project website and will be publishing your blogs and workshops slides within our network channels. We will provide mentorship on how to work on an Open project, and will work with you to achieve your Fellowship goals. You can read more about the first two cohorts of the Programme in the Fellows blog: http://fellows.frictionlessdata.io/blog/ (opens new window).

    # How to apply

    The Fund is open to early career research individuals, such as graduate students and postdoctoral scholars, anywhere in the world, and in any scientific discipline. Successful applicants will be enthusiastic about reproducible research and open science, have some experience with communications, writing, or giving presentations, and have some technical skills (basic experience with Python, R, or Matlab for example), but do not need to be technically proficient. If you are interested, but do not have all of the qualifications, we still encourage you to apply (opens new window). We welcome applications using this form (opens new window) from 4th August until 31st August 2021.

    If you have any questions, please email the team at frictionlessdata@okfn.org and check out the Fellows FAQ section (opens new window). Apply (opens new window) soon, and share with your networks!

    - + diff --git a/blog/2021/08/06/recap-community-calls/index.html b/blog/2021/08/06/recap-community-calls/index.html index b652ee05a..bde656ccd 100644 --- a/blog/2021/08/06/recap-community-calls/index.html +++ b/blog/2021/08/06/recap-community-calls/index.html @@ -38,7 +38,7 @@ - + @@ -117,6 +117,6 @@

    We are halfway through 2021 (aka 2020 part two), and we thought it would be a good moment to look back at all that has happened in the Frictionless Community over these past 6 months. We’re so grateful for everyone in the community - thanks for your contributions, discussions, and participation! A big part of the community is our monthly call, so in case you’ve missed any of the community calls of 2021, here is a quick recap.

    We started the year with a great presentation by Carles Pina i Estany. Carles is a very active member of our community and also a tool-fund grantee. He presented his tool-fund project: Frictionless schema-collaboration (opens new window). What is that? It’s a system that uses Data Package Creator to enable data managers and researchers to create and share dataset schemas, edit them, post messages and export the schemas in different formats (like text, Markdown or PDF). It is a very useful tool because before researchers communicated with data managers via email for each data package they were publishing. Frictionless schema-collaboration makes it easy and faster to communicate.

    February was a great month, we started improving the documentation of the Frictionless Framework website (opens new window) together with the community and we had a brilliant code demonstration of the newly-released Frictionless Python Framework by senior developer Evgeny Karev at the monthly community call. How great was that? That particular call broke the record of attendance, it was fantastic to have so many of you there! And in case you were not there, we recorded Evgeny’s demo and you can watch it on YouTube.

    March marked one year since the beginning of the Covid-19 pandemic in Europe and the Americas. It seemed fair to dedicate that community call to Covid-19 data, so we had Thorben Westerhuys presenting his project on Frictionless vaccination data. Thorben developed a spatiotemporal tracker for state level covid vaccination data in Germany (opens new window) to solve the problems linked to governments publishing vaccination data not parsed for machines. His vaccination scraper takes that data, reformats it and makes it available to everyone in a structured, more machine readable form.

    At the end of April we had an interactive session with the Frictionless Fellows (opens new window). Daniel Alcalá López, Kate Bowie, Katerina Drakoulaki, Anne Lee, Jacqueline Maasch, Evelyn Night and Samantha Wilairat took some time to tell the community about their journey through Open Science. They also shared with the community some of the things they learnt during their 9-months fellowship and how they plan to integrate them to their work. This cohort of fellows made us very proud, they were a true joy to work with. Keep an eye on them all, they will be leaders in Open Science! And in case you are interested in becoming a Frictionless Fellow, we are currently recruiting the 3rd cohort. More info on the programme and how to apply here (opens new window).

    During the April call we also got a short presentation on instant APIs for small Frictionless Data-powered apps by Oleg Lavrovsky. Oleg is also an active member of our community, you have probably already met him at many of our calls.

    May started gloriously with csv,conf, where we had two talks on Frictionless Data. One was by the Fellows, and the other one was by Simon Tyrrell. On top of the one at csv,conf, Simon gave a presentation together with Xingdong Bian about their Frictionless Data for Wheat project (opens new window) at the monthly call. Simon and Xingdong are researchers at the Earlham Institute, and they are both tool-fund grantees, like Carles. They presented their project to the community and explained how they use Frictionless Data to make their large amount of data available, usable and interoperable for everyone.

    The last call we had was in June, also featuring a tool-fund grantee: Nikhil Vats. Nikhil presented the Frictionless Data Package integration he developed for InterMine (opens new window), an open source biological data warehouse that creates databases of biological data accessed by sophisticated web query tools. Nikhil’s integration makes users’ queries more useful, as it describes all the fields of their query, specifically: name of field, type of field, class path, field and class ontology link.
    In the same call, Michael Amadi announced the release of Data Blend, a great project using Frictionless Data. If you find it cool and would like to know more about it, read this case-study (opens new window), but also make sure you don’t miss the October community call, because we will be hearing a presentation on it!

    July’s call was canceled last minute, but it has been rescheduled to August 12th, and it’s going to be extremely interesting! In case you did not sign up yet, please do here (opens new window). We will be hearing from Dave Rowe (aka Libraries Hacked (opens new window)) and how he uses Frictionless Data specs and standards for public libraries open data.
    This first 2021 semester was also great because we completed our website redesign (opens new window) and we added two great tools to the Frictionless Data toolkit: Livemark (opens new window) and Frictionless Repository (opens new window). These tools get better and better everyday thanks to the precious contributions of the community. Thanks to you all, for making the Frictionless Data project so great. Nothing could have happened without you!

    - + diff --git a/blog/2021/08/09/dryad-pilot/index.html b/blog/2021/08/09/dryad-pilot/index.html index f99abc61b..12f1b8e46 100644 --- a/blog/2021/08/09/dryad-pilot/index.html +++ b/blog/2021/08/09/dryad-pilot/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Data and Dryad join forces to validate research data

    What happens to scientific data after it is generated? The answer is complicated - sometimes that data is shared with other researchers, sometimes it is hidden away on a private hard drive. Sharing research data is a key part of open science, the movement to make research more accessible and usable by everyone to drive faster advances in science. A great way to share research data is to upload it to a repository, but simply uploading data is not the final step here. Ideally, the uploaded data will be of high quality - that is, it won’t have errors or missing data, and it will have enough descriptive information that other researchers can also use it! Over the last 6 months, we collaborated with the data repository Dryad to make it easier for researchers to upload their high quality data for sharing.

    Dryad (opens new window) is a community-led data repository that allows researchers to submit data from any field, which not only promotes open science, but also helps researchers comply with open data policies from funders and journals. Because Dryad accepts all kinds of data, they need to curate that data for quality and ensure that the data does not present risk, and have comprehensive metadata to reuse the data. We quickly realized our shared goals, and formed a Pilot collaboration to add Frictionless validation functionality to the Dryad data upload page. Both teams agreed how important it is to give researchers immediate feedback about their data as they are submitting it so they can make edits in that moment, and learn about data best practices.

    The outcome of this collaboration is a revamped upload page for the Dryad application. Researchers uploading tabular data (CSV, XLS, XLSX) under 25MB will have the files automatically validated using the Frictionless tool. These checks are based on the built-in validation of Frictionless Framework (read the validation guide here (opens new window)), and include checking for data errors such as blank cells, missing headers, or incorrectly formatted data. The Frictionless report will help guide researchers on which issues should be resolved, allowing researchers to edit and re-upload files before submitting their dataset for curation and publication.

    Screen Shot 2021-08-06 at 8 10 41 AM
    When a data file is uploaded, researchers can see if the data passed the Tabular Data Checks or if there are any issues. Clicking to “View 1 Issues” shows more details describing the error.

    Screen Shot 2021-08-06 at 8 12 01 AM
    This uploaded data file has a blank header. With this information, the researcher can fix the error and re-upload the data.

    This work was funded by the Sloan Foundation as part of the Frictionless Data for Reproducible Research project. This project was truly collaboratory - most of the technical work was completed by contractor Cassiano Reinert Novais dos Santos with supervision and support from the Dryad team: Daniella Lowenberg, Scott Fisher, Ryan Scherle, and the CDL UX team (Rachael Hu and John Kratz); as well as support from the Frictionless team, Evgeny Karev, Lilly Winfree, and Sara Petti. If you have any feedback on the Dryad upload page, please let us know!

    - + diff --git a/blog/2021/08/16/august-12-call/index.html b/blog/2021/08/16/august-12-call/index.html index e7eb3df88..c5021041e 100644 --- a/blog/2021/08/16/august-12-call/index.html +++ b/blog/2021/08/16/august-12-call/index.html @@ -38,7 +38,7 @@ - + @@ -120,6 +120,6 @@ More info here (opens new window)
    You can apply via this form (opens new window).

    # Join us in 2 weeks!

    Yes, that’s right, August is our lucky month, we don’t have one, but two community calls! Our next meeting will be in just 2 weeks, on August 26th. We will hear a presentation from
    Amber York and Adam Shepherd from BCO-DMO on Frictionless Data Pipelines. You can sign up here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/08/31/august-26-community-call/index.html b/blog/2021/08/31/august-26-community-call/index.html index 14933cc7f..ae21013ca 100644 --- a/blog/2021/08/31/august-26-community-call/index.html +++ b/blog/2021/08/31/august-26-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -118,6 +118,6 @@ Together with the Frictionless Data team at Open Knowledge Foundation, BCO-DMO developed Laminar, a web application to create Frictionless Data Package Pipelines. Laminar helps data managers process data efficiently while recording the provenance of their activities to support reproducibility of results

    You can learn more on the project here (opens new window). If you would like to dive deeper and discover all about Frictionless Data Pipelines, you can watch Amber York’s and Adam Shepherd’s presentation:

    # Other agenda items from our hangout

    # Frictionless Hackathon on 7-8 October!

    Join the Frictionless Data community for a two-day virtual event to create new project prototypes based on existing Frictionless open source code. It’s going to be fun!
    We are currently accepting project submissions, so if you have a cool project in mind, using based on existing Frictionless open source code, this could be an excellent opportunity to prototype it, together with other Frictionless users from all around the world. You can pitch anything - your idea doesn’t need to be complete/fully planned. We can also help you formulate a project if you have an idea but aren’t sure about it. You can also submit ideas for existing projects you need help with!

    Use this form (opens new window) to submit your project.
    Keep an eye on the website (opens new window) for more info.

    # Join us next month!

    Our next meeting will be on September 30th, exceptionally one hour later than usual. We will hear a presentation from Daniella Lowenberg and Cassiano Reinert Novais dos Santos on the Frictionless Data validation implemented for the Dryad application.

    You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/09/30/hackathon-preview/index.html b/blog/2021/09/30/hackathon-preview/index.html index b845113ec..61f05832a 100644 --- a/blog/2021/09/30/hackathon-preview/index.html +++ b/blog/2021/09/30/hackathon-preview/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Why you should join the Frictionless Data Hackathon

    The Frictionless Data Online Hackathon is fast approaching and we just can’t wait for it to start!

    If you are not sure yet whether to participate or not, bear in mind that it will be a great opportunity to test some of the newest Frictionless tools, like Livemark, Repository, play around with Frictionless-py and other new Frictionless code. It will also be a great chance for you to meet other Frictionless users and contributors from all around the world and build a project prototype together.

    Not convinced yet? Go and explore the proposed projects on the Dashboard (opens new window)! You will see, there is a project for every taste, so surely there must be one that sounds right for you!

    Are you a big fan of geodata? In that case you will probably want to join the frictionless-geojson team (opens new window), who is planning to create a frictionless-py plugin to add support for reading, writing and inlining geojson. If you are a devoted CKAN user who would like to see more Frictionless functionalities in it, you may decide to join the Data package manager for CKAN project (opens new window).

    In case you read our blog about Livemark (opens new window) and have been intrigued by this new Frictionless tool ever since, your moment has come! You can finally try it out by joining the Citation Context Reports (opens new window), the Dataset List (opens new window) project, or the Frictionless Community Insights (opens new window) project. If you are interested in datasets discoverability and linkage, you may want to join the Things not Datasets (opens new window) team.

    Oh, and please let us know in advance if you are a big bugs smasher! You will be a coveted participant for all projects and we need to make sure everybody gets a fair share of your skills, including us in our effort to improve the Frictionless Python Framework (opens new window).

    But enough of describing the projects, instead hear about them directly from the people who proposed them:

    Hurry up to register for the hackathon if you haven’t done so yet, you can do it only until the end of this week via this form (opens new window)

    More information on the Frictionless Data Hackathon is available on the dedicated webpage (opens new window). You can also follow news on the day itself through Twitter (opens new window): #FrictionlessHackathon and #FrictionlessHack2021.

    - + diff --git a/blog/2021/10/06/september-community-call/index.html b/blog/2021/10/06/september-community-call/index.html index 8695b4e25..2d65e616c 100644 --- a/blog/2021/10/06/september-community-call/index.html +++ b/blog/2021/10/06/september-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -118,6 +118,6 @@ Go and explore the dashboard to know more about all the projects we plan to work on.
    For general information, just go to the dedicated page (opens new window).
    We are accepting last minute registrations via this form (opens new window), so hurry up if you want to be on board!

    # Join us next month!

    Our next meeting will be on October 28th. We will hear a presentation from Michael Amadi on Open Data Blend datasets powered by Frictionless Data.

    Ahead of our next call, you can learn more about Open Data Blend here (opens new window)

    You can sign up here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/10/13/hackathon-wrap/index.html b/blog/2021/10/13/hackathon-wrap/index.html index c09d46147..d26e6e58b 100644 --- a/blog/2021/10/13/hackathon-wrap/index.html +++ b/blog/2021/10/13/hackathon-wrap/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Wrapping up the Frictionless Hackathon

    The first (of many we hope!) Frictionless Data Hackathon is over, and it was great! Many thanks to all who helped make it such a success the past week.

    The prize for the best project, voted by the participants, went to the DPCKAN team. Well done André, Andrés, Carolina, Daniel, Francisco and Gabriel!
    ”I feel pretty happy after this frictionless hackathon experience. We’ve grown in 2 days more than it could have been possible in one month. The knowledge and experience exchange was remarkable.”, said the winning team.

    It was also great to see participants who had never taken part in a hackathon before being enthusiastic about it. ”I loved the helpfulness of the community members, as well as the diversity of participants.”

    “It was such a great opportunity to network with other people interested in data quality and open data!”

    ”It was amazing to see a weightless tool used in development. I want to learn more about it and integrate it into my projects.”

    Over 20 people signed up for the hackathon from Africa, Asia, Europe, South America and North America. We had a very diverse audience and saw a lot of new faces. The event ran from 7th to 8th October on our Discord server. The result of those 2 days of intense collaboration were four great projects:

    # DPCKAN

    The DPCKAN project was proposed by a team working on the data portal of the state of Minas Gerais in Brazil. To ensure quality metadata and automate the publishing process, the team decided to develop a tool that would allow publishing and updating datasets described with Frictionless Standards in a CKAN instance.

    The main objectives for the hackathon were to refine the package update functions and clean up the documentation.

    You can check out the project’s GitHub repository (opens new window) to see the improvements that were made during the hackathon.

    # Frictionless Tutorials

    The main objective of this project was to write new tutorials using the Python Frictionless Framework. The team not only created a tutorial, but also wrote more detailed instructions (opens new window) on how to create new tutorials for future contributors.

    You can have a look at the tutorial written during the hackathon here (opens new window).

    # Covid tracker

    The main objective of this project was to test Livemark, one of the newest Frictionless tools, with real data and provide an example of all its functionalities. Besides the charts and tables, the information is available on an interactive map, which also takes into account the accuracy of the official data.

    You can have a look at the Covid Tracker here (opens new window).

    # Frictionless Community Insight

    The objective of this project, proposed by the Frictionless core team, was to build a Livemark (opens new window) website telling a story about the Frictionless Data community using the data from the community survey we ran in September.

    The main goals for the hackathon were to clean the data from the survey, visualise it and display it as a story on the Livemark website.

    You can have a look at the draft website (opens new window).

    Four other great projects started the hackathon but did not finish it:

    Dataset List, another Livemark project to list all the datapackages on GitHub, Frictionless Geojson, an extension to add GeoJSON read and write support in frictionless-py, Improve Frictionless Data Python Framework, a project to get familiar with the codebase, and Citation Context Reports, a project to create Frictionless data schemas for scholarly citations data.

    Interestingly, one of the participants started off his own project during the hackathon, building a Discord matrix bridge to allow Frictionless users and contributors to join the community Discord chat using an Open standard. Even if the Matrix did not participate in the voting, it still is a notable project. If you are interested in knowing more about it you can have a look at this GitHub issue (opens new window).

    On the last day of the hackathon, one hour before the end of the event, the teams pitched their projects. Here’s a recording of the event if you missed it and want to have a look:

    Thanks again to all those who took part in the hackathon and contributed with their time and enthusiasm to make it so great. We can’t wait for the next hack already!

    - + diff --git a/blog/2021/11/03/october-community-call/index.html b/blog/2021/11/03/october-community-call/index.html index 3adbf7c92..9810793cb 100644 --- a/blog/2021/11/03/october-community-call/index.html +++ b/blog/2021/11/03/october-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last Frictionless Data community call on October 28th we had Michael Amadi from Nimble Learn giving a presentation on Open Data Blend and their Frictionless Data journey.

    Open Data Blend is a set of open data services that aim to make large and complex UK open data easier to analyse. The Open Data Blend datasets have two interfaces: a UI and an API, both powered by Frictionless Data. The datasets themselves are built on top of three Frictionless Data specifications: data package, data resource and table schema; and they incorporate some Frictionless Data patterns.

    The project addresses some of the main open data challenges:

    • Large data volumes that are difficult to manage due to their size
    • Overwhelming complexity in data analysis
    • Open data shared in sub-optimal file formats for data analysis (e.g. PDFs)
    • When companies and organisation aggregate data, refine it and add value to it, they often don’t openly share the cleaned data

    You can learn more on the project here (opens new window). If you would like to dive deeper and discover all about how Open Data Blend uses the Frictionless Data toolkit, you can watch Michael Amadi’s presentation here:

    # Other agenda items from our hangout

    • Senior developer Evgeny Karev presented Livemark at PyData on October 29th. If you missed it and want to have a look, check out the recording here (opens new window) (for Livemark jump at 1:03:03).
    • The third cohort of Frictionless Fellows has officially kicked off mid-October. You will get to meet them next year during one of our community calls. Meanwhile, stay tuned to know more about them!
    • We don’t have any presentation planned for the December community call yet. Would you like to present something? Drop us a line to let us know!

    # Join us next month!

    Next community call is one week earlier than usual (to avoid conflict with American Thanksgiving), on November 18th. We will hear a presentation from Peter Desmet on Frictionless Data exchange format for camera trapping data.

    You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/11/23/november-community-call/index.html b/blog/2021/11/23/november-community-call/index.html index e0ff946ea..6d8c9a39f 100644 --- a/blog/2021/11/23/november-community-call/index.html +++ b/blog/2021/11/23/november-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last Frictionless Data community call on November 18th we had Peter Desmet from the Research Institute for Nature and Forest (INBO) giving a presentation on Frictionless Data exchange format for camera trapping data.

    Camera trapping is a non-invasive wildlife monitoring technique generating more and more data in the last few years. Darwin Core, a well established standard in the biodiversity field, does not capture the full scope of camera trapping data (e.g. it does not express your camera setup) and it is therefore not ideal. To tackle this problem, the camera trapped data package was developed, using Frictionless Data standards. The camera trapped data package is both a model and a format to exchange camera trapping data, and it is designed to capture all the essential data and metadata of camera trap studies.

    The camera trap data package model includes:

    • Metadata about the project
    • Deployments info about the location, the camera and the time
    • Media including the file url, the timestamp and if it is a sequence
    • Observation about the file (Is it blank? What kind of animal can we see? etc…)

    The format is similar to a Frictionless Data data package. It includes: metadata about the project and the data package structure, csv files for the deployments, the media captured in the deployments, and the observations in those media.

    If you would like to dive deeper and discover all about the Frictionless Data exchange format for camera trapping data, you can watch Peter Desmet’s presentation here:

    You can also find Peter’s presentation deck here (opens new window).

    # Other agenda items from our hangout

    We are part of the organisation of the FOSDEM DevRoom Open Research Tools & Technologies this year too. We would love to have someone from the Frictionless community giving a talk. If you are interested please let us know! We are very happy to help you structure your idea, if needed. Calls for participation will be issued soon. Keep an eye on this page (opens new window).

    # Join us next month!

    Next community call is one week earlier than usual, on December 16th, because of the Winter holidays. Keith Hughitt is going to present some ideas around representing data processing flows as a DAG inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.

    You can sign up here (opens new window).

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2021/12/07/_3rd-cohort-fellows/index.html b/blog/2021/12/07/_3rd-cohort-fellows/index.html index 0f658ba26..67a0af2c6 100644 --- a/blog/2021/12/07/_3rd-cohort-fellows/index.html +++ b/blog/2021/12/07/_3rd-cohort-fellows/index.html @@ -38,7 +38,7 @@ - + @@ -119,6 +119,6 @@ To know more about Melvin click here (opens new window).

    Hello! My name is Kevin Kidambasi(KK). I was born and raised in Vihiga County of western Kenya. Currently, I live in Nairobi, the capital city of Kenya. I am a master’s student in Jomo Kenyatta University of Agriculture and Technology (JKUAT) registered at the department of Biochemistry. My MSc research at the International Centre of Insect Physiology and Ecology (icipe) focuses on the role of haematophagous camel-specific biting keds (Hippobosca camelina) in disease transmission in Laisamis, Marsabit County of northern Kenya. My broad research interest focuses on studying host-pathogen interactions to understand infection mechanisms of diseases in order to discover novel control and treatment targets.

    I am interested in improving research reproducibility because it allows other researchers to confirm the accuracy of my data and correct any bias as well as validate the relevance of the conclusions drawn from the results. This also allows data to be analyzed in different ways and thus, give new insights and lead the research in new directions. In addition, improving research reproducibility would allow the scientific community to understand how the conclusions of a study were made and pinpoint out any mistakes in data analyses. In general, research reproducibility enhances openness, research collaboration, and data accessibility which in turn increase public trust in science and hence permits their participation and support for research. This enables public understanding of how research is conducted and its importance.
    Read more about Kevin here (opens new window).

    Greetings! My name is Lindsay Gypin, she/her. I grew up in Denver, Colorado and began my career as a K-12 educator. I taught high school English and worked as a school librarian before becoming disillusioned with the politicization of public education and determining my skills were better suited for work in public libraries. Attending library school after having worked in libraries for so many years, I found myself drawn to courses in the research data management track of librarianship, and in qualitative research methods.I recently became a Data Services Librarian at the University of North Carolina Greensboro, where I hope to assist scholars in making their research data more open and accessible.

    For some time, I have wanted to build a reproducible workflow to uncover systemic bias in library catalogs. I’m hoping the Fellows Programme will help me build the foundation to do so.
    To learn more about Lindsay click here (opens new window).

    - + diff --git a/blog/2021/12/17/december-community-call/index.html b/blog/2021/12/17/december-community-call/index.html index b25a3f0df..6ceac2baf 100644 --- a/blog/2021/12/17/december-community-call/index.html +++ b/blog/2021/12/17/december-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On the last Frictionless Data community call of the year, on December 16th, we had Keith Hughitt from the National Cancer Institute (NCI) sharing (and demoing) his ideas around representing data processing flows as a DAG (Directed Acyclic Graph) inside of a datapackage.json, and tools for interacting with and visualizing such DAGs.

    Keith started thinking about this when he realised that cleaning and processing data are not obvious processes, on the contrary, there is a lot of bias in them. The decisions made to clean the raw data are not generally included in the publications and are not made available in any transparent way. To allow collaboration and reproducibility, Keith thought of embedding and annotated data provenance DAG in a datapackage.json using the Frictionless specs.

    The basic process Keith has in mind to solve this problem is:

    • The data provenance is encoded as a DAG in the metadata
    • For each step in processing the workflow, the previous DAG is copied and extended
    • Each node of the DAG represents a dataset at a particular stage of processing, and it can be associated with annotations, views
    • Datapackages would be generated and associated with each node
    • Have a web UI that reads the metadata and renders the DAG.

    If you would like to dive deeper and discover all about representing data processing flows as DAG inside of a Data Package, you can watch Keith Hughitt’s presentation here:

    If you find this idea interesting, come and talk to Keith on Discord (opens new window)! He would love to hear what you think and if you have other ideas in mind.

    # Other agenda items from our hangout

    We are part of the organisation of the FOSDEM (opens new window) Thematic Track Open Research Tools & Technologies this year too. We would love to have someone from the Frictionless community giving a talk. The deadline has been extended and you have time until December 23rd to submit a talk proposal! More info at this page (opens new window).

    # Join us next month!

    Next community call is next year, on January 21st. Francisco Alves, from the DPCKAN team who won the Frictionless Data hackathon back in October, is going to present their prototype and how it evolved.

    You can sign up here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2022/01/12/frictionless-dp-for-nih-cfde-project/index.html b/blog/2022/01/12/frictionless-dp-for-nih-cfde-project/index.html index ef5263489..786a821db 100644 --- a/blog/2022/01/12/frictionless-dp-for-nih-cfde-project/index.html +++ b/blog/2022/01/12/frictionless-dp-for-nih-cfde-project/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

    Scientific work produces a wealth of data every year - ranging from electrical signals in neurons to maze-running in mice to hospital readmission counts in patients. Taken as a whole, this data could be queried to discover new connections that could lead to new breakthroughs – how does that increased neuronal activity lead to better memory performance in a mouse, and does that relate to improved Alzheimer’s outcomes in humans? The data is there, but it is often difficult to find and mobilize.

    A main reason that this data is under-utilized is because datasets are often created in fragmented, domain-specific, or proprietary formats that aren’t easily used by others. The Frictionless Data team has been working with Dr. Philippe Rocca-Serra on some of these key challenges – increasing data set discoverability and highlighting how disparate data can be combined. Establishing a dataset catalogue, or index, represents a solution for helping scientists discover data. But, this requires some level of data standardization from different sources. To accomplish this, Dr. Rocca-Serra with the NIH Common Fund Data Ecosystem (NIH CFDE) opted for the Frictionless Data for Reproducible Research Project at the Open Knowledge Foundation (OKF).

    The NIH Common Fund Data Ecosystem (opens new window) project launched in 2019 with the aim of providing a data discovery portal in the form of a single venue where all data coordinating centers (DCC) funded by the NIH would index their experimental metadata. Therefore, the NIH-CFDE (opens new window) is meant to be a data catalogue (Figure 1), allowing users to search the entire set of NIH funded programs from one single data aggregating site. Achieving this goal is no mean feat, requiring striking a balance between functional simplicity and useful detail. Data extraction from individual coordinating centers (for example LINCS DCC) into the selected format should be as straightforward as possible yet the underlying object model needs to be rich enough to allow meaningful structuring of the information.

    Figure 1

    Figure 1 shows the landing page of the NIH-CFDE data portal which welcomes visitors to a histogram detailing the datasets distribution based on data types and file counts by default. This settings may be changes to show sample counts, species or anatomical location for instance.
    url: https://www.nih-cfde.org/ (opens new window)

    Furthermore, it is highly desirable to ensure that structural and content validation is performed prior to upload, so only valid submissions are sent to the Deriva-based NIH CFDE catalogue. How could the team achieve these goals while keeping the agility and flexibility required to allow for iterations to occur, adjustments to be made, and integration of user feedback to be included without major overhauls?

    Owing to the nature of the defined backend, the Deriva System, and the overall consistency of data stored by most DCCs, an object model was built around key objects, connected together via linked tables, very much following the RDBMS / OLAP cubes paradigm (opens new window).

    With this as a background, the choice of using OKF Frictionless data packages framework (opens new window) came to the fore. The Frictionless specifications are straightforward to understand, supported by libraries available in different languages, allowing creation, I/O operations and validations of objects models as well as instance data.

    Frictionless specifications offer several features which assist several aspects of data interoperation and reuse. The tabular data is always shipped with a JSON-formated definition of the field headers. Each field is typed to a data type but can also be marked-up with an RDFtype. Terminology harmonization relies on 4 resources, NCBI Taxonomy for species descriptions, UBERON for anatomical terms, OBI for experimental methods, and EDAM for data types and file format. Regular expression can be specified by the data model for input validation, and last but not least, the declaration of missing information can be made explicit and specific. The CFDE CrossCut Metadata Model (C2M2) relies on Frictionless specifications to define the objects and their relations (Figure 2).

    Figure 2

    Figure 2 shows the latest version of the NIH CFDE data models where the central objects to enable data discovery are identified. Namely, study, biomaterial, biosample, file, each coming with a tight, essential set of attributes some of which associated to controlled vocabularies. url: https://docs.nih-cfde.org/en/latest/c2m2/draft-C2M2_specification/ (opens new window)

    Researchers can submit their metadata to the portal via the Datapackage Submission System (opens new window)(Figure 3). By incorporating Frictionless specifications to produce a common metadata model and applying a thin layer of semantic harmonization on core biological objects, we are closer to the goal of making available an aggregated data index that increases visibility, reusability and clarity of access to a wealth of experimental data. The NIH CFDE data portal currently indexes over 2 million data files, mainly from RNA-Seq and imaging experiments from 9 major NIH programs: a treasure trove for data miners.

    Figure 3

    Figure 3 shows the architecture of the software components supporting the overall operation, from ETL from the individual DCC into the NIH CFDE data model to the validation and upload component.
    url: https://docs.nih-cfde.org/en/latest/cfde-submit/docs/ (opens new window)

    - + diff --git a/blog/2022/01/18/frictionless-planet/index.html b/blog/2022/01/18/frictionless-planet/index.html index 5c344e586..b08923ddc 100644 --- a/blog/2022/01/18/frictionless-planet/index.html +++ b/blog/2022/01/18/frictionless-planet/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Planet – Save the Date

    Originally published: https://blog.okfn.org/2022/01/10/frictionless-planet-save-the-date/ (opens new window)

    We believe that an ecosystem of organisations combining tools, techniques and strategies to transform datasets relevant to the climate crisis into applied knowledge and actionable campaigns can get us closer to the Paris agreement goals. Today, scientists, academics and activists are working against the clock to save us from the greatest catastrophe of our times. But they are doing so under-resourced, siloed and disconnected. Sometimes even facing physical threats or achieving very local, isolated impact. We want to reverse that by activating a cross-sectoral sharing process of tools, techniques and technologies to open the data and unleash the power of knowledge to fight against climate change. We already started with the Frictionless Data process – collaborating with researcher groups to better manage ocean research data (opens new window) and openly publish cleaned, integrated energy data (opens new window) – and we want to expand an action-oriented alliance leading to cross regional, cross sectoral, sustainable collaboration. We need to use the best tools and the best minds of our times to fight the problems of our times.

    We consider you-your organisation- as leading thinkers-doers-communicators leveraging technology and creativity in a unique way, with the potential to lead to meaningful change and we would love to invite you to an initial brainstorming session as we think of common efforts, a sustainability path and a road of action to work the next three years and beyond.

    What will we do together during this brainstorming session? Our overarching goal is to make open climate data more useful. To that end, during this initial session, we will conceptualise ways of cleaning and standardising open climate data, creating more reproducible and efficient methods of consuming and analysing that data, and focus on ways to put this data into the hands of those that can truly drive change.

    # WHAT TO BRING?

    • An effort-idea that is effective and you feel proud of at the intersection of digital and climate change.
    • A data problem you are struggling with.
    • Your best post-holidays smile.

    # When?

    13:30 GMT – 20 January – Registration open here (opens new window). SOLD OUT

    20:30 GMT – 21 January – Registration open here (opens new window).

    Limited slots, 25 attendees per session.

    - + diff --git a/blog/2022/02/02/january-community-call/index.html b/blog/2022/02/02/january-community-call/index.html index 252e82672..be79a1c56 100644 --- a/blog/2022/02/02/january-community-call/index.html +++ b/blog/2022/02/02/january-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -117,6 +117,6 @@

    On January 27th, for the first Frictionless Data community call of the year, we heard a presentation on the Data Package Manager for CKAN (DPCKAN) from Francisco Alves - leader of the proactive transparency policy in the Brazilian State of Minas Gerais.

    You may remember Francisco and DPCKAN from the Frictionless Data Hackathon (opens new window) back in October 2021, where his team won the hack with this very project.

    # So what is DPCKAN?

    It all started with the will to publish all the raw data on the Fiscal Transparency portal of the State of Minas Gereis, which is built on a CKAN (opens new window) instance, as open data following the Frictionless standards.

    Francisco and his team wanted to install a data package, and be able to work with it locally. They also wanted to have the ability to partially update a dataset already uploaded in CKAN without overwriting it (this particular feature was developed during the Hackathon). That’s how the Data Package Manager was born. It is now in active development.

    # And what’s next?

    Francisco and his team would like to:

    • Make it possible to read a data package directly from CKAN,
    • Make CKAN Datastore respect the Frictionless table schema types
    • Have human readable metadata visualisation
    • Contribute back upstream to Frictionless Data, CKAN, etc.

    Franscisco also gave a quick demo of what the DPCKAN looks like. You can watch the full presentation (including the demo):

    If you are interested in DPCKAN, come and talk to Francisco on Discord (opens new window)! You can also check out the presentation slides in this GitHub repository (opens new window).

    # Other agenda items from our hangout

    This year as well, we are helping organise the FOSDEM (opens new window) Thematic Track Open Research Tools & Technologies.
    Join us on February 5th! Among the many interesting talks, you will have the opportunity to catch senior developer Evgeny Karev presenting the newest Frictionless tool: Livemark (opens new window).
    Have a look at the programme (opens new window). The event is free of charge and there is no need to register. You can just log in the talks that you like.

    # Join us next month!

    Next community call is next year, on February 24th. We don’t have a presentation scheduled yet, so if you have a project that you would like to present to the community, this could be your chance! Email us if you have something in mind: sara.petti@okfn.org.

    You can sign up for the call already here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2022/02/07/libraries-hacked/index.html b/blog/2022/02/07/libraries-hacked/index.html index eb8a1700f..20e5974e5 100644 --- a/blog/2022/02/07/libraries-hacked/index.html +++ b/blog/2022/02/07/libraries-hacked/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Libraries Hacked

    I started the Libraries Hacked (opens new window) project in 2014. Inspired by ‘tech for good’ open data groups and hackathons, I wanted to explore how libraries could leverage data for innovation and service improvement. I had already been involved in the work of the group Bath Hacked (opens new window), and worked at the local Council in Bath, releasing large amounts of open data that was well used by the community. That included data such as live car park occupancy, traffic surveys, and air quality monitoring.

    Getting involved in civic data publishing led me to explore data software, tools, and standards. I’ve used the Frictionless standards of Table Schema and CSV Dialect, as well as the code libraries that can be utilised to implement these. Data standards are an essential tool for data publishers in order to make data easily usable and reproducible across different organisations.

    Public library services in England are managed by 150 local government organisations. The central government department for Digital, Culture, Media, and Sport (DCMS) hold responsibility for superintending those services. In September 2019 they convened a meeting about public library data.

    Library data, of many kinds, is not well utilised in England.

    • Lack of public data. There are relatively few library services sharing data about themselves for public use.
    • Low expectations. There is no guidance on what data to share. Some services will publish certain datasets, but these will likely be different to the ones other publish.
    • Few standards. The structure of any published data will be unique to each library service. For example, there are published lists of library branches from Nottinghamshire County Council (opens new window) and North Somerset Council (opens new window). Both are out of date, and have different fields, field names, field types, and file formats.

    The meeting discussed these issues, amongst others. The problems are understood, but difficult to tackle, as no organisation has direct responsibility for library data. There are also difficult underlying causes - low skills and funding being two major ones.

    Large scale culture change will take many years. But to begin some sector-led collaborative work, a group of the attendees agreed to define the fields for a core selection of library datasets. The project would involve data practitioners from across English library services.

    The datasets would cover:

    • Events: the events that happen in libraries, their attendance, and outcomes
    • Library branches: physical building locations, opening hours, and contact details
    • Loans: the items lent from libraries, with counts, time periods, and categories
    • Stock: the number of items held in libraries, with categories
    • Mobile library stops: locations of mobile library stops, and their timetabled frequency
    • Physical visits: how many people visit library premises
    • Membership: counts of people who are library members, at small-area geographies.

    These can be split into 3 categories:

    • Registers. Data that should be updated when it changes. A list of library branches is a permanent register, to be updated when there are changes to those branches.
    • Snapshot. Data that is released as a point in time representation. Library membership will be continually changing, but a snapshot of membership counts should be released at regular intervals.
    • Time-series. Data that is new every time it is published. Loans data should be published at regular intervals, each published file being an addition to the existing set.

    To work on these, we held an in-person workshop at the DCMS offices. This featured an exciting interruption by a fire drill, and we had to relocate to a nearby café (difficult for a meeting with many people held in in London!). We also formed an online group using Slack to trial and discuss the data.

    # Schemas and Frictionless Data

    The majority of our discussions were practical rather than technical, such as what data would be most useful, whether or not it was currently used locally by services, and common problems.

    However, to formalise how data should be structured, it became clear that it would be necessary to create technical 'data schemas’.

    It can be easy to decide on the data you want, but fail to describe it properly. For example, we could provide people with a spreadsheet that included a column title such as ‘Closed date’. I’d expect people to enter a date in that column, but we’d end up with all kinds of formats.

    The Table Schema (opens new window) specification for defining data, from Frictionless Data, provided a good option for tackling this problem. Not only would it allow us to create a detailed description for the data fields, but we could use other frictionless tools such as Good Tables (opens new window). This would allow library services to validate their data before publishing. Things like mismatching date formats would be picked up by the validator, and it would give instructions for how to fix the issue. We would additionally also provide ‘human-readable’ guidance on the datasets.

    Frictionless Data is an Open Knowledge Foundation (opens new window) project, and using tools from an internationally renowned body was also a good practice. The schemas are UK-centric but could be adapted and reused by international library services.

    The schemas are all documented at Public Library Open Data (opens new window), including guidance, links to sample data, and the technical definition files.

    # Lessons learned

    The initial datasets are not comprehensive. They are designed to be a starting point, allowing more to be developed from service requirements.

    They are overly focussed towards ‘physical’ library services. It wasn’t long after these meetings that public libraries adjusted to provide all-digital services due to lockdowns. There is nothing here to cover valuable usage datasets like the video views that library services receive on YouTube and Facebook.

    There are some that have become even more important. The physical visits schema describes how to structure library footfall data, allowing for differences in collection methods and intervals. This kind of data is now in high demand, to analyse how library service visits recover.

    Some of the discussions we had were fascinating. It was important to involve the people who work with this data on a daily basis. They will know how easy it is to extract and manipulate, and many of the pitfalls that come with interpreting it.

    # Complexity

    There was often a battle between complexity and simplicity. Complex data is good, it often means it is more robust, such as using external identifiers. But simplicity is also good, for data publishers and consumers.

    Public library services will primarily employ data workers who are not formally trained in using data. Where there are complex concepts (e.g. Table Schema itself), they are used because they make data publishing easier and more consistent.

    Public data should also be made as accessible as possible for the public, while being detailed enough to be useful. In this way the data schemas tend towards simplicity.

    # Standards not standardisation

    There is a difference between a standard format for data, and standardised data. The schemas are primarily aimed at getting data from multiple services into the same format, to share analysis techniques between library services, and to have usable data when merged with other services.

    There were some cases where we decided against standardising the actual data within data fields. For example, there is a column in the loans and the stock datasets called ‘Item type’. This is a category description of the library item, such as ‘Adult fiction’. In some other previous examples of data collection this data is standardised into a uniform set of categories, in order to make it easily comparable.

    That kind of exercise defies reality though. Library services may have their own set of categories, many of them interesting and unique. To use a standard set would mean that library services would have to convert their underlying data. As well as extra work, it would be a loss of data. It would also mean that library services would be unlikely to use the converted data themselves. Why use such data if it doesn’t reflect what you actually hold?

    The downside is that anyone analysing combined data would have to decide themselves how to compare data in those fields. However, that would be at least a clear task for the data analyst - and would most likely be an easier exercise to do in bulk.

    # Detail

    In my ideal world, data would be as detailed as possible. Instead of knowing how many items a library lent every month, I want that data for every hour. In fact I want to have every lending record! But feasibly that would make the data unwieldy and difficult to work with, and wouldn’t be in-line with the statistics libraries are used to.

    We primarily made decisions based upon what library services already do. In a lot of cases this was data aggregated into monthly counts, with fields such as library branch and item type used to break down that data.

    # The future

    The initial meetings were held over two years ago, and it seems longer than that! A lot has happened in the meantime. We are still in a global pandemic that from library perspectives has de-prioritised anything other than core services.

    However, there are good examples of the data in action. Barnet libraries publish 5 out of the 7 data schemas (opens new window) on a regular basis.

    I have also been creating tools that highlight how the data can be used such as Library map (opens new window) and Mobile libraries (opens new window).

    There is national work underway that can make use of these schemas. The British Library is working on a Single Digital Presence (opens new window) project that will require data from library services in a standard form.

    Internationally there are calls for more public library open data. The International Federation of Library Associations and Institutions (IFLA) has released a statement on Open Library Data (opens new window) calling for “governments to ensure, either directly or through supporting others, the collection and open publication of data about libraries and their use”. It would be great to work with organisations like IFLA to promote schemas that could be reused Internationally as well as for local services. There could also be the opportunity to use other Frictionless Data tools to aid in publishing data, such as DataHub (opens new window).

    Hopefully in the future there can be workshops, training events, and conferences that allow these data schemas to be discussed and further developed.

    - + diff --git a/blog/2022/02/10/nasa-earth-mission-science/index.html b/blog/2022/02/10/nasa-earth-mission-science/index.html index 2fe604b19..bb1b590c7 100644 --- a/blog/2022/02/10/nasa-earth-mission-science/index.html +++ b/blog/2022/02/10/nasa-earth-mission-science/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless response to NASA Earth mission science data processing

    We are very excited to announce that we responded to a request for information (opens new window) that was recently published by NASA for its Earth System Observatory (ESO) (opens new window).

    What is ESO? It is a set of (mainly satellite) missions providing information on planet Earth, which can guide efforts related to climate change, natural hazard mitigation, fighting forest fires, and improving real-time agricultural processes.

    With this request for information, ESO wants to gather expert advice on ways to find a more integrated approach to enhance data architecture efficiency and promote the open science principles.

    We believe Frictionless Data would benefit the mission science data processing in several ways. Here’s how:

    First, Frictionless automatically infers metadata and schemas from a data file, and allows users to edit that information. Creating good metadata is vital for downstream data users – if you can’t understand the data, you can’t use it (or can’t easily use it). Similarly, having a data schema is useful for interoperability, promoting the usefulness of datasets.

    The second Frictionless function we think will be helpful is data validation. Frictionless validates both the structure and content of a dataset, using built-in and custom checks. For instance, Frictionless will check for missing values, incorrect data types, or other constraints (e.g. temperature data points that exceed a certain threshold). If any errors are detected, Frictionless will generate a report for the user detailing the error so the user can fix the data during processing.

    Finally, users can write reproducible data transformation pipelines with Frictionless. Writing declarative transform pipelines allows humans and machines to understand the data cleaning steps and repeat those processes if needed in the future. Collectively, these functions create well documented, high quality, clean data that can then be used in further downstream analysis.

    We provided them with two examples of relevant collaboration:

    # Use Case 1

    The Biological and Chemical Oceanography Data Management Office (BCO-DMO) (opens new window) cleans and hosts a wide variety of open oceanography data sets for use by researchers. A main problem for them was data being submitted to them was messy and not standardized, and it was time consuming and difficult for their data managers to clean in a reproducible, documented way. They implemented Frictionless code to create a new data transformation pipeline that ingests the messy data, performs defined cleaning/transforming steps, documents those steps, and produces a cleaned, standardized dataset. It also produces a (human and machine-readable) document detailing all the transformation steps so that downstream users could understand what happened to the data and undo/repeat if necessary. This process not only helps data managers clean data faster and more efficiently, it also drives open science by making the hosted data more understandable and usable while preserving provenance.

    More info on this use case here (opens new window).

    # Use Case 2

    Dryad (opens new window) is a biological data repository with a large user base. In our collaboration, their main issue was that they do not have the people-power to curate all the submitted datasets, so they implemented Frictionless tooling to help data submitters curate their data as they submit it. When data is submitted on the Dryad platform, Frictionless performs validation checks, and generates a report if any errors are found. The data submitter can then fix that error (e.g. there are no headers in row 1) and resubmit. Creating easy-to-understand error reports helps submitters understand how to create more useable, standardized data, and also frees up valuable time for the Dryad data management team. Ultimately, now the Dryad data repository hosts higher quality open science data.

    More info on this use case here (opens new window).


    Are there other ways you think Frictionless Data could help the ESO project? Let us know!

    Image used: Antarctica Eclipsed. NASA image courtesy of the DSCOVR EPIC team. NASA Earth Observatory images by Joshua Stevens, using Landsat data from the U.S. Geological Survey. Story by Sara E. Pratt.

    - + diff --git a/blog/2022/03/03/community-call-february/index.html b/blog/2022/03/03/community-call-february/index.html index 25b7eaec8..72725f780 100644 --- a/blog/2022/03/03/community-call-february/index.html +++ b/blog/2022/03/03/community-call-february/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our second community call of the year, on February 24th, we had Ilya Kreymer and Ed Summers from Webrecorder (opens new window) updating us on their effort in standardising the WAZC format (which they discussed with us already when it was still at an early development stage, in the community call of December 2020 (you can read the blog here (opens new window)).

    Webrecorder is a suite of open source tools and packages to capture interactive websites and replay them at a later time as accurately as possible. They created the WACZ format to have a portable format for archived web content that can be distributed and contain additional useful metadata about the web archives, using the Frictionless Data Package standard.

    Ed & Ilya also hoped to discuss with the community the possibility of signing these Data Packages, in order to provide an optional mechanism to make web archives bundled in WACZ more trusted, because a cryptographic proof of who the author of a Data Package is might be interesting for other projects as well. Unfortunately the call was rather empty. Maybe it was because of the change of time, but in case there are other reasons why you did not come, please let us know (dropping an email at sara.petti@okfn.org or with a direct message on Discord/Matrix).

    We did record the call though, so in case anyone is interested in having that discussion, we could always try to have it asynchronously on Discord (opens new window) or Matrix (opens new window).

    Their current proposal to create signed WACZ packages is summarised in on GitHub (opens new window), so you can always reach out to them there as well.

    # Join us next month!

    Next community call is on March 31st. We are going to hear from Johan Richer from Multi, who is going to present the latest prototype of Etalab and his theory of portal vs catalogue (opens new window)

    You can sign up for the call already here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    As usual, you can join us on Discord (opens new window), Matrix (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2022/03/05/fellows-datapackage/index.html b/blog/2022/03/05/fellows-datapackage/index.html index 5aac2c48a..3529e282e 100644 --- a/blog/2022/03/05/fellows-datapackage/index.html +++ b/blog/2022/03/05/fellows-datapackage/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Why is it important to package data?

    With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science.

    As part of their training, we asked the 3rd cohort of Frictionless Fellows to package their research data in Frictionless Data Packages (opens new window). Here’s what they reported on their experience:

    # Victoria

    Constantly under the impression that I’m six months behind on lab work, I am capital Q - Queen - of bad data practices. My computer is a graveyard of poorly labeled .csv files, featuring illustrative headers such as “redo,” “negative pressure why?” and “weird - see notes.” I was vaguely aware of the existence of data packages, but like learning Italian or traveling more, implementing them in my workflow got slotted in the category of “would be nice if I had the time.” That clemency, however, was not extended to my research lifeblood - molecular spectroscopy databases, you disorganised beauties you - nor to collaborators who often invoked the following feeling:

    fellows-img-1

    Particularly in fields where measurables aren’t tangible macro concepts (see: population) but abstract and insular conventions with many varied representations, clear descriptors of multivariate data are a must in order for that data to be easily used and reproduced. This is where data packages come in; they bundle up your data with a human and machine readable file containing, at minimum, standardised information regarding structure and contents. In this lil’ post here, we’re going to walk through this process together by packaging data together with its metadata, and then validating the data using Frictionless tools.

    Keep on reading about Victoria’s experience packaging data in her blog here (opens new window).

    # Lindsay

    The first tenet of the American Library Association’s Bill of Rights states: “Books and other library resources should be provided for the interest, information, and enlightenment of all people of the community the library serves” (American Library Association). Libraries are supposed to be for everyone. Unfortunately, like many other institutions, libraries were founded upon outdated and racist patriarchal heteronormative ideals that ostracise users from marginalized backgrounds. Most academic libraries in the United States use the Library of Congress Classification System to organize books, a system that inadvertently centers christian, heterosexual white males. Critical librarianship, or critical cataloging is “a movement of library workers dedicated to bringing social justice principles into our work in libraries” critlib (opens new window). I would like to use data science principles to explore bias in library MARC (machine readable catalog) records.

    Read Lindsay’s Data Package blog here (opens new window).

    # Zarena

    As a social science researcher studying the research landscape in Central Asian countries, I decided to share a part of my dataset with key bibliometric information about the journal articles published by Kyrgyzstani authors between 1991-2021. The data I am going to share comes from the Lens (opens new window) platform. To ensure the data quality, and to comply with the FAIR principles (opens new window), before sharing my data, I created a data package that consists of the cleaned raw data, metadata (opens new window), and schema (opens new window).

    I tested two methods to create such a package. First, I tried to use the data package programming libraries (opens new window). This method lets you do more than just to create a data package (e.g., describe, extract, transform, and validate your data). But I found the programming libraries a bit complicated. So, I ended up using the second method, that is the browser tool Frictionless Data Package Creator (opens new window). It lets you create a data package without ay technical knowledge. The tool is comparatively simple and easy to navigate. It allows you to clean your dataset, change datatypes, provide a short description to your data as well as to add and edit associated metadata…

    Keep on reading about how Zarena packaged here data in her blog here (opens new window).

    # Kevin

    My research aims at understanding the transmission mechanisms of neglected vector-borne diseases. I mostly deal with data on the distribution and diversity of vectors of diseases and their infection status. The metadata would include but not be limited to the date of sample collection, location and GPS coordinates of the sites of sample collection, type of sample (blood or fly sample), the concentration of RNA or DNA extracted from the samples, and the infection status of the samples (whether the samples are infected with pathogens or not) as well as the blood meal sources of the insect vectors. All these datasets are supposed to be presented in a way that it can be understood by whoever accesses it and that information regarding the licensing and other attribution information can easily be accessed. One way to reduce friction when dealing with such huge datasets is to put them in a container that groups all the descriptive data and schema together. A schema tells us how the data is structured and the type of content that is expected in it. All this is contained in a data package that can be generated by a data package creator.

    I am going to take you through a step by step process on how I created a data package for my dataset on sandflies diversity, infection status, and their blood-meal sources, using Frictionless Data Package Creator…

    Read Kevin’s blog here (opens new window) to know more about how he created data packages for his data.

    # Guo Qiang

    The dataset I am going to package is from a project which we have recently completed – “Menopausal hormone therapy and women’s health: An umbrella review (opens new window)” which summarizes the clinical evidence on various health effects of menopausal hormone therapy in menopausal women. The full datasets are publicly available in the Open Science Framework (opens new window). I am going to use one of the datasets –All-Cause Mortality.xlsx, which summarizes all the clinical trials published until 2017 investigating the effect of menopausal hormone therapy on all-cause mortality in menopausal women – to illustrate the process of creating a Data Package.

    As the Data Package Creator currently accepts only .csv format, first I need to convert All-Cause Mortality.xlsx to .csv format…

    Keep on reading about Guo Qiang’s experience of packaging is data in his blog here (opens new window).

    # Melvin

    Being a soil science student, I felt using soil data would be useful for me to better understand this process of packaging data for future use. I got data on the impact of fertiliser recommendations on yield and felt it would be great to use it. However, this wasn’t such a good idea as I got so many error messages and clean-ups to do to suit the tabular data accepted by the data package creator (create.frictionlessdata.io (opens new window)). Similarly in case you want to create a data package using someone else’s data it should either have a licence or ask to use the data.Afterwards, I got around to working with a different data set that was more straightforward and easy to work with.The data was on the infection prevalence of ‘Ca. Anaplasma camelii’ in camels and camel keds evaluated in different seasons within a year…

    To read about the errors that Melvin got and what she learned from them, read her blog here (opens new window).


    You can read all the Frictionless Data Fellows’ blogs on the dedicated website: https://fellows.frictionlessdata.io/ (opens new window)

    - + diff --git a/blog/2022/03/09/save-our-planet/index.html b/blog/2022/03/09/save-our-planet/index.html index c85695f56..90c800915 100644 --- a/blog/2022/03/09/save-our-planet/index.html +++ b/blog/2022/03/09/save-our-planet/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

    During these past tumultuous years, it has been striking to witness the role that information has played in furthering suffering: misinformation, lack of data transparency, and closed technology have worsened the pandemic, increased political strife, and hurt climate policy. Building on these observations, the team at Open Knowledge Foundation are refocusing our energies on how we can come together to empower people, communities, and organisations to create and use open knowledge to solve the most urgent issues of our time, including climate change, inequality, and access to knowledge . Undaunted by these substantial challenges, we entered 2022 with enthusiasm for finding ways to work together, starting with climate data.

    To start this year fresh and inspired, we convened two gatherings of climate researchers, activists, and organisations to brainstorm ways to collaborate to make open climate data more usable, accessible, and impactful. Over 30 experts attended the two sessions, from organisations around the world, and we identified and discussed many problems in the climate data space. We confirmed our initial theory that many of us are working siloed and that combining skills, knowledge and networks can result in a powerful alliance across tech communities, data experts and climate crisis activists.

    Now, we want to share with you some common themes from these sessions and ask: how can we work together to solve these pressing climate issues?

    A primary concern of attendees was the disconnect between how (and why) data is produced and how data can (and should) be used. This disconnect shows up as frictions for data use: we know that much existing “open” data isn’t actually usable. During the call, many participants mentioned they frequently can’t find open data, and even when they can find it, they can’t easily access it. Even when they can access the data, they often can’t easily use it.

    So why is it so hard to find, access, and use climate data? First, climate data is not particularly well standardised or curated, and data creators need better training in data management best practices. Another issue is that many climate data users don’t have technical training or knowledge required to clean messy data, greatly slowing down their research or policy work.

    # How will the Open Knowledge Foundation fix the identified problems? Skills, standards and community.

    An aim for this work will be to bridge the gaps between data creators and users. We plan to host several workshops in the future to work with both these groups, focusing on identifying both skills gaps and data gaps, then working towards capacity building.

    Our goal with capacity building will be to give a data platform to those most affected by climate change. How do we make it easier for less technical or newer data users to effectively use climate data? Our future workshops will focus on training data creators and users with the Open Knowledge Frictionless Data tooling (opens new window) to better manage data, create higher quality data, and share data in impactful ways that will empower trained researchers and activists alike. For instance, the Frictionless toolbox can help data creators generate clean data that is easy to understand, share, and use, and the new Frictionless tool Livemark can help data consumers easily share climate data with impactful visualisations and narratives.

    Another theme that emerged from the brainstorm sessions was the role data plays in generating knowledge versus the role knowledge plays in generating data, and how this interplay can be maximised to create change. For instance, we need to take a hard look at how “open” replicates cycles of inequalities. Several people brought up the great work citizen scientists are doing for climate research, but how these efforts are rarely recognised by governments or other official research channels. So much vital data on local impacts of climate change are being lost as they aren’t being incorporated into official datasets. How do we make data more equitable, ensuring that those being most affected by climate change can use data to tell their stories?

    We call on data organisations, climate researchers, and activists to join us in these efforts. How can we best work together to solve pressing climate change issues? Would you like to partner with us for workshops, or do you have other ideas for collaborations? Let us know! We would like to give our utmost thanks to the organisations that joined our brainstorming sessions for paving the way in this important work. To continue planning this work, we are creating a space to talk in our Frictionless Data community chat, and we invite all interested parties to join us. We are currently migrating our community from Discord to Slack. We encourage you to join the Slack channel, which will soon be populated with all Frictionless community members: https://join.slack.com/t/frictionlessdata/shared_invite/zt-14x9bxnkm-2y~uQcmmrqarSP2kV39_Kg (opens new window)
    (We also have a Matrix mirror if you prefer Matrix: https://matrix.to/#/#frictionless-data:matrix.org (opens new window))

    Finally, we’d like to share this list of resources that attendees shared during the calls:

    - + diff --git a/blog/2022/04/13/march-community-call/index.html b/blog/2022/04/13/march-community-call/index.html index 5e0083b9a..f223d9ce6 100644 --- a/blog/2022/04/13/march-community-call/index.html +++ b/blog/2022/04/13/march-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    At our last community call on March 31st, we had a discussion with Johan Richer from Multi (opens new window) around his theory of portal vs catalogue.

    The discussion started with a presentation of the latest catalogue prototype by Etalab (opens new window) currently in development: https://github.com/etalab/catalogage-donnees (opens new window). Data cataloguing has become a major component of open data policies in France, but there are issues related to the maintainability of the catalogue and the traceability of the data.

    In the beginning the data producers were also the data publishers, and therefore the purpose of a portal was to catalogue, publish, and store the data. Recently the process became more complicated, and the cataloguing became a prerequisite to publication. Instead of publishing by default, data producers want to make sure that the data is clean before injecting it into the portal. This started a new workflow of internal data management, that the portals were not made for. So how can we restore the broken link between catalogue and portal? Johan thinks data lineage is key.

    If you want to know more about it, you can go and have a look at Johan’s presentation here (opens new window) (in French, but here’s a shortcut to the Google translation (opens new window) if you’d rather have it in English), or watch the recording:

    # News from the community

    Our community chat has moved from Discord to Slack! In the community survey we ran last year, many people suggested moving to Slack, and the terms of services are definitely better (ranking B vs E for Discord, according to https://tosdr.org/ (opens new window) ). We will also be able to organise the questions & answer better, and that will definitely be an added value for the community.

    To join our community chat: https://frictionlessdata.slack.com/messages/general (opens new window)

    # Join us next month!

    Next community call is on April 28th. We are going to hear about open science practices at the Turing Way from former Frictionless Fellow Anne Lee Steele.
    You can sign up for the call already here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up!

    # Call recording:

    On a final note, here is the recording of the full call:

    Join us on Slack (opens new window) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2022/05/05/april-community-call/index.html b/blog/2022/05/05/april-community-call/index.html index e62b2e7fc..abad63fd7 100644 --- a/blog/2022/05/05/april-community-call/index.html +++ b/blog/2022/05/05/april-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    At our last community call on April 28th, we had a discussion around open science best practices and the Turing Way with Anne Lee Steele, who - you might remember, was part of the second cohort of Frictionless Fellows (opens new window).

    The Turing Way is an open source and community-led handbook for reproducible, ethical and collaborative research. It is composed of more than 240 pages created by ~300 researchers over the course of 3 years, written collaboratively via GitHub PRs - contrasting to the notion of single/small-authorship papers.

    There is currently an effort to make the Turing way develop meta-practices that can be applied to other areas as well, one example is documentation.

    A great outcome of the call was the proposal to have a closer cooperation between the Frictionless Data community and the Turing Way’s one, possibly developing a chapter for Open Infrastructures for research to contribute upstream. This chapter would set the context and provide a vision for how to evaluate tools and platforms with a Turing Way perspective on reproducibility, ethical alternatives and collaboration in practice. For more info about this proposal, check this issue (opens new window).

    If you want to know more about the Turing Way, have a look at the project website (opens new window). You can also check out the full recording of the call:

    # News from the community

    • You’re all invited to join the Frictionless Fellows for a free virtual workshop on Open Science best practices on May 25 at 2pm UTC!
      In this beginner-friendly workshop, Fellows will demonstrate how to use the Frictionless tools to make research data more understandable, usable, and open. You will learn how to use the Frictionless non-coding tools to manipulate metadata and schemas (and why that is important!) and how to validate data in a hands-on format. Learn more & sign up on the Fellows website: https://fellows.frictionlessdata.io/ (opens new window).

    • Reminder that our community chat has moved to Slack. Join us there (opens new window). We now also have a fully operating Matrix bridge (opens new window), so if you prefer you can join us from there as well.

    # Join us next month!

    Next community call is on May 26th. We are going to hear Nick Kellett from Deploy Solutions explain to us how to build citizen science and climate change solutions, using Frictionless.

    You can sign up for the call already here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up.

    Join us on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window) to say hi or ask any questions. See you there!

    - + diff --git a/blog/2022/05/15/fellows-reproducing-datapackages/index.html b/blog/2022/05/15/fellows-reproducing-datapackages/index.html index 09d14aad1..fcebcc9cb 100644 --- a/blog/2022/05/15/fellows-reproducing-datapackages/index.html +++ b/blog/2022/05/15/fellows-reproducing-datapackages/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Is reproducing someone else’s research data a Frictionless experience? (pt.2)

    Is reproducing someone else’s research data a Frictionless experience? As we have seen with all the previous cohorts of Frictionless Fellows (you can read the blog here (opens new window)), most often than not it is sadly not the case.

    To prove that the “reproducibility crisis” is a real problem in scientific research at the moment, we challenged the Fellows to exchange their data to see if they could reproduce each other’s Data Packages. Read about their experience:

    # Melvin

    We had an interesting task for our frictionless fellow activity that involved exchanging our data sets with our fellow colleagues (pairwise) and trying to reproduce their work. My partner for this assignment was Lindsay, who is a librarian.

    In data science, replicability and reproducibility are some of the keys to data integrity.It (opens new window) creates more opportunities for new insights and reduces errors. In order to ensure reproducibility of data, one must first make sure that the raw data is available. In this regard, my partner Lindsay shared with me her data that was on her Github account to facilitate the process.

    This process and activity were really useful and humbling. As we got to discuss our data sets with Lindsay, I realized key things such as Tidy data principles, which was the highlight for me in this whole process, besides the point that it’s not easy to understand someone else’s data without further metadata to accompany the data set. Imagine the frustration researchers go through trying to understand and reproduce other people’s data without more information on the data.
    Read Melvin’s blog (opens new window) to see how she managed to reproduce her fellows’ data package.

    # Victoria

    My data package partner, Zarena is an awesome social scientist in the human rights sphere. She has a background in mental health research and interests ranging from epistemic injustice to intersectionality - two terms I had to double check my understanding of. In poking around Zarena’s profile, particularly interesting was her focus on mad studies (opens new window), a young interdisciplinary field dealing with identity and the marginalisation of individuals with alternative mental states. This idea - broadly accepting a spectrum of human states instead of subjecting them to a black/white absolute interpretation - was completely new to me and fascinating! But being a social theory noob, I suspected to encounter a barrier to understanding her data.

    Zarena’s data was publicly available in her GitHub fellows repository. I clocked a couple of things off the bat: the repo contained a csv called “data-dp.csv”, as well as a README.md (opens new window) and several schema files. When in doubt of where to start, a good place to look is the README.
    Read Victoria’s blog (opens new window) to see how she reproduced her fellow’s data packages.

    # Kevin

    Data reproducibility is where other researchers use same data to attain the same results by using same methods. Research reproducibility allows other scientist to gain new insights from your data as well as improve quality of research by checking the correctness of your findings. The aim of this assignment was to try and reproduce my colleague’s data package and validate the tabular data using frictionless browser tools, that is, data package creator and good tables, respectively.

    First, Guo-Qiang shared the links to his datasets and the data package to me which I freely accessed from his GitHub repository. His data was a summary of clinical evidence of various health effects of menopausal hormone therapy in menopausal women.
    Read Kevin’s blog (opens new window) to see how he managed to reproduce Guo-Qiang’s datapackages.

    # Zarena

    Before joining the Frictionless Data Fellowship Programme, I did not realise the importance of research reproducibility. To tell the truth, I really did not have such a concept in my professional vocabulary despite having an MSc degree in Social Science Research Methods and working in different social research projects. But, maybe, that was the reason why I did not know this concept and never practised it in my research projects. Like many of my social science colleagues, especially the ones working with qualitative - and often sensitive - data, for me it was important to ensure that data I collect are safely stored in a password-protected platform and then - upon completion of a project - are deleted. But now working for the Frictionless Data Fellowship Programme and managing different sorts of data, including bibliometric metadata, I see that if we want social sciences and humanities to progress, it is vital to integrate such practices as reproducing, replicating, and reusing data into our research.

    So, in this blog (opens new window), I will try to explain my first attempt to reproduce my Frictionless fellow’s dataset, which is openly shared in the GitHub repository (opens new window).

    # Lindsay

    Our most recent Fricitonless Fellows project is to trade data and create a Data Package using another Fellow’s data. I traded data with the fabulous Melvin! Melvin is a pathologist and soil scientist.

    While this seems like a fun project, I was frustrated at first. I had to find my partner’s data. After reading her Data Package Blog (opens new window) and poking around on GitHub (opens new window), I could not find her data. I eventually realized: we are mimicking the process of reusing reproducible research data. The first hurdle any researcher must overcome is finding the data.
    Read Lindsay’s blog (opens new window) to understand what happened while reproducing Melvin’s Data Packages.

    # About the Frictionless Data Fellowship

    With the Frictionless Data Reproducible Research Fellows Programme, supported by the Sloan Foundation and Open Knowledge Foundation, we are recruiting and training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. To know more about the programme, visit the dedicated website (opens new window).

    - + diff --git a/blog/2022/05/24/tu-delft-training/index.html b/blog/2022/05/24/tu-delft-training/index.html index 69e656eaa..79cb52801 100644 --- a/blog/2022/05/24/tu-delft-training/index.html +++ b/blog/2022/05/24/tu-delft-training/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Workshop on FAIR and Frictionless Workflows for Tabular Data

    Originally published on: https://community.data.4tu.nl/2022/05/19/workshop-on-fair-and-frictionless-workflows-for-tabular-data/ (opens new window)

    4TU.ResearchData and Frictionless Data joined forces to organize the workshop “FAIR and frictionless workflows for tabular data” (opens new window). The workshop took place on 28 and 29 April 2022 in an online format

    On 28 and 29 April we ran the workshop “FAIR and frictionless workflows for tabular data” in collaboration with members of the Frictionless Data project team (opens new window).

    This workshop was envisioned as a pilot to create training on reproducible and FAIR tools that researchers can use when working with tabular data, from creation to publication. The programme was a mixture of presentations, exercises and hands-on live coding sessions. We got a lot of inspiration from The Carpentries (opens new window) style of workshops and tried to create a safe, inclusive and interactive learning experience for the participants.

    The workshop started with an introduction to Reproducible and FAIR research given by Eirini Zormpa (opens new window) (Trainer at 4TU.ResearchData), who also introduced learners to best practices for data organization of tabular data based on the Data Carpentry for Ecologists lesson (opens new window). You can have a look at Eirini’s slides here (opens new window).

    The introduction was followed by a hands-on session exploring the Frictionless Data framework (opens new window). The Frictionless Data project has developed a full data management framework for Python to describe, extract, validate, and transform tabular data following the FAIR principles. Lilly Winfree (opens new window) used Jupyter Notebook to introduce learners to the different tools, as it helps visualizing the steps of the workflow. You can access the presentation and the notebook (and all the materials of the workshop) used by Lilly in this GitHub repository (opens new window).

    During the hands-on coding session, the learners practiced what they were learning on an example dataset from ecology (source of the dataset: Data Carpentry for Ecologists (opens new window)). Later in the workshop, Katerina Drakoulaki, Frictionless Data fellow and helper, also gave an example of how to apply the framework tools to a dataset coming from the computational musicology field (opens new window).

    We concluded the workshop with a presentation about Data Publication (opens new window) by Paula Martinez Lavanchy (opens new window), Research Data Officer at 4TU.ResearchData. The presentation focused on why researchers should publish their data, how to select the data to publish and how to choose a good data repository that helps implement the FAIR principles to the researchers’ data. Paula also briefly demoed the features of 4TU.ResearchData using the repository sandbox (opens new window).

    Besides the instructors, we also had a great team of helpers that were there in case the learners encountered any technical problems or had questions during the live coding session. We would like to give a big thank you to: Nicolas Dintzner – TU Delft Data Steward of the Faculty of Technology, Policy & Management, Katerina Drakoulaki – Postdoctoral researcher, at NKUA & Frictionless Data Fellow, Aleksandra Wilczynska – Data Manager at TU Delft Library & the Digital Competence Center and Sara Petti – Project Manager at Open Knowledge Foundation.

    image

    Image: Top-left: Eirini Zormpa -Trainer of RDM and Open Science at TU Delft Library & 4TU.ResearchData, Top-right: Lilly Winfree – Product Manager of Frictionless Data at the Open Knowledge Foundation, Bottom: Katerina Drakoulaki – Postdoctoral researcher at NKUA & Frictionless Data fellow.

    Nineteen learners joined the workshop. The audience had a broad range of backgrounds with both researchers and support staff (e.g. data curator, research data manager, research software engineer, data librarian, etc.) represented. The workshop received quite positive feedback. Most of the learner’s expectations were fulfilled (79%) and they would recommend the workshop to other researchers (93%). It was also nice to know that most of the learners felt that they can apply what they learned immediately and they felt comfortable learning in the workshop.

    image

    Images: Feedback training event

    This feedback from the learners has helped us to start thinking about how to improve future runs of the workshop. For example, we used less time than we had planned, which creates the opportunity to provide instruction on more features of the framework or to add more exercises or practice time. The learners also indicated they would have liked to have a common document (e.g. Google doc or HackMD) to share reference material and to document the code that the instructor was typing in case they got lost.

    Even though there is room for improvement, the learners appreciated the highly practical approach of the workshop, the space they had to practice what they learned and the overall quality of the Frictionless Data framework tools. Here are some of the strengths that learners mentioned:

    ‘Hands-on, can start using what I learned immediately’

    ‘Practical experience with the framework and working on shared examples.’

    ‘Machine readable data and packaging for interoperability through frictionless’

    ‘Very clear content. Assured assistance in case of technical problems. Adherence to timelines with breaks. Provided many in-depth links. Friendly atmosphere.’

    We at the 4TU.ResearchData team greatly enjoyed this collaboration that allowed us to help build the skills that researchers and other users of the repository need to make research data findable, accessible, interoperable and reproducible (FAIR).

    - + diff --git a/blog/2022/06/01/deploy-solutions/index.html b/blog/2022/06/01/deploy-solutions/index.html index 40cfd6ba1..90fe5b1e8 100644 --- a/blog/2022/06/01/deploy-solutions/index.html +++ b/blog/2022/06/01/deploy-solutions/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    At our last community call on May 28th, we heard about citizen science and climate change solutions using Frictionless Data from Nick Kellett, Pan Khantidhara and Justin Mosbey from Deploy Solutions (opens new window).

    Deploy Solutions builds software that can help with climate change disruptions, and they are using Frictionless Data to help! They develop cloud-hosted solutions using big data from satellites, and, since 2019, they have adopted a citizen focus in climate change research.
    They researched and identified the main problems that prevent people and communities from acting in case of climate change disasters:

    • Citizens feel overwhelmed by the volume of information received.
    • They feel the information they get is not personalised to their needs.
    • Authorities have difficulties directly collaborating and sharing information with citizens.

    The solution they propose is the creation of a complete map-centred web-application that can be built very quickly (~4 hours) with basic functionalities to provide basic and reliable information for disaster response, while allowing users to upload citizen science observations.

    The app takes Earth observations imagery from satellites, and associates them with imagery that citizens are taking on the ground, to check that the machine learning algorithms applied are correctly predicting the disaster extent.

    It also visualises the data coming in to look for trends, gathering historic data and comparing with what is predicted. The quantity of information needed for such an app is huge, and most often than not, it comes from different sources and does not follow any standards. It is therefore tricky to describe it and validate it. You might have guessed it by now, Frictionless Data is helping with that.

    If you are interested in knowing more about Deploy Solutions and how they are using Frictionless Data, you can watch the full presentation (including Pan Khantidhara’s demo!):

    If you have questions or feedback, you can let us know in Slack (opens new window), or you can reach out to Deploy Solutions directly.

    # Join us next month!

    Next community call is on June 30th. Join us to meet the 3rd cohort of Frictionless Fellows and hear about their reproducibility and open science journey!

    You can sign up for the call already here: (opens new window)

    Do you want to share something with the community? Let us know when you sign up.

    Would you like to present at one of the next community calls? Please fill out this form (opens new window).

    Join our community on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window). See you there!

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2022/07/04/june-community-call/index.html b/blog/2022/07/04/june-community-call/index.html index 89999c8eb..cf4f2c64f 100644 --- a/blog/2022/07/04/june-community-call/index.html +++ b/blog/2022/07/04/june-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On June 30th we had a very special community call. Instead of a project presentation this time we had the chance to meet the 3rd cohort of Frictionless Fellows (opens new window) and hear about their reproducibility and open science journey.

    The fellows are a group of early career researchers interested in learning about open science and data management by using the Frictionless Data tools in their own research projects. Melvin Adhiambo, Lindsay Gypin, Kevin Kidambasi, Victoria Stanley, and Guo-Qiang Zhang are almost at the end of their nine months fellowship. During the past nine months they have learnt open science principles and how to discuss them (especially with colleagues who are not convinced yet!). They also learnt data management skills, and how to correctly use metadata and data schemas. Besides using the Frictionless Data browser tools, there was also a coding component to the fellowship, as they used the Frictionless Python tools as well.

    The Fellows also ran workshops and wrote great blog posts during the last nine months. You can read them here (opens new window).

    If you are interested in knowing more about the fellows’ research field and what being a Frictionless Data Fellow meant for them, go and watch the full recording of the call:

    # Join us next month!

    Next community call is on July 28th. Join us to hear David Raznick telling us about Flatterer (opens new window), a new tool that helps convert JSON into tabular data.

    You can sign up for the call already here. (opens new window)

    Do you want to share something with the community? Let us know when you sign up.

    Would you like to present at one of the next community calls? Please fill out this form (opens new window).

    Join our community on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window). See you there!

    - + diff --git a/blog/2022/07/05/frictionless-planet-conversation/index.html b/blog/2022/07/05/frictionless-planet-conversation/index.html index 9cf024adf..ecdcb04ce 100644 --- a/blog/2022/07/05/frictionless-planet-conversation/index.html +++ b/blog/2022/07/05/frictionless-planet-conversation/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Frictionless Planet and Lacuna Fund discuss gaps in climate datasets for machine learning

    Originally published on: https://blog.okfn.org/2022/07/05/frictionless-planet-and-lacuna-fund-discuss-gaps-in-climate-datasets-for-machine-learning/ (opens new window)

    On 24 June we hosted a conversation with the Lacuna Fund about datasets for climate change where we heard all about the Lacuna Fund’s recently launched Request for Proposals around Datasets for Climate Applications. We were joined by climate data users and creators from around the globe. This conversation is a part of Open Knowledge Foundation’s recent work on building a Frictionless Planet by using open tools and design principles to tackle the world’s largest problems, including climate change.

    A lacuna is a gap, a blank space or a missing part of an item. Today there are gaps in the datasets that are available to train and evaluate machine learning models. This is especially true when it comes to specific populations and geographies. The Lacuna Fund was created to support data scientists in closing those gaps in machine learning datasets needed to better understand and tackle urgent problems in their communities, like those linked to the climate crisis.

    Lacuna Fund is currently accepting proposals for two climate tracks: Climate & Energy (opens new window) and Climate & Health (opens new window). The first track is looking at the intersection between energy, climate, and green recovery, and the second focuses on health and strategies to mitigate the impact of the climate crisis. Proposals should focus on machine learning datasets, either collecting and annotating new data, annotating and releasing existing data, or expanding existing datasets and increasing usability. Lacuna Fund’s guiding principles include equity, ethics, and participatory approach, and those values are very important for this work. Accordingly, proposals should include a plan for data management and licencing, privacy, and how the data will be shared. The target audience for this call is data scientists, with a focus on under-represented communities in Africa, Asia, and Latin America.

    During the call, we also discussed if participants have specific data gaps in their fields, like a lack of data on how extreme heat events affect human health. The response was a strong “Yes”! Participants described working in “data deserts” where there is often missing data, leading to less accurate machine learning algorithms. Another common issue is data quality and trust in data, especially from “official” sources. Tackling data transparency will be important for creating impactful climate policy. We’d like to ask you the same question: If your group could have access to one data set that would have a large impact on your work, what is that data set?

    - + diff --git a/blog/2022/07/14/flatterer/index.html b/blog/2022/07/14/flatterer/index.html index c14b92051..b70e60df3 100644 --- a/blog/2022/07/14/flatterer/index.html +++ b/blog/2022/07/14/flatterer/index.html @@ -38,7 +38,7 @@ - + @@ -114,6 +114,6 @@ case-studies

    Originally posted on: https://medium.com/opendatacoop/announcing-flatterer-converting-structured-data-into-tabular-data-c4652eae27c9 (opens new window)

    In this blog post, we introduce Flatterer - a new tool that helps convert JSON into tabular data. To hear more about Flatterer, sign up (opens new window) to join David Raznick at the Frictionless Data community call on July 28th.

    Open data needs to be available in formats people want to work with. In our experience at Open Data Services (opens new window), we’ve found that developers often want access to structured data (for example, JSON) while analysts are used to working with flat data (in CSV files or tables).

    More and more data is being published as JSON, but for most analysts this isn’t particularly useful. For many, working with JSON means needing to spend time converting the structured data into tables before they can get started.

    That’s where Flatterer (opens new window) comes in. Flatterer is an opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. It helps people to convert JSON into relational, tabular data that can be easily analysed. It’s fast and memory efficient, and can be run either in the command line (opens new window) or as a Python library (opens new window). The Python library supports creating data frames for all the flattened data, making it easy to analyse and visualise.

    # What does it do?

    With Flatterer you can:

    • easily convert JSON to flat relational data such as CSV, XLSX, Database Tables, Pandas Dataframes and Parquet;
    • convert JSON into data packages, so you can use Frictionless data to convert into any database format;
    • create a data dictionary that contains metadata about the conversion, including fields contained in the dataset, to help you understand the data you are looking at;
    • create a new table for each one-to-many relationship, alongside _link fields that help to join the data together.

    # Why we built it

    When you receive a JSON file where the structure is deeply nested or not well specified, it’s hard to determine what the data contains. Even if you know the JSON structure, it can still be time consuming to work out how to flatten the JSON into a relational structure for data analysis, and to be part of a data pipeline.
    Flatterer aims to be the first tool to go to when faced with this problem. Although you may still need to handwrite code, Flatterer has a number of benefits over most handwritten approaches:

    • it’s fast – written in Rust but with Python bindings for ease of use. It can be 10x faster than hand written Python flattening;
    • it’s memory efficient – flatterer uses a custom streaming JSON parser which means that a long list of objects nested with the JSON will be streamed, so less data needs to be loaded into memory at once;
    • it gives you fast, memory efficient output to CSV/XLSX/SQLITE/PARQUET;
    • it uses best practice that has been learnt from our experience flattening JSON countless times, such as generating keys to link one-to-many tables to their parents.

    # Using Flatterer in the OpenOwnership data pipeline

    As an example, we’ve used Flatterer (opens new window) to help OpenOwnership (opens new window) create a data pipeline to make information about who owns and controls companies available in a variety of data formats (opens new window). In the example below, we’ve used Flatterer to convert beneficial ownership data from the Register of Enterprises of the Republic of Latvia and the OpenOwnership Register from JSON into CSV, SQLite, Postgresql, Big Query and Datasette formats.

    img-1-flatterer

    Alongside converting the data into different formats, Flatterer has created a data dictionary that shows the fields contained in the dataset, alongside the field type and field counts. In the example below, we show how this dictionary interprets person_statement fields contained in the Beneficial Ownership Data Standard.

    img-2-flatterer

    Finally, you can see Flatterer has created special _link fields, to help with joining the tables together. The example below shows how the _link field helps join entity identifiers (opens new window) to statements about beneficial ownership.

    img-3-flatterer

    # What’s next?

    Next, we’ll be working to make Flatterer more user friendly. We’ll be exploring creating a desktop interface, improving type guessing for fields, and giving more summary statistics about the input data. We welcome feedback on the tool through GitHub (opens new window), and are really interested to find out what kind of improvements you’d like to see.

    More information about using Flatterer is available on deepnote (opens new window). To hear more about Flatterer, you can join David Raznick at Frictionless Data’s monthly community call on July 28th.

    # At Open Data Services Cooperative we’re always happy to discuss how developing or implementing open data standards could support your goals. Find out more about our work (opens new window) and get in touch (opens new window).

    - + diff --git a/blog/2022/07/20/lilly-message-to-community/index.html b/blog/2022/07/20/lilly-message-to-community/index.html index 4b9d76b57..8b26e5a5f 100644 --- a/blog/2022/07/20/lilly-message-to-community/index.html +++ b/blog/2022/07/20/lilly-message-to-community/index.html @@ -38,7 +38,7 @@ - + @@ -111,6 +111,6 @@ (opens new window)

    Thank you from Lilly - A message to the community

    Price icons created by Pixel perfect - Flaticon

    Dear Frictionless community,

    I’m writing to let you all know that this is my final week working on Frictionless Data with Open Knowledge Foundation. It has been a true pleasure to get to interact with you all over the last four years! Rest assured that Frictionless Data is in good hands with the team at Open Knowledge (Evgeny, Sara, Shashi, Edgar, and the rest of the OKF tech team).

    What’s next for me? I’m still staying in the data space, moving to product at data.world (did you know they export data as datapackages?)! Maybe you’ll see me presenting a demo at an upcoming Frictionless community call 😉

    If you’ll allow me to reminisce for a few minutes, here are some of my favourite Frictionless memories from my time working on this project:

    The Frictionless Hackathon: In October 2021, we hosted the first-ever Frictionless Hackathon (virtually of course), and it was so cool to see all the projects and contributors from around the world! You can read all about it in the summary blog here (opens new window). Should we do another Hackathon? Let Sara know what you think! (Special shout-out to Oleg who set up the Hackathon software and inspired the entire event!)

    Pilot collaborations: We started our first Reproducible Research pilot collaboration with the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) team in 2019, and learned so much from this implementation! This resulted in a new data processing pipeline for BCO-DMO data managers that used Frictionless to reproducibly clean and document data. This work ultimately led to the creation of the Frictionless Framework. You can check out all the other Pilots on the Adoption page (opens new window) too.

    Fellows: Getting to mentor and teach 17 Fellows was truly a spectacular experience. These current (and future) leaders in open science and open scholarship are people to keep an eye on – they are brilliant! You can read all about their experience as Fellows on their blog (opens new window).

    The Frictionless Team at OKF: I’ve been very lucky to get to work with the best team while being at OKF! Many of you already know how helpful and smart my colleagues are, but in case you don’t know, I will tell you! Evgeny has been carefully leading the technical development of Frictionless with a clear vision, making my job easy and fun. Sara has transformed how the community feels and works, which is no small feat! Shashi and Edgar have only been working on the project for less than a year, but their contributions to the code base and to help answer questions have already made a big impact! I will miss working with these excellent humans, and all of you in the community that have made Frictionless a special place!

    Thank you all for being a part of the Frictionless community and for working with me in the past! I wish you all the best, and maybe I will see some of you in Buenos Aires in April for csv,conf,v7 (opens new window)?

    Cheers!

    Lilly (opens new window)

    - + diff --git a/blog/2022/08/03/community-call-july-flatterer/index.html b/blog/2022/08/03/community-call-july-flatterer/index.html index 62bc07000..970a765d3 100644 --- a/blog/2022/08/03/community-call-july-flatterer/index.html +++ b/blog/2022/08/03/community-call-july-flatterer/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On the last community call on July 28th, we heard David Raznick (an ex OKFer, now working at Open Data Services (opens new window)) presenting Flatterer, a tool he developed to convert structured JSON data into tabular data, using Frictionless Data specifications.

    David has been working with many different open data standards functioning with deeply nested JSON. To make the data in standard formats more human readable, users often flatten JSON files with flattening tools, but the result they get are very large spreadsheets, which can be difficult to work with.

    Flattening tools are also often used to unflatten tabular data in JSON. That way, the data, initially written in a more human readable format, can then be used according to the standards. Unfortunately the result is not optimal, the output of flattening tools is often not user-friendly and the user would probably still need to tweak it by hand, for example modifying headers’ names and/or the way tables are joined together.

    Flatterer aims at making these processes easier and faster. It can convert in the blink of an eye your JSON file in the tabular format of your choice: csv, xlsx, parquet, postgres and sqlite. Flatterer will convert your JSON file into a main table, with keys to link one-to-many tables to their parents. That way the data is tidy and easier to work with.

    If you are interested in knowing more about Flatterer, have a look at David’s presentation and demo:

    You can also read more about the project here: https://flatterer.opendata.coop/ (opens new window), or have a look at the project documentation (opens new window).

    # Join us next month!

    Next community call is on August 25th. Frictionless Data developer Shashi Gharti will discuss with the community a tool she would like to add to the Frictionless Framework. Stay tuned to know more!

    You can sign up for the call already here (opens new window).

    Do you want to share something with the community? Let us know when you sign up.

    Would you like to present at one of the next community calls? Please fill out this form (opens new window).

    Join our community on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window). See you there!

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2022/08/29/frictionless-framework-release/index.html b/blog/2022/08/29/frictionless-framework-release/index.html index 94ba7c301..58bd72dc7 100644 --- a/blog/2022/08/29/frictionless-framework-release/index.html +++ b/blog/2022/08/29/frictionless-framework-release/index.html @@ -38,7 +38,7 @@ - + @@ -217,6 +217,6 @@ it’s not possible to mix dicts and classes in methods like package.add_resource
    it’s not possible to export an invalid descriptor
    This separation might make one to add a few additional lines of code, but it gives us much less fragile programs in the end. It’s especially important for software integrators who want to be sure that they write working code. At the same time, for quick prototyping and discovery Frictionless still provides high-level actions like validate function that are more forgiving regarding user input.

    # Static Typing

    One of the most important consequences of “fixing” state management in Frictionless is our new ability to provide static typing for the framework codebase. This work is in progress but we have already added a lot of types and it successfully pass pyright validation. We highly recommend enabling pyright in your IDE to see all the type problems in-advance:

    type-error

    # Livemark Docs

    We’re happy to announce that we’re finally ready to drop a JavaScript dependency for the docs generation as we migrated it to Livemark. Moreover, Livemark’s ability to execute scripts inside the documentation and other nifty features like simple Tabs or a reference generator will save us hours and hours for writing better docs.

    # Script Execution

    livemark-1

    # Reference Generation

    livemark-2

    # Happy Contributors

    We hope that Livemark docs writing experience will make our contributors happier and allow to grow our community of Frictionless Authors and Users. Let’s chat in our Slack (opens new window) if you have questions or just want to say hi.

    Read Livemark Docs (opens new window) for more information.

    - + diff --git a/blog/2022/08/30/community-call-github-integration/index.html b/blog/2022/08/30/community-call-github-integration/index.html index 0463d6739..ef013fccc 100644 --- a/blog/2022/08/30/community-call-github-integration/index.html +++ b/blog/2022/08/30/community-call-github-integration/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On the last community call on August 25th, we had our very own Frictionless Data developer Shashi Gharti presenting to the community the new Frictionless GitHub integration, to read and write data packages from/to GitHub repositories.

    Besides reading and writing packages, the integration also allows the creation of containers for data packages: the catalog (opens new window), a list of packages from multiple repositories in GitHub. To select which repository you want to be in the catalog, you can use any GitHub qualifier.

    The Frictionless GitHub integration is part of the beta release of Frictionless Framework version 5 (opens new window).

    If you are interested in knowing more about the Frictionless GitHub integration, have a look at Shashi’s presentation and demo:

    You can also check out Shashi’s slides (opens new window) or have a look at the project documentation (opens new window). If you use the Frictionless Framework v5 and its GitHub integration, please let us know! And if you have any feedback, feel free to open an issue in the repository (opens new window)

    # Join us next month!

    Next community call is on September 29th. Frictionless Data lead developer Evgeny Karev will be presenting the Frictionless Framework version 5, so make sure not to miss it!

    You can sign up for the call already here (opens new window).

    Do you want to share something with the community? Let us know when you sign up.

    Would you like to present at one of the next community calls? Please fill out this form (opens new window).

    Join our community on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window). See you there!

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2022/09/15/deploy-solutions/index.html b/blog/2022/09/15/deploy-solutions/index.html index 583531ae7..7c0872416 100644 --- a/blog/2022/09/15/deploy-solutions/index.html +++ b/blog/2022/09/15/deploy-solutions/index.html @@ -38,7 +38,7 @@ - + @@ -138,6 +138,6 @@ You can see a full video of this process on our video showcase page: https://showcase.oasis.climatechange.ca (opens new window)

    # Want to Learn More About Our Software Prototype and Climate Change Software R&D?

    We’ve created a new service to share our climate change software development knowledge. It’s called “OASIS: Software Solutions for Climate Change Problems”.
    The OASIS service includes a free weekly newsletter and premium subscription service. It shares unique insights, content, resources, and guidance on how to use software and data from Earth and space to build solutions for climate change problems.
    Learn more, and subscribe to the free newsletter, at https://oasis.climatechange.ca (opens new window).

    - + diff --git a/blog/2022/09/20/mysociety-workflow/index.html b/blog/2022/09/20/mysociety-workflow/index.html index 3a2d689dc..5345709ca 100644 --- a/blog/2022/09/20/mysociety-workflow/index.html +++ b/blog/2022/09/20/mysociety-workflow/index.html @@ -38,7 +38,7 @@ - + @@ -113,6 +113,6 @@

    Publishing and analysing data: mySociety's workflow

    Originally published on: https://www.mysociety.org/2022/09/13/publishing-and-analysing-data-our-workflow/ (opens new window)

    I recently blogged about the data we’re publishing and making use of in mySociety’s climate programme (opens new window) (and how we want to help people make use of it!). This blog post explores behind the scenes how we’re managing that data, using the GitHub ecosystem and Frictionless Data standards to validate and publish data.

    # How we’re handling common data analysis and data publishing tasks.

    Generally we do all our data analysis in Python and Jupyter notebooks. While we have some analysis using R, we have more Python developers and projects, so this makes it easier for analysis code to be shared and understood between analysis and production projects.

    Following the same basic ideas as (and stealing some folder structure from) the cookiecutter data science (opens new window) approach that each small project should live in a separate repository, we have a standard repository template (opens new window) for working with data processing and analysis.

    The template defines a folder structure, and standard config files for development in Docker and VS Code. A shared data_common library builds a base Docker image (for faster access to new repos), and common tools and utilities that are shared between projects for dataset management. This includes helpers for managing dataset releases, and for working with our charting theme. The use of Docker means that the development environment and the GitHub Actions environment can be kept in sync – and so processes can easily be shifted to a scheduled task as a GitHub Action.

    The advantage of this common library approach is that it is easy to update the set of common tools from each new project, but because each project is pegged to a commit of the common library, new projects get the benefit of advances, while old projects do not need to be updated all the time to keep working.

    This process can run end-to-end in GitHub – where the repository is created in GitHub, Codespaces can be used for development, automated testing and building happens with GitHub Actions and the data is published through GitHub Pages. The use of GitHub Actions especially means testing and validation of the data can live on Github’s infrastructure, rather than requiring additional work for each small project on our servers.

    # Dataset management

    One of the goals of this data management process is to make it easy to take a dataset we’ve built for our purposes, and make it easily accessible for re-use by others.

    The data_common library contains a dataset command line tool – which automates the creation of various config files, publishing, and validation of our data.

    Rather than reinventing the wheel, we use the frictionless data standard (opens new window) as a way of describing the data. A repo will hold one or more data packages (opens new window), which are a collection of data resources (opens new window) (generally a CSV table). The dataset tool detects changes to the data resources, and updates the config files. Changes between config files can then be used for automated version changes.

    mysociety-img-1

    # Data integrity

    Leaning on the frictionless standard for basic validation that the structure is right, we use pytest (opens new window) to run additional tests on the data itself. This means we define a set of rules that the dataset should pass (eg ‘all cells in this column contain a value’), and if it doesn’t, the dataset will not validate and will fail to build.

    This is especially important because we have datasets that are fed by automated processes, read external Google Sheets, or accept input from other organisations. The local authority codes dataset (opens new window) has a number of tests (opens new window) to check authorities haven’t been unexpectedly deleted, that the start date and end dates make sense, and that only certain kinds of authorities can be designated as the county council or combined authority overlapping with a different authority. This means that when someone submits a change to the source dataset, we can have a certain amount of faith that the dataset is being improved because the automated testing is checking that nothing is obviously broken.

    The automated versioning approach means the defined structure of a resource is also a form of automated testing. Generally following the semver rules for frictionless data (opens new window) (exception that adding a new column after the last column is not a major change), the dataset tool will try and determine if a change from the previous version is a MAJOR (backward compatibility breaking), MINOR (new resource, row or column), or PATCH (correcting errors) change. Generally, we want to avoid major changes, and the automated action will throw an error if this happens. If a major change is required, this can be done manually. The fact that external users of the file can peg their usage to a particular major version means that changes can be made knowing nothing is immediately going to break (even if data may become more stale in the long run).

    mysociety-img-2

    # Data publishing and accessibility

    The frictionless standard allows an optional description for each data column. We make this required, so that each column needs to have been given a human readable description for the dataset to validate successfully. Internally, this is useful as enforcing documentation (and making sure you really understand what units a column is in), and means that it is much easier for external users to understand what is going on.

    Previously, we were uploading the CSVs to GitHub repositories and leaving it as that – but GitHub isn’t friendly to non-developers, and clicking a CSV file opens it up in the browser rather than downloading it.

    To help make data more accessible, we now publish a small GitHub Pages site for each repo, which allows small static sites to be built from the contents of a repository (the EveryPolitician project (opens new window) also used this approach). This means we can have fuller documentation of the data, better analytics on access, sign-posting to surveys, and better sign-posted links to downloading multiple versions of the data.

    mysociety-img-3

    The automated deployment means we can also very easily create Excel files that packages together all resources in a package into the same file, and include the meta-data information about the dataset, as well as information about how they can tell us about how they’re using it.

    Publishing in an Excel format acknowledges a practical reality that lots of people work in Excel. CSVs don’t always load nicely in Excel, and since Excel files can contain multiple sheets, we can add a cover page that makes it easier to use and understand our data by packaging all the explanations inside the file. We still produce both CSVs and XLSX files – and can now do so with very little work.

    mysociety-img-4

    For developers who are interested in making automated use of the data, we also provide a small package (opens new window) that can be used in Python or as a CLI tool to fetch the data, and instructions on the download page on how to use it (opens new window).

    mysociety-img-5

    At mySociety Towers, we’re fans of Datasette (opens new window), a tool for exploring datasets. Simon Willison recently released Datasette Lite (opens new window), a version that runs entirely in the browser. That means that just by publishing our data as a SQLite file, we can add a link so that people can explore a dataset without leaving the browser. You can even create shareable links for queries: for example, all current local authorities in Scotland (opens new window), or local authorities in the most deprived quintile (opens new window). This lets us do some very rapid prototyping of what a data service might look like, just by packaging up some of the data using our new approach.

    mysociety-img-6

    # Data analysis

    Something in use in a few of our repos is the ability to automatically deploy analysis of the dataset when it is updated.

    Analysis of the dataset can be designed in a Jupyter notebook (including tables and charts) – and this can be re-run and published on the same GitHub Pages deploy as the data itself. For instance, the UK Composite Rural Urban Classification (opens new window) produces this analysis (opens new window). For the moment, this is just replacing previous automatic README creation – but in principle makes it easy for us to create simple, self-updating public charts and analysis of whatever we like.

    Bringing it all back together and keeping people to up to date with changes

    The one downside of all these datasets living in different repositories is making them easy to discover. To help out with this, we add all data packages to our data.mysociety.org (opens new window) catalogue (itself a Jekyll site that updates via GitHub Actions) and have started a lightweight data announcement email list (opens new window). If you have got this far, and want to see more of our data in future – sign up (opens new window)!

    - + diff --git a/blog/2022/11/02/october-call/index.html b/blog/2022/11/02/october-call/index.html index 2454136cb..1394e1a84 100644 --- a/blog/2022/11/02/october-call/index.html +++ b/blog/2022/11/02/october-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last community call on October 27th, we had our very own Frictionless Data developer Shashi Gharti presenting to the community the Frictionless Zenodo integration, to read and write data packages from and to Zenodo.

    The integration is currently in development, but we decided to present this feature already in order to gather feedback from the community. It was a great idea because we got a lot of very useful inputs from all of you. Also how wonderful to see the community! We had really missed you all in the last two months, since we had to cancel the September call.

    Back to Shashi’s presentation: what is Zenodo? For those of you who don’t know it, Zenodo is an open repository, allowing researchers to deposit papers, datasets, software, reports, etc.

    Many members of our community are active users of Zenodo, and have asked for a plugin which would make it easier to use Frictionless Data and Zenodo together. Since our aim with Frictionless Data is to make data more easily shareable, transportable and interoperable, this feature made a lot of sense.

    Similarly to the GitHub integration Shashi presented in August (opens new window), the Zenodo integration will work with Frictionless-py v5, and has 3 different features to write data, read data and create a catalog from multiple Zenodo entries, searchable

    If you are interested in knowing more about the feature, have a look at Shashi’s presentation and demo:

    You can also check out Shashi’s slides (opens new window). If you use the Frictionless Framework v5 and its Zenodo integration, please let us know! We would love to hear what you think. And if you have any feedback, feel free to open an issue in the repository (opens new window).

    # Other news from the community

    # Join us next month!

    Next community call is on December 1st (we are pushing it back one week because of the US Thanksgiving).

    You can sign up for the call already here (opens new window).

    Do you want to share something with the community? Let us know when you sign up.

    Would you like to present at one of the next community calls? Please fill out this form (opens new window).

    Join our community on Slack (opens new window) (also via Matrix (opens new window)) or Twitter (opens new window). See you there!

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2022/12/07/community-call/index.html b/blog/2022/12/07/community-call/index.html index be6dd0371..ddb9142cf 100644 --- a/blog/2022/12/07/community-call/index.html +++ b/blog/2022/12/07/community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last community call on December 1st, we heard about the new Frictionless Data - CKAN integration from senior developer Edgar Zanella.

    Being a much awaited and longterm requested integration from the community, there are several projects aiming at integrating Frictionless Data with CKAN:

    1. Datapackager CKAN Extension - allowing the import of Data Packages directly to CKAN, and the export of any dataset in your portal as a Data Package
    2. CKAN Validation Extension - providing all the Frictionless Framework validation functionalities to your CKAN portal
    3. CKAN Data Portal supported by Frictionless Framework - providing an easy way to load Data Packages to and from your CKAN portal, using CKAN control
    4. Frictionless CKAN Mapper - a small Python library working behind the scenes to convert datasets formats from CKAN to Frictionless Packages, and vice versa.

    Check out Edgar’s presentation to know more about these projects and to see them demoed:

    If you use any Frictionless Data - CKAN integration, please let us know! We would love to hear what you think.

    Here are all the repos:

    # Join us next month!

    Next community call is on December 22nd, we don’t have any presentation scheduled yet, so if you have a cool project that you would like to show to the community, just let us know! You can just fill out this form (opens new window), or come and tell us on our community chat on Slack (opens new window) (also via Matrix (opens new window)). See you there!

    Also, you can sign up for the call already here (opens new window).

    Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/01/06/datapackage-as-a-service/index.html b/blog/2023/01/06/datapackage-as-a-service/index.html index 16f2a3c8b..7e7eb586b 100644 --- a/blog/2023/01/06/datapackage-as-a-service/index.html +++ b/blog/2023/01/06/datapackage-as-a-service/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On December 22nd, for our last community call of the year, we had a nice discussion with Oleg Lavrovsky, an old friend of Open Knowledge Foundation, board member of the Swiss chapter, and valued member of the Frictionless Data community, about Data Package as a Service.

    Oleg together with Thorben Westerhuys (remember his spatiotemporal covid 19 vaccination tracker he presented in March 2021 (opens new window)?) already made a first attempt at this in 2019, as you can see in this GitHub repo here (opens new window). The repository works as a template to create a quick API around your Frictionless Data Package. This solution is based on the Falcon micro framework (opens new window) and the Pandas Data Package Reader (opens new window).

    More recently Edgar Zanella from the Frictionless Data core team also worked on an experimental solution (opens new window), converting a Data Package to SQLite database and using Datasette (opens new window) to have a JSON API (opens new window) over the data. The advantage of this solution is that the way of querying the data is going to be familiar for those that knows SQL (opens new window).

    Then in November 2022, during the GLAMhack 2022 in Mendrisio, an API for Frictionless Data Packages was needed again to be able to sort data and view it on a map. The end result was a Living Herbarium app (opens new window).

    So Oleg decided to pitch the idea of Frictionless Data Packages as services, as a challenge at the DINAcon hacknights (opens new window) in Bern. The challenge was not picked by anyone at the hackathon itself, but it sparked a conversation in our community chat (opens new window).

    If you are also interested in joining the conversation, just get on the thread in the community chat. If you need a bit of context, you can of course rewatch Oleg’s presentation:

    It was also noted during the call that 2 other excellent ways to get a quick API for Frictionless Data Packages are:

    • The Flat Data project (opens new window), developed on top of an idea by Simon Willison, allows (among other things) to have a quick API for your Data Package.

    • CKAN, since CKAN provides APIs. For example via CKAN-embed (opens new window), a widget for embedding live data searches from CKAN data portals into external websites.

    # Join us next month!

    Next community call is on January 26th and we are going to hear about Frictionless Data and DCAT from Matteo Fortini.

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    And if you have a cool project that you would like to show to the community, please let us know! You can just fill out this form (opens new window), or come and tell us on our community chat on Slack (opens new window) (also via Matrix (opens new window)). See you there!

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/01/31/frictionless-at-fosdem/index.html b/blog/2023/01/31/frictionless-at-fosdem/index.html index d67e6d577..5435e8955 100644 --- a/blog/2023/01/31/frictionless-at-fosdem/index.html +++ b/blog/2023/01/31/frictionless-at-fosdem/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@

    We are very excited to announce that we are going back to FOSDEM this year! Frictionless Data Technical Leader at Open Knowledge Foundation, Evgeny Karev, will give an overview of the main functionalities of the upcoming and much awaited Frictionless Application.

    The talk will be featured in the Open Research Tools and Technology devroom, alongside other super interesting talks. Have a look at the schedule here (opens new window) to know more. This devroom provides a place and time to discuss the issues related to the creation and usage of open research technologies, with the ambition to foster discussions between designers, developers and users, bridging multiple knowledge-based communities together, and with the broader FLOSS community.

    While FOSDEM is going back to Brussels for a physical event this year, the decision was made to keep part of the programme online to allow a broader audience to join. The Frictionless Application overview will be online as well, so you can follow it and participate in the discussion from wherever you are, even if you are not in Brussels.

    This will be our 4th year at FOSDEM, and we are very proud to have been part of the Open Research devroom since its creation 4 years ago. To celebrate this, let’s look back at some of the great moments we shared:

    In 2020 Lilly Winfree presented on-site “Frictionless Data for Reproducible Research” (you can watch the video recording here (opens new window)), a talk in which she discussed the technical ideas behind Frictionless Data for research, and showcased collaborative use cases, particularly the BCO-DMO pilot (opens new window), which was quite new at the time. Lilly showed how implementing Frictionless Data tooling into their data ingest pipelines allowed BCO-DMO to integrate disparate data while maintaining quality metadata in an easy to use interface. You can find all the info about this talk into the FOSDEM archive here (opens new window).

    In 2021, for the first online FOSDEM due to Covid 19, Carles Pina i Estany presented Schema-collaboration (opens new window), an open tool for reproducible research that helps data managers and researchers to collaborate on documenting datasets, built using the Frictionless Data specifications and software. Schema-collaboration was developed during the 2020 Tool Fund. For the FOSDEM presentation Carles gave an overview of the tool and did a demo as well.
    Find all the info (including a video recording of the presentation) on the FOSDEM archive here (opens new window).

    In 2022 our Technical Lead, Evgeny Karev, presented the newly released (at the time) Livemark (opens new window), a tool to publish data articles with interactive tables, charts, and other elements very easily, without leaving a text editor.
    Have a look at the presentation and find all the information about it on the FOSDEM archive here (opens new window).

    And what about this year? We wanted to release the Frictionless Application for FOSDEM, but we had to push that back a little bit. But as we said above, Evgeny is still giving an overview of the Application (opens new window) and some of its main features, which you should definitely not miss! See you online on Saturday, and happy FOSDEM weekend to you all!

    - + diff --git a/blog/2023/02/06/community-call/index.html b/blog/2023/02/06/community-call/index.html index f297ea048..8d96fe4ea 100644 --- a/blog/2023/02/06/community-call/index.html +++ b/blog/2023/02/06/community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    At our last community call on January 26th, we had Matteo Fortini from the Italian National Department of Digital Transformation, who led a discussion about DCAT and Frictionless Data.

    Open data is key to ensure transparency and accountability, understand the world, and have an economy of data. The open data publishing chain in Europe starts with distribution of datasets that go into a national catalogue, which is then harvested by an EU catalogue - all this enabled by metadata.

    In practice, Matteo and his colleagues would publish the data (e.g. on the Next Generation EU funds, or on the National Population Registry) as Frictionless Data with DCAT metadata, a format that is mandatory to get into the EU catalogue.

    The data is gathered on GitHub (a CKAN instance is sadly not available yet) through scripts that are run everyday. The data is published in both CSV and JSON format, with foreign keys to other tabular data (e.g. geographical data for municipalities) and Frictionless metadata to have a standard way to document all the different attributes of the data, to enforce constraints, and ensure data quality in general. On top of that there is the Italian DCAT_AP, and the mandatory attributes for metadata.

    While DCAT is very useful to understand the content, the themes, and the licences, Frictionless Data goes down to attribute descriptions, data types and constraints. So what Matteo would like to have in the future is one type of metadata that would cover both the data description and attributes, and the catalogue information.

    Some efforts were already made in the past by community members Augusto Herrman and Ayrton Bourne to map data packages to DCAT (as documented in this issue (opens new window)). Now Matteo and his colleagues are actively looking for other people who would be interested in creating a working group about this, to try to get to some kind of shared standard.

    Other community members present at the call shared their own experience with Frictionless and DCAT:.

    The German State of Schleswig - Holstein shared a very interesting example (opens new window) from their portal. As they did not find a good way to attach the Frictionless Specification to the DCAT Distribution, they created a separate distribution for the Frictionless Tabular Data Resource. Switzerland took the same approach, linking the Frictionless Specification as a separate distribution, as you can see in this example (opens new window). They are unsure about this approach though, as it seems to be a misuse of the DCAT Class.

    To make Frictionless Data more interoperable with other semantic web standards, Dan Feder pointed out the idea to create RDF or JSON-LD Specification, something that had already been discussed in the past, as documented in this issue (opens new window).

    Do you have anything to add to this? Are you interested in joining the open discussion? Let us know in our community chat on Slack (opens new window) or Matrix (opens new window).

    If you want to know more about Matteo’s presentation, here’s the recording:

    # Join us next month!

    Next community call is on February 23rd and we are going to hear about the database curation software for the World Glacier Monitoring Service (WGMS) from Ethan Welty.

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    And if you have a cool project that you would like to show to the community, please let us know! You can just fill out this form (opens new window), or come and tell us on our community chat.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/03/01/february-community-call/index.html b/blog/2023/03/01/february-community-call/index.html index daaf6f699..3217c38da 100644 --- a/blog/2023/03/01/february-community-call/index.html +++ b/blog/2023/03/01/february-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    At our last community call on February 28th, we heard about generating spreadsheet templates from Tabular Data Package metadata from Ethan Welty.

    Ethan works for the World Glacier Monitoring Service (WGMS) (opens new window), which maintains and curates a single longrunning dataset (with entries dating back to 1894!) combining both satellite measurements, and manual submissions of scientists from around the world who go out to glaciers and measure the mass changes on the ground.

    One of their biggest challenges is that parts of the data are not machine-generated, but inserted by humans. It is therefore important to review the data submissions to try and catch any possible error. To do that, Ethan adopted the Frictionless Tabular Data Package approach, getting as much of the organisation logic and data management into a centralised metadata.

    Plus, to help people doing their data entry, they have spreadsheet templates automatically generated. The file is built in markup language, and is generated from the validation pipeline (which works in a slightly different way than in Frictionless Data, as it scales to a much longer pipeline). The template generator, called Tablecloth, currently supports Excel - as it is what most people who work with the WGMS are comfortable using, and it is soon going to support Google Sheets too.

    If you want to know more about Tablecloth and are interested in having a look at the demo Ethan did on the call, go ahead and have a look at the recording of the presentation:

    You can also check out Tablecloth on GitHub (opens new window) and GitLab (opens new window).

    # Join us next month!

    Next community call is on March 30th and guess what? We do not have any presentations scheduled yet! So this could be your moment to come and tell us about your project! If you are interested in doing so just fill out this form (opens new window), or come and tell us on our community chat.

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/04/06/march-community-call/index.html b/blog/2023/04/06/march-community-call/index.html index 8ba56f9cf..3440811de 100644 --- a/blog/2023/04/06/march-community-call/index.html +++ b/blog/2023/04/06/march-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    At our last community call on March 30th, our very own Evgeny Karev - tech lead of the Frictionless Data project at Open Knowledge Foundation (opens new window), presented the new Frictionless command line features.

    The new commands have been developed as part of the effort of building recommended data workflows for different needs, and might be particularly useful for data wrangling and data exploration. Here they are:

    • List function is a new command to quickly see lists of resources in a dataset.
    • Describe, an old command actually, but that can be part of the exploration workflow as it infers Table Schemas for all tabular resources.
    • Extract, also an old command, can be used to understand what kind of data is in the table, and get a preview of it.
    • Explore, to use in combination with Visidata (opens new window) to edit tables directly in the command line.
    • Query which will put a dataset into a SQLite database, with everything indexed, adding nice functionalities, like the possibility of saving queries as CSV files.
    • Script is a feature that allows dataset indexing and will create Pandas dataframes for you.
    • Convert, a work-in-progress command that can be used to convert from one format to the other, something that was historically done with the Extract function in the Framework.
    • Publish is also a work-in-progress command, and you can use it to upload your dataset to a data portal (e.g. a CKAN instance) just providing an API key.

    To better understand how you can use all these new commands, have a look at Evgeny’s presentation and demo:

    # Join us next month!

    Next community call is on April 27th. Keith Hughitt will share with us his ideas on how to improve support for non-tabular data, a proposed abstract data model, and a specification for describing the relationship between datasets.

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window)(also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol) .

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call, including the short presentation and community discussion on the project governance:

    - + diff --git a/blog/2023/05/08/april-community-call/index.html b/blog/2023/05/08/april-community-call/index.html index 40bda0897..625efd19c 100644 --- a/blog/2023/05/08/april-community-call/index.html +++ b/blog/2023/05/08/april-community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    At our last community call on April 27th Keith Hughitt presented his ideas on how we can improve support for non-tabular data, and on how we could build a specification for describing the relationship between datasets. It took me some time to write this recap blog, because some of the reflections that Keith shared with us resonated very much with some of the thinking we have been doing at Open Knowledge Foundation around governance. I had explained during the March community call (opens new window) that the governance of the specs has been recently unblocked, and we are starting to think about how to get to v2. It was actually Keith who urged me to do that presentation to clarify the project governance (and I am so glad he did!).

    Keith’s main goals are pretty clear: 1. He wants datasets to be soft contained and well defined enough to be combinable with minimal effort. Datasets should function like lego blocks, which is the way Frictionless Data works too. 2. He wants transparency on how the data is processed and communicated, as this is key to reproducibility.

    At the moment the Frictionless Data specs have a strong focus on tabular data, and Keith would like to extend that same kind of support to other types of data as well. Having some kind of common spec would be very useful for all those who work with more than one type of data, and he feels something can be done to make that work easier.

    # So what does Keith have in mind?

    He argues that we should separate the description of structure (data types) and domain (fields that are included in one discipline). This is easy to achieve because Frictionless is modular by design.

    We should take some intentional action to design a high-level model, so that even if we leave it to community members to build domain-specific specs, the core Frictionless team at Open Knowledge Foundation would oversee that they all still have a common core data model which allows all the different extensions to interact easily.

    Keith suggests using a mix-in approach, where the domain-specific schema would be made by combining specs (data type/structure + data domain). This would make sense to avoid redundancy in the code structure.

    It would be important to have a working group with representatives from different disciplines, and working in different capacities, to build together this common data model in a way that really fits the needs of everyone (or at least find some minimal common ground). This is exactly the direction we would like the project to move forward. We are working on it, so stay tuned!

    Meanwhile, if you want to know more about Keith’s ideas, you can watch the recording of his presentation:

    # Join us next month!

    Next community call is on May 25th.

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window)(also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol) .

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/05/12/csv-conf/index.html b/blog/2023/05/12/csv-conf/index.html index 84c4572fc..f32662997 100644 --- a/blog/2023/05/12/csv-conf/index.html +++ b/blog/2023/05/12/csv-conf/index.html @@ -38,7 +38,7 @@ - + @@ -119,6 +119,6 @@ Watch the recording of Augusto’s talk here (opens new window).

    FastELT

    Defining The Turing Way by Malvika Sharan, Melissa Black, and Esther Plomp:
    Malvika Sharan, Melissa Black, and Esther Plomp discussed The Turing Way, an initiative aimed at providing reproducible, ethical, and inclusive research practices. In their talk, Frictionless Data was acknowledged as a useful tool to achieve these goals. By adopting Frictionless Data’s principles and tools, researchers can ensure data integrity, enhance collaboration, and promote openness in their work.
    Watch the recording of the talk here (opens new window). And if you are interested in knowing more about the Turing Way, go and have a look at the recap blog (opens new window) of the community call of April last year, with Turing Way’s community manager Anne Steele.

    Defining the Turing Way

    - + diff --git a/blog/2023/06/05/community-call-fastetl/index.html b/blog/2023/06/05/community-call-fastetl/index.html index e4dde17cf..4b582c0e8 100644 --- a/blog/2023/06/05/community-call-fastetl/index.html +++ b/blog/2023/06/05/community-call-fastetl/index.html @@ -38,7 +38,7 @@ - + @@ -116,6 +116,6 @@ community-hangout

    At our last community call on May 25th Augusto Herrmann presented FastETL, a free and open source software library for Apache Airflow that makes it easier to integrate heterogeneous data sources and to publish open data (e.g. to CKAN data portals) using Apache Airflow.

    Augusto told us how the data engineering team at the Secretariat for Management and Innovation in the Brazilian federal government has been using FastETL in combination with the Frictionless Framework, and Tabular Data Packages for processing data pipelines and to publish open data.

    Augusto and his team have developed FastETL, among other things, to be able to periodically synchronise data sources in the data lake, publish open data on open data portals, and be notified about publications in the official gazette.

    Some of the things that you can do with FastETL are:

    • Full or incremental replication of tables in SQL Server, and Postgres databases (and MySQL sources).
    • Load data from GSheets and from spreadsheets on Samba/Windows networks.
    • Extract CSVs from SQL.
    • Query the Brazilian National Official Gazette’s API, and get a notification when there is a new publication in the Official Gazette.
    • Use CKAN or dados.gov.br (opens new window)’s API to update dataset metadata.
    • Use Frictionless Tabular Data Packages to write data dictionaries in OpenDocument Text format.

    Would you like to know more? You can have a look at Augusto’s slides on his website here (opens new window), or check out the FastETL GitHub Repository (opens new window).
    And if you want to better understand how to use FastETL, have a look at Augusto’s presentation, with some great data pipeline examples:

    # Join us next month!

    Next community call is on June 29th, and it will be a hands-on session on strange datasets and how to describe them! Jesper Zedlitz from the German federal state of Schleswig-Holstein will be bringing one. Let us know if you would also like to bring a dataset to this call, by emailing Sara Petti sara.petti[at]okfn.org (opens new window).

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window) (also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol) .

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/07/05/community-call/index.html b/blog/2023/07/05/community-call/index.html index 6c58dea11..34a80b29f 100644 --- a/blog/2023/07/05/community-call/index.html +++ b/blog/2023/07/05/community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On June 29th we had our last monthly call, and it was kind of a special one! Instead of the usual project presentation, we had a hands-on session on strange datasets and how to describe them.

    Our community member Jesper Zedlitz comes regularly across very weird datasets in his day-to-day work, and had asked in the May community call, whether it was possible to bring some of them to the call and check them out together with the community to try to make sense of them all together. This turned out to be an excellent idea for a fun call!

    So what kind of problems is Jesper encountering?

    • Sometimes we have extra information on the dataset, the licence, etc. at the beginning and comments at the end of the csv, so some rows need to be ignored. This is easy to do for the top part of the dataset, but it’s harder for the bottom part. Something we will definitely need to think about for the next iteration of the Frictionless specs, for example by giving the possibility to have a “headline row”, or something like that. This was a common problem for other community members too.

    • Sometimes we don’t have any information at all: Jesper showed us some CSVs without any headerlines, where it’s up to you to figure out what kind of data is in there.

    • The dialect (e.g. weird delimiters) and character encoding are sometimes tricky too, but that’s already easy to manage with the Frictionless specs.

    Do you want to know more about the strange datasets that Jesper has shown us during the call? Then you should watch the full recording of the call:

    # Join us in August!

    Exceptionally we won’t have any community call in July, so see you all on August 31st!

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window)(also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol) .

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    - + diff --git a/blog/2023/10/02/community-call/index.html b/blog/2023/10/02/community-call/index.html index 089c17e12..abebea966 100644 --- a/blog/2023/10/02/community-call/index.html +++ b/blog/2023/10/02/community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    After 2 long months of absence, our monthly community call was finally back on September 28th, with some very exciting news! Our Tech Lead Evgeny Karev presented the work that has absorbed so much of his and Shashi Gharti’s time in the last months: the Frictionless no-code application Open Data Editor (opens new window).

    The problem that inspired this new tool is that still today there is no easy tool to manage and publish data for those who don’t have technical skills. The new Open Data Editor offers the possibility to access all Frictionless functionalities without having to write one single line of code, nor open your shell. Like most Frictionless products, Open Data Editor focuses on tabular data, and it can easily open big files because it uses the database under the hood (similarly to CKAN). You can use it to edit metadata, and declare some rules for opening that you can share with your collaborators, making your data more reproducible.

    You can use it to create data visualisation with VegaLite, but Open Data Editor has also an AI support, which you can use to create charts for you, in case you don’t know how to use the VegaLite specifications. You can also publish data stories, and much much more! Check out Evgeny’s presentation to see all the great features of the Open Data Editor:

    The application is still a work in progress, but if you would like to try it out that’s of course absolutely possible, and we would love it if you could give us feedback, so please let us know if you spot anything weird. To make the experience smoother, we have a detailed documentation website (opens new window) you can consult.

    # Join us in October!

    Next community call is on October 26th, join us to hear exciting news about the Frictionless specs (opens new window) update!

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window)(also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol) .

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    On a final note, here is the recording of the full call:

    - + diff --git a/blog/2023/11/06/community-call/index.html b/blog/2023/11/06/community-call/index.html index c48243c2f..dcd098d1a 100644 --- a/blog/2023/11/06/community-call/index.html +++ b/blog/2023/11/06/community-call/index.html @@ -38,7 +38,7 @@ - + @@ -115,6 +115,6 @@ community-hangout

    On our last community call on October 25th, we started discussing with the community the Frictionless specs update. Thanks to the generous support of NLnet (opens new window), the Frictionless core team, together with a working group composed of members of the community, will be focusing on this for the coming months.

    # What is this update all about?

    Our main goal is really to make the current Frictionless specs a finished product. We have a list of GitHub issues that we will use as a starting point for this iteration, but we would like to follow decisions made by the working group.

    Please note, there will be no breaking changes (we can hear your sigh of relief!).

    As a next step, we will write a separate blog that will serve as a reference for the overarching goals and roadmap of the project.

    The core Frictionless team at Open Knowledge Foundation will also draft a governance model to apply to the review process and how things get merged. Ideally we would like to test and build a new governance model that would delegate more decisions to the community, and that would then stay in place beyond the v2 release, to improve the project sustainability.

    Another key goal is to increase diversity to get better representation when we think about things. We have a couple of ideas in mind, but we welcome any suggestion you may have.

    # Join us in November!

    Next community call is on November 30th, join us to hear all the exciting news about the Frictionless specs (opens new window) update!

    Do you have something you would like to present to the community at one of the upcoming calls? Let us know via this form (opens new window), or come and tell us on our community chat on Slack (opens new window)(also accessible via a Matrix bridge (opens new window) if you prefer to use an open protocol).

    You can sign up for the call already here (opens new window). Do you want to share something with the community? Let us know when you sign up.

    # Call Recording

    Here is the recording of the full call:

    # Thank you

    On a final note, we would like to thank all community members who joined the call and who keep all these discussions alive, and those who manifested their interest in joining the specs working group. Without you, all of this would not be possible.

    - + diff --git a/blog/index.html b/blog/index.html index e8f72a26a..d3025d14b 100644 --- a/blog/index.html +++ b/blog/index.html @@ -30,7 +30,7 @@ - + @@ -190,6 +190,6 @@
  • Matrix
  • - + diff --git a/blog/page/10/index.html b/blog/page/10/index.html index 57c08e30d..2043b25cb 100644 --- a/blog/page/10/index.html +++ b/blog/page/10/index.html @@ -30,7 +30,7 @@ - + @@ -183,6 +183,6 @@
  • Matrix
  • - + diff --git a/blog/page/11/index.html b/blog/page/11/index.html index 27a36ae49..d5de6adce 100644 --- a/blog/page/11/index.html +++ b/blog/page/11/index.html @@ -30,7 +30,7 @@ - + @@ -184,6 +184,6 @@
  • Matrix
  • - + diff --git a/blog/page/12/index.html b/blog/page/12/index.html index 294d463ef..0e3477f47 100644 --- a/blog/page/12/index.html +++ b/blog/page/12/index.html @@ -30,7 +30,7 @@ - + @@ -128,12 +128,6 @@
    July 16, 2018 by Frictionless Data

    This guide explores the options available to represent point location data in a CSV file within a Data Package.

    -

  • Packaging your Data -
    - July 16, 2018 by Frictionless Data -

    You can package any kind of data as a Data Package.

  • Validated tabular data
    July 16, 2018 by Frictionless Data @@ -143,7 +137,13 @@ Goodtables CLI field-guide -

    When it comes to validating tabular data, you have some great tools at your disposal. We take a look at a couple of ways to utilise goodtables.

  • When it comes to validating tabular data, you have some great tools at your disposal. We take a look at a couple of ways to utilise goodtables.

  • Packaging your Data +
    + July 16, 2018 by Frictionless Data +

    You can package any kind of data as a Data Package.

    +

  • Matrix
  • - + diff --git a/blog/page/13/index.html b/blog/page/13/index.html index 6279dad9a..0ec2e8f03 100644 --- a/blog/page/13/index.html +++ b/blog/page/13/index.html @@ -30,7 +30,7 @@ - + @@ -182,6 +182,6 @@
  • Matrix
  • - + diff --git a/blog/page/14/index.html b/blog/page/14/index.html index a75e28c81..c00cd4514 100644 --- a/blog/page/14/index.html +++ b/blog/page/14/index.html @@ -30,7 +30,7 @@ - + @@ -174,6 +174,6 @@
  • Matrix
  • - + diff --git a/blog/page/15/index.html b/blog/page/15/index.html index 47550bc94..698103316 100644 --- a/blog/page/15/index.html +++ b/blog/page/15/index.html @@ -30,7 +30,7 @@ - + @@ -173,6 +173,6 @@
  • Matrix
  • - + diff --git a/blog/page/16/index.html b/blog/page/16/index.html index 4b7fb8c6f..060b0ce68 100644 --- a/blog/page/16/index.html +++ b/blog/page/16/index.html @@ -30,7 +30,7 @@ - + @@ -128,12 +128,7 @@ March 28, 2017 by Ida Lucente

    Turnkey data to data science, analytics and software teams in healthcare industry.

  • Open Power System Data -
    - November 15, 2016 by Lion Hirth and Ingmar Schlecht -

    A free-of-charge and open platform providing the data needed for power system analysis and modeling.

  • Turnkey data to data science, analytics and software teams in healthcare industry.

  • Dataship
    November 15, 2016 by Frictionless Data

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

  • A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

  • Open Power System Data +
    + November 15, 2016 by Lion Hirth and Ingmar Schlecht +

    A free-of-charge and open platform providing the data needed for power system analysis and modeling.

  • Tesera
    November 15, 2016 by Spencer Cox
  • Matrix
  • - + diff --git a/blog/page/17/index.html b/blog/page/17/index.html index f5382a854..ab5c52088 100644 --- a/blog/page/17/index.html +++ b/blog/page/17/index.html @@ -30,7 +30,7 @@ - + @@ -133,6 +133,6 @@
  • Matrix
  • - + diff --git a/blog/page/2/index.html b/blog/page/2/index.html index 293156e40..65e7acd8e 100644 --- a/blog/page/2/index.html +++ b/blog/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -182,6 +182,6 @@
  • Matrix
  • - + diff --git a/blog/page/3/index.html b/blog/page/3/index.html index 38c53211f..38a228345 100644 --- a/blog/page/3/index.html +++ b/blog/page/3/index.html @@ -30,7 +30,7 @@ - + @@ -184,6 +184,6 @@
  • Matrix
  • - + diff --git a/blog/page/4/index.html b/blog/page/4/index.html index 5eca489e9..a8170ed05 100644 --- a/blog/page/4/index.html +++ b/blog/page/4/index.html @@ -30,7 +30,7 @@ - + @@ -182,6 +182,6 @@
  • Matrix
  • - + diff --git a/blog/page/5/index.html b/blog/page/5/index.html index 9f90224ac..6b80acb33 100644 --- a/blog/page/5/index.html +++ b/blog/page/5/index.html @@ -30,7 +30,7 @@ - + @@ -182,6 +182,6 @@
  • Matrix
  • - + diff --git a/blog/page/6/index.html b/blog/page/6/index.html index be04dae47..8a155a4e3 100644 --- a/blog/page/6/index.html +++ b/blog/page/6/index.html @@ -30,7 +30,7 @@ - + @@ -188,6 +188,6 @@
  • Matrix
  • - + diff --git a/blog/page/7/index.html b/blog/page/7/index.html index 92c77ba58..77329689d 100644 --- a/blog/page/7/index.html +++ b/blog/page/7/index.html @@ -30,7 +30,7 @@ - + @@ -180,6 +180,6 @@
  • Matrix
  • - + diff --git a/blog/page/8/index.html b/blog/page/8/index.html index 662ff965f..97c51fab1 100644 --- a/blog/page/8/index.html +++ b/blog/page/8/index.html @@ -30,7 +30,7 @@ - + @@ -180,6 +180,6 @@
  • Matrix
  • - + diff --git a/blog/page/9/index.html b/blog/page/9/index.html index dd8cbe312..274d82f21 100644 --- a/blog/page/9/index.html +++ b/blog/page/9/index.html @@ -30,7 +30,7 @@ - + @@ -178,6 +178,6 @@
  • Matrix
  • - + diff --git a/design/index.html b/design/index.html index c332fb300..c46b8e2a1 100644 --- a/design/index.html +++ b/design/index.html @@ -30,7 +30,7 @@ - + @@ -111,6 +111,6 @@

    # Dark Logotype



    - + diff --git a/development/architecture/index.html b/development/architecture/index.html index 6726bda31..bd5314e97 100644 --- a/development/architecture/index.html +++ b/development/architecture/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Frictionless Architecture

    Design


    - + diff --git a/development/process/index.html b/development/process/index.html index c53bf7db4..aa0b0df39 100644 --- a/development/process/index.html +++ b/development/process/index.html @@ -30,7 +30,7 @@ - + @@ -105,6 +105,6 @@

    # Frictionless Process

    This document proposes a process to work on the technical side of the Frictionless Data project. The goal - have things manageable for a minimal price.

    # Project

    The specific of the project is a huge amount of components and actors (repositories, issues, contributors etc). The process should be effective in handling this specific.

    # Process

    The main idea to focus on getting things done and reduce the price of maintaining the process instead of trying to fully mimic some popular methodologies. We use different ideas from different methodologies.

    # Roles

    • Product Owner (PO)
    • Product Manager (PM)
    • Developer Advocate (DA)
    • Technical Lead (TL)
    • Senior Developer (SD)
    • Junior Developer (JD)

    # Board

    We use a kanban board located at https://github.com/orgs/frictionlessdata/projects/2?fullscreen=true (opens new window) to work on the project. The board has following columns (ordered by issue stage):

    • Backlog - unprocessed issues without labels and processed issues with labels
    • Priority - prioritized issues planned for the next iterations (estimated and assigned)
    • Current - current iteration issues promoted on iteration planning (estimated and assigned)
    • Review - issues under review process
    • Done - completed issues

    # Workflow

    The work on the project is a live process splitted into 2 weeks iterations between iteration plannings (including retrospection):

    • Inside an iteration assigned persons work on their current issues and subset of roles do issues processing and prioritizing
    • During the iteration planning the team moves issues from the Priority column to the Current column and assign persons. Instead of issue estimations assigned person approves amount of work for the current iteration as a high-level estimation.

    # Milestones

    As milestones we use concrete achievements e.g. from our roadmap. It could be tools or spec versions like “spec-v1”. We don’t use the workflow related milestones like “current” of “backlog” managing it via the board labeling system.

    # Labels

    Aside internal waffle labels and helpers labels like “question” etc we use core color-coded labels based on SemVer. The main point of processing issues from Inbox to Backlog is to add one of this labels because we need to plan releases, breaking announces etc:

    labels

    # Assignments

    Every issue in the Current column should be assigned to some person with meaning “this person should do some work on this issue to unblock it”. Assigned person should re-assign an issue for a current blocker. It provides a good real-time overview of the project.

    # Analysis

    After planning it’s highly recommended for an assigned person to write a short plan of how to solve the issue (could be a list of steps) and ask someone to check. This work could be done on some previous stages by subset of roles.

    # Branching

    We use Git Flow with some simplifications (see OKI coding standards). Master branch should always be “green” on tests and new features/fixes should go from pull requests. Direct committing to master could be allowed by subset of roles in some cases.

    # Pull Requests

    A pull request should be visually merged on the board to the corresponding issue using “It fixes #issue-number” sentence in the pull request description (initial comment). If there is no corresponding issue for the pull request it should be handled as an issue with labeling etc.

    # Reviews

    After sending a pull request the author should assign the pull request to another person “asking” for a code review. After the review code should be merged to the codebase by the pull request author (or person having enough rights).

    # Documentation

    By default documentation for a tool should be written in README.md (opens new window) not using additional files and folders. It should be clean and well-structured. API should be documented in the code as docstrings. We compile project level docs automatically.

    # Testings

    Tests should be written using OKI coding standards. Start write tests from top (match high-level requirements) to bottom (if needed). The most high-level tests are implemented as testsuites on project level (integration tests between different tools).

    # Releasing

    We use SemVer for versioning and Github Actions for testing and releasing/deployments. We prefer short release cycle (features and fixes could be released immediately). Releases should be configured using tags based on package examples workflow provided by OKI.

    The release process:

    • merge changes to the main branch on GitHub
      • use “Squash and Merge”
      • use clean commit message
    • pull the changes locally
    • update the software version according to SemVer rules
      • in Python projets we use <name>/assets/VERSION
      • in JavaScript projects we use standard package.json
    • update a CHANGELOG file adding info about new feature or important changes
    • run main release (it will release automatically)

    # References


    - + diff --git a/development/roadmap/index.html b/development/roadmap/index.html index 0e0121d8e..3b9eb7a4a 100644 --- a/development/roadmap/index.html +++ b/development/roadmap/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Frictionless Roadmap


    - + diff --git a/feed.atom b/feed.atom index 0a52d6647..57c0a3593 100644 --- a/feed.atom +++ b/feed.atom @@ -2,7 +2,7 @@ https://frictionlessdata.io Frictionless Data - 2023-11-07T09:32:15.664Z + 2023-11-15T16:02:05.855Z https://github.com/webmasterish/vuepress-plugin-feed diff --git a/hackathon/README(pt-br).html b/hackathon/README(pt-br).html index eb787f121..6cd38808b 100644 --- a/hackathon/README(pt-br).html +++ b/hackathon/README(pt-br).html @@ -30,7 +30,7 @@ - + @@ -105,6 +105,6 @@

    # Junte-se à comunidade de Frictionless Data para dois dias de Hackathon virtual em 7 e 8 de Outubro!

    As inscrições já estão abertas no formulário: https://forms.gle/ZhrVfSBrNy2UPRZc9 (opens new window)

    # O que é um hackathon?

    Você irá trabalhar com um grupo de outros usuários do Frictionless para criar novos protótipos baseados no código aberto do projeto. Por exemplo: usar a nova ferramenta Livemark (opens new window) para criar sites de storytelling de dados ou o React Components (opens new window) para adicionar a camada de validação de dados da sua aplicação.

    # Quem pode participar deste hackathon?

    Nós estamos buscando contribuições de todos os tamanhos e níveis de habilidades! Algumas habilidades que você poderá trazer incluem: programação em Python (ou outras linguagens também!), escrever documentação, gestão de projetos, design, muitas ideias e muito entusiasmo! Você estará em um time para que vocês possam aprender e ajudar uns aos outros. Você não precisa ser familiarizado com o Frictionless ainda - poderá aprender durante o evento.

    # Por que eu deveria participar?

    Em primeiro lugar, porque vai ser divertido! Você conhecerá outros usuários do Frictionless e aprenderá algo novo. Esta também é uma oportunidade em que você terá o suporte contínuo da principal equipe do Frictionless para ajudar a realizar seu protótipo. Além disso, haverá prêmios (detalhes em breve).

    # Quando o hackathon ocorrerá?

    O hackathon será virtual e ocorrerá de 7 a 8 de outubro. O evento terá início de madrugada em 7 de outubro e terminará de tarde em 8 de outubro no horário brasileiro (os horários exatos serão anunciados em breve). Isso permitirá que pessoas de todo o mundo participem em um período que seja bom para elas. Estaremos usando Github e Zoom para coordenar e trabalhar virtualmente. As equipes serão capazes de se formar desde antes para que possam se organizar e estarem prontos quando a hora chegar.

    # Quero me inscrever!

    Use este formulário (opens new window) para se registrar. O evento será gratuito e também teremos algumas bolsas para participantes que, de outra forma, não poderiam comparecer. Inscreva-se para uma bolsa de U$ 300 usando este formulário de bolsa (opens new window)

    # Quais projetos estarão no Hackathon?

    Os projetos vão desde um plug-in GeoJSON para frictionless-py, à código Python para trabalhar com Datapackages em CKAN, à criação de um site estático para listar todos os conjuntos de dados Frictionless no GitHub até à criação de novos tutoriais para código Frictionless.
    Todos os projetos serão adicionados ao painel do evento em https://frictionless-hackathon.herokuapp.com/event/1#top (opens new window), mantido por DribDat (opens new window).
    Interessado em trabalhar em seu próprio projeto? Envia-nos um email!

    # Eu tenho dúvidas…

    Envie um e-mail para frictionlessdata@okfn.org se você tiver dúvidas ou quiser apoiar o Hackathon.


    - + diff --git a/hackathon/index.html b/hackathon/index.html index b15399753..0281dcc14 100644 --- a/hackathon/index.html +++ b/hackathon/index.html @@ -30,7 +30,7 @@ - + @@ -105,6 +105,6 @@

    # Join the Frictionless Data community for a two-day virtual Hackathon on 7-8 October!

    Registration is now open using this form: https://forms.gle/ZhrVfSBrNy2UPRZc9 (opens new window)

    See the Participation Guide at the bottom for more info!

    # What’s a hackathon?

    You’ll work within a group of other Frictionless users to create new project prototypes based on existing Frictionless open source code. For example, use the new Livemark (opens new window) tool to create websites that display data-driven storytelling, or use Frictionless React Components (opens new window) to add data validation to your application.

    # Who should participate in this hackathon?

    We’re looking for contributions of all sizes and skill levels! Some skills that you would bring include: coding in Python (other languages supported too!), writing documentation, project management, having ideas, design skills, and general enthusiasm! You’ll be in a team, so you can learn from each other and help each other. You don’t have to be familiar with Frictionless yet - you can learn that during the event.

    # Why should I participate?

    First of all, it will be fun! You’ll meet other Frictionless users and learn something new. This is also an opportunity where you’ll have the uninterrupted support of the Frictionless core team to help you realize your prototype. Also, there will be prizes (details to be announced later).

    # When will the hackathon occur?

    The hackathon will be virtual and occur on 7-8 October. The event will start at 9am CEST on 7 October, and will end at 6pm CEST on 8 October. This will allow people from around the world to participate during a time that works for them. We will be using Github and Zoom to coordinate and work virtually. Teams will be able to form before the event occurs so you can start coordinating early and hit the ground running.

    # Sign me up!

    Use this form (opens new window) to register. The event will be free, and we will also have some scholarships for attendees that would otherwise be unable to attend. Apply for a $300 scholarship using this scholarship form (opens new window).

    # What projects will be at the Hackathon?

    Projects will range from a GeoJSON Plugin for frictionless-py, to Python code to work with Datapackages in CKAN, to creating a static site to list all the Frictionless datasets on GitHub, to creating new tutorials for Frictionless code.
    All of the projects will be added to the event dashboard at https://frictionless-hackathon.herokuapp.com/event/1#top (opens new window), powered by DribDat (opens new window).
    Interested in working on your own project? Email us!

    # I have questions…

    Please email us at frictionlessdata@okfn.org if you have questions or would like to support the Hackathon.

    # Participation Guide

    (Here is a link to the Guide (opens new window))


    - + diff --git a/img/blog/ Specs-Update-2023.png b/img/blog/ Specs-Update-2023.png new file mode 100644 index 000000000..6e4fe7f64 Binary files /dev/null and b/img/blog/ Specs-Update-2023.png differ diff --git a/index.html b/index.html index 5b02a116f..410ac82a7 100644 --- a/index.html +++ b/index.html @@ -33,7 +33,7 @@ - + @@ -113,6 +113,6 @@
  • Matrix
  • - + diff --git a/introduction/index.html b/introduction/index.html index 6a2663cf3..8ad3d53a0 100644 --- a/introduction/index.html +++ b/introduction/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Frictionless Data

    Get a quick introduction to Frictionless in “5 minutes”.

    Frictionless Data is a progressive open-source framework for building data infrastructure – data management, data integration, data flows, etc. It includes various data standards and provides software to work with data.

    TIP

    This introduction assumes some basic knowledge about data. If you are new to working with data we recommend starting with the first module, “What is Data?”, at School of Data (opens new window).

    # Why Frictionless?

    The Frictionless Data project aims to make it easier to work with data - by reducing common data workflow issues (what we call friction). Frictionless Data consists of two main parts, software and standards.

    Structure

    # Frictionless Software

    The software is based on a suite of data standards that have been designed to make it easy to describe data structure and content so that data is more interoperable, easier to understand, and quicker to use. There are several aspects to the Frictionless software, including two high-level data frameworks (for Python and JavaScript), 10 low-level libraries for other languages, like R, and also visual interfaces and applications. You can read more about how to use the software (and find documentation) on the projects page.

    For example, here is a validation report created by the Frictionless Repository (opens new window) software. Data validation is one of the main focuses of Frictionless Data and this is a good visual representation of how the project might help to reveal common problems working with data.

    Report

    # Frictionless Standards

    The Standards (aka Specifications) help to describe data. The core specification is called a Data Package, which is a simple container format used to describe and package a collection of data files. The format provides a contract for data interoperability that supports frictionless delivery, installation and management of data.

    A Data Package can contain any kind of data. At the same time, Data Packages can be specialized and enriched for specific types of data so there are, for example, Tabular Data Packages for tabular data, Geo Data Packages for geo data, etc.

    To learn more about Data Packages and the other specifications, check out the projects page or watch this video to learn more about the motivation behind packaging data.

    # How can I use Frictionless?

    You can use Frictionless to describe your data (add metadata and schemas), validate your data, and transform your data. You can also write custom data standards based on the Frictionless specifications. For example, you can use Frictionless to:

    • easily add metadata to your data before you publish it.
    • quickly validate your data to check the data quality before you share it.
    • build a declarative pipeline to clean and process data before analyzing it.

    Usually, new users start by trying out the software. The software gives you an ability to work with Frictionless using visual interfaces or programming languages.

    As a new user you might not need to dive too deeply into the standards as our software incapsulates its concepts. On the other hand, once you feel comfortable with Frictionless Software you might start reading Frictionless Standards to get a better understanding of the things happening under the hood or to start creating your metadata descriptors more proficiently.

    # Who uses Frictionless?

    The Frictionless Data project has a very diverse audience, ranging from climate scientists, to humanities researchers, to government data centers.

    Audience

    During our project development we have had various collaborations with institutions and individuals. We keep track of our Pilots and Case Studies with blog posts, and we welcome our community to share their experiences using our standards and software. Generally speaking, you can apply Frictionless in almost every field where you work with data. Your Frictionless use case could range from a simple data table validation to writing complex data pipelines.

    # Ready for more?

    As a next step, we recommend you start using one of our Software projects, get known our Standards or read about other user experience in Pilots and Case Studies sections. Also, we welcome you to reach out on Slack (opens new window) or Matrix (opens new window) to say hi or ask questions!


    - + diff --git a/people/index.html b/people/index.html index 026fce3a7..2881dfafa 100644 --- a/people/index.html +++ b/people/index.html @@ -30,7 +30,7 @@ - + @@ -179,6 +179,6 @@
    Lily Zhao

    Lily Zhao

    Work
    Work icons created by Freepik - Flaticon
    Reproducible Research Fellow 2019-2020
    City
    World icons created by Freepik - Flaticon
    USA

    # Code Contributors

    Frictionless Data is possible due to our awesome contributor community. You can click on the pictures below to see code contributions in detail. This is only a subset of all the people working on the project - please take a look on our Github Organization (opens new window) to view more. Are you interested in contributing? Check out our Contributing page to get started.

    project

    website

    specs

    datahub.io

    frictionless-py

    frictionless-js

    frictionless-r

    datapackage-py

    tableschema-py

    datapackage-js

    tableschema-js

    datapackage-rb

    tableschema-rb

    datapackage-php

    tableschema-php

    datapackage-java

    tableschema-java

    datapackage-go

    tableschema-go

    datapackage-r

    tableschema-r

    datapackage-swift

    tableschema-swift

    datapackage-jl

    tableschema-jl

    datapackage-clj

    tableschema-clj


    - + diff --git a/projects/index.html b/projects/index.html index ad72fce05..ba1033d34 100644 --- a/projects/index.html +++ b/projects/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Frictionless Projects

    Open source projects for working with data.

    The Frictionless Data project provides a rich set of open source projects for working with data. There are tools, a visual application, and software for many programming platforms.

    TIP

    This document is an overview of the Frictionless Projects - for more in-depth information, please click on one of the projects below and you will be redirected to a corresponding documentation portal.

    # Software and Standards

    It’s a list of core Frictionless Projects developed by the core Frictionless Team:

    Frictionless Application

    Data management application for Browser and Desktop for working with tabular data.

    Frictionless Framework

    Python framework to describe, extract, validate, and transform tabular data.

    Livemark

    Static site generator that extends Markdown with charts, tables, scripts, and more.

    Frictionless Repository

    Github Action allowing you to validate tabular data on every commit to your repository.

    Frictionless Standards

    Lightweight yet comprehensive data standards as Data Package and Table Schema.

    Datahub

    A web platform built on Frictionless Data that allows discovering, publishing, and sharing data.

    # Which software is right for me?

    Choosing the right tool for the job can be challenging. Here are our recommendations:

    # Visual Interfaces

    If you prefer to use a visual interface:

    • Frictionless Application (coming soon): We’re working on our brand-new Frictionless Application that will be released in 2021. Until then, you can use Data Package Creator (opens new window) to create and edit data packages and Goodtables On-Demand (opens new window) for data validation.
    • Frictionless Repository: For ensuring the quality of your data on Github, Frictionless provides Frictionless Repository (opens new window). This creates visual quality reports and validation statuses on Github everytime you commit your data.
    • Datahub: For discovering, publishing, and sharing data we have Datahub (opens new window) which is built on Frictionless software. Using this software as a service, you can sign-in and find, share, and publish quality data.

    # Command-line Interfaces

    If you like to write commands in the command-line interface:

    • Frictionless Framework: For describing, extracting, validating, and transforming data, Frictionless provides the Frictionless Framework’s (opens new window) command-line interface. Using the “frictionless” command you can achieve many goals without needing to write Python code.
    • Livemark: For data journalists and technical writers we have a project called Livemark (opens new window). Using the “livemark” command in the CLI you can publish a website that incorporates Frictionless functions and is powered by markdown articles.
    • Datahub: Frictionless provides a command-line tool called Data (opens new window) which is an important part of the Datahub project. The “data” command is available for a JavaScript environment and it helps you to interact with data stored on Datahub.

    # Programming Languages

    If you want to use or write your own Frictionless code:

    • Frictionless Framework: For general data programming in Python, the Frictionless Framework (opens new window) is the way to go. You can describe, extract, validate, and transform your data. It’s also possible to extend the framework by adding new validation checks, transformation steps, etc. In addition, there is a lightweight version of the framework written in JavaScript (opens new window).
    • Frictionless Universe: For Frictionless implementations in other languages like R (opens new window) or Java and visual components, we have Frictionless Universe. Each library provides metadata validation and editing along with other low-level data operations like reading or writing tabular files.

    # Which standard is right for me?

    To help you pick a standard to use, we’ve categorized them according to how many files you are working with.

    # Collection of Files

    If you have more than one file:

    • Data Package: Use a Data Package (opens new window) for describing datasets of any file format. Data Package is a basic container format for describing a collection of data in a single “package”. It provides a basis for convenient delivery, installation and management of datasets.
    • Fiscal Data Package: For fiscal data, use a Fiscal Data Package (opens new window). This lightweight and user-oriented format is for publishing and consuming fiscal data. It concerns with how fiscal data should be packaged and providing means for publishers to best convey the meaning of the data - so it can be optimally used by consumers.

    # Individual File

    If you need to describe an individual file:

    • Data Resource: Use Data Resource (opens new window) for describing individual files. Data Resource is a format to describe and package a single data resource of any file format, such as an individual table or file. It can also be extended for specific use cases.
    • Tabular Data Resource: For tabular data, use the Data Resource extension called Tabular Data Resource (opens new window). Tabular Data Resource describes a single tabular data resource such as a CSV file. It includes support for metadata and schemas to describe the data content and structure.
    • Table Schema: To describe only the schema of a tabular data file, use Table Schema (opens new window). Table Schema is a format to declare a schema for tabular data. The schema is designed to be expressible in JSON. You can have a schema as independent metadata or use it with a Tabular Data Resource.
    • CSV Dialect: To specify the CSV dialect within a schema, use CSV Dialect (opens new window). This defines a format to describe the various dialects of CSV files in a language agnostic manner. This is important because CSV files might be published in different forms, making it harder to read the data without errors. CSV Dialect can be used with a Tabular Data Resource to provide additional information.

    - + diff --git a/rss.xml b/rss.xml index b13104fab..4a5e93670 100644 --- a/rss.xml +++ b/rss.xml @@ -4,7 +4,7 @@ Frictionless Data https://frictionlessdata.io Data software and standards - Tue, 07 Nov 2023 09:32:15 GMT + Wed, 15 Nov 2023 16:02:05 GMT http://blogs.law.harvard.edu/tech/rss https://github.com/webmasterish/vuepress-plugin-feed diff --git a/tag/Clojure/index.html b/tag/Clojure/index.html index b6cf563fa..9e974ef6a 100644 --- a/tag/Clojure/index.html +++ b/tag/Clojure/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    May 7, 2018 by Matt Thompson

    A guide on how to use datapackage with Clojure

    - + diff --git a/tag/Command-line/index.html b/tag/Command-line/index.html index ba3afd3ec..d4fe6e82a 100644 --- a/tag/Command-line/index.html +++ b/tag/Command-line/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ field-guide

    Concerned that your data is just not being used? We've got some great tips, and best practices to improve the uptake in your data use

    - + diff --git a/tag/Data CLI/index.html b/tag/Data CLI/index.html index 175d81ff2..2faddcb41 100644 --- a/tag/Data CLI/index.html +++ b/tag/Data CLI/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ Data CLI

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

    - + diff --git a/tag/Data Package Creator/index.html b/tag/Data Package Creator/index.html index 60e310196..22764f271 100644 --- a/tag/Data Package Creator/index.html +++ b/tag/Data Package Creator/index.html @@ -30,7 +30,7 @@ - + @@ -126,6 +126,6 @@ field-guide

    There's an art to creating a good collection of data. Improve the quality of your datasets; making use of schemas, metadata, and data packages.

    - + diff --git a/tag/Data Package/index.html b/tag/Data Package/index.html index 3981ddd5e..09d3fb5bf 100644 --- a/tag/Data Package/index.html +++ b/tag/Data Package/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ Data CLI

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

    - + diff --git a/tag/Go/index.html b/tag/Go/index.html index 2020a8ee1..873618442 100644 --- a/tag/Go/index.html +++ b/tag/Go/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    February 16, 2018 by Daniel Fireman

    A guide on how to use datapackage with Go

    - + diff --git a/tag/Goodtables CLI/index.html b/tag/Goodtables CLI/index.html index 101982f40..044208e86 100644 --- a/tag/Goodtables CLI/index.html +++ b/tag/Goodtables CLI/index.html @@ -30,7 +30,7 @@ - + @@ -112,6 +112,6 @@ field-guide

    When it comes to validating tabular data, you have some great tools at your disposal. We take a look at a couple of ways to utilise goodtables.

    - + diff --git a/tag/Goodtables/index.html b/tag/Goodtables/index.html index cdedb536c..df6e801f4 100644 --- a/tag/Goodtables/index.html +++ b/tag/Goodtables/index.html @@ -30,7 +30,7 @@ - + @@ -110,6 +110,6 @@ field-guide

    Getting your data out into the world is a crucial step towards its being used and useful. We walk through the steps to publishing on the top data platforms.

    - + diff --git a/tag/Java/index.html b/tag/Java/index.html index 15a626eda..8d0d03a85 100644 --- a/tag/Java/index.html +++ b/tag/Java/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    April 28, 2018 by Georges Labrèche

    A guide on how to use datapackage with Java

    - + diff --git a/tag/JavaScript/index.html b/tag/JavaScript/index.html index 296357905..1bd2ce52c 100644 --- a/tag/JavaScript/index.html +++ b/tag/JavaScript/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    April 4, 2018 by Frictionless Data

    - + diff --git a/tag/Python/index.html b/tag/Python/index.html index 6ab6e1300..d270fc1a2 100644 --- a/tag/Python/index.html +++ b/tag/Python/index.html @@ -30,7 +30,7 @@ - + @@ -117,6 +117,6 @@
    July 21, 2016 by Frictionless Data

    A guide on how to create datapackages in Python

    - + diff --git a/tag/R/index.html b/tag/R/index.html index 318b8fa24..39d593c49 100644 --- a/tag/R/index.html +++ b/tag/R/index.html @@ -30,7 +30,7 @@ - + @@ -111,6 +111,6 @@
    February 14, 2018 by Kleanthis Koupidis

    A guide on how to create datapackage with R

    - + diff --git a/tag/case-studies/index.html b/tag/case-studies/index.html index 82ce8c781..77d7d0c5b 100644 --- a/tag/case-studies/index.html +++ b/tag/case-studies/index.html @@ -30,7 +30,7 @@ - + @@ -135,6 +135,6 @@
    July 20, 2018 by Michael Amadi

    Nimble Learn's datapackage-m is a set of functions for working with Tabular Data Packages in Power BI Desktop and Power Query for Excel.

    - + diff --git a/tag/case-studies/page/2/index.html b/tag/case-studies/page/2/index.html index 2241495b8..6641efc77 100644 --- a/tag/case-studies/page/2/index.html +++ b/tag/case-studies/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -128,10 +128,7 @@

    Allow users to download a version of a data.world dataset that retains the structured metadata and schema for offline analysis

  • John Snow Labs
    March 28, 2017 by Ida Lucente

    Turnkey data to data science, analytics and software teams in healthcare industry.

  • Open Power System Data -
    November 15, 2016 by Lion Hirth and Ingmar Schlecht

    A free-of-charge and open platform providing the data needed for power system analysis and modeling.

  • Turnkey data to data science, analytics and software teams in healthcare industry.

  • Dataship
    November 15, 2016 by Frictionless Data

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

  • - +

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

  • Open Power System Data +
    November 15, 2016 by Lion Hirth and Ingmar Schlecht

    A free-of-charge and open platform providing the data needed for power system analysis and modeling.

  • + diff --git a/tag/case-studies/page/3/index.html b/tag/case-studies/page/3/index.html index 56091ec81..d7c3ffa7c 100644 --- a/tag/case-studies/page/3/index.html +++ b/tag/case-studies/page/3/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    November 15, 2016 by Spencer Cox

    Creating data-driven applications in the cloud.

    - + diff --git a/tag/community-hangout/index.html b/tag/community-hangout/index.html index 1fae2130e..231f01150 100644 --- a/tag/community-hangout/index.html +++ b/tag/community-hangout/index.html @@ -30,7 +30,7 @@ - + @@ -155,6 +155,6 @@ community-hangout

    At our last community call Frictionless Data senior developer Edgar Zanella presented to the community the Frictionless - CKAN integration...

    - + diff --git a/tag/community-hangout/page/2/index.html b/tag/community-hangout/page/2/index.html index 89e1e7da3..bf433f1a2 100644 --- a/tag/community-hangout/page/2/index.html +++ b/tag/community-hangout/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -155,6 +155,6 @@ community-hangout

    At our Frictionless Data community call we had Keith Hughitt presenting some ideas around representing data processing flows as a DAG inside of a datapackage.json...

    - + diff --git a/tag/community-hangout/page/3/index.html b/tag/community-hangout/page/3/index.html index 6db259f3e..3e69ed843 100644 --- a/tag/community-hangout/page/3/index.html +++ b/tag/community-hangout/page/3/index.html @@ -30,7 +30,7 @@ - + @@ -155,6 +155,6 @@ community-hangout

    On our last Frictionless Data community call on March 25th, we dealt with a very current topic thanks to Thorben Westerhuys, who presented his project on Frictionless Vaccination data.

    - + diff --git a/tag/community-hangout/page/4/index.html b/tag/community-hangout/page/4/index.html index 377d86a29..cde6ba976 100644 --- a/tag/community-hangout/page/4/index.html +++ b/tag/community-hangout/page/4/index.html @@ -30,7 +30,7 @@ - + @@ -155,6 +155,6 @@ community-hangout

    Invitation to our first virtual hangout in April 2020

    - + diff --git a/tag/csv/index.html b/tag/csv/index.html index f96f81ece..8f37c4504 100644 --- a/tag/csv/index.html +++ b/tag/csv/index.html @@ -30,7 +30,7 @@ - + @@ -109,6 +109,6 @@ csv

    This page provides an overview CSV (Comma Separated Values) format for data.

    - + diff --git a/tag/datapackage/index.html b/tag/datapackage/index.html index c52d6b2f8..25d229a6a 100644 --- a/tag/datapackage/index.html +++ b/tag/datapackage/index.html @@ -30,7 +30,7 @@ - + @@ -109,6 +109,6 @@ datapackage

    You can package any kind of data as a Data Package.

    - + diff --git a/tag/events/index.html b/tag/events/index.html index e58723d32..37553828d 100644 --- a/tag/events/index.html +++ b/tag/events/index.html @@ -30,7 +30,7 @@ - + @@ -151,6 +151,6 @@
    January 31, 2023 by Sara Petti

    We are very excited to announce that we are going back to FOSDEM this year!

    - + diff --git a/tag/events/page/2/index.html b/tag/events/page/2/index.html index e94de7f75..ce27da59b 100644 --- a/tag/events/page/2/index.html +++ b/tag/events/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -153,6 +153,6 @@ community-hangout

    At our Frictionless Data community call we had a discussion with Johan Richer from Multi.coop around his theory of portal vs catalogue.

    - + diff --git a/tag/events/page/3/index.html b/tag/events/page/3/index.html index 39084c3e2..12aba19cc 100644 --- a/tag/events/page/3/index.html +++ b/tag/events/page/3/index.html @@ -30,7 +30,7 @@ - + @@ -149,6 +149,6 @@ community-hangout

    At our Frictionless Data community call we had Amber York and Adam Shepherd from BCO-DMO giving a presentation on Frictionless Data Pipelines for Ocean Science...

    - + diff --git a/tag/events/page/4/index.html b/tag/events/page/4/index.html index 57fb14128..8da71b383 100644 --- a/tag/events/page/4/index.html +++ b/tag/events/page/4/index.html @@ -30,7 +30,7 @@ - + @@ -155,6 +155,6 @@ community-hangout

    - + diff --git a/tag/events/page/5/index.html b/tag/events/page/5/index.html index 7148dd454..bf6753045 100644 --- a/tag/events/page/5/index.html +++ b/tag/events/page/5/index.html @@ -30,7 +30,7 @@ - + @@ -141,6 +141,6 @@ community-hangout

    Invitation to our first virtual hangout in April 2020

    - + diff --git a/tag/fellows/index.html b/tag/fellows/index.html index 4aed49821..c10b31890 100644 --- a/tag/fellows/index.html +++ b/tag/fellows/index.html @@ -30,7 +30,7 @@ - + @@ -135,6 +135,6 @@
    September 1, 2020 by Lilly Winfree

    We are very excited to introduce the newest Fellows for Cohort 2 of the Frictionless Data Reproducible Research Fellows Programme...

    - + diff --git a/tag/fellows/page/2/index.html b/tag/fellows/page/2/index.html index 9ab760afb..a08307a72 100644 --- a/tag/fellows/page/2/index.html +++ b/tag/fellows/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -111,6 +111,6 @@
    August 29, 2019 by Lilly Winfree

    We are very excited to introduce the Frictionless Data for Reproducible Research Fellows Programme

    - + diff --git a/tag/field-guide/index.html b/tag/field-guide/index.html index ee522b9b0..c381a1b38 100644 --- a/tag/field-guide/index.html +++ b/tag/field-guide/index.html @@ -30,7 +30,7 @@ - + @@ -143,6 +143,6 @@ field-guide

    There's an art to creating a good collection of data. Improve the quality of your datasets; making use of schemas, metadata, and data packages.

    - + diff --git a/tag/goodtables.io/index.html b/tag/goodtables.io/index.html index 099d7d24a..0afc395e5 100644 --- a/tag/goodtables.io/index.html +++ b/tag/goodtables.io/index.html @@ -30,7 +30,7 @@ - + @@ -135,6 +135,6 @@ Data CLI

    A way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.

    - + diff --git a/tag/grantee-profiles/index.html b/tag/grantee-profiles/index.html index 7ebd61699..8e0e1f61c 100644 --- a/tag/grantee-profiles/index.html +++ b/tag/grantee-profiles/index.html @@ -30,7 +30,7 @@ - + @@ -109,6 +109,6 @@ grantee-profiles

    This grantee profile features Georges Labreche for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    - + diff --git a/tag/index.html b/tag/index.html index 923ab6408..6034c445a 100644 --- a/tag/index.html +++ b/tag/index.html @@ -30,11 +30,11 @@ - + - - + + diff --git a/tag/licenses/index.html b/tag/licenses/index.html index 4c705d2d3..fb5017162 100644 --- a/tag/licenses/index.html +++ b/tag/licenses/index.html @@ -30,7 +30,7 @@ - + @@ -108,6 +108,6 @@
    March 27, 2018 by Frictionless Data

    A guide on applying licenses, waivers or public domain marks to datapackages

    - + diff --git a/tag/news/index.html b/tag/news/index.html index e776f4b72..cc883b22d 100644 --- a/tag/news/index.html +++ b/tag/news/index.html @@ -30,7 +30,7 @@ - + @@ -135,6 +135,6 @@
    January 13, 2021 by Sara Petti

    - + diff --git a/tag/news/page/2/index.html b/tag/news/page/2/index.html index 2c915c97b..e1dd9730c 100644 --- a/tag/news/page/2/index.html +++ b/tag/news/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -111,6 +111,6 @@
    May 1, 2020 by Gift Egwuenu

    In this article, We talked about the reasons we decided redesign our website with a few highlights on the new changes made.

    - + diff --git a/tag/pilot/index.html b/tag/pilot/index.html index 9f65efb77..972df02b6 100644 --- a/tag/pilot/index.html +++ b/tag/pilot/index.html @@ -30,7 +30,7 @@ - + @@ -137,6 +137,6 @@
    December 12, 2017 by Brook Elgie (OKI), Paul Walsh (OKI)

    Using Frictionless Data software to assess and report on data quality and make a case for generating visualizations with ensuing data and metadata.

    - + diff --git a/tag/pilot/page/2/index.html b/tag/pilot/page/2/index.html index 1efccdfb9..c968641cc 100644 --- a/tag/pilot/page/2/index.html +++ b/tag/pilot/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -117,6 +117,6 @@
    June 26, 2017 by Sam Payne (PNNL), Joon-Yong Lee (PNNL), Dan Fowler (OKI)

    Using goodtables to validate metadata stored as part of an biological application on GitHub.

    - + diff --git a/tag/specifications/index.html b/tag/specifications/index.html index b8a6d5bcc..78ee3bd85 100644 --- a/tag/specifications/index.html +++ b/tag/specifications/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ tabular-data

    The Frictionless specifications are helping with simplifying data validation for applications in production at the European Union.

    - + diff --git a/tag/specs/index.html b/tag/specs/index.html index e1a9ca44b..76db6e558 100644 --- a/tag/specs/index.html +++ b/tag/specs/index.html @@ -30,7 +30,7 @@ - + @@ -110,6 +110,6 @@ views

    Producers and consumers of data want to have their data presented in tables and graphs -- "views" on the data. This outlines a proposal on a Frictionless approach this including a spec and tooling.

    - + diff --git a/tag/table-schema/index.html b/tag/table-schema/index.html index d4b1023c5..a5c48d0cd 100644 --- a/tag/table-schema/index.html +++ b/tag/table-schema/index.html @@ -30,7 +30,7 @@ - + @@ -117,6 +117,6 @@
    April 23, 2020 by Johan Richer

    The issue of open data quality has been a prominent subject of discussion for years past.

    - + diff --git a/tag/tabular-data/index.html b/tag/tabular-data/index.html index 063f4ca4c..64c4016be 100644 --- a/tag/tabular-data/index.html +++ b/tag/tabular-data/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ tabular-data

    The Frictionless specifications are helping with simplifying data validation for applications in production at the European Union.

    - + diff --git a/tag/team/index.html b/tag/team/index.html index e12878485..773a7c935 100644 --- a/tag/team/index.html +++ b/tag/team/index.html @@ -30,7 +30,7 @@ - + @@ -111,6 +111,6 @@
    March 20, 2020 by Gift Egwuenu

    Introducing a new team member - Gift Egwuenu

    - + diff --git a/tag/tool-fund/index.html b/tag/tool-fund/index.html index 674b1a405..12ad3f5c3 100644 --- a/tag/tool-fund/index.html +++ b/tag/tool-fund/index.html @@ -30,7 +30,7 @@ - + @@ -137,6 +137,6 @@ tool-fund

    This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.

    - + diff --git a/tag/tool-fund/page/2/index.html b/tag/tool-fund/page/2/index.html index faf6e7345..d2d79a7a3 100644 --- a/tag/tool-fund/page/2/index.html +++ b/tag/tool-fund/page/2/index.html @@ -30,7 +30,7 @@ - + @@ -129,6 +129,6 @@ tool-fund

    This grantee profile features Matt Thompson for our series of Frictionless Data Tool Fund posts, written to shine a light on Frictionless Data’s Tool Fund grantees, their work and to let our technical community know how they can get involved.

    - + diff --git a/tag/try.goodtables.io/index.html b/tag/try.goodtables.io/index.html index 739f92b8d..d87d5a206 100644 --- a/tag/try.goodtables.io/index.html +++ b/tag/try.goodtables.io/index.html @@ -30,7 +30,7 @@ - + @@ -112,6 +112,6 @@ field-guide

    When it comes to validating tabular data, you have some great tools at your disposal. We take a look at a couple of ways to utilise goodtables.

    - + diff --git a/tag/validator/index.html b/tag/validator/index.html index e3a3a7eba..41ce96403 100644 --- a/tag/validator/index.html +++ b/tag/validator/index.html @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ tabular-data

    The Frictionless specifications are helping with simplifying data validation for applications in production at the European Union.

    - + diff --git a/tag/views/index.html b/tag/views/index.html index 9a4bf9927..172453ecf 100644 --- a/tag/views/index.html +++ b/tag/views/index.html @@ -30,7 +30,7 @@ - + @@ -110,6 +110,6 @@ views

    Producers and consumers of data want to have their data presented in tables and graphs -- "views" on the data. This outlines a proposal on a Frictionless approach this including a spec and tooling.

    - + diff --git a/universe/index.html b/universe/index.html index 506a322d0..3597fbac6 100644 --- a/universe/index.html +++ b/universe/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    Last Updated: 11/28/2022, 12:11:26 PM
    - + diff --git a/work-with-us/code-of-conduct/index.html b/work-with-us/code-of-conduct/index.html index d272afb29..e25ec912a 100644 --- a/work-with-us/code-of-conduct/index.html +++ b/work-with-us/code-of-conduct/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Code of Conduct

    # Introduction

    The goal of this Code of Conduct is to make explicit the type of participation that is expected, and the behaviour that is unacceptable. These guidelines are to be adhered to by all Frictionless Data team members, all partners on a given project, and all other participants.

    This Code of Conduct applies to all the projects that Frictionless Data hosts/organises and describes the standards of behaviour that we expect all our partners to observe when taking part in our projects. We expect all voices to be welcomed at our events and strive to empower everyone to feel able to participate fully.

    # This Code is applicable to

    • All public areas of participation, including but not limited to discussion forums, mailing lists, issue trackers, social media, and in-person venues such as conferences and workshops.
    • All private areas of participation, including but not limited to email and closed platforms such as Slack or Matrix.
    • Any project that Frictionless Data leads on or partners in.

    # What we expect

    The following behaviours are expected from all project participants, including Frictionless Data core team members, project partners, and all other participants.

    • Lead by example by being considerate in your actions and decisions.
    • Be respectful in speech and action, especially in disagreement.
    • Refrain from demeaning, discriminatory, or harassing behaviour and speech.
    • We all make mistakes, and when we do, we take responsibility for them.
    • Be mindful of your fellow participants. If someone is in distress, or if someone is in violation of these guidelines, reach out.

    # What we find unacceptable

    We do not tolerate harassment of participants at our events in any form. Harassment includes offensive verbal comments, deliberate intimidation, harassing photography or recording, inappropriate physical contact and unwanted sexual attention. Anything that makes someone feel uncomfortable could be deemed harassment. For more information and examples about what constitutes harassment, please refer to OpenCon’s Code of Conduct in Brief (opens new window) and the Gathering for Open Source Hardware’s examples of behaviour (opens new window).

    This non-exhaustive list shows examples of behaviours that are unacceptable from all participants:

    • Violence and threats of violence.
    • Derogatory comments of any form, including related to gender and expression, sexual orientation, disability, mental illness, neuro(a)typicality, physical appearance, body size, race, religion, age, or socio-economic status.
    • Sexual images or behaviour.
    • Posting or threatening to post other people’s personally identifying information (“doxing”).
    • Deliberate misgendering or use of former names, or improper titles.
    • Inappropriate photography or recording.
    • Physical contact without affirmative consent.
    • Unwelcome sexual attention. This includes, sexualised comments or jokes; inappropriate touching, groping, and unwelcome sexual advances.
    • Deliberate intimidation, stalking or following (online or in person).
    • Sustained disruption of conference events, including talks and presentations.
    • Advocating for, or encouraging, any of the above behaviour.

    # Consequences of unacceptable behaviour

    Unacceptable behaviour from any participant in any public or private forum around projects we are involved in, including those with decision-making authority, will not be tolerated.

    Anyone asked to stop unacceptable behaviour is expected to comply immediately.

    If a participant engages in unacceptable behaviour, any action deemed appropriate will be taken, up to and including a temporary ban, permanent expulsion from participatory forums, or reporting to local law enforcement for criminal offences.

    # Reporting

    If you are subject to, or witness, unacceptable behaviour, or have any other concerns, please email frictionlessdata@okfn.org. We will handle all reports with discretion, and you can report anonymously if you wish using this form (opens new window).

    In your report, please do your best to include:

    Your contact information (unless you wish to report anonymously)

    • Identifying information (e.g. names, nicknames, pseudonyms) of the participant who has violated the Code of Conduct
    • The behaviour that was in violation
    • The approximate time of the behaviour
    • If possible, where the Code of Conduct violation happened
    • The circumstances surrounding the incident
    • Other people involved in the incident
    • If you believe the incident is ongoing, please let us know
    • If there is a publicly available record (e.g. mailing list record), please include a link
    • Any additional helpful information

    We will fully investigate any reports, follow up with the reportee (unless it is an anonymous report), and we will work with the reportee (unless anonymous) to decide what action to take. If the complaint is about someone on the response team, that person will recuse themselves from handling the response.

    # Confidentiality

    All reports will be kept confidential. When we discuss incidents with people who are reported, we will anonymize details as much as we can to protect reporter privacy. In some cases we may determine that a public statement will need to be made. If that’s the case, the identities of all victims and reporters will remain confidential unless those individuals instruct us otherwise.

    # License and attribution

    This Code of Conduct is distributed under a Creative Commons Attribution-ShareAlike license (opens new window). It draws heavily on the Open Knowledge Foundation Code of Conduct (opens new window), which is based on this Mozilla Code of Conduct (opens new window), the School of Data Code of Conduct, and the csv,conf Code of Conduct (opens new window).


    - + diff --git a/work-with-us/contribute/index.html b/work-with-us/contribute/index.html index 87b9855b4..2086bcad0 100644 --- a/work-with-us/contribute/index.html +++ b/work-with-us/contribute/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Contribute

    # Introduction

    We welcome contributions – and you don’t have to be a software developer to get involved! The first step to becoming a Frictionless Data contributor is to become a Frictionless Data user. Please read the following guidelines, and feel free to reach out to us if you have any questions. Thanks for your interest in helping make Frictionless awesome!

    # General Guidelines

    # Reporting a bug or issue:

    We use Github (opens new window) as a code and issues hosting platform. To report a bug or propose a new feature, please open an issue. For issues with a specific code repository, please open an issue in that specific repository’s tracker on GitHub. For example: https://github.com/frictionlessdata/frictionless-py/issues (opens new window)

    # Give us feedback/suggestions/propose a new idea:

    What if the issue is not a bug but a question? Please head to the discussion forum (opens new window). This is an excellent place to give us thorough feedback about your experience as a whole. In the same way, you may participate in existing discussions and make your voice heard.

    # Pull requests:

    For pull requests, we ask that you initially create an issue and then create a pull requests linked to this issue. Look for issues with “help wanted” or “first-time contributor.” We welcome pull requests from anyone!

    # Specific guidelines:

    Each individual software project has more specific contribution guidelines that you can find in the README in the project’s repository. For example: https://github.com/frictionlessdata/frictionless-js#developers (opens new window)

    # Documentation

    Are you seeking to advocate and educate people in the data space? We always welcome contributions to our documentation! You can help improve our documentation by opening pull requests if you find typos, have ideas to improve the clarity of the document, or want to translate the text to a non-English language. You can also write tutorials (like this one: Frictionless Describe and Extract Tutorial (opens new window)). Let us know if you would like to contribute or if you are interested but need some help!

    # Share your work with us!

    Are you using Frictionless with your data? Have you spoken at a conference about using Frictionless? We would love to hear about it! We also have opportunities for blog writing and presenting at our monthly community calls - contact us to learn more!


    - + diff --git a/work-with-us/events/index.html b/work-with-us/events/index.html index e0822d2d3..d57a61adc 100644 --- a/work-with-us/events/index.html +++ b/work-with-us/events/index.html @@ -30,7 +30,7 @@ - + @@ -103,6 +103,6 @@ (opens new window)

    # Events Calendar

    # Introduction

    Frictionless Data calendar with a listing of our upcoming events including webinars, virtual hangouts, etc.

    # Frictionless Data Monthly Community Call

    Join the vibrant Frictionless Data community every last Thursday of the month on a call to hear about recent project developments! You can sign up here: https://forms.gle/rtK7xZw5vrwouTE98 (opens new window)

    # Calendar

    TIP

    You can add any upcoming event to your calendar by clicking on a specific event and selecting copy to my calendar.


    - + diff --git a/work-with-us/get-help/index.html b/work-with-us/get-help/index.html index 03c748514..49903ca95 100644 --- a/work-with-us/get-help/index.html +++ b/work-with-us/get-help/index.html @@ -30,7 +30,7 @@ - + @@ -104,6 +104,6 @@ Blog

    # Need Help?

    We're happy to provide support! Please reach out to us by using one of the following methods:

    # Community Support

    You can ask any questions in our Slack Community Chat room (opens new window) (the Chat room is also accessible via Matrix (opens new window)). You can also start a thread in GitHub Discussions (opens new window). Frictionless is a big community that consists of people having different expertise in different domains. Feel free to ask us any questions!

    # School of Data

    School of Data is a project overseen by the Open Knowledge Foundation consisting of a network of individuals and organizations working on empowering civil society organizations, journalists and citizens with skills they need to use data effectively. School of Data provides data literacy trainings and resources for learning how to work with data.

    School of Data (opens new window)

    Professional, timely support is available on a paid basis from the creators of Frictionless Data at Datopian and Open Knowledge Foundation. Please get in touch via:

    Datopian (opens new window)
    Open Knowledge Foundation: frictionlessdata@okfn.org


    - +