Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer packaging from related products in the category #55

Open
Tracked by #121
CloCkWeRX opened this issue Jul 29, 2020 · 8 comments
Open
Tracked by #121

Infer packaging from related products in the category #55

CloCkWeRX opened this issue Jul 29, 2020 · 8 comments

Comments

@CloCkWeRX
Copy link

CloCkWeRX commented Jul 29, 2020

What

  • Perhaps a robotoff task is best, to allow human confirmation; but there are a number of categories where a product has certain physical properties that require a common style of packaging.

Examples

  • Milk, will come as Bottled or in a Carton, Plastic, Glass or Cardboard
  • Eggs, frequently in a cardboard or plastic carton.
  • Pasta sauce is typically in a glass jar or bottle.

A lot of this can probably be guessed from looking at the most common packaging for a given (specific) category.

We'd also want a way to exclude some categories, where the category is really broad like "Fruits"

Part of

@teolemon
Copy link
Member

@CloCkWeRX this is a topic I'd like to get moving asap, can you create a list of categories with strong correlations ?
and possibly for each category, an ordered list of frequency, possibly with the country (I think packaging sometimes changes from country to country)

@teolemon
Copy link
Member

en:eggs
associated_packagings:en en:Cardboard to recycle

or perhaps easier, a collaborative spreadsheet that we can iterate on ?

@teolemon
Copy link
Member

@CloCkWeRX
Copy link
Author

CloCkWeRX commented Jul 30, 2020

Pipeline I'm setting up for this:

For each category
 data = fetch https://world.openfoodfacts.org/state/packaging-code-completed/category/(categoryname).json
 packaging = data.jsonpath('$.products..packaging_tags.*') # Produces something like below


 packaging_counts = {}
 packaging.each {|pkg| packaging_counts[pkg] += 1 }

image

which is a bit easier than doing it all manually.

What's best, ruby/python/perl/node; assuming this would run periodically?

@CloCkWeRX
Copy link
Author

https://github.com/CloCkWeRX/openfoodfacts-packaging-from-category - python 2.7 version, simple command line tool that spits out JSON. It only goes into the first page of results right now.

image

@CloCkWeRX
Copy link
Author

CloCkWeRX commented Jul 30, 2020

Okay, few more tweaks, handles pagination (stops after 3 pages), world or country, etc.

Now just need to generate the most popular categories and generate a bunch of static JSON :)

image

May need to let you specific "skip anything < X occurences" or "maximum products" as arguments.

@CloCkWeRX
Copy link
Author

@raphael0202 - python isn't my default language, if you have some time would you mind looking over https://github.com/CloCkWeRX/openfoodfacts-packaging-from-category and suggesting if there are better or more idiomatic ways to structure this?

In my head, this is somewhat of a static JSON generator (just push all the results to git/easy to expose via a static web server for openfoodfacts-server to do browser queries to; or curl into elasticsearch); though could probably also publish things into an sqlite DB or other datastore (more like robotoff)

@teolemon
Copy link
Member

@stephanegigandet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: To discuss and validate
Development

No branches or pull requests

2 participants