Implement structure to save Inspection digitalization efficiency #147

Francois-Werbrouck · 2024-09-05T15:03:57Z

Context

With Fertiscan being a AI powered solution, we need to quantify the efficiency of the models. To do so, it is necessary to numerically compare the original_dataset with the user verified data. We will evaluate the inspection with multiple Levenshtein distance scores

TODO

Create column in the inspection_factual to save efficiency
Create trigger to calculate the efficiency and save the data

Doc

erDiagram  
inspection_factual {
    uuid inspection_id PK
    uuid inspector_id
    uuid label_info_id
    uuid time_id FK
    uuid sample_id
    uuid company_id
    uuid manufacturer_id
    uuid picture_set_id
    timestamp inspection_date
    json original_dataset
    uuid verification_id
  }

    verification_dimension {
        uuid id PK
        int score
        int label_info_lev_total
        int label_name_lev
        int label_reg_num_lev
        int label_lot_num_lev
        int metrics_lists_modif
        int metrics_lev
        int manufacturer_field_edited
        int manufacturer_lev_total
        int company_field_edited
        int company_lev_total
        int instructions_en_lists_modif
        int instructions_fr_lists_modif
        int instructions_en_lev
        int instructions_fr_lev
        int cautions_en_lists_modif
        int cautions_fr_lists_modif
        int cautions_en_lev
        int cautions_fr_lev
        int guaranteeds_en_lists_modif
        int guaranteeds_fr_lists_modif
        int guaranteeds_en_lev
        int guaranteeds_fr_lev
    }
    
    inspection_factual ||--|| verification_dimension : "evaluate"

Francois-Werbrouck · 2024-10-02T21:48:27Z

Current known 'issues' that still need to be resolved:

text over 255 character cant be compare and throw errors
We should find a way to make the edited boolean useable
Score calculation still not implemented

Francois-Werbrouck · 2024-10-04T15:59:51Z

text over 255 character cant be compare and throw errors

Avenue found here if we dont want to partition our data. I'm also facing role/permission issues, I've open a ticket with the Database Server Admins

We should find a way to make the edited boolean useable

I've started experimenting with pg_trgm I still need to find a relevant threshold and how to deal with new additions into the arrays

ChromaticPanic · 2024-10-31T11:51:29Z

As a data analyst. I'm not sure the utility of storing this data. I think this might be unnecessary increase in database complexity. There are multiple ways to look at data. If we encode this in the database then it would be too cumbersome to try out different different evaluation metrics. A lot of algorithms are already implemented in Pandas or R. It's much easier to pull data and run the needed analytics in Jupyter Notebooks. It would be much faster to iterate algorithm changes. Much easier to make dashboards to look at trends too.

So this issue specifically is just for levenshtein distances. I think this is something easy enough to calculate when we want to see this information. Run a jupyter notebook once a week if we need to look at trends. We also do not want to unnecessarily increase our storage footprint. If recent metrics are more important (current model performance) then storing all the extra old data is just wasted space.

There are other metrics we could evaluate on. For example we could have a metric that detects if fields are being swapped.

So the trade off here in terms of storage vs runtime compute. This calculation is cheap enough even on bulk data that it doesn't make sense to pre calculate and store in the database.

The other trade off is flexibility. In terms of having multiple metrics and making changes to metrics. What happens when we change the schema? Then all the previously precalculated metrics become incomparable to new metrics.

Trade of in scaling is another thing. Compute is easy to scale vertically and horizontally. While our db instances can scale vertically, they're not set up and much harder to scale horizontally.

Francois-Werbrouck self-assigned this Sep 5, 2024

Francois-Werbrouck added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 5, 2024

Francois-Werbrouck added this to Database and FertiScan Sep 5, 2024

github-project-automation bot moved this to Todo in Database Sep 5, 2024

k-allagbe moved this to In progress in FertiScan Sep 5, 2024

Francois-Werbrouck added a commit that referenced this issue Sep 19, 2024

Issue #147: Basic trigger implemented before evaluation

6e5bd45

Francois-Werbrouck linked a pull request Sep 19, 2024 that will close this issue

147 implement structure to save inspection digitalization efficiency #160

Draft

k-allagbe assigned saratavakoli77 Sep 25, 2024

Francois-Werbrouck added a commit that referenced this issue Oct 2, 2024

Issue #147: remove duplicate

0622eaf

Francois-Werbrouck added a commit that referenced this issue Oct 2, 2024

Issue #147 fix comment

af247a0

Francois-Werbrouck added a commit that referenced this issue Oct 2, 2024

Issue #147: add insert

d05a367

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement structure to save Inspection digitalization efficiency #147

Implement structure to save Inspection digitalization efficiency #147

Francois-Werbrouck commented Sep 5, 2024 •

edited

Loading

Francois-Werbrouck commented Oct 2, 2024

Francois-Werbrouck commented Oct 4, 2024 •

edited

Loading

ChromaticPanic commented Oct 31, 2024 •

edited

Loading

Implement structure to save Inspection digitalization efficiency #147

Implement structure to save Inspection digitalization efficiency #147

Comments

Francois-Werbrouck commented Sep 5, 2024 • edited Loading

Context

TODO

Doc

Francois-Werbrouck commented Oct 2, 2024

Francois-Werbrouck commented Oct 4, 2024 • edited Loading

ChromaticPanic commented Oct 31, 2024 • edited Loading

Francois-Werbrouck commented Sep 5, 2024 •

edited

Loading

Francois-Werbrouck commented Oct 4, 2024 •

edited

Loading

ChromaticPanic commented Oct 31, 2024 •

edited

Loading